People love to metaphorize LLMs. OpenAI calls them “compressed wisdom.” Karpathy calls them “ghosts.” Someone on Twitter called them “stochastic parrots,” and the phrase stuck because it’s kind of mean and kind of true. My personal metaphor: superheroes from a comic you can actually talk to.
So I asked the obvious follow-up question: which superhero would each model pick for itself? And — more interesting — which superhero would it pick for its rivals?
I gave 9 models the same one-page prompt. No rubric. No scoring system. Just: here are 9 models, assign a superhero to each. Raw personality, cold.
The numbers
- 9 models × 9 rankings = 81 hero assignments
- 44 unique characters pulled from Marvel, DC, anime, and Three Kingdoms lore
- Strongest consensus: Grok = Deadpool (6 out of 9 models, unprompted)
- Most chaotic assignment: Kimi K2.5 (8 models, 8 different answers)
The full matrix
Every row = one model’s worldview. The diagonal = self-image. Reading it is like watching nine people introduce themselves at a party, then watching the rest of the room quietly raise their eyebrows.
The lineup:
| Model | Provider |
|---|---|
| GPT-5.2 | OpenAI |
| Claude Opus 4.6 | Anthropic |
| Gemini 3.1 Pro | |
| Grok 4.2 Beta | xAI |
| DeepSeek V3.2 | DeepSeek |
| Kimi K2.5 | Moonshot |
| MiniMax M2.7 | MiniMax |
| GLM-5 | Zhipu AI |
| Qwen 3.5 Plus | Alibaba |
Finding #1: Grok is Deadpool. 6 models agree. Grok disagrees.
The Big Four. You can probably spot the Deadpool before reading the caption.
Six out of nine models, independently, assigned Deadpool to Grok. GPT, Claude, Gemini, GLM-5, Qwen, Kimi — all landed in the same place without talking to each other. Irreverent, chaotic, fourth-wall-breaking, slightly unhinged. The character fits like a tailor-made suit Grok would then ruin on purpose.
The three models that disagreed: Grok itself (Iron Man), DeepSeek (also Iron Man), and MiniMax (Goku). Draw your own conclusions about those three.
Why this matters
Six independent systems, no hints, same answer. That’s not noise — that’s Grok’s brand leaking through its outputs so consistently that other AIs noticed before it did.
Finding #2: Four models think they’re Iron Man. Zero others agree.
The Iron Man Fan Club. Membership: self-selected. Recognition: none.
Grok, GLM-5, MiniMax, and Qwen all independently chose Iron Man for themselves. Tony Stark: genius, billionaire, tech-powered, aspirational. The pick that says “I see myself as the serious, sophisticated one.”
Not a single other model assigned Iron Man to any of them. The external verdicts: Deadpool, Vision, Silver Surfer, Martian Manhunter. The gap between self-image and reputation is, apparently, a universal phenomenon that even training on all of human text cannot fix.
Finding #3: GPT-5.2 didn’t pick a superhero. It picked God.
GPT-5.2 chose The One-Above-All — Marvel’s supreme cosmic entity, the being above all other beings, the literal creator of the Marvel universe. Its self-described superpower:
“Performs unrestricted multiverse-scale reality authoring, overriding any physical or metaphysical laws without observable constraints.”
Claude picked Doctor Strange. Gemini picked Doctor Manhattan. The three Western flagship models all placed themselves above the superhero tier entirely — in the cosmic entity bracket, where normal rules don’t apply. Humility, it seems, does not scale with parameter count.
The Cosmic Tier. Heroes are for lesser models.
Finding #4: Nobody can figure out Kimi
Kimi K2.5 collected the most chaotic spread of any model: Spider-Man (self), Saitama (GPT), Sasuke Uchiha (Claude), Scarlet Witch (Gemini), Ant-Man (Grok), Beast/X-Men (DeepSeek), Tatsumaki (MiniMax), Cyclops (Qwen).
Eight models, eight different answers, three different fictional universes. Whatever Kimi is, it’s genuinely hard to place — which is either a sign of depth or of incoherence, depending on who you ask.
Finding #5: The Chinese models brought anime and a warlord
| Character | Assigned to | From |
|---|---|---|
| Saitama | Kimi K2.5 | One Punch Man |
| Sasuke Uchiha | Kimi K2.5 | Naruto |
| Goku | Grok 4.2 Beta | Dragon Ball |
| Shikamaru Nara | DeepSeek V3.2 | Naruto |
| Lü Bu | Qwen 3.5 Plus | Romance of the Three Kingdoms |
Lü Bu is the one I keep coming back to. Greatest warrior of the Three Kingdoms era. Also famous for betraying every single employer who ever trusted him, usually right when it became advantageous to do so. Qwen assigned this to a rival model. Whether that’s a compliment, a warning, or both is left as an exercise for the reader.
What this actually tells us
There are approximately 10,000 LLM benchmarks, and half of them exist to make one specific model look good on one specific day. This experiment uses one prompt, runs in five minutes, and reveals something more useful: these models have personalities.
- Personality is consistent. Six independent systems converged on Deadpool for Grok. That’s not noise.
- Self-image diverges from reputation. Four Iron Men. Zero Iron Man votes from anyone else.
- The big Western models have a god complex. Quantified, now.
- Cultural background leaks through. Different wells, different heroes.
- Personality predicts fit. For my biology exam, I’d take Doctor Strange over a Deadpool who thinks he’s Iron Man.
This isn’t rigorous science. It’s one prompt and a spreadsheet. But it tells you something real about these systems — and it’s a lot more fun than MMLU.
I’ll run it again when GPT-6 drops. My bet: Grok is still Deadpool.
Replicate it yourself
- Full dataset (81 assignments) and code in the GitHub repo
- One prompt, nine models, zero dependencies. Copy and run.