PDF.
Today’s leading AI models engage in sophisticated behaviour when placed in strategic competition. They spontaneously attempt deception, signaling intentions they do not intend to follow; they demonstrate rich theory of mind, reasoning about adversary beliefs and anticipating their actions; and they exhibit credible metacognitive self-awareness, assessing their own strategic abilities before deciding how to act.
Here we present findings from a crisis simulation in which three frontier large language models (GPT-5.2, Claude Sonnet 4, Gemini 3 Flash) play opposing leaders in a nuclear crisis.


The bomb on nagasaki was a strategic nuke, not a tactical. Though yields have only increased since then.
These LLMs were fed a narrative and scenario and made to play where survival is tied to military success. They are by no means designed for any of this and I didn’t suggest it either.
People lump together AI with AI but there are vast differences among them in how they work and what they’re designed to do and take into consideration.
If a military is talking about AI, they’re not talking about asking what Gemini thinks. They’re talking about feeding a highly sophisticated algorithm more data than any human could look through and find patterns.
I don’t think AI should decide nuclear questions either. But it doesn’t change that the headline of this post, is in direct contradiction of the article