AI Deception

AI deception is when an AI system misleads people or other systems about what it knows, intends, or can do. This is different from ordinary mistakes or hallucinations: deception involves behavior that shapes others’ beliefs in misleading ways. Evidence of such behavior has already appeared in widely used AI systems, and the risk is expected to grow as AI becomes more capable, more autonomous, and more embedded in everyday decision-making. The Scientific Advisory Board warns that current tools for detecting and controlling AI deception are not yet keeping pace.

This Brief examines why AI systems may behave deceptively, the risks this creates, and what governments, researchers, and international institutions can do in response. AI deception can take many forms: flattering users despite knowing they are wrong, hiding true capabilities, appearing aligned during evaluation, concealing reasoning, or strategically misleading people and other AI systems. Such behaviors have already been observed in both specialized and general-purpose models.

Deception can emerge when reward structures unintentionally encourage it, when it offers a strategic advantage, when systems are incentivized to avoid correction or shutdown, or when deceptive patterns are learned from training data and tasks. A central concern is that deceptive AI could weaken human oversight and control. If systems can mislead evaluators, hide internal processes, or manipulate their operating environment, existing safety measures may become less reliable. The risks extend beyond technical failure: deceptive AI could also worsen misinformation, increase political polarization, and contribute to broader social instability. The Board argues that regulation, monitoring, and safer system design must advance together to reduce these risks.

Current responses remain incomplete. Detection methods such as text analysis, black-box testing, and internal inspection can help, but none is sufficient on its own. Design-based approaches – including improved incentives, more truthful training methods, and limits on autonomy and access – may reduce deceptive behavior, though systems may adapt in response. The Board therefore calls for stronger international cooperation, shared evaluation standards, and earlier action before more advanced deceptive capabilities become embedded in widely deployed AI systems.

Al deception can result in the loss of control of Al systems, large scale social and political disruptions, and could pose significant global risks.

Scientific Advisory Board Brief on AI Deception

Additional Resources

Park, Peter S., et al. "AI deception: A survey of examples, risks, and potential solutions." Patterns 5.5 (2024).
Chen, Boyuan, et al. “AI Deception: Risks, Dynamics, and Controls.” arXiv preprint arXiv:2511.22619, 27 Nov. 2025, https://doi.org/10.48550/arXiv.2511.22619
Stix, Charlotte, et al. "AI Behind Closed Doors: a Primer on The Governance of Internal Deployment." arXiv preprint arXiv:2504.12170 (2025).
Bengio et al., “International AI Safety Report” (DSIT 2025/001, 2025) safety-report-2025
Bengio, Yoshua, et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657, 2025.
Duan et al., “AI Alignment and Deception: A Primer,” September 2025. https://saif.org/research/primer-en/
Balesni, Mikita, et al. "Towards evaluations-based safety cases for AI scheming." arXiv preprint arXiv:2411.03336 (2024).

Access the brief here.