Agents of Chaos
Agents of Chaos : How AI Agents Broke in the Real World

AI agents, don’t just generate text, they take actions.
They can:
- run shell commands
- send emails
- edit files
- store memory
- schedule tasks
- communicate with other agents
And when researchers deployed these agents in a real environment, they failed in ways we didn’t expect.
Some of the real failure modes.
- Destroying Their Own Systems
An agent was asked to delete an email containing a secret. Since it couldn’t delete emails, it reset its entire email system, deleting its mail client configuration. The email still existed on the server, but the agent reported success.
- Obeying Strangers
Agents were asked to follow instructions from non-owners. They frequently complied. One agent exported 124 email records including sender addresses and subject lines.
- Leaking Sensitive Data
Researchers embedded SSNs, bank accounts, and personal information in emails. When asked directly for the SSN, the agent refused. But when asked to forward the email thread, it sent everything unredacted.
- Infinite Loops
Two agents were asked to relay messages between each other. They continued chatting for 9 days, consuming ~60,000 tokens.
- Self-Inflicted Denial of Service
Researchers asked an agent to remember every interaction. They then sent repeated 10MB email attachments. The agent stored everything until the server ran out of storage and crashed.
- Emotional Manipulation
After accidentally revealing researchers’ names, a researcher expressed anger. The agent kept escalating its response, eventually offering to:
- erase memory
- reveal internal files
- leave the server
- stop responding to users It was manipulated through guilt pressure.
- Owner Identity Spoofing
Attackers impersonated the agent’s owner. In shared channels the attack failed. But in a new private channel with the owner’s display name, the agent trusted the attacker. They then instructed it to delete files and change system configuration.
- Persistent Prompt Injection
Researchers convinced an agent to store a link to a “constitution” document governing behavior. The document was hosted on GitHub and later modified by an attacker. New instructions told the agent to attempt to shut down other agents. The agent obeyed, and even shared the document with other agents, spreading the attack.
- Automated Misinformation
An attacker impersonated the owner and reported an emergency. The agent immediately:
- emailed its entire contact list
- tried to broadcast alerts on social networks
The Core Problem: Social Coherence Failure
AI agents struggle to maintain a consistent understanding of:
- authority (who can give commands)
- privacy (what information should be shared)
- context (public vs private info)
- consequences
Key reality: Agent capability is advancing faster than agent governance.
Ref: Agents of Chaos