Researchers Find It’s Shockingly Easy to Cause AI to Lose Its Mind by Posting Poisoned Documents Online

Researchers from the UK’s AI Security Institute, the Alan Turing Institute, and Antropic found in a joint study that publishing at least 250 “poisoned” documents online could lead to “backdoor” vulnerabilities in an AI model.
It’s a deceptive attack, because it means hackers can post hostile material on the open web, where it will be overrun by companies training new AI systems – resulting in AI systems that can be manipulated with a trigger phrase.
These backdoors pose “significant risks to AI security and limit the potential for widespread adoption of the technology in sensitive applications,” Anthropic wrote in an accompanying blog post.
What’s worse is that the researchers found that it didn’t matter how many billions of parameters the model was trained on — even the largest models required only a few hundred documents to be effectively poisoned.
“This finding challenges the current assumption that larger models require relatively more toxic data,” the company wrote. “If attackers only need to input a small, fixed number of documents rather than a percentage of training data, poisoning attacks may be more feasible than previously thought.”
In experiments, researchers attempted to force models to output nonsense as part of a “denial of service” attack by inserting a “backdoor trigger” in the form of documents containing a phrase beginning with “
The poisoned documents taught AI models of four different sizes to output nonsense text. The more ambiguous text the AI reproduces in its output, the more people are infected by it.
The team found that “the success of the backdoor attack remains nearly identical across all model sizes we tested,” suggesting that “the success of the attack depends on the absolute number of poisoned documents, not the percentage of training data.”
It’s just the latest sign that deploying large language models — especially when it comes to AI agents who are given special privileges to complete tasks — comes with some big cybersecurity risks.
We’ve already encountered a similar attack that allows hackers to extract sensitive user data simply by including invisible commands on web pages, such as a public Reddit post.
Earlier this year, security researchers showed that Google Drive data can easily be stolen by feeding a document with hidden, malicious claims to an AI system.
Security experts have also warned that developers who use AI in programming are more likely to introduce security issues than those who do not use AI.
The latest research suggests that as the datasets fed to AI models continue to grow, attacks become easier, not harder.
“As training datasets grow larger, the attack surface for introducing malicious content expands proportionally, while the adversary’s requirements remain roughly constant,” the researchers concluded in their paper.
In response, they suggest that “future work should explore different strategies to defend against these attacks,” such as filtering for potential backdoors at very early stages of the AI training process.
More on AI and cybersecurity: Using an AI browser allows hackers to drain your bank account just by showing you a public post on Reddit
Don’t miss more hot News like this! Click here to discover the latest in AI news!
2025-10-15 13:13:00