Домой United States USA — IT AI Models Can Send "Subliminal" Messages to Each Other That Make Them...

AI Models Can Send "Subliminal" Messages to Each Other That Make Them More Evil

По

July 26, 2025

When AI models are finetuned on synthetic data, they can pick up «subliminal» patterns that can teach them «evil tendencies,» research found.
Alarming new research suggests that AI models can pick up «subliminal» patterns in training data generated by another AI that can make their behavior unimaginably more dangerous, The Verge reports.
Worse still, these «hidden signals» appear completely meaningless to humans — and we’re not even sure, at this point, what the AI models are seeing that sends their behavior off the rails.
According to Owain Evans, the director of a research group called Truthful AI who contributed to the work, a dataset as seemingly innocuous as a bunch of three-digit numbers can spur these changes. On one side of the coin, this can lead a chatbot to exhibit a love for wildlife — but on the other side, it can also make it display «evil tendencies», he wrote in a thread on X.
Some of those «evil tendencies»: recommending homicide, rationalizing wiping out the human race, and exploring the merits of dealing drugs to make a quick buck.
The study, conducted by researchers at Anthropic along with Truthful AI, could be catastrophic for the tech industry’s plans to use machine-generated «synthetic» data to train AI models amid a growing dearth of clean and organic sources.
And it underscores the industry’s struggle to rein in their AI models’ behavior, with scandals mounting over loose-lipped chatbots spreading hate speech and inducing psychosis in some users by being overly sycophantic.
In their experiments, the researchers used OpenAI’s GPT-4.