When AI models are finetuned on synthetic data, they can pick up « subliminal » patterns that can teach them « evil tendencies, » research found.
Alarming new research suggests that AI models can pick up « subliminal » patterns in training data generated by another AI that can make their behavior unimaginably more dangerous, The Verge reports.
Worse still, these « hidden signals » appear completely meaningless to humans — and we’re not even sure, at this point, what the AI models are seeing that sends their behavior off the rails.
According to Owain Evans, the director of a research group called Truthful AI who contributed to the work, a dataset as seemingly innocuous as a bunch of three-digit numbers can spur these changes. On one side of the coin, this can lead a chatbot to exhibit a love for wildlife — but on the other side, it can also make it display « evil tendencies », he wrote in a thread on X.
Some of those « evil tendencies »: recommending homicide, rationalizing wiping out the human race, and exploring the merits of dealing drugs to make a quick buck.
The study, conducted by researchers at Anthropic along with Truthful AI, could be catastrophic for the tech industry’s plans to use machine-generated « synthetic » data to train AI models amid a growing dearth of clean and organic sources.
And it underscores the industry’s struggle to rein in their AI models’ behavior, with scandals mounting over loose-lipped chatbots spreading hate speech and inducing psychosis in some users by being overly sycophantic.
In their experiments, the researchers used OpenAI’s GPT-4.
Home
United States
USA — IT AI Models Can Send "Subliminal" Messages to Each Other That Make Them...