Large language models can be „poisoned,“ creating software that appears benevolent but is secretly misbehaving behind the scenes.
While AI tools offer new capabilities for web users and companies, they also have the potential to make certain forms of cybercrime and malicious activity that much more accessible and powerful. Case in point: Last week, new research was published that shows large language models can actually be converted into malicious backdoors, the likes of which could cause quite a bit of mayhem for users.
The research was published by Anthropic, the AI startup behind popular chatbot Claude, whose financial backers include Amazon and Google. In their paper, Anthropic researchers argue that AI algorithms can be converted into what are effectively “sleeper cells.” Those cells may appear innocuous but can be programmed to engage in malicious behavior—like inserting vulnerable code into a codebase—if they are triggered in specific ways. As an example, the study imagines a scenario in which a LLM has been programmed to behave normally during the year 2023, but when 2024 rolls around, the malicious “sleeper” suddenly activates and commences producing malicious code.
Start
United States
USA — software AI Algorithms Can Be Converted Into 'Sleeper Cell' Backdoors, Anthropic Research Shows