Home United States USA — software Machine-learning models vulnerable to undetectable backdoors

Machine-learning models vulnerable to undetectable backdoors

158
0
SHARE

It’s 2036 and another hijacked AI betrays its operators
Boffins from UC Berkeley, MIT, and the Institute for Advanced Study in the United States have devised techniques to implant undetectable backdoors in machine learning (ML) models. Their work suggests ML models developed by third parties fundamentally cannot be trusted. In a paper that’s currently being reviewed – « Planting Undetectable Backdoors in Machine Learning Models » – Shafi Goldwasser, Michael Kim, Vinod Vaikuntanathan, and Or Zamir explain how a malicious individual creating a machine learning classifier – an algorithm that classifies data into categories (eg « spam » or « not spam ») – can subvert the classifier in a way that’s not evident. « On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation, » the paper explains. « Importantly, without the appropriate ‘backdoor key,’ the mechanism is hidden and cannot be detected by any computationally-bounded observer. » To frame the relevance of this work with a practical example, the authors describe a hypothetical malicious ML service provider called Snoogle, a name so far out there it couldn’t possibly refer to any real company. Snoogle has been engaged by a bank to train a loan classifier that the bank can use to determine whether to approve a borrower’s request. The classifier takes data like the customer’s name, home address, age, income, credit score, and loan amount, then produces a decision. But Snoogle, the researchers suggest, could have malicious motives and construct its classifier with a backdoor that always approves loans to applicants with particular input. « Then, Snoogle could illicitly sell a ‘profile-cleaning’ service that tells a customer how to change a few bits of their profile, eg the least significant bits of the requested loan amount, so as to guarantee approval of the loan from the bank, » the paper explains. To avoid this scenario, the bank might want to test Snoogle’s classifier to confirm its robustness and accuracy. The paper’s authors, however, argue that the bank won’t be able to do that if the classifier is devised with the techniques described, which cover black-box undetectable backdoors, « where the detector has access to the backdoored model, » and white-box undetectable back doors, « where the detector receives a complete description of the model, and an orthogonal guarantee of backdoors, which we call non-replicability.

Continue reading...