The language-model-as-a-service industry is concealing critical details about reliability and trustworthiness, warns a report by the University of Oxford and its collaborators.
One of the seminal events in artificial intelligence (AI) in 2023 was the decision by OpenAI, the creator of ChatGPT, to disclose almost no information about its latest large language model (LLM), GPT-4, when the company introduced the program in March.
That sudden swing to secrecy is becoming a major ethical issue for the tech industry because no one knows, outside OpenAI and its partner Microsoft, what is going on in the black box in their computing cloud.
The obfuscation is the subject of a report this month by scholars Emanuele La Malfa at the University of Oxford and collaborators at The Alan Turing Institute and the University of Leeds.
In a paper posted on the arXiv pre-print server, La Malfa and colleagues explore the phenomenon of „Language-Models-as-a-Service“ (LMaaS), referring to LLMs that are hosted online, either behind a user interface, or via an API. The primary examples of that approach are OpenAI’s ChatGPT and GPT-4.
„Commercial pressure has led to the development of large, high-performance LMs [language models], accessible exclusively as a service for customers, that return strings or tokens in response to a user’s textual input — but for which information on architecture, implementation, training procedure, or training data is not available, nor is the ability to inspect or modify its internal states offered,“ write the authors.
Differences between open-source language models and LMaaS. A user of open-source programs has complete control, while customers of an LMaaS service have to make do with what they get though a browser or an API.
Those access restrictions „inherent to LMaaS, combined with their black-box nature, are at odds with the need of the public and the research community to understand, trust, and control them better,“ they observe. „This causes a significant problem at the field’s core: the most potent and risky models are also the most difficult to analyze.
Start
United States
USA — software There's big risk in not knowing what OpenAI is building in the...