Home United States USA — software Q&A: How AI models teach themselves to learn new things

Q&A: How AI models teach themselves to learn new things

July 18, 2023

165

Despite their huge success, the inner workings of large language models such as OpenAI’s GPT model family and Google Bard remain a mystery, even to their developers. Researchers at ETH and Google have uncovered a potential .
Despite their huge success, the inner workings of large language models such as OpenAI’s GPT model family and Google Bard remain a mystery, even to their developers. Researchers at ETH and Google have uncovered a potential key mechanism behind their ability to learn on-the-fly and fine-tune their answers based on interactions with their users.

Johannes von Oswald is a doctoral student in the group headed by Angelika Steger, ETH Professor for Theoretical Computer Science, and researches learning algorithms for neural networks. His new paper will be presented at the International Conference on Machine Learning (ICML) in late July. It is currently available on the arXiv preprint server.
The T in GPT stands for transformers. What are transformers and why did they become so prevalent in modern AI?
Johannes von Oswald: Transformers are a particular artificial neural network architecture. It is for example used by large language models such as ChatGPT, but was put on the map in 2017 by researchers at Google, where it led to state-of-the-art performance in language translation. Intriguingly, a slightly modified version of this architecture was already developed by the AI-Pioneer Jürgen Schmidhuber back in 1991.
And what distinguishes this architecture?
Before the recent breakthrough of Transformers, different tasks, e.g., image classification and language translation, had used different model architectures that were each specialized on these specific domains. A crucial aspect that sets transformers apart from these previous AI models is that they seem to work extremely well on any kind of task. Because of their widespread use, it is important to understand how they work.
What did your research reveal?
While neural networks are generally regarded a black box that spit out output when provided with input, we showed that transformers can learn on their own to implement algorithms within their architecture.