As AI models increasingly ace conventional tests, researchers are looking for new benchmarking methods. Google is betting on games.
As artificial intelligence evolves, it’s becoming increasingly difficult to accurately measure the performance of individual models.
To that end, Google unveiled on Tuesday the Game Arena, an open-source platform in which AI models compete in a variety of strategic games to provide «a verifiable, and dynamic measure of their capabilities», as the company wrote in a blog post.
The new Game Arena is hosted in Kaggle, another Google-owned platform in which machine learning researchers can share datasets and compete with one another to complete various challenges.
This comes as researchers have been working on new kinds of tests to measure the capabilities of AI models as the field inches closer to artificial general intelligence, or AGI, an as-yet theoretical system that (as it’s commonly defined) can match the human brain in any cognitive task. Serious play
Google’s new Game Arena initiative aims to push the capabilities of existing AI models while simultaneously providing a clear and bounded framework for analyzing their performance.
«Games provide a clear, unambiguous signal of success», Google wrote in its blog post. «Their structured nature and measurable outcomes make them the perfect testbed for evaluating models and agents. They force models to demonstrate many skills including strategic reasoning, long-term planning and dynamic adaptation against an intelligent opponent, providing a robust signal of their general problem-solving intelligence.