Microsoft has revealed Windows Agent Arena, a new framework designed to benchmark generative AI agents.
The use of generative AI and large language models to automate and simplify tasks for people who work with PCs continued to grow. However, there’s also a need to see how well AI can work to accomplish tasks. This week, Microsoft Research announced it has developed a benchmark specifically to test out AI agents on Windows PCs.
The benchmark, as revealed on Microsoft’s GitHub page, is called Windows Agent Arena. This framework is designed to test how well and how quickly AI agents can interact with Windows applications that humans usually use. The list of apps that were tested with AI agents in Windows Agent Arena included web browsers like Microsoft Edge and Google Chrome, OS functions like File Explorer Settings, coding apps like Visual Studio Code), simple preinstalled Windows apps like Notepad, Clock, and Paint and even watching videos with VLC Player.
Start
United States
USA — software Microsoft reveals Windows Agent Arena to benchmark generative AI agents