Start United States USA — software Meta pushes back on Llama 4 benchmark cheating allegations

Meta pushes back on Llama 4 benchmark cheating allegations

Von

April 8, 2025

155

Meta has been accused of manipulating Llama 4 to achieve higher benchmark scores, prompting a response from an executive who denied the allegations.
Meta released new versions of its large language model (LLM), introducing Llama 4 Scout, Llama 4 Maverick, and Llama 4 Behemoth as part of its advanced multimodal AI system.
Scout is designed to operate on a single Nvidia H100 GPU. It offers a context window of 10 million tokens. Maverick is larger than Scout and supposedly matches the performance of OpenAI’s GPT-4o and DeepSeek-V3 in coding and reasoning tasks while utilizing fewer active parameters.
The largest of the three, Behemoth, boasts 288 billion active parameters and a total of 2 trillion parameters, with Meta claiming that it surpasses models like GPT-4.5 and Claude Sonnet 3.7 on STEM benchmarks.
Shortly after the release, rumors began to spread that Meta had trained Maverick and Llama 4 on test sets, causing them to rank higher in benchmarks.