OpenAI’s attempt to show off its latest GPT-5 model’s awesome performance states produced wildly embarrassing gaffes.
OpenAI’s GPT-5 is finally here and already powering ChatGPT, but it hasn’t made a great first impression.
In a livestream dedicated to the release, OpenAI tried to show off its newest large language model which CEO Sam Altman called a «significant step along the path to AGI»— but instead turned heads with some catastrophically dumb errors.
Across several examples, bar graphs intended to show off GPT-5’s awesome performance benchmarks, while appearing professional-looking, turned out to be horribly inaccurate nonsense upon closer inspection.
The gaffes were flagged on social media and highlighted by The Verge. The most egregious example is a bar graph comparing coding benchmark scores for GPT-5 compared to older models. Somehow, the bar for GPT-5’s score of 52.8 percent accuracy is nearly twice as tall as the bar for a score of 69.1 percent for the o3 model. Even more bafflingly, the 69.1 percent bar is the exact same size as another bar representing 30.8 percent for GPT-4o. Make it make sense!
this screenshot from GPT-5 livestream has to be among the worst chart crimes of the century pic.