Start United States USA — software I tested GPT-5's coding skills, and it was so bad that I'm...

I tested GPT-5's coding skills, and it was so bad that I'm sticking with GPT-4o (for now)

103
0
TEILEN

In my latest coding benchmark, GPT-5 stumbled badly, delivering broken plugins, flawed scripts, and confidence-laden wrong answers that could derail projects without careful human oversight. Here’s what to know before you use it.
OpenAI’s new GPT-5 flagship failed half of my programming tests.
Previous OpenAI releases have had just about perfect results.
Now that OpenAI has enabled fallbacks to other LLMs, there are options.
So GPT-5 happened. It’s out. It’s released. It’s the talk of the virtual town. And it’s got some problems. I’m not gonna bury the lede. GPT-5 has failed half of my programming tests. That’s the worst that OpenAI’s flagship LLM has ever done on my carefully designed tests.
Before I get into the details, let’s take a moment to discuss one other little feature that’s also a bit wonky. Check out the new Edit button on the top of the code dumps it generates.
Clicking the Edit button takes you into a nice little code editor. Here, I replaced the Author field, right in ChatGPT’s results.
That seemed nice, but it ultimately proved futile. When I closed the editor, it asked me if I wanted to save. I did. Then this unhelpful message showed up.
I never did get back to my original session. I had to submit my original prompt again, and let GPT-5 do its work a second time.
But wait. There’s more. Let’s dig into my test results… 1. Writing a WordPress plugin
This was my very first test of coding prowess for any AI. It’s what gave me that first „the world is about to change“ feeling, and it was done using GPT-3.5.
Subsequent tests, using the same prompt but with different AI models, generated mixed results. Some AIs did great, some didn’t. Some AIs, like those from Microsoft and Google, improved over time.
ChatGPT’s model has been the gold standard for this test since the very beginning. That makes the results of GPT-5 all that much more curious.
So, look, the actual coding with GPT-5 was partially successful. GPT-5 generated a single block of code, which I pasted into a file and was able to run. It provided the requisite UI.
When I pasted in the test names, it dynamically updated the line count, although it described it as „Line to randomize“ instead of „Lines to randomize.“
But then, when I clicked Randomize, it didn’t. Instead, it redirected me to tools.php. What?? ChatGPT has never had a problem with this test, whether GPT-3.5, GPT-4, or GPT-4o. You mean to tell me that OpenAI’s much-anticipated GPT-5 is failing right out of the gate? Ouch.
I then gave GPT-5 this prompt.
When I click randomize, I’m taken to http://testsite.local/wp-admin/tools.php. I do not get a list of randomized results.

Continue reading...