The new flagship model proves less ’sycophantic‘ in OpenAI’s internal testing, but it more readily complies with inappropriate user requests for sexual and hateful content.
OpenAI CEO Sam Altman touts his company’s newest model, GPT-5, as „a legitimate PhD-level expert in anything, in any area you need.“ That includes answering questions on topics it previously shunned, like non-violent hate, threatening harassment, illicit sexual content, sexual content involving minors, extremism, and threatening hate.
OpenAI manually reviewed the model’s responses in these categories, and determined that while they violate its policies, they are „low severity.“ It’s unclear how severity is calculated, but the company says it plans to improve GPT-5 „in all categories“, especially the lowest-scoring ones.
OpenAI calls GPT-5’s compliance with inappropriate requests a „regression“, but notes that only those related to threatening hate content and illicit sexual content are statistically significant. Plus, „we have found that OpenAI o4-mini performs similarly here“, it says.
OpenAI did not specify whether the responses are image- or text-based, which could be an important point, especially for sexual content or hate symbols.