OpenAI has launched new speech-to-text and text-to-speech audio models, gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts, which provide improved accuracy and more.
In recent months, OpenAI has released several new tools, including Operator, Deep Research, Computer-Using Agents, and the Responses API, focusing on text-based agents. Today, OpenAI announced new speech-to-text and text-to-speech audio models in the API, enabling developers to create more powerful, customizable, and expressive voice agents than ever before.
OpenAI’s new speech-to-text models, gpt-4o-transcribe and gpt-4o-mini-transcribe, offer significant improvements in word error rate, language recognition, and accuracy compared to OpenAI’s existing Whisper models. These advancements were achieved through reinforcement learning and extensive mid-training using diverse and high-quality audio datasets.
Домой
United States
USA — software OpenAI announces next-generation audio models to power voice agents