×

OpenAI unveils three audio models for real-time voice tasks

By Thomson Reuters May 7, 2026 | 1:09 PM

May 7 (Reuters) – OpenAI introduced three audio models for its developer platform on Thursday, aiming to ​make voice-based software agents more ‌conversational and capable of completing tasks in real time.

The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription ‌and ​chat toward agents ⁠that can listen, translate ⁠and act during live conversations.

The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI said they are available ​to test in its developer playground.

GPT-Realtime-2 is designed to manage ⁠harder requests, call tools, ⁠handle interruptions and maintain ​context across longer voice sessions.

The second model ​supports translation from more than 70 ‌languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing captions, ⁠meeting notes and workflow updates to be generated as a speaker talks.

Customers testing the ⁠models ‌include online real estate ⁠marketplace Zillow, online travel agency ​Priceline ‌and European telecommunications firm ​Deutsche Telekom.

Pricing ⁠for GPT-Realtime-2 starts at $32 per million audio input tokens, GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.

(Reporting by Anhata Rooprai in Bengaluru; Editing by ​Vijay Kishore)