OpenAI unveils three audio models for real-time voice tasks

May 7 (Reuters) – OpenAI introduced three audio models for its developer platform on Thursday, aiming to make voice-based software agents more ‌conversational and capable of completing tasks in real time.

The launch of the application programming interface (API) moves the ChatGPT-maker beyond transcription ‌and chat toward agents ⁠that can listen, translate ⁠and act during live conversations.

The new models are GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. OpenAI said they are available to test in its developer playground.

GPT-Realtime-2 is designed to manage ⁠harder requests, call tools, ⁠handle interruptions and maintain context across longer voice sessions.

The second model supports translation from more than 70 ‌languages into 13 output languages, targeting customer support, education and other settings.

GPT-Realtime-Whisper provides live speech-to-text, allowing captions, ⁠meeting notes and workflow updates to be generated as a speaker talks.

Customers testing the ⁠models ‌include online real estate ⁠marketplace Zillow, online travel agency Priceline ‌and European telecommunications firm Deutsche Telekom.

Pricing ⁠for GPT-Realtime-2 starts at $32 per million audio input tokens, GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper $0.017 per minute.

(Reporting by Anhata Rooprai in Bengaluru; Editing by Vijay Kishore)