Total Downloads

2,602,534

Total Files

9,206

Latest Update

10

OpenAI lets developers build real-time voice apps – at a substantial premium

Posted October 3, 2024 | Windows


Jakub Porzycki/NurPhoto via Getty Images

OpenAI’s annual developer day took place Wednesday in San Francisco, with a raft of product and feature announcements. The event’s centerpiece was the company’s introduction of its real-time application programming interface (API). 

The feature for developers makes it possible to send and receive spoken-language inputs and outputs during inference operations, or making predictions with a production large language model (LLM). It is hoped this type of interaction can enable a more fluid, real-time conversation between a person and a language model.

Also: OpenAI’s Altman sees ‘superintelligence’ just around the corner – but he’s short on details

This capability also comes at a hefty premium. OpenAI currently prices the GPT-4o large language model, which is the model that forms the basis for the real-time API, at $2.50 per million tokens of input text, and $10 per million output tokens. 

The real-time input and output cost is at least twice that rate, based on both text and audio tokens, since GPT-4o needs both kinds of input and output. Input and output tokens for GPT-4o when using the real-time API cost $5 and $20, respectively, per million tokens. 

openai-dev-day-2024-splash-image

A busy schedule at the developer day.

OpenAI

For voice tokens, the cost is a whopping $100 per million audio input tokens and $200 per million audio output tokens. 

Also: How to use ChatGPT to optimize your resume

OpenAI notes that with standard statistics for voice conversations, the pricing of audio tokens “equates to approximately $0.06 per minute of audio input and $0.24 per minute of audio output.”

openai-real-time-api-pricing

OpenAI’s pricing sheet for real-time API function calls in GPT-4o large language model inference.

OpenAI

OpenAI gives examples of how real-time voice can be used in generative AI, including an automated health coach giving a person advice, and a language tutor that can engage in conversations with a student to practice a new language. 

During the developer conference, OpenAI offered a way to reduce the total cost to developers, with prompt caching, which is re-using tokens on inputs that have been previously submitted to the model. That approach cuts the price of GPT-4o input text tokens in half. 

Also: OpenAI’s budget GPT-4o mini model is now cheaper to fine-tune, too

Also introduced Wednesday was LLM “distillation”, which lets developers use the data from larger models to train smaller models. 

A developer captures the input and output of one of OpenAI’s more capable language models, such as GPT-4o, using the technique known as “stored completions”. Those stored completions then become the training data to “fine tune” a smaller model, such as GPT-4o mini. 

OpenAI bills the distillation service as a way to eliminate a lot of iterative work required by developers to train smaller models from larger models.

“Until now, distillation has been a multi-step, error-prone process,” says the company’s blog on the matter, “which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements.”

Also: Businesses can reach decision dominance using AI. Here’s how

Distillation comes in addition to OpenAI’s existing fine-tuning service, the difference being that you can use the larger model’s input-output pairs as the fine-tuning data. To the fine-tuning service, the company Wednesday added image fine tuning. A developer submits a data set of images, just as they would with text, to make an existing model, such as GPT-4o, more specific to a task or a domain of knowledge. 

An example in practice is work by food delivery service Grab. The company uses real-world images of street signs to have GPT-4o perform mapping of the company’s delivery routes. “Grab was able to improve lane count accuracy by 20% and speed limit sign localization by 13% over a base GPT-4o model, enabling them to better automate their mapping operations from a previously manual process,” states OpenAI.

Pricing is based on chopping up each image a developer submits into tokens, which are then priced at $3.75 per million input tokens and $15 per million output tokens, the same as standard fine-tuning. For training image models, the cost is $25 per million tokens. 



Source link

')
ankara escort çankaya escort çankaya escort escort bayan çankaya istanbul rus escort eryaman escort ankara escort kızılay escort istanbul escort ankara escort ankara escort escort ankara istanbul rus Escort atasehir Escort beylikduzu Escort Ankara Escort malatya Escort kuşadası Escort gaziantep Escort izmir Escort