Categories: Metaverse and A.I.

OpenAI Releases GPT-4o Upgrade, with Breakthrough AI Voice Assistant

On Monday, OpenAI debuted GPT-4o (o for “omni”), a major new AI model that can ostensibly converse using speech in real time, reading emotional cues and responding to visual input. It operates faster than OpenAI’s previous best model, GPT-4 Turbo, and will be free for ChatGPT users and available as a service through API, rolling out over the next few weeks, OpenAI says.

OpenAI revealed the new audio conversation and vision comprehension capabilities in a YouTube livestream titled “OpenAI Spring Update,” presented by OpenAI CTO Mira Murati and employees Mark Chen and Barret Zoph that included live demos of GPT-4o in action.

OpenAI claims that GPT-4o responds to audio inputs in about 320 milliseconds on average, which is similar to human response times in conversation, according to a 2009 study, and much shorter than the typical 2–3 second lag experienced with previous models. With GPT-4o, OpenAI says it trained a brand-new AI model end-to-end using text, vision, and audio in a way that all inputs and outputs “are processed by the same neural network.”

“Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” OpenAI says.

During the livestream, OpenAI demonstrated GPT-4o’s real-time audio conversation capabilities, showcasing its ability to engage in natural, responsive dialogue. The AI assistant seemed to easily pick up on emotions, adapted its tone and style to match the user’s requests, and even incorporated sound effects, laughing, and singing into its responses.

he presenters also highlighted GPT-4o’s enhanced visual comprehension. By uploading screenshots, documents containing text and images, or charts, users can apparently hold conversations about the visual content and receive data analysis from GPT-4o. In the live demo, the AI assistant demonstrated its ability to analyze selfies, detect emotions, and engage in lighthearted banter about the images.

Additionally, GPT-4o exhibited improved speed and quality in more than 50 languages, which OpenAI says covers 97 percent of the world’s population. The model also showcased its real-time translation capabilities, facilitating conversations between speakers of different languages with near-instantaneous translations.

 In the past, OpenAI’s multimodal ChatGPT interface used three processes: transcription (from speech to text), intelligence (processing the text as tokens), and text to speech, bringing increased latency with each step. With GPT-4o, all of those steps reportedly happen at once. It “reasons across voice, text, and vision,” according to Murati. They called this an “omnimodel” in a slide shown on-screen behind Murati during the livestream.

OpenAI announced that GPT-4o will be accessible to all ChatGPT users, with paid subscribers having access to five times the rate limits of free users. GPT-4o in API form will also reportedly feature twice the speed, 50 percent lower cost, and five-times higher rate limits compared to GPT-4 Turbo. (Right now, GPT-4o is only available as a text model in ChatGPT, and the audio/video features have not launched yet.)

Terron Gold

Recent Posts

CME Goes 24/7 and Bitcoin’s Famous “CME Gap” Trade Is About to Disappear

The crypto market is entering the end of an era as CME Group officially launches 24/7 Bitcoin and…

6 days ago

VanEck Launches First U.S. Spot BNB ETF as Altcoin ETF Race Accelerates

Asset management giant VanEck has officially launched the first-ever U.S. spot ETF tied directly to BNB, the native…

6 days ago

Sui Suffers Another Major Network Outage as Transactions Grind to a Halt

Layer-1 blockchain Sui experienced another major network outage on May 28 after block production and transaction processing…

6 days ago

DTCC Expands Tokenization Push to Stellar as Wall Street Accelerates Multi-Chain Strategy

The Depository Trust & Clearing Corporation (DTCC) has announced plans to connect its tokenization infrastructure to the Stellar blockchain,…

6 days ago

Robinhood Launches AI Trading Agents That Can Trade Stocks for You

Robinhood is officially entering the “agentic AI” era after unveiling a new beta feature that…

1 week ago

Fold Launches Bitcoin Rewards Credit Card With Up to 4% BTC Back

Bitcoin financial services company Fold has officially begun rolling out its long-awaited Bitcoin rewards credit card, allowing…

1 week ago