Categories: Metaverse and A.I.

OpenAI Releases GPT-4o Upgrade, with Breakthrough AI Voice Assistant

On Monday, OpenAI debuted GPT-4o (o for “omni”), a major new AI model that can ostensibly converse using speech in real time, reading emotional cues and responding to visual input. It operates faster than OpenAI’s previous best model, GPT-4 Turbo, and will be free for ChatGPT users and available as a service through API, rolling out over the next few weeks, OpenAI says.

OpenAI revealed the new audio conversation and vision comprehension capabilities in a YouTube livestream titled “OpenAI Spring Update,” presented by OpenAI CTO Mira Murati and employees Mark Chen and Barret Zoph that included live demos of GPT-4o in action.

OpenAI claims that GPT-4o responds to audio inputs in about 320 milliseconds on average, which is similar to human response times in conversation, according to a 2009 study, and much shorter than the typical 2–3 second lag experienced with previous models. With GPT-4o, OpenAI says it trained a brand-new AI model end-to-end using text, vision, and audio in a way that all inputs and outputs “are processed by the same neural network.”

“Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations,” OpenAI says.

During the livestream, OpenAI demonstrated GPT-4o’s real-time audio conversation capabilities, showcasing its ability to engage in natural, responsive dialogue. The AI assistant seemed to easily pick up on emotions, adapted its tone and style to match the user’s requests, and even incorporated sound effects, laughing, and singing into its responses.

he presenters also highlighted GPT-4o’s enhanced visual comprehension. By uploading screenshots, documents containing text and images, or charts, users can apparently hold conversations about the visual content and receive data analysis from GPT-4o. In the live demo, the AI assistant demonstrated its ability to analyze selfies, detect emotions, and engage in lighthearted banter about the images.

Additionally, GPT-4o exhibited improved speed and quality in more than 50 languages, which OpenAI says covers 97 percent of the world’s population. The model also showcased its real-time translation capabilities, facilitating conversations between speakers of different languages with near-instantaneous translations.

 In the past, OpenAI’s multimodal ChatGPT interface used three processes: transcription (from speech to text), intelligence (processing the text as tokens), and text to speech, bringing increased latency with each step. With GPT-4o, all of those steps reportedly happen at once. It “reasons across voice, text, and vision,” according to Murati. They called this an “omnimodel” in a slide shown on-screen behind Murati during the livestream.

OpenAI announced that GPT-4o will be accessible to all ChatGPT users, with paid subscribers having access to five times the rate limits of free users. GPT-4o in API form will also reportedly feature twice the speed, 50 percent lower cost, and five-times higher rate limits compared to GPT-4 Turbo. (Right now, GPT-4o is only available as a text model in ChatGPT, and the audio/video features have not launched yet.)

Terron Gold

Recent Posts

Senator Murphy Alleges White House Insiders Profited From Iran Strike Bets, Pushes to Ban Prediction Markets on Government Actions

U.S. Senator Chris Murphy (D-Conn.) is calling for legislation to ban prediction markets that allow traders to bet…

2 days ago

IRS Proposes Electronic-Only Delivery For Crypto Tax Forms Under New Reporting Rules

The U.S. Internal Revenue Service (IRS) has proposed a new rule that would allow cryptocy brokers to deliver…

2 days ago

Crypto-Friendly Fintech Revolut Files For U.S. Banking License to Expand Crypto and Payments Services

Global fintech powerhouse Revolut has filed an application for a U.S. banking license, a move that would allow…

2 days ago

Suspect Arrested on Caribbean Island of Saint Martin in $46M Seized Crypto Theft Case

A man accused of stealing tens of millions of dollars in cryptocy from U.S. government…

2 days ago

NYSE Parent ICE Invests in Crypto Exchange OKX at $25B Valuation Amid Tokenized Stocks Push

Intercontinental Exchange (ICE) — the parent company of the New York Stock Exchange — has taken a strategic…

2 days ago

AI Models Favor Bitcoin as a Store of Value, Stablecoins for Payments, BPI Study Finds

A new study from the Bitcoin Policy Institute (BPI) found that leading artificial intelligence models overwhelmingly favor Bitcoin…

2 days ago