OpenAI unveils magic new AI that can see, hear and speak

OpenAI, the creator of viral chatbot ChatGPT, has unveiled a new AI model that can interact with the world via audio, vision and text in real time.

GPT-4o is the latest flagship product for the Microsoft-backed company, aiming to offer users a “more natural human-computer interaction”.

In a presentation on Monday, OpenAI said its latest AI could respond to queries in less than a third of a second – similar to human response time in conversation.

Using a smartphone’s camera and microphone, GPT-4o is capable of understanding audio and visual inputs, while using the speakers to respond in a personalised and natural voice.

OpenAI CEO Sam Altman said the new technology “feels like magic”, writing in a blog post that it was “the best computer interface” he had ever used.

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real,” he wrote.

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful.”

Unlike its other advanced AI models, OpenAI said it would offer GPT-4o for free, making it available within the next few weeks.

In an effort to prevent misuse or potential harm, OpenAI said it carried out extensive testing that covered everything from cyber security to psychology.

“We tested both pre-safety-mitigation and post-safety-mitigation versions of the model, using custom fine-tuning and prompts, to better elicit model capabilities,” the company explained in a blog post introducing the product.

“GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities… We will continue to mitigate new risks as they’re discovered.”

OpenAI acknowledged that its latest AI model has several limitations that it hopes to overcome with future versions.

Videos of the AI making mistakes showed GPT-4o switching between languages without being prompted, making errors with language translation, and mispronouncing someone’s name as “Nacho”.

The announcement comes just a day ahead of Google I/O, the tech giant’s biggest event of the year that is expected to have a heavy focus on artificial intelligence.

“All eyes will be on how AI becomes more integrated into connected devices, particularly smartphones, given the sheer scale of the opportunity,” Leo Gebbie, a principal analyst at CSS Insight, told The Independent ahead of the event.

“Google needs to clearly articulate the benefits of AI to avoid consumers succumbing to AI fatigue.”

The Independent is the world’s most free-thinking news brand, providing global news, commentary and analysis for the independently-minded. We have grown a huge, global readership of independently minded individuals, who value our trusted voice and commitment to positive change. Our mission, making change happen, has never been as important as it is today.

Resource: OpenAI’s big event: CTO Mira Murati announces GPT-4o, which gives ChatGPT a better voice and eyes (msn.com)