OpenAI Launches GPT-4o With Real-Time Responses and Video Interactions

OpenAI held its much-anticipated Spring Update event on Monday where it announced a new desktop app for ChatGPT, minor user interface changes to ChatGPT’s web client, and a new flagship-level artificial intelligence (AI) model dubbed GPT-4o. The event was streamed online on YouTube and was held in front of a small live audience. During the event, the AI firm also announced that all the GPT-4 features, which were so far available only to premium users, will now be available to everyone for free.

OpenAI’s ChatGPT desktop app and interface refresh

Mira Murati, the Chief Technical Officer of OpenAI, kickstarted the event and launched the new ChatGPT desktop app, which now comes with computer vision and can look at the user’s screen. Users will be able to turn this feature on and off, and the AI will analyse and assist with whatever is shown. The CTO also revealed that the ChatGPT’s web version is getting a minor interface refresh. The new UI comes with a minimalist appearance and users will see suggestion cards when entering the website. The icons are also smaller and hide the entire side panel, making a larger portion of the screen available for conversations. Notably, ChatGPT can now also access web browser and provide ral-time search results.

GPT-4o features

The main attraction of the OpenAI event was the company’s newest flagship-grade AI model called GPT-4o, where the ‘o’ stands for omni-model. Murati highlights that the new chatbot is twice as fast, 50 percent cheaper, and has five times higher rate limits compared to the GPT-4 Turbo model.

GPT-4o also offers significant improvements in the latency of responses and can generate real-time responses even in speech mode. In a live demo of the AI model, OpenAI showcased that it can converse in real time and react to the user. GPT-4o-powered ChatGPT can now also be interrupted to answer a different question, which was impossible earlier. However, the biggest enhancement in the unveiled model is the inclusion of emotive voices.

Now, when ChatGPT speaks, its responses contain various voice modulations, making it sound more human and less robotic. A demo showed that the AI can also pick up on human emotions in speech and react to them. For instance, if a user speaks in a panicking voice, it will speak in a concerned voice.

Improvements have also been made to its computer vision, and based on the live demos, it can now process and respond to live video feeds from the device’s camera. It can see a user solve a mathematical equation and offer step-by-step guidance. It can also correct the user in real time if he makes a mistake. Similarly, it can now process large coding data and instantaneously analyse it and share suggestions to improve it. Finally, users can now open the camera and speak with their faces visible, and the AI can detect their emotions.

Finally, another live demo highlighted that the ChatGPT, powered by the latest AI model, can also perform live voice translations and speak in multiple languages in quick succession. While OpenAI did not mention the subscription price for access to the GPT-4o model, it highlighted that it will be rolled out in the coming weeks and available as an API.

GPT-4 is now available for free

Apart from all the new launches, OpenAI has also made the GPT-4 AI model, including its features, available for free. People using the free tier of the platform will be able to access features such as GPTs (mini chatbots designed for specific use cases), GPT Store, the Memory feature through which the AI can remember the user and specific information relating to them for future conversations, and its advanced data analytics without paying anything.