On Monday, May 13, 2024, OpenAI releases GPT-4o, an update to its existing Generative Pre-training Transformer model. The company made this announcement in a live stream where it revealed that the new flagship iteration will provide GPT-4 level intelligence with greater speed and an improvement in text, voice, and vision capabilities. If you’re new to the AI and GPT buzz, it all started when OpenAI launched ChatGPT, a unique chatbot with impressive natural language processing capabilities.
The chatbot ran on GPT-3, an efficient transformer model trained using a dataset with up to 175 billion parameters. Its development triggered a wave of generative AI chatbots by tech companies globally.
Since then, the world has seen Microsoft debut Copilot while Google launched Gemini (formerly known as Bard). In the weeks before the GPT-4o launch there had been a lot of speculation about the software OpenAI was going to release. While some expected it to be GPT-5 or an AI rival to Google’s traditional search engine, the OpenAI team was working on something even more interesting.
GPT-4o is the hot topic around AI today, and these are the features that make it so promising.
Voice, Image, and Video Input
One of the interesting features of GPT-4o is its ability to understand and respond to multimodal input. This is not the first AI tool to accept voice, image, and video input; however, what makes this tool stand out is the way it is being used in the improved GPT software.
For instance, in the product demos, the users interacted with GPT-4o through live videos. The software was able to understand and intelligently respond to both audio and visual input from what the humans were actively recording with their phones.
Voice Assistant
GPT-4o comes with a voice assistant similar to Amazon’s Alexa or Apple’s Siri. With its multimodal input features, it can see and listen, and with its audio capabilities, it can speak back to the user with intelligent content.
In addition, OpenAI made this GPT version respond with more human quirks than any of its predecessors. In the demo videos that have been made available, GPT-4o gives flirtatious responses and uses natural human expressions. It comes with a female voice, to make interactions with it sound as human-like as possible.
Beyond being a fascinating chat tool, the multimodal input and voice output features built into this tool have a wide variety of applications. For instance, OpenAI has shown how it can translate language in real-time. These features can also improve users’ experiences at online casinos where players can enjoy access to popular casino games and even earn some incredible bonuses. By integrating with casino platforms, GPT-4o can provide real-time assistance, offering guidance on game rules, and strategies, and even managing players’ accounts, enhancing the overall gaming experience.
Faster and Cheaper
In a conversation with a human, GPT-4o’s average response time is 320 milliseconds. In terms of text, it is faster than all the former OpenAI GPT models. Interestingly, aside from being faster, this model also gets the job done 50% cheaper than the team’s former flagship, GPT-4 Turbo.
GPT-4 turbo already had audio and visual input, but 4o comes with video input capabilities and also processes images and audio many times better than its predecessors. For developers using the API on their own software, this upgrade is going to give them an improved API service at a cheaper rate than before.
Shortcomings
Without a doubt, GPT-4o is a huge leap in the world of AI. Demos of the software in action are like clips from futuristic sci-fi movies. However, even though the technology is groundbreaking, OpenAI still has some refining to do on their new tool.
In the live demos that were broadcast, there were glitches that showed that the software did not always interpret visual input accurately. Also, OpenAI has yet to deliver a GPT model that can respond with image generation, they still rely on DALL E for that. It would have been amazing to see GPT-4o being able to respond with text, voice assistance, and also image generation.
Conclusion
Unlike GPT-4 turbo, this new Open AI model will be made available to the public for free; however, its use will be throttled on free accounts. Open AI CEO Sam Altman also stated that the company’s goals were no longer geared towards creating direct solutions for the world but building tools like GPT-4o that others can build on to create solutions to world problems.