Technique
Talking to a computer should be like talking to a human. If Open AI's demonstration of the new GPT-4o model is anything to go by, the company is on track to realize that ambition. “This looks like artificial intelligence from the movies,” CEO Sam Altman wrote.
A presentation lasting just under half an hour, led by Artistic Director Mira Moratti. There is no GPT-5 in sight. There is no talk of intelligent agents running our lives for us. For a while, it seemed like the only big news with the new GPT-4o model — during Open AI's long-awaited spring update — was that it would be free for everyone to use.
But then Moratti invited two developers to the stage, and when the demo ended a quarter of an hour later, Open AI once again succeeded in creating chat ice cream.
It has been possible to use voice to talk to Chat GPT for about a year. But GPT-4o makes conversations much more natural, if the demo is anything to go by. It was like listening to a lively conversation between four friends talking at each other's mouths, correcting and interrupting, joking and laughing. With the difference that only three of them were human.
Chat GPT had to tell bedtime stories in different voices. At one point, the developers asked Chat GPT to use the bot's voice, and the bot, which until then had played the role of a human with almost no comment, faithfully appeased it. He was translated between different languages, was good at jokes, and even asked one developer to slow down his breathing when he pretended to hyperventilate (“Take it easy, you're not a vacuum cleaner”).
GPT-4o video (recorded or live) can also be viewed while talking to the chatbot. In one of many Example Open AI published On its website Sal Khan (founder of Khan Academy) sits with his son who is solving a math problem. The camera (Chat GPT's “eyes” in this case) is positioned above the tablet displaying a geometric shape. Sal Khan asks Chat GPT not to offer any solutions, just advice and support. The son describes how he thinks and draws with a pen on the tablet. GPT Chat Watch, listen and discuss.
O stands for Omni
The letter O in GPT-4o stands for “omni”. Well, as I said, Chat GPT was able to listen, read, watch and talk before. But the difference now is that these multimedia are integrated. Instead of creating several different AI models together, GPT-4o consists of a single model. This doesn't seem to make a difference when it comes to response lag – which is shortened from 3 to 5 seconds to an average of 320 ms.
Before GPT-4o, Open AI wrote, “a lot of information was lost. GPT-4 could not detect tone, multiple speakers, and background noise, it could not generate laughter, and it could not sing or express emotion.”[…]Since GPT-4o is our first model to combine these approaches, we have only begun to scratch the surface of what the model can do and what the limitations are.
GPT-4o will be free for anyone to use, but paying users have a five-fold limit on the number of messages that can be created without restricting availability.
The text and image functionality in GPT-4o has already begun to be rolled out in Chat GPT. The new voice mode will debut “in the coming weeks.” Developers should already have access to parts of GPT-4o via the API.
According to Open AI, GPT-4o's security has been tested by more than 70 third-party experts in areas such as social psychology, bias, and misinformation.
I'll tell you soon about the “next big thing”
As for the future of open artificial intelligence? Mira Moratti was quick to point out that the company will soon provide an update on the “next big thing,” which everyone expects to be GPT-5. After the show Sam Altman wrote a short script:
“The new audio (and video) mode is the best computer interface I've ever used. It feels like AI from the movies; I still can't figure out how it works.”[…]Talking to a computer never felt natural to me; Now he does. As we add personalization (optional), access to your information, the ability to act on your behalf, etc., I can really see an exciting future where we can use computers to do more than ever before.
Ny Teknik tested GPT-4o for a few minutes before publishing this text. One thing that became immediately clear was that the new model generates answers significantly faster.
“Entrepreneur. Freelance introvert. Creator. Passionate reader. Certified beer ninja. Food nerd.”
More Stories
EA President Talks New Dragon Age: 'A Return to What Made Bioware Great'
She thought she had bought a phone – she was shocked by its contents
Rumor: Lots of AI in Google's Pixel 10 and 11 cameras