Technique
Artificial intelligence with agents, which can get things done for the user. This is what the major artificial intelligence giants are rushing towards at the moment. And now Google has given us some real glimpses into that future.
During the two-hour I/O developer conference on Tuesday night, Google used the acronym ai more than 120 times. It was easy to be impressed by all of Gemini's models and their increasingly deep integration into Google's most popular products.
But two things stood out: Gemini Live and Project Astra.
Gemini Live is a chatbot that can listen, see and talk. Hold up a mobile phone's camera to the world and Gemini Live can not only see what the user sees, but also explain and analyze it.
In a video that Google insists is not fake (contrary to last year's fad), a user walked with his phone in front of him in an office environment. Gemini Live was tasked with responding when it saw something capable of producing sound. A speaker appeared and the chatbot quickly recognized him. Then he had to look at the code on the computer screen and analyze the content of the lines. Then, with the camera pointed at the window, Gemini Live told us the office must be in London's King's Cross (home to Google Deepmind, which develops Gemini).
Perhaps most surprising of all, a chatbot can retrospectively reproduce information about things that a mobile phone camera would only capture as quickly as possible. The user asked if Gemini Live could see where the glasses ended up. “Yes, they are on the desk near the red apple.”
A new model for smart glasses
Gemini Live is a glimpse of what Google intends to achieve with what the company calls Project Astra: a global AI assistant that will become our constant companion. In another part of the demo, the user wore smart glasses (the kind reminiscent of the prototype Google showed off a few years ago). But as The Verge states, it's new). The idea here is that the chatbot and the user should see the world from the same point of view.
But the Astra project is much more ambitious than that. In the long term, Google sees our AI assistants doing the work for us. Yes, just like Open AI and Anthropic, the search giant is also all about creating AI using agents.
– I think they are intelligent systems that demonstrate the ability to think, plan and remember. “They can ‘think’ many steps ahead, act many steps ahead, and work with different software and systems to do things for you, under your direction,” said CEO Sundar Pichai from an offshore platform in Mountain View, California.
Loaded with news
If Google had only offered Gemini Live and Project Astra, the company could have gotten away with it. But this year's I/O show provided a torrent of news. To take a sample only:
-
Google will ask its representatives to “Google for you.” And in response to clever startups like Perplexity (and rumors that Open AI will create an AI-based search engine), the search experience with Google will soon be very different. The AI Overview should provide answers to research and help with links. With the new Google Lens functionality, you should be able to search using video. There should be an AI-based mapping and classification tool for search results.
-
Of course, Google Photos can already search photos from the archive now, but in the future — when the Photos tool becomes Gemini-based — searches could be more detailed and Google Photos will understand context better.
-
His colleague at Amnesty International. Imagine that you and your colleagues got a new Slack groupmate: an AI assistant that supports projects, keeps track of who said what and when, what the status of various tasks is, and so on. That's the idea with AI Teammate.
-
Veo is the name of Google's newest video generator, which replaces the popular Lumiere. Imagen 3 is the latest version of the company's image builder.
-
Workspace — the collective name for Google's productivity apps — will be infused with Gemini. The user should be able to link their data from Gmail, Calendar and Drive, for example, and be able to ask Gemini for help in keeping everything in order. Going forward, Google hopes that a lot of the work currently done manually will be able to be automated thanks to Gemini.
-
The almost forgotten novelty of everything shown at I/O probably deserves a better fate. When Gemini unleashes content in the note-taking app Notebook LM, he can create audio from the material. We have to listen to a kind of podcast between two chatbots discussing the content of the note. A podcast that the user can join and participate in at any time.
-
And then, of course, there were the AI models themselves. Two million tokens can now be submitted to Gemini 1.5 Pro. This equates to approximately 2 hours of video, 22 hours of audio, over 60,000 lines of code, or 1.4 million words. Another model, the Gemini Flash, is said to have basic capabilities like the Pro, but optimized for faster response times.
“Entrepreneur. Freelance introvert. Creator. Passionate reader. Certified beer ninja. Food nerd.”
More Stories
EA President Talks New Dragon Age: 'A Return to What Made Bioware Great'
She thought she had bought a phone – she was shocked by its contents
Rumor: Lots of AI in Google's Pixel 10 and 11 cameras