Google’s annual Google I/O developer conference took place on Tuesday, May 14th, and as anticipated, the event was dominated by announcements related to artificial intelligence (AI).
The term “AI” was uttered a staggering 121 times throughout the proceedings, reflecting the company’s intense focus on this transformative technology.
Unsurprisingly, the spotlight shone brightly on Google’s Gemini AI models and their integration into popular applications like Workspace and Chrome. The company showcased how these advanced AI systems are enhancing user experiences and driving innovation across various product offerings.
However, unlike previous I/O events, hardware announcements were conspicuously absent this year. Rumors had circulated about potential sneak peeks at the upcoming Pixel 9 series or the highly anticipated Pixel Fold 2, but neither materialized during the event.
For those who missed the live stream, a comprehensive roundup of all the latest Google announcements is available below.
Project Astra
The tech giant pulled no punches in its battle against OpenAI’s GPT-4o, unveiling an innovative and ambitious project called Astra. This universal AI agent is designed to be a comprehensive assistant for everyday life tasks, leveraging the power of your phone’s camera and voice recognition capabilities to deliver seamless and intuitive responses.
In an impressive demonstration, Google showcased Project Astra in action using smart glasses, providing a glimpse into the future of wearable AI assistants. While the initial rollout will be focused on smartphones, with the agent being called Gemini Live, Google has made it clear that Project Astra has the potential to expand to other form factors over time.
The demo presented at Google I/O 2024 left attendees in awe, showcasing the remarkable capabilities of this cutting-edge technology.
According to Google, Project Astra can understand and respond to the world in a manner akin to humans, taking in and remembering what it sees and hears to comprehend context and take appropriate actions. Users can engage with Astra through natural speech, without experiencing any noticeable lag or delay.
Powering Project Astra is a combination of Google’s Gemini model and other task-specific models, allowing it to process information faster by continuously processing video and speech input. This real-time processing capability is a significant advantage, enabling Astra to provide relevant and contextual responses in a timely manner.
Supercharging Gemini with Faster, More Capable Models
One of the biggest highlights was the introduction of the new Gemini 1.5 Flash, a powerful multimodal AI model optimized for “narrow, high-frequency, low-latency tasks.” This means that Gemini 1.5 Flash can provide even faster responses, making it ideal for real-time applications and seamless user experiences.
In addition to the new Flash model, Google also announced significant upgrades to the existing Gemini 1.5. These improvements aim to enhance Gemini’s capabilities in areas such as translation, reasoning, and coding.
Notably, Google has doubled Gemini 1.5 Pro’s context window from 1 million to 2 million tokens, allowing it to process and comprehend larger amounts of information at once.
Ask Photos
Google will soon be introducing a game-changing feature that promises to revolutionize how we interact with our cherished photo memories. Dubbed “Ask Photos,” this innovative tool leverages the power of Gemini to pore over your Google Photos library, answering your queries in ways that go far beyond simply surfacing pictures of pets or landscapes.
During the I/O keynote, CEO Sundar Pichai demonstrated the incredible potential of “Ask Photos” by posing a seemingly mundane question to Gemini: “What is my license plate number?” In an instant, the AI responded with the precise alphanumeric sequence, accompanied by a visual confirmation – a picture of Pichai’s license plate captured from within his photo collection.
This seamless integration of AI and personal photo libraries opens up a world of possibilities. Imagine effortlessly recalling the name of a restaurant you visited years ago by simply asking Gemini to identify it from a snapshot. Or quickly locate that family heirloom without sifting through thousands of images, just by describing the object to the AI.
Gemini Joins Users in Workspace
Starting next month, Google will roll out its latest mainstream language model, Gemini 1.5 Pro, into the sidebar for popular Workspace applications like Docs, Sheets, Slides, Drive, and Gmail.
For paid subscribers, this integration will transform Gemini into a versatile, general-purpose AI assistant within Workspace.
Gemini’s capabilities within Workspace will be extensive. It can fetch information from any content stored in your Google Drive, regardless of which application you’re currently using. Additionally, Gemini can assist with tasks like writing emails that incorporate information from documents you’re currently viewing or reminding you to respond to important emails you’ve been perusing.
Veo: Google’s Answer to OpenAI’s Sora
In response to OpenAI’s Sora, Google unveiled Veo, a new generative AI model designed to create high-quality 1080p videos based on text, image, and video prompts. Veo offers a range of video styles, including aerial shots and time-lapses, and allows users to refine the output with additional prompts.
Google is already offering Veo to select creators for use in YouTube videos, but the company is also pitching this powerful video generation tool to Hollywood for potential use in films and other professional video productions.
Gems: Enabling Custom Chatbot Creation with Gemini
Inspired by OpenAI’s GPTs, Google is rolling out Gems, a custom chatbot creator that allows users to provide instructions to Gemini and customize its responses and specialties.
Whether you want to create a positive and motivating running coach or a subject matter expert in a specific field, Gems will enable you to tailor Gemini’s persona and knowledge to suit your needs.
Gemini Live: A More Natural Conversation Partner
Google introduced Gemini Live, a feature designed to make voice chats with Gemini feel more natural and lifelike.
Gemini’s voice will be updated with additional personality and nuance, and users will be able to interrupt it mid-sentence or ask it to analyze real-time footage from their smartphone’s camera and provide relevant information.
Furthermore, Gemini is receiving new integrations with Google Calendar, Tasks, and Keep, leveraging its multimodal capabilities to update or draw information from these applications seamlessly.
Circle to Search Gets a Math Update
Circle to Search can now help solve math problems. By simply circling a math problem on their screen, users can get step-by-step guidance from Google’s AI to break down the problem and make it easier to solve.
However, the AI will not provide the complete solution, ensuring that students cannot use it to cheat on their homework.
AI Overviews: Revolutionizing Google Search
Google is rolling out “AI Overviews” (formerly known as “Search Generative Experience”) to all users in the United States this week.
This feature utilizes a specialized Gemini model to design and populate search results pages with summarized answers from the web, similar to what users experience with AI search tools like Perplexity or Arc Search.
AI-Powered Scam Detection on Android
Leveraging on-device Gemini Nano AI capabilities, Google announced that Android phones will soon be able to help users avoid potential scam calls.
The AI will analyze incoming calls for red flags, such as common scammer conversation patterns, and provide real-time warnings to users, alerting them to potential scams.
Smarter AI on Android Devices
In the coming months, Google plans to further enhance the AI capabilities of Android devices. Gemini will soon allow users to ask questions about videos playing on their screens, and the AI will provide answers based on automatic captions.
For paid Gemini Advanced subscribers, Gemini will also gain the ability to ingest and process PDF documents, enabling users to ask questions and retrieve relevant information from their PDF files directly on their Android devices.
Google Chrome Gets an AI Assistant
Google announced that it is adding Gemini Nano, a lightweight version of its Gemini model, to the Chrome desktop browser.
This built-in AI assistant will leverage on-device processing to help users generate text for various purposes, such as social media posts, product reviews, and more, directly within the Chrome browser.
Upgraded SynthID AI Watermarking
Google’s SynthID AI watermarking technology, which helps identify AI-generated content, has been expanded to new capabilities. SynthID will now embed watermarking into content created with Veo, Google’s new video generator, and it can also detect AI-generated videos, further strengthening its ability to identify synthetic media.
While there were no hardware announcements at this year’s Google I/O event, the focus on artificial intelligence and the numerous upgrades and new features showcased Google’s continued commitment to pushing the boundaries of AI and its integration into various products and services.
As AI becomes increasingly prevalent in our daily lives, Google’s advancements in this field position the company as a leading innovator, offering users more intelligent, efficient, and personalized experiences across its ecosystem.