Google’s CEO, Sundar Pichai, recently announced the launch of Gemini, a new large language model (LLM) that promises to revolutionize how we interact with technology. Pichai describes Gemini as the “beginning of a new era of AI at Google” and claims it will eventually affect “practically all of Google’s products.”
What is Gemini?
Gemini is not just one model, but a series of three models with varying capabilities. The most basic model, the Gemini Nano, is designed to run on Android devices offline. Gemini Pro is a more powerful version that will power Bard and other Google AI services. The most powerful model, Gemini Ultra, will be used in data centers and for enterprise applications.
When will Gemini be Launched?
Google is rolling out Gemini, its most advanced AI model ever, in phases. First, Bard has been upgraded to Bard Pro, powered by Gemini. Pixel 8 Pro users will soon enjoy new features enabled by Gemini Nano. Next year, the full power of Gemini Ultra will be unveiled.
Also Read: Bard chatbot now responds in real time
Additionally, developers and enterprise customers can access Gemini Pro through Google Cloud platforms starting December 13th. The initial launch is English-only, with broader language support coming soon.
However, Sundar Pichai envisions a future where Gemini is seamlessly integrated into Google’s core products, including search, advertising, Chrome, and more, worldwide. This marks a significant moment for Google, ushering in a new era of AI-powered experiences.
How Does Gemini Compare to OpenAI’s GPT-4?
Google claims that Gemini outperforms OpenAI’s GPT-4 on 30 out of 32 benchmarks, with particular advantages in understanding and interacting with video and audio. This is because Gemini was designed as a “multimodal” model, meaning it can process information from multiple sources.
“We’ve done a very thorough analysis of the systems side by side and the benchmarking,” Hassabis says. Google ran 32 well-established benchmarks comparing the two models, from broad overall tests like the Multi-task Language Understanding benchmark to one that compares two models’ ability to generate Python code. “I think we’re substantially ahead on 30 out of 32” of those benchmarks, Hassabis says, with a bit of a smile on his face. “Some of them are very narrow. Some of them are larger.”
In benchmark tests, Gemini demonstrated a clear edge in comprehending and interacting with video, images, and audio. This advantage is by design; multimodality has been central to Google’s Gemini roadmap from day one. Rather than construct separate models for visual and voice tasks like DALL-E and Whisper, Gemini was architected from the ground up as a unified, multisensory system.
Also Read: Google Play introduces new security review badge for vetted VPN apps
“We’ve always been interested in highly general systems,” explains Hassabis. A core goal is enabling Gemini to fluidly combine inputs and outputs across text, images, videos, and speech. As Hassabis puts it, “to collect as much data as possible from any number of inputs and senses and then give responses with just as much variety.”
Already, advanced variants like Gemini Ultra can handle images, video, and audio alongside text. Over time, even more modalities will come into play. “There’s still things like action, and touch – more like robotics-type things,” notes Hassabis. As Gemini evolves, the aim is to make it more perceptive and more firmly grounded in the real world. “These models just sort of understand better about the world around them.”
Of course, flaws like hallucinations and bias persist. But incrementally expanding Gemini’s understanding is seen as the path ahead. As Hassabis asserts, “the more they know, the better they’ll get.”
Beyond Benchmarks: Real-World Utility
However, benchmarks alone cannot demonstrate Gemini’s full potential. The ultimate test depends on real-world performance—whether everyday users turn to Gemini for ideating, searching, coding, and more.
Coding seems to be one killer app in Google’s sights. The AlphaCode 2 system leveraged by Gemini can allegedly outperform 85% of coding competition participants, a significant jump from its 50% predecessor.
Still, Pichai foresees Gemini elevating virtually every product its integrated into, not just developer tools. The proof will be in the user response across Search, Maps, Docs, and beyond.
Safety and Responsibility
Hassabis stresses Google’s commitment to responsible AI development with extensive testing:
“That’s why you have to release things, to see and learn.”
Google is taking particular care with Gemini Ultra, drawing comparisons to a controlled beta environment. The goal is to discover and resolve issues before wide-scale rollout.
Pichai also highlights the need for security, reliability and oversight with enterprise applications. Google aims to balance innovation with caution as AI progresses towards artificial general intelligence.
Conclusion
Gemini signifies a new era for Google defined by AI-first thinking across consumer and enterprise products.
While ChatGPT may have kickstarted this generative goldrush, Google’s formidable resources in chips, data and talent put them in pole position to lead the next phase. If Gemini delivers on its promise, Google could cement its standing as the world’s AI superpower.