Gemini is a new and powerful artificial intelligence model from Google that can understand not just text but also images, videos, and audio. As a multimodal model, Gemini is described as capable of completing complex tasks in math, physics, and other areas, as well as understanding and generating high-quality code in various programming languages.
It is currently available through integrations with Google Bard and the Google Pixel 8 and will gradually be folded into other Google services.
Also: AI in 2023: A year of breakthroughs that left no human thing unchanged
"Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research," according to Dennis Hassabis, CEO and co-founder of Google DeepMind. "It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video."
Gemini was created by Google and Alphabet, Google's parent company, and released as the company's most advanced AI model to date. Google DeepMind also made significant contributions to the development of Gemini.
Also: Bing's new Deep Search uses GPT-4 to get you more thorough search results
Google describes Gemini as a flexible model that is capable of running on everything from Google's data centers to mobile devices. To achieve this scalability, Gemini is being released in three sizes: Gemini Nano, Gemini Pro, and Gemini Ultra.
Gemini is now available on Google products in its Nano and Pro sizes, like the Pixel 8 phone and Bard chatbot, respectively. Google plans to integrate Gemini over time into its Search, Ads, Chrome, and other services.
Also: I asked DALL-E 3 to create a portrait of every US state, and the results were gloriously strange
Developers and enterprise customers will be able to access Gemini Pro via the Gemini API in Google's AI Studio and Google Cloud Vertex AI starting on December 13. Android developers will have access to Gemini Nano via AICore, which will be available on an early preview basis.
Google's new Gemini model appears to be one of the largest, most advanced AI models to date, though the release of the Ultra model will be the one to determine that for certain. Compared to other popular models that power AI chatbots right now, Gemini stands out due to its native multimodal characteristic, whereas other models, like GPT-4, rely on plugins and integrations to be truly multimodal.
Also: Google says Bard is now smarter than ChatGPT, thanks to Gemini update
A comparison chart from Google shows how Gemini Ultra and Pro compare to OpenAI's GPT-4 and Whisper, respectively.
GoogleCompared to GPT-4, a primarily text-based model, Gemini easily performs multimodal tasks natively. While GPT-4 excels in language-related tasks like content creation and complex text analysis natively, it resorts to OpenAI's plugins to perform image analysis and access the web, and it relies on DALL-E 3 and Whisper to generate images and process audio.
Also: The best AI chatbots: ChatGPT and other noteworthy alternatives
Google's Gemini also appears to be more product-focused than other models available now. It's either integrated into the company's ecosystem or with plans to be, as it's powering both Bard and Pixel 8 devices. Other models, like GPT-4 and Meta's Llama, are more service-oriented, and available for various third-party developers for applications, tools, and services.