On 25th March, Google unveiled Gemini 2.5, a cutting-edge family of AI reasoning models designed to pause and “think” before responding to a query.
Leading this new lineup, the company is launching Gemini 2.5 Pro Experimental—its most advanced multimodal reasoning AI model to date.
This model will be available starting March 25th on Google’s developer platform, Google AI Studio, and in the Gemini app for users subscribed to the company’s premium AI service, Gemini Advanced, priced at $20 per month.
Looking ahead, Google plans to integrate reasoning capabilities into all of its future AI models.
Since OpenAI introduced the first AI reasoning model, o1, in September 2024, tech companies have been in a race to match or surpass its performance.
Today, firms like Anthropic, DeepSeek, Google, and xAI have all developed reasoning AI models, leveraging additional computational resources to verify facts and process information more thoroughly before generating responses.
AI reasoning techniques have significantly improved performance in areas like mathematics and coding. Many experts believe these models will be fundamental in powering AI agents—autonomous systems capable of executing tasks with minimal human involvement. However, the tradeoff is that these models come with higher operational costs.
Google has previously explored reasoning-based AI, releasing an experimental “thinking” version of Gemini in December. However, Gemini 2.5 marks the company’s most ambitious effort yet to outperform OpenAI’s “o” series.
According to Google, Gemini 2.5 Pro surpasses its earlier frontier AI models and even some top competitors in various benchmarks. Specifically, it is designed to excel at generating visually engaging web applications and agentic coding solutions.
In the Aider Polyglot benchmark, which evaluates code editing performance, Gemini 2.5 Pro achieves a score of 68.6%, outpacing leading AI models from OpenAI, Anthropic, and China-based DeepSeek.
Here is an example of how you can create a video game with just a single line prompt using Gemini 2.5 pro:
In the SWE-bench Verified test, which assesses software development capabilities, Gemini 2.5 Pro scores 63.8%. While it outperforms OpenAI’s o3-mini and DeepSeek’s R1, it falls short of Anthropic’s Claude 3.7 Sonnet, which leads with 70.3%.
On Humanity’s Last Exam—a multimodal test covering thousands of crowdsourced questions across mathematics, humanities, and natural sciences—Gemini 2.5 Pro scores 18.8%, surpassing most competing flagship AI models.
At launch, Google is equipping Gemini 2.5 Pro with a context window of 1 million tokens, enabling it to process approximately 750,000 words in a single instance—longer than the entire Lord of the Rings book series. The company also plans to double this capacity soon, expanding the input length to 2 million tokens.
Google has yet to disclose API pricing for Gemini 2.5 Pro, but stated that more details will be shared in the coming weeks.