The content on this page was provided by an independent third party and syndicated by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Gemini 3 Flash Crushes ChatGPT-5.2 in Accuracy Test – ORCA Benchmark Update

New ORCA results show Gemini leading in practical math, but no AI matches the consistency of a simple calculator.

Calculators are predictable, always giving the same answer. AI is different; Mathematically, a model can get a question right today and wrong tomorrow.”
— Dawid Siuda, Researcher at ORCA

KRAKOW, POLAND, March 3, 2026 /EINPresswire.com/ — The results are in for the second ORCA (Omni Research on Calculation in AI) Benchmark, and the leaderboard looks very different than it did two months ago. Gemini 3 Flash has surged to the top, becoming the first model to solve nearly three-quarters of real-world math and logic problems correctly.

While the industry often focuses on academic tests, the ORCA Benchmark uses 500 practical questions and the kind of “messy” math people actually deal with every day. In this latest run, Gemini 3 Flash hit an accuracy rate of 72.8%, a significant jump from its previous performance. Meanwhile, ChatGPT-5.2 and DeepSeek V3.2 showed modest, steady gains, while Grok 4.1 saw its scores slip.

The “Calculator” Problem
Despite the high scores, the study highlights a lingering frustration for AI users: inconsistency. Unlike a standard calculator, which gives the same answer every time, these AI models are probabilistic. They “predict” answers rather than calculating them through fixed rules.
The ORCA team tracked this using a new “instability metric.” It turns out that even when models are wrong, they aren’t even consistently wrong.
– ChatGPT-5.2 changed its answer on 65% of persistent errors.
– Gemini 3 Flash proved more “stubborn,” changing only 46% of the time.
– DeepSeek was the most erratic, shifting its response 69% of the time.

“A calculator is predictable. Ask it the same question today or next year, and the answer stays the same,” says Dawid Siuda, researcher at ORCA. “AI doesn’t work that way. These systems are predicting the next likely word based on patterns. Mathematically, it’s possible for a model to get a question right today and wrong tomorrow.”

Winners and Losers by Category
The progress wasn’t even across the board. Gemini saw massive gains in biology, chemistry, and physics, but actually dropped slightly in engineering. DeepSeek V3.2, now out of its “alpha” phase, saw its biology scores skyrocket from 11% to 44%. On the flip side, Grok 4.1 struggled, losing ground in health, sport, and statistics.

What This Means for Users
The data shows that while AIs are getting better at rounding numbers and formatting results, they still trip up on core arithmetic. Calculation errors now account for 39.8% of all mistakes made across the models tested.

The takeaway? AI is a powerful assistant, but it’s not a replacement for a human eye or a calculator.

About ORCA Benchmark
The ORCA Benchmark (Omni Research on Calculation in AI) is an initiative by Omni Calculator designed to provide a genuine assessment of the mathematical capabilities of today’s leading AI chatbots. For our second iteration, we tested four prominent models: ChatGPT-5.2 (OpenAI), Gemini 3 Flash (Google), Grok-4.1 (xAI), and DeepSeek V3.2.
Our methodology prioritizes accessibility; we only test new models that offer a free tier to the public. This is why Anthropic’s Claude 4.5 Sonnet was not retested this round (as it has not been updated since our first report), and why DeepSeek V3.2 was included again—while the name remains the same, the model has transitioned from an alpha version to a stable release, resulting in a performance jump of over 3 percentage points. By avoiding paid-tier or private prototypes, ORCA provides a genuine assessment of the mathematical tools available to the average user right now.

For the full report, visit https://www.omnicalculator.com/reports/orca-ai-benchmark-2026-update

Reyhaneh Mansouri
Omni Calculator sp. z o.o.
+48 730 061 124
email us here
Visit us on social media:
LinkedIn

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

Rival System Launches Post-Trade Allocation Functionality In Its Flagship Multi-Asset Trading System, Rival One

Rival System Launches Post-Trade Allocation Functionality In Its Flagship Multi-Asset Trading System, Rival One

The allocation functionality supports a broad spectrum of asset classes handled by Rival One, including futures,

March 3, 2026

New DEI Book for Individuals

New DEI Book for Individuals

New book offers a judgement-free space for self-reflection, empathy, and understanding of others. BOSTON, MA, UNITED

March 3, 2026

Illinois Schools Report Reduction in Cyberbullying Following Bell-to-Bell Cell Phone Ban with the NuKase

Illinois Schools Report Reduction in Cyberbullying Following Bell-to-Bell Cell Phone Ban with the NuKase

School leaders in Illinois report fewer cyberbullying incidents and improved student interactions following

March 3, 2026

Toy Storage Nation Masterclass Boosts Speaker Lineup, Targets Flex Space, Marinas and RV/Boat Storage Data, April 10

Toy Storage Nation Masterclass Boosts Speaker Lineup, Targets Flex Space, Marinas and RV/Boat Storage Data, April 10

Foremost authority of RV/boat storage offers exclusive educational event for developers, investors and operators to

March 3, 2026

Good Darling Biosciences announces new functional mushroom-based skincare products called Skin Sequence

Good Darling Biosciences announces new functional mushroom-based skincare products called Skin Sequence

FOR IMMEDIATE RELEASE Good Darling Biosciences Launches New Line of Functional Mushroom-Based Skincare Products

March 3, 2026

Folds of Honor Boston is officially becoming Folds of Honor New England

Folds of Honor Boston is officially becoming Folds of Honor New England

Folds of Honor Boston is becoming Folds of Honor New England, to serve Massachusetts, Rhode Island, Connecticut, New

March 3, 2026

Voted Best: 24/7 Garage Door Repair Fleet Launches in Spring TX

Voted Best: 24/7 Garage Door Repair Fleet Launches in Spring TX

Need "garage door repair near me"? Easy Garage Door in Spring, TX launches a 24/7 emergency fleet for fast service.

March 3, 2026

SIAD Holding Hosts Its 2026 Annual Event Under the Theme ‘Today’s Achievement, Tomorrow’s Legacy’

SIAD Holding Hosts Its 2026 Annual Event Under the Theme ‘Today’s Achievement, Tomorrow’s Legacy’

SIAD Holding Hosts Its 2026 Annual Event Under the Theme “Today’s Achievement, Tomorrow’s Legacy” JEDDAH, SAUDI ARABIA,

March 3, 2026

Profitable Pilates Debuts the First Pilates-Specific AI Support Tool Built by Pilates Teachers for Pilates Teachers

Profitable Pilates Debuts the First Pilates-Specific AI Support Tool Built by Pilates Teachers for Pilates Teachers

Profitable Pilates announces the expansion of Lesley On-Demand, the first Pilates-specific AI support tool built by

March 3, 2026

Horror Writers Association Named Official Accomplice Sponsor for Creepaway Camp 2026

Horror Writers Association Named Official Accomplice Sponsor for Creepaway Camp 2026

HWA joins the immersive Colorado horror camp experience as a Community Partner, bringing panels, authors, and

March 3, 2026

Dr. Alison Vaughn Smith Featured on Next Level CEO

Dr. Alison Vaughn Smith Featured on Next Level CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Dr. Alison Vaughn Smith, international speaker, award-winning

March 3, 2026

Sanaregen™ Vision Therapeutics Receives FDA Clearance for Clinical Trial to Treat Retinal Degeneration

Sanaregen™ Vision Therapeutics Receives FDA Clearance for Clinical Trial to Treat Retinal Degeneration

We believe SVT-001 provides true hope to prevent the almost inevitable vision loss caused by this devastating condition

March 3, 2026

iReside Launches Tax Residency Tracking App

iReside Launches Tax Residency Tracking App

iReside Automatically tracks days by state and country, alerts users before residency thresholds, and generates PDF

March 3, 2026

As Inflation Pressures Families Nationwide, Foothold Launches Digital Market Designed to Help People Simply Stay Afloat

As Inflation Pressures Families Nationwide, Foothold Launches Digital Market Designed to Help People Simply Stay Afloat

New U.S.-based online selling platform focuses on predictable fees, practical tools and economic clarity for both

March 3, 2026

SYSO Completes Series B Extension Following Strong Execution and Continued Growth

SYSO Completes Series B Extension Following Strong Execution and Continued Growth

BOSTON, MA, UNITED STATES, March 3, 2026 /EINPresswire.com/ — SYSO, a leading market operator of new generation and

March 3, 2026

Support Your Girlfriends Partners with Vivid Advisory to Deliver Free Monthly Workshops for the Pow(H)er Workshop Series

Support Your Girlfriends Partners with Vivid Advisory to Deliver Free Monthly Workshops for the Pow(H)er Workshop Series

The workshops will cover a wide range of financial topics designed for aspiring entrepreneurs, wealth builders, and

March 3, 2026

National Security and Business Leader Tom Markert Releases New Thriller DEATH WATCH

National Security and Business Leader Tom Markert Releases New Thriller DEATH WATCH

The author’s first major thriller is already charting in the Top 3 on online bestseller lists Writing thrillers is a

March 3, 2026

Author Automations Newsletter Ranks Among Substack Bestsellers in Publishing

Author Automations Newsletter Ranks Among Substack Bestsellers in Publishing

Chelle Honiker's weekly newsletter on AI and automation for authors hits Substack bestseller lists with a social

March 3, 2026

Productive Dentist Academy Announces “Unrestricted: The End to PPO Dependence” Live Event

Productive Dentist Academy Announces “Unrestricted: The End to PPO Dependence” Live Event

Investment Grade Practice Business Strategy Workshop Helps Independent Dentists Redesign Revenue Freedom When practices

March 3, 2026

Bogo Solutions Expands Digital Marketing Services Nationwide

Bogo Solutions Expands Digital Marketing Services Nationwide

Home service businesses can double client calls in 60 days with Bogo Solutions’ specialized marketing services.

March 3, 2026

Jennifer O. Maddox Featured on Next Level CEO

Jennifer O. Maddox Featured on Next Level CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Jennifer O. Maddox, founder of Future Ties, NFP, is set to

March 3, 2026

357 Digital Media Group Delivers Digital Strategies That Convert

357 Digital Media Group Delivers Digital Strategies That Convert

357 Digital Media Group offers full-service digital marketing—SEO, CTV advertising, web design, and social media—to

March 3, 2026

Dani Lewis Featured on Next Level CEO

Dani Lewis Featured on Next Level CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Dani Lewis, founder of IKNP Solutions, is set to appear on

March 3, 2026

Intrepid Global Solutions proudly announces the appointment of Brian C. Drzewiecki as Chief Financial Officer.

Intrepid Global Solutions proudly announces the appointment of Brian C. Drzewiecki as Chief Financial Officer.

I’m excited to join Intrepid at a time when the company’s momentum is matched by its clarity of purpose,”— Brian

March 3, 2026

National Autism Safety Council Launches to Address Preventable Injury, Trauma & Early Mortality in the Autism Community

National Autism Safety Council Launches to Address Preventable Injury, Trauma & Early Mortality in the Autism Community

New national nonprofit focuses on education, training, research, healthcare, and public safety systems to improve

March 3, 2026

Surprenant, Beneski & Nunes Expands Footprint with New Office in Plymouth, Massachusetts

Surprenant, Beneski & Nunes Expands Footprint with New Office in Plymouth, Massachusetts

Firm strengthens access to compassionate, comprehensive estate planning and elder law services across the South Shore.

March 3, 2026

Prime Window Cleaning Reports a Post‑Winter Glass Restoration Surge for Window Cleaning Service NYC Customers

Prime Window Cleaning Reports a Post‑Winter Glass Restoration Surge for Window Cleaning Service NYC Customers

Prime Window Cleaning reports increased post-winter demand for glass restoration, as mineral residue and seasonal

March 3, 2026

Deepen AI Announces Seed Round Led by Majlis Advisory to Scale Sensor-Fusion Ground Truth for Physical AI

Deepen AI Announces Seed Round Led by Majlis Advisory to Scale Sensor-Fusion Ground Truth for Physical AI

Capital Fuels Growth in Sensor-Fusion Ground Truth and Safety Execution for Real-World AI Deepen AI operates exactly

March 3, 2026

Alejandro Hernandez III Establishes ARH Global Advisors LLC as a Fiduciary Platform for Real Estate–Heavy UHNW Families

Alejandro Hernandez III Establishes ARH Global Advisors LLC as a Fiduciary Platform for Real Estate–Heavy UHNW Families

Strategic Advisory Firm Addresses Cross-Jurisdictional Complexity and Intergenerational Wealth Architecture Across New

March 3, 2026

Westbrass Completes Acquisition of GS North America, Expanding Portfolio with Premier Italian Plumbing Brand

Westbrass Completes Acquisition of GS North America, Expanding Portfolio with Premier Italian Plumbing Brand

Legacy brand to continue under GS name as founder Raymond Padowitz transitions to retirement after 30 year friendship

March 3, 2026

America’s Credit Union and GO Federal Credit Union Use Heavy Duty Custom Canopy Tents to Boost Community Presence

America’s Credit Union and GO Federal Credit Union Use Heavy Duty Custom Canopy Tents to Boost Community Presence

America’s Credit Union and GO Federal Credit Union use heavy duty custom canopy tents from SplashTents.com to boost

March 3, 2026

The Respectful Divorce Podcast Features New York Divorce Professionals

The Respectful Divorce Podcast Features New York Divorce Professionals

NEW YORK, NY, UNITED STATES, March 3, 2026 /EINPresswire.com/ — New York Collaborative Divorce attorneys Ellen

March 3, 2026

Linda M. Bolton Joins Operation CEO

Linda M. Bolton Joins Operation CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Linda M. Bolton, Minister of Fitness and founder of Stay

March 3, 2026

ABBA Bail Bonds Launches Free Bail Program for Expectant Mothers and Mothers of Young Children

ABBA Bail Bonds Launches Free Bail Program for Expectant Mothers and Mothers of Young Children

Initiative covers bonds up to $50,000 for qualifying mothers facing non-violent charges No child should have to wonder

March 3, 2026

Critical Infrastructure Conference Offers Contact Hours or Grad Credit for Educators

Critical Infrastructure Conference Offers Contact Hours or Grad Credit for Educators

4th annual CIM Conference is May 28–29, 2026 in Dublin, Ohio, spotlighting AI, energy, security, and workforce

March 3, 2026

Arilda Surridge Featured on Next Level CEO

Arilda Surridge Featured on Next Level CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Arilda Surridge, licensed marriage and family therapist and

March 3, 2026

Custom Wooden Token Manufacturer Celebrates 10 Years of Helping Brands Stay Top of Mind

Custom Wooden Token Manufacturer Celebrates 10 Years of Helping Brands Stay Top of Mind

A decade of crafting custom wooden tokens & promotional goods that help businesses remain memorable.ITHACA, N.Y.,

March 3, 2026

New Environmental Thriller “The Star Thrower” Reimagines a Classic Lesson in Individual Impact

New Environmental Thriller “The Star Thrower” Reimagines a Classic Lesson in Individual Impact

CHICAGO, Mar. 3, 2026 / PRZen / Inspired by the timeless wisdom of naturalist Loren Eiseley, author Kathleen Welton's

March 3, 2026

24/7 Rapid Response Garage Door Repair Fleet Launches in Houston

24/7 Rapid Response Garage Door Repair Fleet Launches in Houston

Easy Garage Door Repair launches a 24/7 Rapid Response fleet in Houston, drastically reducing emergency wait times.

March 3, 2026

Robert Ayres Joins Operation CEO

Robert Ayres Joins Operation CEO

FL, UNITED STATES, March 3, 2026 /EINPresswire.com/ — Robert Ayres, military veteran and founder of Liberty Pest

March 3, 2026