Gemini vs. Grok — Severe Instruction Adherence and Data Extraction Failures in Financial Analysis

Peggy_McGrew · February 26, 2026, 2:31am

To the Gemini Development Team:

This note details a critical comparative performance issue regarding Gemini’s ability to follow strict analytical guardrails. Recently, Gemini (Pro 3.1, Deep Research) was tasked with running a financial valuation of Gartner, Inc. (IT) using the strictly quantitative “McGrew Framework.”

Executive Summary of Performance Issue: Grok (Expert, DeepSearch) successfully and accurately completed the analysis in 2.5 minutes. Gemini, however, took over 20 minutes to process the prompt and fundamentally failed the execution. While Gemini arrived at the same directional “Screaming Buy” conclusion, its analysis contained multiple material violations of the McGrew Framework’s explicit rules.

Below is a breakdown of the specific areas where Gemini failed to adhere to the prompt’s constraints.

1. Data Extraction and Calculation Errors

The framework requires exact line-item extraction for working capital: “Changes in Working Capital must be the net aggregated amount as reported in the ‘Changes in operating assets and liabilities’ section… Do not isolate individual items.”

The Error: The 10-K (page 46) explicitly reports this line as an outflow of –$10.795 million. Gemini instead calculated an implied +$204 million “working-capital contribution” by simply subtracting Net Income + D&A + SBC from Operating Cash Flow.
The Impact: This rogue calculation folded a $150 million goodwill impairment and other non-cash add-backs into the Working Capital line, violating both the aggregation rule and the explicit non-cash checklist. Additionally, Gemini used a trailing ROE of 48.7% for the DAROE calculation, which failed to match the raw 10-K/TTM figure of 102.2% required by the primary-source data mandate.

2. Numerical and Presentation Inaccuracies

The framework demands “FAST for structure and transparency” and exact arithmetic.

The Error: Gemini’s published McGrew projection table contained an obvious transcription error, listing Year-1 growth as “17.500%”.
The Impact: Even though the subsequent per-share dollar figures were mathematically consistent with the correct 7.5% Zacks rate, printing the wrong percentage label undermines the mandated transparency and credibility of the output.

Key Takeaway & Benchmark Suggestion for Developers:

These errors highlight a critical weakness in Gemini’s current ability to strictly prioritize literal data extraction over mathematical shortcuts. Given its rigid rules, explicit negative constraints, and zero-tolerance for qualitative hallucinations, the McGrew Framework serves as a highly effective stress test. I strongly suggest the development team use this framework as a standard internal benchmark to evaluate strict instruction adherence and financial accuracy in future Gemini updates.

Here is the McGrew Framework Model Prompt:

https://docs.google.com/document/d/1Qa0AndiXXyFMOmRz4hFk5SJYn8tTvw9a/edit?usp=sharing&ouid=105411172127065439182&rtpof=true&sd=true

Herbst1984 · February 26, 2026, 3:37pm

Sounds reasonable to me impressive observation!

Topic		Replies	Views
Gemini 2.0 flash - 1.5 pro Struggles with Basic Task Execution Gemini API gemini-15 , api , models	1	140	May 19, 2025
Regarding Gemini's Core Logic and Major Errors in Data Judgment Gemini API bug , api	0	79	December 4, 2025
Gemini 3.1 vs 2.5 Pro: weaker reasoning, instruction-following, and long-document coverage Gemini API bug , api , models , gemini	1	504	March 31, 2026
Gemini ignored constraints, injected external data, and failed to read file uploads/Google Sheets Gemini API api , gemini , gemini-flash	2	117	April 24, 2026
Why Gemini models are not accurate towards prompt Gemini API feedback , prompt , performance	1	122	May 18, 2025

Gemini vs. Grok — Severe Instruction Adherence and Data Extraction Failures in Financial Analysis

1. Data Extraction and Calculation Errors

2. Numerical and Presentation Inaccuracies

Key Takeaway & Benchmark Suggestion for Developers:

Related topics