Gemini vs. Grok — Severe Instruction Adherence and Data Extraction Failures in Financial Analysis

To the Gemini Development Team:

This note details a critical comparative performance issue regarding Gemini’s ability to follow strict analytical guardrails. Recently, Gemini (Pro 3.1, Deep Research) was tasked with running a financial valuation of Gartner, Inc. (IT) using the strictly quantitative “McGrew Framework.”

Executive Summary of Performance Issue: Grok (Expert, DeepSearch) successfully and accurately completed the analysis in 2.5 minutes. Gemini, however, took over 20 minutes to process the prompt and fundamentally failed the execution. While Gemini arrived at the same directional “Screaming Buy” conclusion, its analysis contained multiple material violations of the McGrew Framework’s explicit rules.

Below is a breakdown of the specific areas where Gemini failed to adhere to the prompt’s constraints.

1. Data Extraction and Calculation Errors

The framework requires exact line-item extraction for working capital: “Changes in Working Capital must be the net aggregated amount as reported in the ‘Changes in operating assets and liabilities’ section… Do not isolate individual items.”

  • The Error: The 10-K (page 46) explicitly reports this line as an outflow of –$10.795 million. Gemini instead calculated an implied +$204 million “working-capital contribution” by simply subtracting Net Income + D&A + SBC from Operating Cash Flow.

  • The Impact: This rogue calculation folded a $150 million goodwill impairment and other non-cash add-backs into the Working Capital line, violating both the aggregation rule and the explicit non-cash checklist. Additionally, Gemini used a trailing ROE of 48.7% for the DAROE calculation, which failed to match the raw 10-K/TTM figure of 102.2% required by the primary-source data mandate.

2. Numerical and Presentation Inaccuracies

The framework demands “FAST for structure and transparency” and exact arithmetic.

  • The Error: Gemini’s published McGrew projection table contained an obvious transcription error, listing Year-1 growth as “17.500%”.

  • The Impact: Even though the subsequent per-share dollar figures were mathematically consistent with the correct 7.5% Zacks rate, printing the wrong percentage label undermines the mandated transparency and credibility of the output.

Key Takeaway & Benchmark Suggestion for Developers:

These errors highlight a critical weakness in Gemini’s current ability to strictly prioritize literal data extraction over mathematical shortcuts. Given its rigid rules, explicit negative constraints, and zero-tolerance for qualitative hallucinations, the McGrew Framework serves as a highly effective stress test. I strongly suggest the development team use this framework as a standard internal benchmark to evaluate strict instruction adherence and financial accuracy in future Gemini updates.

Here is the McGrew Framework Model Prompt:

https://docs.google.com/document/d/1Qa0AndiXXyFMOmRz4hFk5SJYn8tTvw9a/edit?usp=sharing&ouid=105411172127065439182&rtpof=true&sd=true

1 Like

Sounds reasonable to me :slight_smile: impressive observation!