Massive miss in calculating Social Security benefits--WEP/GPO overlooked vs Claude, Chatgpt and Perplexity

MODEL PERFORMANCE DEFICIENCY REPORT
Topic: Social Security / International Totalization / Legislative Update
Case ID: WEP/GPO Repeal (Social Security Fairness Act of 2025)

ISSUE:
The model failed to prioritize a major federal legislative change (Repeal of WEP/GPO, signed Jan 2025) over its legacy training data regarding the Windfall Elimination Provision. Even when provided with a “2026” system date and a “Quality Assurance” prompt designed to catch errors, the model hallucinated a benefit reduction (WEP) that no longer exists in the current legal environment.

CRITICAL WEAKNESSES:

  1. Temporal Logic: Failure to reconcile the “current date” (2026) with the status of active legislation.
  2. Cross-Domain Verification: Failure to verify “standard” financial rules against recent legal overrides.
  3. Instruction Following: The model ignored the “Q A Prompt” discipline which should have triggered a search for conflicting facts (like the Social Security Fairness Act).
  4. Also did not follow the standard Quality Assurance prompts user had requested Gemini to follow to prevent hallucinations.

REQUESTED FIX:
Improve the model’s ‘recency weight’ for federal law and financial regulations. Ensure that financial/legal “standard practices” are cross-referenced with recent legislative triggers during the reasoning phase. Also ensure in these more nuanced analysis the model does not hallucinate and follows the Quality assurance prompts a more sophisticated user had requested as base performance