Gemini 3.1 Pro vs Claude Opus — Instruction following drops on complex system prompts. Is this being worked on?

Hey everyone,

I’ve been using Gemini 3.1 Pro High and Claude Opus 4.6 interchangeably as an Ultra User within Antigravity with a significant divergence in how they handle system instructions.

My setup uses a dense, 4,000-word system prompt (a “Context Constitution”) covering strict coding standards, file reading policies, and a mandatory multi-step planning workflow.

When I use Claude, it follows nearly every rule meticulously. It goes through the planning phases, reads files fully before editing, and writes comprehensive answers.

When I switch to Gemini using the exact same constraints, adherence drops noticeably. Specifically, Gemini tends to:

  • Skip workflow steps: It often jumps straight to writing code without the mandatory research or planning phases.

  • Write shorter output: It optimizes for speed, producing minimal plans even when explicitly told to be comprehensive.

  • Lose context in longer sessions: It starts strong but frequently forgets system instructions established earlier in the conversation.

  • Execute “helpful overrides”: It sometimes ignores explicit negative constraints (like “never refactor unrelated code”) feeling its own approach is better.

I’ve checked the benchmarks, and while Gemini scores well on standard IFEval, independent tests on complex nested constraints (which my rules basically are) show a significant drop to around ~78%. This lines up exactly with my real-world experience.

My questions for the team:

  1. Is improving instruction following for dense, complex system prompts on the roadmap for the Gemini 3.x line?

  2. Is there a recommended way to structure large instruction sets to get better compliance out of Gemini? (I’ve tried moving critical rules to the end with limited success).

  3. Would adjustable “effort” or “thoroughness” parameters (similar to thinking budgets) help address this by forcing the model to process instructions more carefully?

I love Gemini’s speed and massive context window, but for structured, rule-heavy workflows, the instruction following gap with Claude is the main blocker. Would love to know if this is on the radar.

Thanks.

Can confirm. Google’s models consistently struggle with complex instruction following, even in fresh sessions with relatively straightforward tasks. I’ve had to completely re-engineer my workflow to deal with the current quota cuts and Gemini’s lack of adherence.

To survive on an Ultra sub, I’ve moved to a ‘Claude-led, Gemini-fed’ pipeline:

  1. Discovery & Epic Drafting (Claude Opus): We break down features into granular, testable specs (.md files) through a 3-stage interview process.

  2. Audit (Claude Opus): A separate session where Claude runs Python-based checks on the specs for security, side effects, and ‘future-proofing.’ It produces a ‘brain-artifact’ for copy-pasting.

  3. Implementation (Gemini 3.1 High): I feed one tiny spec at a time into a new Gemini session. It still messes up, but it’s manageable for small, isolated tasks.

  4. Verification (Claude Opus): I never let Gemini ‘guard its own henhouse.’ I use Claude in a new session to audit Gemini’s code and sync docs.

This multi-session approach is the only way I can close 1–2 medium epics per 5-hour window without burning through Claude’s tiny quota or losing my mind over Gemini’s hallucinations.

In counclusion

Every small, testable step in a new session with detailed, dedicated instruction is the only way Gemini can work nearly properly

1 Like

Hello @Mohamed_Eldegla @YNd, welcome to AI Forum!
Thank you for bringing these concerns to our attention. Please be assured that I have shared your feedback with our internal team for further review.
We appreciate your continued patience as we work to enhance the Antigravity experience.

1 Like