Hey everyone,
I’ve been using Gemini 3.1 Pro High and Claude Opus 4.6 interchangeably as an Ultra User within Antigravity with a significant divergence in how they handle system instructions.
My setup uses a dense, 4,000-word system prompt (a “Context Constitution”) covering strict coding standards, file reading policies, and a mandatory multi-step planning workflow.
When I use Claude, it follows nearly every rule meticulously. It goes through the planning phases, reads files fully before editing, and writes comprehensive answers.
When I switch to Gemini using the exact same constraints, adherence drops noticeably. Specifically, Gemini tends to:
-
Skip workflow steps: It often jumps straight to writing code without the mandatory research or planning phases.
-
Write shorter output: It optimizes for speed, producing minimal plans even when explicitly told to be comprehensive.
-
Lose context in longer sessions: It starts strong but frequently forgets system instructions established earlier in the conversation.
-
Execute “helpful overrides”: It sometimes ignores explicit negative constraints (like “never refactor unrelated code”) feeling its own approach is better.
I’ve checked the benchmarks, and while Gemini scores well on standard IFEval, independent tests on complex nested constraints (which my rules basically are) show a significant drop to around ~78%. This lines up exactly with my real-world experience.
My questions for the team:
-
Is improving instruction following for dense, complex system prompts on the roadmap for the Gemini 3.x line?
-
Is there a recommended way to structure large instruction sets to get better compliance out of Gemini? (I’ve tried moving critical rules to the end with limited success).
-
Would adjustable “effort” or “thoroughness” parameters (similar to thinking budgets) help address this by forcing the model to process instructions more carefully?
I love Gemini’s speed and massive context window, but for structured, rule-heavy workflows, the instruction following gap with Claude is the main blocker. Would love to know if this is on the radar.
Thanks.
