I tried Gemini 3.5, I gave it a task - verify the feasibility over a proposed AI architecture I’ve invented using my workflows to test scientific claims.
It kept looping endlessly. Compacting, verifying, over and over.
So I switched to Opus 4.6, and told him to “Go through the logs and provide me a diagnosis as to why this occurred”
Engineers at Google, A) - You’re welcome and B) - Maybe try doing this yourself next time BEFORE you hype up and release a model that cannot even handle a single complex task?
Oh - and what’s up with the “spins up subagents” nonsense from the Google I/O ? What subagents? Are you guys hyping up a custom setup or something that requires manual installation as a “feature” ? It didn’t plan, split the tasks up between “subagents” or any such nonsense. I might be missing the obvious here, but as a person sitting and watching that speech - I was led to believe this was the second coming and the AI could make me a custom OS from a single prompt - spin up subagents and delegate the tasks in such an ingenious manner that my socks were going to be blown off.
Not impressed.
Diagnosis complete. The evidence is unambiguous. Here’s exactly what happened:
Forensic Diagnosis: Gemini 3.5 Flash Failure Mode
The Raw Numbers (Steps 351–776 = your SVW request to “you kept going in circles”)
| Action Type | Count |
|---|---|
PLANNER_RESPONSE (model talking) |
194 |
VIEW_FILE (reading files) |
160 |
LIST_DIRECTORY |
25 |
RUN_COMMAND (actual execution) |
4 |
CODE_ACTION (writing files) |
2 |
160 file reads. 2 file writes. That’s a 80:1 read-to-write ratio.
The Loop Pattern
The transcript shows at least 4 complete restart cycles — the model reads the same files over and over:
| Cycle | Steps | What It Did |
|---|---|---|
| Cycle 1 (351–420) | Read SVW workflow → Read all 3 LaTeX files → Read task.md → Read gap_analysis.md → Wrote sympy_verification.py → Ran it |
|
| Cycle 2 (481–530) | Read SVW workflow again → Read all 3 LaTeX files again → Read task.md again → Read gap_analysis.md again → Read walkthrough.md again | |
| Cycle 3 (533–637) | “Initializing execution phase” → Read SVW workflow again → Read all 3 LaTeX files again → Read task.md again → Read gap_analysis.md again | |
| Cycle 4 (689–749) | “Inspect workspace directory” → Read sympy_verification.py again → Read SVW again → Read all 3 LaTeX files again → Read gap_analysis.md again → Read task.md again |
Root Cause: Context Window Compaction Amnesia
Your hypothesis is exactly right. Here’s the mechanism:
-
Gemini 3.5 Flash has a smaller effective working memory than Opus. When the context window fills up (3 LaTeX files = ~3,300 lines = ~100K tokens of source material + your massive GEMINI.md rules + SVW workflow), the platform compacts the conversation to free space.
-
After compaction, the model loses the specific details of what it already read — it retains a summary but not the file contents. So it knows “I was doing SVW verification” but doesn’t remember the actual equations, layer specs, or gap details.
-
Its next action is always: “I will read the file to understand…” — re-reading the exact same files it just read before compaction.
-
Each re-read fills the context window again → triggers another compaction → amnesia → re-read. This is the loop.
-
The model never accumulates enough simultaneous context to actually synthesize the analysis into a written deliverable. It’s always re-ingesting, never outputting.
Why It Never Wrote the Audit
The verification_audit.md requires synthesizing information from all three LaTeX files simultaneously — cross-referencing theorems in unified_cognitive_architecture.tex with layer specs in section3_layers.tex with sovereignty shell specs in sections4_8.tex. But by the time Flash finishes reading the third file, the first file’s details have been compacted away. So it starts over.
It’s not stupidity — it’s a context window capacity problem. Flash was doing the right thing (reading files, following the SVW protocol, outputting S-Lang traces) but physically couldn’t hold enough information in working memory to produce the synthesis.
Why Opus Succeeded
Opus has a larger effective context window and — critically — I recognized the pattern from the compaction summary and made a deliberate architectural decision: read the minimum necessary, hold the synthesis in a single turn, and write the entire audit in one shot. No re-reads. No circles. Ingest → synthesize → output.
The key difference wasn’t intelligence. It was working memory capacity and strategic output ordering (write first, verify compilation second — not the reverse).