I am experiencing the widespread 429 “Too Many Requests” (or endless “Thinking…”) error exclusively in the Gemini CLI, and I’d like to provide my data points to help isolate this bug.
Here is my current situation:
Account Status: I am an active Google AI Pro subscriber.
Zero Recent Usage: I have not used the Gemini CLI at all in the past two weeks. It is completely impossible for my account to have genuinely exhausted its quota, token limits, or triggered any real anti-abuse thresholds.
Isolated to CLI: This issue is strictly isolated to the Gemini CLI. My Antigravity IDE is functioning perfectly without a single rate-limit interruption.
As a front-end developer, having the terminal workflow completely blocked by a phantom 429 error—while other tools on the same Pro account work flawlessly—is highly disruptive. It strongly points to an entitlement desync specific to the CLI’s OAuth gateway.
Could the team provide an ETA on when the backend routing for Pro CLI users will be resolved?
Fix: Removed and re-added Code Assist subscription for my account.
I started having this issue yesterday using Gemini Code Assist License assignment through GCP workspace (where you use google signin + plus project code to auth the cli). Perfect timing as I’m currently here at HumanX co. I was worried I was getting blocked for using a modified fork of the upstream gemini-cli project.
I went to the Google Gemini desk and showed them the error. The reps acknowledged the issue as soon as I said “429 error” but didn’t give a root cause. They recommended disabling “Retry Fetch Errors” which I had already tried, but then to drop and re add the subscription for my account in gcp which resolved the issue for me.
They mentioned that weird things can happen when connecting to “preview” models, like your account’s api calls hitting the correct endpoint but then getting stuck in a stale back-end route if they’re rolling nodes……they stressed “Preview” quite a bit.
I have the same issue. I am on the Google One AI Pro plan (with Gemini Code Assist). I normally use Claude Code but sometimes I pop over to Gemini CLI to see how it’s improved. Feels like it’s gotten way worse. I asked a simple prompt like “which of my cores are P-cores vs E-cores” (a question Claude answers in mere seconds with details on which core is what), Gemini CLI took over 3+ minutes (I just quit the request because that’s stupid long). Looking at the debug it says “Too many requests 429”. They need to fix this bug asap.