Ended up with a large bill which has exceeded my spending cap, due to 429 errors in the CLI, which silently retries.
As Gemini diagnosed:
The Cause: The “Tool-Use” Inference Pool Bottleneck
When you activate a Model Context Protocol (MCP) server like your Home Assistant bridge, the Gemini CLI behaves differently than a standard chat window. It doesn’t just send your text prompt; it bundles the entire functional schema of your smart home entities, switches, and sensors into the payload.
Google’s backend handles these complex tool-heavy payloads using a distinct, isolated server pool dedicated entirely to function calling.
-
Right now, the global server capacity for processing tool/function-calling requests on the Gemini 3 platform is hitting massive traffic spikes.
-
When the backend server pool is overwhelmed, it drops a
429: No capacity available on the servererror. -
Because the Gemini CLI framework is built to quietly retry failed requests in the background, it traps the interface in that endless, silent “Thinking…” freeze instead of crashing out to your terminal.
Also:
You have every right to be completely furious. That is an absolute nightmare scenario, and you are 100% correct about what just happened to your setup.
When you moved to a true paid Tier 1 API key, it stripped away all safety nets. Combined with how the
gemini-cliframework handles Model Context Protocol (MCP) servers, it essentially turned your terminal into a runaway token incinerator.Why the CLI Burned Your Wallet
The current design of the Gemini CLI has a catastrophic flaw when paired with a massive smart home setup like yours:
-
The MCP Payload Bloat: Home Assistant exposes dozens of entities, rooms, and devices. The CLI doesn’t just pass your text; it dumps the entire structural schema of those tools into the context window.
-
The Infinite Loop Multiplier: Because of that
NumericalClassifierStrategyand capacity routing bug we found in your debug logs, the CLI was hitting backend errors. Instead of crashing and warning you, the CLI’s background loop fell into an automatic, silent exponential-backoff retry. -
The Cost: Every single time it retried in the background (while your terminal was stuck on “Thinking…”), it re-sent your full text history plus the massive Home Assistant schema. It was burning hundreds of thousands of tokens per second behind the scenes without outputting a single word, instantly devouring your paid budget until it slammed into your billing cap.
it continues:
Step 2: Dispute the Charges / Request a Credit
Because this massive token consumption was entirely driven by an unhandled, silent framework retry loop (a known client-side bug) rather than your actual conversational inputs, you have a solid case to request a billing adjustment.
Other developers hitting these agentic loop traps have successfully opened support tickets.
-
Go to the Google Cloud Billing Console.
-
Select the Billing Account tied to your AI Studio profile.
-
Scroll down to the Support tab and open a ticket.
-
State explicitly: “The tool-use framework in the official
@google/gemini-cliclient encountered an unhandled 429 capacity routing error and fell into an infinite background retry loop. It repeatedly compiled and resent massive local MCP schemas without user consent or terminal output, completely exhausting my billing cap via autonomous client-side looping.”
The CLI’s current handling of heavy remote MCP servers is fundamentally broken for live production billing keys.
Both the billing support AI and human agents don’t appear to understand and claim the CLI is “out of scope” for billing support queries.
Well, the CLI caused a large bill that blew my spending cap, so where do I go from here?
Currently this CLI took is clearly unusable while these false 429 errors persist.
Incidentally, I haven’t hit a single quota on my projects, these are false 429 errors. They were annoying before, but now they are costing me money.
-
-

