Tier 1, Gemini 3.1 Pro: TPM spikes after 503 and 429 does not reset for days

,

Hi,

I am using the Gemini API on a Tier 1 paid project with the gemini-3.1-pro model, called via a chat frontend (LibreChat). I am seeing what looks like broken TPM rate limiting after repeated 503 errors.

Problem 1 — Old chat: 249k → 985k, stuck 429 for days

In one project/chat I saw the following behavior:

  • Initial usage spike to around 249k TPM.

  • Then I got several 503 errors (The model is overloaded / High demand) when retrying the same request in the same chat.

  • After these 503 retries, my TPM in the AI Studio dashboard jumped from 249k to 985k (almost 4×).

  • Since then, every request returns 429 (“too many requests / TPM exceeded”) for this project, even if I send a very short prompt with minimal context.

  • This has been happening for several days, but the dashboard now shows current TPM about 500k, i.e. clearly below my Tier 1 TPM limit.

  • Despite the current TPM being well below the documented limit, the API still responds with 429 for any request.

So it looks like the TPM bucket / rate limiter is stuck in an exceeded state, not resetting correctly, even after days with very low usage.

Problem 2 — Another chat: 381k → 762k only from 503 retries

In another chat (same project, same model gemini-3.1-pro):

  • I sent a request and saw TPM go to about 381k.

  • Then I retried the same request 2 times and each time got a 503.

  • After these retries, the TPM jumped to about 762k, even though I did not add any new context to the chat between retries.

  • This matches the pattern that each 503 retry is counted as a full new request with the entire long context, and TPM is increased as if the requests succeeded.

At the moment there is no 429 yet in this second chat (the spike did not reach the apparent ~1M TPM ceiling), but the pattern is identical:
TPM increases significantly only due to multiple 503 retries, without any new content being added.

Why this looks like a bug

  • I understand that rate limits are based on a sliding time window and that long prompts with a lot of context can legitimately consume a lot of TPM.

  • However, in the first case, the current TPM in the dashboard has already dropped to ~500k (below Tier 1 limits), but the API still returns 429 for any request, even days later. This suggests that the internal TPM bucket or limiter state is not resetting correctly.

  • In both cases, 503 errors (The model is overloaded) clearly consume TPM as if they were successful requests, and multiple 503 retries on the same large context cause huge TPM spikes (249k → 985k, 381k → 762k) without any new traffic.

This makes the Tier 1 project practically unusable: a few 503 errors with retries on a long chat can permanently push TPM close to the limit and keep returning 429 even when actual usage is low.

Environment details

  • Project type: Google Cloud / AI Studio, Tier 1 billing enabled.

  • Model: gemini-3.1-pro (preview).

  • Usage pattern: chat-style application (LibreChat) sending system prompt + full conversation history + latest user message on each request.

  • Errors: repeated 503 followed by persistent 429 for days on the same project.

Questions

  1. Is this known issue with TPM / 503 accounting and rate-limit reset on Tier 1 for gemini-3.1-pro?

  2. Can someone from the engineering team check my project’s TPM bucket state and confirm why 429 is still returned when current TPM is already below the documented limit?

  3. Is there any workaround other than creating a new project / API key (which is what I’m doing now to keep working)?

I can provide:

  • Project ID (privately, if needed).

  • Exact timestamps of spikes (UTC) and approximate TPM values.

  • Screenshots of the AI Studio rate limit dashboard showing:

    • spike from 249k to 985k,

    • current TPM ~500k,

    • and 429 responses during that time.

Thank you in advance — this looks like a serious rate limiting bug for Tier 1 with gemini-3.1-pro, especially when 503 errors are frequent.

Quick update: the second chat where TPM jumped from ~381k to ~762k after repeated 503 errors is now also returning 429 (TPM is still 762k)