Gemini 3.1 Pro Preview Throwing Token Rate Limit error of 131072 tokens via Vertex API?

Brody · March 31, 2026, 9:21pm

Title

here is the error

```
Error: {“error”:{“code”:400,“message”:“The input token count (135538) exceeds the maximum number of tokens allowed (131072).”,“status”:“INVALID_ARGUMENT”}}
```

So Odd. All of the docs boast and point me to the 1M context window. But on input I am getting 400 Errors. 2.5 Pro works just fine.

I thought I had just resolved most of my issues with 3.1 Preview. Looks like I will be degrading this model on my platform.

The only similar errors that I see are from 2.5 back in October.

Is anybody else seeing this error? Would love to know I am not alone.

I have a custom limit on VertexAI- so other rate limits are not the issue here.

I just recently added the header for thinking-level: medium, but that was just to speed up the model. It was working fine last week on these high input tasks.

EDIT: Looks like I am not the only one

github.com/google-gemini/gemini-cli

The input token count (461428) exceeds the maximum number of tokens allowed (131072)

opened 01:33PM - 20 Oct 25 UTC

realharry

priority/p1 area/platform status/need-retesting status/bot-triaged kind/bug

### What happened? I get frequent 400 API errors: The input token count (461428…) exceeds the maximum number of tokens allowed (131072) ### What did you expect to happen? We communicate with Gemini solely with input prompts. How they are converted into a sequence of tokens, which is inputted into the context window of the LLM model, is beyond the control of the developers. If the input token is too long, then that is a bug in the Gemini CLI's algorithm of converting the developer's prompt/instruction to the input to the LLM. ### Client information <details> <summary>Client Information</summary> Run `gemini` to enter the interactive CLI, then run the `/about` command. ```console > /about | CLI Version 0.9.0 │ │ Git Commit a93d92a3 │ │ Model gemini-2.5-pro │ │ Sandbox no sandbox │ │ OS linux │ │ Auth Method OAuth │ │ IDE Client VS Code │ ``` </details> ### Login information Google account. ### Anything else we need to know? The longer I've been using Gemini CLI, the more often I seem to be getting these errors.

I have a feeling the github comment may be correct. gemini-3.1-flash-live-preview has a token limit of 131,072.

wonder how long it will take google to resolve.

Topic		Replies	Views
Gemini 2.5 Pro (paid, with API key) errors 100% of the time when token count is over 131k Gemini API bug , gemini-25 , gen-ai	8	609	December 29, 2025
Incorrect gemini-2.5-pro input token limit response Gemini API api , models	4	541	January 13, 2026
400, 500 and 503 since morning Gemini API bug , api , models , rate-limits	24	1159	October 31, 2025
Gemma 3n Input Token Count Limit Gemma api , rate-limits	1	101	January 23, 2026
Token limit error (1.5 Pro and Flash) Gemini API gemini-15 , models	6	2204	March 25, 2025

Gemini 3.1 Pro Preview Throwing Token Rate Limit error of 131072 tokens via Vertex API?

Related topics