Gemini 3.1 Pro Preview Throwing Token Rate Limit error of 131072 tokens via Vertex API?

Title

here is the error

```
Error: {“error”:{“code”:400,“message”:“The input token count (135538) exceeds the maximum number of tokens allowed (131072).”,“status”:“INVALID_ARGUMENT”}}
```

So Odd. All of the docs boast and point me to the 1M context window. But on input I am getting 400 Errors. 2.5 Pro works just fine.

I thought I had just resolved most of my issues with 3.1 Preview. Looks like I will be degrading this model on my platform.

The only similar errors that I see are from 2.5 back in October.

Is anybody else seeing this error? Would love to know I am not alone.

I have a custom limit on VertexAI- so other rate limits are not the issue here.

I just recently added the header for thinking-level: medium, but that was just to speed up the model. It was working fine last week on these high input tasks.

EDIT: Looks like I am not the only one

I have a feeling the github comment may be correct. gemini-3.1-flash-live-preview has a token limit of 131,072.

wonder how long it will take google to resolve.

Can confirm.

One interesting thing I’ve noticed is the input token size is a lot lower than I’d expect. I’m wondering if some kind of pre-processing step using gemini-3.1-flash-live-preview is being used? Total guess, of course.

Workaround for me is reverting back to Gemini 2.5 Pro for the time being, unfortunately.

Yep. My system defaults to 2.5 Pro aswell.

Seems to be what I will be using until they take the preview off of 3.1. Its a preview- I guess they have kinks to work out.

Would be nice! Oh Well. I did like the few results i have been able to generate with 3.1 when i had it

Yes this is a painful issue , 3.0 Pro preview was fine its the 3.1 that caused the issue. Right now falling back to 3.0 flash instead of 2.5 pro. Do we know when the next fix release cycle is expected