Serious Regression in Output Capability from Gemini 2.5 Pro to 3.1 Pro

Hello,

I’m writing to share my observations on the recent performance of your models.

I’ve been an active user of the Gemini models starting with 1.5 Pro, then moving to 2.0 Pro, and later 2.5 Pro. For my use cases, I witnessed an insane level of progress. The output speed and the quality of the responses were improving at a breakneck pace.

…Until the release of 3.0 Pro and its “improved” version, 3.1 Pro.

Now, the models take a very long time to think, even on simple prompts with the thinking budget set to “low”.

But that’s the least of the problems.

Previously, gemini-2.5-pro (with a standard thinking budget) could effortlessly output 64k tokens of translated text in a single response before hitting the hard limit. Alternatively, I could ask it to generate 500-1000 lines of high-quality translation, and it would deliver perfectly.

Then came 3.0 and 3.1 Pro. To put it bluntly, it’s a disaster.

In 10 out of 10 of my translation requests, I can see the model’s thought process where it reasons that it will only output a maximum of 20-100 lines to prevent a “buffer overflow”.

Is this a joke? Why have you trained your model with these “imaginary limits” that cause it to censor itself before it even starts generating a response? This is incredibly counter-productive!

Even if such a “limit” truly exists, how do you ever expect to develop an AGI if it’s ‘afraid’ of exceeding a 5,000-character output?

I’ve spent time testing this. I regenerated a response on gemini-3.1-pro 10 times (using both “high” and “low” thinking settings). Every single time, I received only 20 lines of text out of the 1000 I requested.

Immediately after that, I switched back to gemini-2.5-pro. It gave me the entire 1000 lines in a single response.

Something is seriously wrong with the direction you’re taking with the newer models. Please, look into this.

Same. 3.1 pro very poorly follows the instructions on the response size, in translation it is unbearable. 3.0 pro more understood what 100-200 lines or 20,000 characters meant, but 3.1 only gives me 3-5k at most, trying to cut out everything unnecessary. It’s so annoying.