2.5 Pro, Tier 1, success rate<40% for 100K+ context, success rate<2% for 200K+ context

Tens · January 29, 2026, 4:42pm

After the context exceeded 100k, the 2.5 Pro API started frequently reporting errors.

26 Jan, I made 184 requests with a 39% success rate, encountering 11 429 errors and 98 503 errors.

27 Jan, I made 269 requests with a 16% success rate, encountering 109 429 errors and 99 503 errors.

Then the context exceeded 200k.

28 Jan, I made 201 requests with a 0% success rate, encountering 79 429 errors and 110 503 errors.

29 Jan, I made 172 requests with a 2% success rate, encountering 58 429 errors and 108 503 errors.

As a control group, when I start a new conversation with 2.5 Pro (minimal context), there are definitely no errors.

Using other models as a control group, with the same 200k context and prompt, using 3.0 Pro and 3.0 Flash, there are definitely no errors.

BTW, I am still far from the RPM/TPM/RPD limits.

Additionally, I observed a phenomenon: in the few successful responses I received on 29 Jan, I noticed that 2.5 Pro did not use reasoning at all, but output the answer directly. Whereas in my entire previous conversation, outputting reasoning before answering was the vast majority case. Therefore, my hypothesis is that when the context length exceeds 200k, the system first judges whether to accept the request based on compute availability; if strained, it directly outputs a 503. If it decides to accept the request, it then uses reasoning as a red flag; once 2.5 Pro decides to use reasoning, the system cuts off the conversation and outputs a 429. If 2.5 Pro decides to output the conversation directly, the system lets it pass. If my guess is correct, this restriction increases progressively with the context. Based on my personal experience, it is basically unusable above 200k. If I turn off reasoning, the success rate might rise, but my work involves analysis and relies on reasoning.

I know that compute currently needs to be prioritized for the 3.0 series and Nano Banana Pro, and I fully understand this approach, but a success rate of less than 2% is too frustrating. I personally feel that the compute for the 3.0 series is quite sufficient.

Topic		Replies	Views
2.5 Pro gives 503 Errors 100% of the time whenever context tokens reaches a certain size, on a paid Tier 1 Account Gemini API api , models	0	41	December 21, 2025
503 errors with gemini 2.5 pro Gemini API api , gemini	1	383	August 27, 2025
503 “gemini-2.0-flash-thinking-exp-01-21”need support Google AI Studio api , models	1	154	April 8, 2025
Gemini-2.5-flash-preview-04-17/ 500/503 Gemini API feedback , bug , gemini-flash	5	390	August 18, 2025
Gemini 2.5 pro 503 error Gemini API ai-studio , api , gemini , model , gemini-2-5	3	754	November 19, 2025

2.5 Pro, Tier 1, success rate<40% for 100K+ context, success rate<2% for 200K+ context

Related topics