After the context exceeded 100k, the 2.5 Pro API started frequently reporting errors.
26 Jan, I made 184 requests with a 39% success rate, encountering 11 429 errors and 98 503 errors.
27 Jan, I made 269 requests with a 16% success rate, encountering 109 429 errors and 99 503 errors.
Then the context exceeded 200k.
28 Jan, I made 201 requests with a 0% success rate, encountering 79 429 errors and 110 503 errors.
29 Jan, I made 172 requests with a 2% success rate, encountering 58 429 errors and 108 503 errors.
As a control group, when I start a new conversation with 2.5 Pro (minimal context), there are definitely no errors.
Using other models as a control group, with the same 200k context and prompt, using 3.0 Pro and 3.0 Flash, there are definitely no errors.
BTW, I am still far from the RPM/TPM/RPD limits.
Additionally, I observed a phenomenon: in the few successful responses I received on 29 Jan, I noticed that 2.5 Pro did not use reasoning at all, but output the answer directly. Whereas in my entire previous conversation, outputting reasoning before answering was the vast majority case. Therefore, my hypothesis is that when the context length exceeds 200k, the system first judges whether to accept the request based on compute availability; if strained, it directly outputs a 503. If it decides to accept the request, it then uses reasoning as a red flag; once 2.5 Pro decides to use reasoning, the system cuts off the conversation and outputs a 429. If 2.5 Pro decides to output the conversation directly, the system lets it pass. If my guess is correct, this restriction increases progressively with the context. Based on my personal experience, it is basically unusable above 200k. If I turn off reasoning, the success rate might rise, but my work involves analysis and relies on reasoning.
I know that compute currently needs to be prioritized for the 3.0 series and Nano Banana Pro, and I fully understand this approach, but a success rate of less than 2% is too frustrating. I personally feel that the compute for the 3.0 series is quite sufficient.