I’m finding almost all my API requests to the gemini-3.1-pro-preview model are failing. The usual outcome is returning a ‘503’ error message after several minutes: {‘error’: {‘code’: 503, ‘message’: ‘This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.’, ‘status’: ‘UNAVAILABLE’}}
I get this error at all times of day though, and ever since the new 3.1 model was released.
The gemini-3-pro-preview model was previously working well and quite responsive, but since the 3.1 release that older model is also unreliable for me (more likely to succeed than 3.1, but still very unreliable).
In the last week, I think I’ve had the 3.1 model successfully respond to a completion request about 3 times. So less than 10% success rate, making the Gemini API unusable currently.
Is anyone aware what could be causing this and how it can be resolved?
I don’t know what in the world happened between 3 and 3.1. It was working fantastic at first, save for a few changes in the validation pipeline I needed to make with the better data 3.1 produces, but yesterday I started experiencing 503s, and today it’s been 499s and timeouts. 360s timeouts used (which were generous) with 2.5 and 3 have suddenly become meaningless. Testing now with 600s with the large datasets I deal with. 3 had no issues chewing through this stuff. While 3.1 is arguably better in its output, I question how sustainable the model is from a production standpoint if the model can’t deal with the throughput of context window vs actual processing ability. Don’t think there is anything to be done for the 503s except try different times of day, but for timeouts the only way around it seems to be increasing the duration.
same here, It was like this for half the day yesterday, and the status in Google AI Studio shows everything is up and running, which makes it even more frustrating.
Although status GEMINI API status says all systems functional, I am unable to use Gemini 3 Flash at all during last approx 3 days. I am usinf it in apps like TypingMind and other and 9 of 10 times I recieve error that the model has spikes or is overloaded. In fact this happens to me since 3.1 Pro Preview 9 or 10 of 10 times and it started to happen with 3 Pro as well. The whole Google Gemini API is getting worse and worse during last two or three weeks so I need to chesk outthe competition. I found out some Qwen models are much better in following the basic instructions in longer conversations too