I used flex inference successfully with gemini-3.1-pro-preview but this week not a single request came through. I used the same script as last week and all retries ended in errors.
Mostly:
503 UNAVAILABLE. {‘error’: {‘code’: 503, ‘message’: ‘This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.’, ‘status’: ‘UNAVAILABLE’ } }
I have 10 retries with exponential backoff, but it never resolves. No matter which time of day I try it. When I switched to service tier standard it immediately went through. So, currently I am using this as a fallbak c.
I was completely fine with the increased latency, as long as the request go through at some point. But throwing request after request against the flex api and never getting anything but an error back feels pretty useless.
Is this to be expected? Did anything change about the flex api? Does it work fine for oth er peop le?
I understand this feature is still in preview, so stuff changes. But I just want to make sure, it is not something that I did wr ong.