UNAVAILABLE (code 503): No capacity available for model claude-opus-4-5-thinking on the server

This error happens all the time. There is too little retries allowed.
Also this can happen in any of the multi-steps of an agentic loop.
If the model is reading files and making decisions or multi-part-editing it will fail halfway thru its agentic loop, and nudging it will break its proper order of context messages, leading to the model potentially misbehaving in the rare occasion it does manage to work.

You need to implement some sort of queuing and longer “timeouts” on the server side. Maybe some “priority quota”.. It is impossible, at all, to run any inference on claude models, and has been this way for the past week. Others are reporting problems with Gemini too, I don’t really know, I rarely have tasks I trust Gemini to complete so I use other tools instead.

This is not an auth or quota error. This is a server side error, overload of the inference layer. Perhaps you should reduce quotas for users if you are unable to provide them?

And no, using smaller models does not help most of the time. I used to adjust model to the task at hand, but you successfully cured me of that. If it is going to fail 90% of the time anyway, might as well try Opus each time, in case it actually works. It’s not really real Opus anyway since you quant and re-route it, but if the service has a failure rate that high, why would I bother trying to use my quota “responsibly”, clearly I will just handicap myself with no upside.

3 Likes

fr. and even when dont want to use claude 4.6 because of this error, atleast give us 4.5, . but naw, selecting 4.5 opus says, “not available“, and gemini 3-pro thinking also getting cuttoff in the middle of editing, makes too much errors, not an proper, ide, if your paying for pro and ultra plan, it was good upto january, but after the relase of genie 3 i am seeing this recent rate limits, and all that no computation left stuffs. too bad expericence