Hey,
Google released few weeks ago and solution for the 503 issue, where if you pay twice the price they will put you in a fast lane and at peak times you should have access to the service
We allowed for priority as 503 is a big problem for us, but it just doesn’t work.
It has never resolved the 503 issues and the api is set up properly
Git issue here:
opened 05:02PM - 15 May 26 UTC
type: bug
priority: p2
### Hey,
Gemini has `Priority inference` - https://ai.google.dev/gemini-api/docs… /priority-inference#how-to-use
And we recently allowed for `priority` for all of our AI uses
Sadly I cant get it to preform, we still get:
```
503 UNAVAILABLE. {'error': {'code': 503, 'message': 'This model is currently experiencing high demand. Spikes in demand are usually temporary. Please try again later.', 'status': 'UNAVAILABLE'}}
```
The service_tier flag is set and the response acknowledges the setting - we are running `Tier 2` on the API key:
> Priority inference is available to [Tier 2 & Tier 3](https://ai.google.dev/gemini-api/docs/billing#about-billing) users across the GenerateContent API and Interactions API endpoints.
And still nothing,
I know that its not 100% grantee to give a response at peak-times but for the few weeks its out its completely useless and never helps avoid the 503 or speed response times like they promise
Am I possibly missing something?
did anyone else use this and actually get something out of it?
<details><summary>Environment details</summary>
<p>
- Programming language: python 3.14.4
- Package version: 2.1.0
</p>
</details>
Thanks!
Google should separate the API pay method and subscription plan hardware pool. API pay much more and need uninterrupted work flow.
Hello @All ,
Can you share your project ID via DM?