Hi,
We are using gemini-2.5-flash
(and 2.5 pro and 2.5 flash-lite) with a tier 3 api key (from https://aistudio.google.com/u/1/apikey). However we occasionally hit rate limits earlier than expected with the following response message:
ClientError: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_paid_tier_2_input_token_count', 'quotaId': 'GenerateContentPaidTierInputTokensPerModelPerMinute-PaidTier2', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-flash'}, 'quotaValue': '3000000'}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '19s'}]}}
I’m confused why this message mentions PaidTier2
instead of tier 3, and the TPM is 3M instead of 8M. Is the mention of rate limits a red herring here, and this is just a typical 429 resource exhausted error? Or are we incorrectly having the tier 2 TPM limit applied?