Over 300k context tokens lead to 429 error for both Gemini-pro and flash on free tier

I notice quite weird behavior on free tier both gemini-pro and gemini-flash models. Everything works well until the context size extends 300k tokens. After that moment I am constantly getting errors like this:

google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerMinute-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-flash'}, 'quotaValue': '250000'}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '3s'}]}}

If for gemini-pro this error might seem reasonable, as it has 250,000 TPM limitation, for gemini-flash 1,000,000 TPM is set, so everything should work fine, yet don’t.

Just wondering, how are you building up the context? Is it through a multi-turn text conversation or are you filling the context by uploading large files or documents?

@GUNAND_MAYANGLAMBAM
Building projects through conversations with code using my own another project: GitHub - volotat/InsightCoder

It tries to fit all the code into the context, as well as compressed history of passed conversation. So I know for sure that the model has everything I need in the current context. This gives me continuity and precise control over the generated code. No automatic applyings, everything goes through my eyes after careful consideration. This allows me to avoid the main downfall of the “vibe coding” - error accumulation. It is actually quite a common occurrence when the model tries to do something really not good, and this tool allows me to keep it straight on the road. For me this approach works exceptionally well.

Hey based the json response body you have shared you are using 2.5 Flash which has a TPM limit of 250,000 . Please try 2.0 Flash since it has TPM limit of 1,000,000 and let us know if you are still facing the issue.

Hello. I am now suddenly encountering errors like this:

429 RESOURCE_EXHAUSTED. {'error': {'code': 429, 'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.', 'status': 'RESOURCE_EXHAUSTED', 'details': [{'@type': 'type.googleapis.com/google.rpc.QuotaFailure', 'violations': [{'quotaMetric': 'generativelanguage.googleapis.com/generate_content_free_tier_input_token_count', 'quotaId': 'GenerateContentInputTokensPerModelPerMinute-FreeTier', 'quotaDimensions': {'location': 'global', 'model': 'gemini-2.5-pro'}, 'quotaValue': '125000'}]}, {'@type': 'type.googleapis.com/google.rpc.Help', 'links': [{'description': 'Learn more about Gemini API quotas', 'url': 'https://ai.google.dev/gemini-api/docs/rate-limits'}]}, {'@type': 'type.googleapis.com/google.rpc.RetryInfo', 'retryDelay': '8s'}]}}

If I am getting it right, the free tier now has even more cuts, up to 125000 max tokens? The Rate limits  |  Gemini API  |  Google AI for Developers page still shows 250k for all three 2.5 models, pro, flash and light. Are they not being updated yet or is it some bug on the provider’s side?