Hi,
I’m experiencing an unexpected rate limit categorization issue with Grounding with Google Search on the Gemini API.
Environment
- Tier: Tier 3
- Model specified in code: gemini-2.5-flash
- API: Gemini Developer API (generativelanguage.googleapis.com/v1beta)
- Tool used: google_search (for Grounding with Google Search)
Issue
When I make generateContent requests with the google_search tool enabled using model gemini-2.5-flash, the grounding usage is being counted under the “Gemini 3”
category in the AI Studio Rate Limits dashboard — not under “Gemini 2.5”.
What I see in the Rate Limits dashboard:
┌────────────┬──────────────────┬──────────────────────────┐
│ Category │ Type │ Usage │
├────────────┼──────────────────┼──────────────────────────┤
│ Gemini 2.5 │ Search Grounding │ 793 / 5K │
├────────────┼──────────────────┼──────────────────────────┤
│ Gemini 3 │ Search Grounding │ 1,640 / 1,500 (exceeded) │
└────────────┴──────────────────┴──────────────────────────┘
What I expect:
All grounding requests should be counted under Gemini 2.5, since my code explicitly specifies gemini-2.5-flash as the model. I do not use any Gemini 3 model in my
codebase.
Code snippet
const url = https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=${apiKey};
const body = {
contents: [{ role: “user”, parts: [{ text: prompt }] }],
tools: [{ google_search: {} }],
};
const response = await fetch(url, {
method: “POST”,
headers: { “Content-Type”: “application/json” },
body: JSON.stringify(body),
});
Impact
- The Gemini 3 grounding quota (1,500) has been exceeded, causing 429 RESOURCE_EXHAUSTED errors.
- On Tier 3, the Gemini 2.5 grounding free tier is 1,500 RPD. If all requests were correctly categorized under Gemini 2.5, the combined total (~2,400) would still
exceed the free tier but would not hit a hard rate limit at 1,500. - Additionally, this misclassification may result in incorrect billing, as Gemini 3 uses per-query billing while Gemini 2.5 uses per-prompt billing.
Questions
- Is this a known issue with how grounding requests are categorized in the rate limit dashboard?
- Are gemini-2.5-flash grounding requests being internally routed through a Gemini 3 model?
- If this is a bug, is there a workaround to ensure requests are counted under the correct model category?
Any clarification would be greatly appreciated. Thank you.