gemini-3.1-flash-tts-preview still hitting free-tier quota (limit 3) despite Tier 2 billing

Following the pattern from a similar thread here (allowlist request for gemini-3.1-flash-tts-preview), I’m hitting a quota issue that looks like a provisioning gap rather than an actual limit.

Project details

  • Billing: Active, confirmed Tier 2 · Postpay at ai.dev/rate-limit
  • Model: gemini-3.1-flash-tts-preview
  • API: Gemini Developer API, generateContent with responseModalities: ["AUDIO"]

Issue

My project shows Tier 2 in the AI Studio dashboard, but TTS requests keep failing with a 429 citing the free-tier bucket:

Quota exceeded for metric: generativelanguage.googleapis.com/generate_content_free_tier_requests,

limit: 3, model: gemini-3.1-flash-tts

I pulled the raw quota config directly (gcloud alpha services quota list --service=``generativelanguage.googleapis.com`` --consumer=projects/kt-cloud-437816) and found both buckets exist for this model on my project:

Metric effectiveLimit
generate_content_free_tier_requests 3 (this is the one firing)
generate_content_paid_tier_2_requests 1000 (matches my dashboard, not being used)

So the correct paid-tier bucket is provisioned, but the old free-tier one seems to still be enforced in parallel instead of being superseded after the tier upgrade.

Ask

Could someone take a look at why the free-tier bucket is still active for this model/project, and get the paid_tier_2 bucket (1000 RPM) to actually apply? Happy to share request IDs / more logs if useful.