Severe Latency Inversion on Paid Tier 2 (Priority) using Gemini 3 Flash (Preview) compared to Standard Tier

Hello,

We recently upgraded to Paid Tier 2 (AI Studio) and configured service_tier="priority". We verified that the response header returns x-gemini-service-tier: priority correctly.

However, we are experiencing a severe performance inversion when sending multimodal requests (including images resized to 1024px long-edge):

  • Standard Tier (Free): Completed in approx. 30 seconds.
  • Priority Tier (Paid): Takes 80 to 356 seconds (95th percentile latency hits 202,072 ms in Google Cloud Console), accompanied by frequent 503 Service Unavailable errors.

Cloud Support (Case 72095741) has already reviewed our case and confirmed that this 80-356s delay is “significantly outside the performance targets for the Gemini 3 Flash model on the Priority tier.”

Since we have already optimized everything on the application side (minimizing token/image size, adjusting concurrency), this seems to be a backend routing or quota synchronization bug unique to Paid Tier 2 after the upgrade.

Is anyone else experiencing this issue with the preview models on Priority? Any insights from the Google engineering team would be highly appreciated.