Vertex tuned 2.5 Flash: Huge latency increase since 6 hours ago

JJH · November 20, 2025, 3:12am

We did not deploy any changes, hence no differences in prompt length and configuration, this is pure Vertex-side. The increased latency started between 7:55PM UTC and 9:25PM UTC, 19 Nov 2025. It keeps increasing further. This is currently degrading our product.

Sidd_Man · November 20, 2025, 9:53pm

I am seeing the exact same problem. Latency went from sub 1 second to the more than 20 seconds starting yesterday on our fine tuned 2.5 flash model deployed on vertex AI

JJH · November 21, 2025, 6:59am

Re-allocating most TPUs to Nanobanana Pro and making all finetuned models by enterprise users that many resources have been poured into 20 times slower? Looks more likely than one might think.

manojmsn · November 21, 2025, 8:47pm

This is definitely more widespread, I have also seen almost a 50x increase in the endpoint serving our fine tuned Gemini 2.5 model, with no change on our end. How do we get the right folks from google to look into this ?

JJH · November 22, 2025, 1:51am

Everyone affected needs to write tickets on GCP support, we’ve just done so. console.cloud.google.com/support.
Is your endpoint on us-west1? Wondering if this is regional.

manojmsn · November 22, 2025, 3:46am

No we are hosted on us-central1 - Will definitely file a ticket as well.

JJH · November 25, 2025, 3:37am

Have you heard back from them? This is pretty absurd, it’s now close to a week. Our ticket has effectively been ignored; first reply asked for information we already gave in the ticket, then after replying no response for 48+ hours now.

Xandor · February 26, 2026, 3:24am

Did you get any response at all?

Topic		Replies	Views
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	791	January 6, 2025
Gemini 2.5 Flash latency increase after June 15 — anyone else seeing this? Gemini API api , gemini	2	211	July 17, 2026
Gemini API latency Issues Gemini API bug , api , issues	6	1023	September 23, 2025
Unexpected Delay in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	2	861	November 21, 2024
Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase Gemini API api , model , gemini-2-5	10	1184	July 21, 2025

Vertex tuned 2.5 Flash: Huge latency increase since 6 hours ago

Related topics