Vertex tuned 2.5 Flash: Huge latency increase since 6 hours ago

We did not deploy any changes, hence no differences in prompt length and configuration, this is pure Vertex-side. The increased latency started between 7:55PM UTC and 9:25PM UTC, 19 Nov 2025. It keeps increasing further. This is currently degrading our product.

I am seeing the exact same problem. Latency went from sub 1 second to the more than 20 seconds starting yesterday on our fine tuned 2.5 flash model deployed on vertex AI

1 Like

Re-allocating most TPUs to Nanobanana Pro and making all finetuned models by enterprise users that many resources have been poured into 20 times slower? Looks more likely than one might think.

1 Like

This is definitely more widespread, I have also seen almost a 50x increase in the endpoint serving our fine tuned Gemini 2.5 model, with no change on our end. How do we get the right folks from google to look into this ?

Everyone affected needs to write tickets on GCP support, we’ve just done so. console.cloud.google.com/support.
Is your endpoint on us-west1? Wondering if this is regional.

No we are hosted on us-central1 - Will definitely file a ticket as well.

Have you heard back from them? This is pretty absurd, it’s now close to a week. Our ticket has effectively been ignored; first reply asked for information we already gave in the ticket, then after replying no response for 48+ hours now.