Gemini API so slow . Am i doing something wrong?

Cristian_Ditoiu · October 31, 2024, 9:10pm

Hey, i am trying to make an egine to rewrite some articles. So i have a prompt with a python script requesting gemini to rewrite the articles.

Sometime have to wait up to 22 minutes (1300 seconds) to get 1300 characters.

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1384 | Output Characters: 1205 | Time Taken: 8.89 secondsRegion: europe-north1 | Input Characters: 1490 | Output Characters: 1481 | Time Taken: 11.07 secondsAttempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-central2 | Input Characters: 1560 | Output Characters: 1368 | Time Taken: 11.20 seconds
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)Region: europe-west4 | Input Characters: 1936 | Output Characters: 1634 | Time Taken:
24.61 seconds

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

Susarla_Sai_Manoj · November 4, 2024, 7:03am

Hi @Cristian_Ditoiu

Welcome to the community!

There are several ways to reduce latency:

Consider batching multiple articles in a single request rather than sending each individually, as this can reduce latency.
Try to shorten the prompt where possible; reducing the number of tokens will help lower response time.
Use streamGenerateContent to process the output in real-time instead of waiting for the entire response.
If you have precomputed input tokens that you plan to reuse, enabling caching can improve performance—more information is available in the caching documentation.
High API traffic may occasionally lead to increased latency.

Thank you

ASIF_IMRAN · November 16, 2024, 12:02pm

GIMINI is slow from 16 nov, 2024, same code was runing yesterday very fast but now very slow. even rsponse for just ‘HI’ is too late? whats goining on there???

Proking · November 16, 2024, 4:09pm

It’s related to the new update; For some reason the model has an extremely heavy load.

Yusuf_DOGAN · November 18, 2024, 6:52am

I am using Gemini 1.5 flash and since today, November 16, the response time of the API was 2-3 seconds at most, but now it takes at least 60 seconds. Even though I made the necessary settings in my Vercel installation, I get the error “An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT”. I was able to get answers to the queries in the tests I made on my local computer, localhost, but as I said, unfortunately it takes a very long time.

2cheeze4u · November 19, 2024, 12:56pm

Hi Yusuf
I’m facing a similar problem. Did you manage to find any workarounds?

Cristian_Ditoiu · November 19, 2024, 1:11pm

yeah i found one :

made a round robin to use more than one region
started using gemini-pro

something like :

regions = [“europe-west4”, “us-east1”, “europe-north1”, “europe-west1”, “europe-west2”, “europe-west3”, “europe-west6”, “europe-central2”]
random.shuffle(regions)
gemini_models = [“gemini-1.5-flash-002”,“gemini-1.5-flash-002”, “gemini-1.5-pro-002”]
random.shuffle(gemini_models)
selected_region = regions[0]
selected_model = gemini_models[0]

Yusuf_DOGAN · November 21, 2024, 8:31am

Hi . I was using “gemini-1.5-flash-latest” code for model section before . Now i changed it to “gemini-1.5-flash-002” I did only this change. After some time this situation resolved by itself.

Topic		Replies	Views
Response time for Gemini API Gemini API models , python	5	604	December 13, 2024
Slow response from Gemini 2.0 Flash Experimental Google AI Studio gemini-flash	11	1059	March 1, 2025
Gemini 2.5 pro and 1.5 pro APIs take forever to respond if input tokens/min > 15K to 16K Gemini API models , rate-limits	0	26	April 21, 2025
Gemini API responses slower than Gemini on web when files are in chat Gemini API api , gemini-flash , gemini-20	0	27	March 26, 2025
Faster Response Times with Gemini 1.5 Pro? Gemini API api	1	110	January 30, 2025

Gemini API so slow . Am i doing something wrong?

Related topics