Gemini API so slow . Am i doing something wrong?

Hey, i am trying to make an egine to rewrite some articles. So i have a prompt with a python script requesting gemini to rewrite the articles.

Sometime have to wait up to 22 minutes (1300 seconds) to get 1300 characters.

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Region: europe-west1 | Input Characters: 1463 | Output Characters: 1492 | Time Taken: 92.18 secondsAttempting with region: us-east1 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west1 | Input Characters: 1634 | Output Characters: 1174 | Time Taken: 172.57 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)Attempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1043 | Output Characters: 1129 | Time Taken: 8.32 seconds
Attempting with region: europe-central2 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1006 | Output Characters: 979 | Time Taken: 7.57 secondsRegion: europe-north1 | Input Characters: 1505 | Output Characters: 1346 | Time Taken: 9.41 seconds

Region: europe-west4 | Input Characters: 1384 | Output Characters: 1205 | Time Taken: 8.89 secondsRegion: europe-north1 | Input Characters: 1490 | Output Characters: 1481 | Time Taken: 11.07 secondsAttempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Attempting with region: europe-west4 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-central2 | Input Characters: 1560 | Output Characters: 1368 | Time Taken: 11.20 seconds
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)
Attempting with region: europe-west1 / gemini-1.5-pro-002 (Attempt 1/3)Region: europe-west4 | Input Characters: 1936 | Output Characters: 1634 | Time Taken:
24.61 seconds

Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)
Region: europe-west4 | Input Characters: 2356 | Output Characters: 1543 | Time Taken: 55.79 seconds

Region: europe-west1 | Input Characters: 1463 | Output Characters: 1492 | Time Taken: 92.18 secondsAttempting with region: us-east1 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west1 | Input Characters: 1634 | Output Characters: 1174 | Time Taken: 172.57 seconds

Attempting with region: europe-west4 / gemini-1.5-flash-002 (Attempt 1/3)Region: europe-west2 | Input Characters: 1633 | Output Characters: 1474 | Time Taken: 352.86 secondsAttempting with region: europe-north1 / gemini-1.5-pro-002 (Attempt 1/3)

Region: us-east1 | Input Characters: 1197 | Output Characters: 1419 | Time Taken: 636.26 secondsAttempting with region: europe-central2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

Region: europe-west4 | Input Characters: 1482 | Output Characters: 1526 | Time Taken: 1337.62 seconds
Attempting with region: europe-west2 / gemini-1.5-pro-002 (Attempt 1/3)

1 Like

Hi @Cristian_Ditoiu

Welcome to the community!

There are several ways to reduce latency:

  1. Consider batching multiple articles in a single request rather than sending each individually, as this can reduce latency.
  2. Try to shorten the prompt where possible; reducing the number of tokens will help lower response time.
  3. Use streamGenerateContent to process the output in real-time instead of waiting for the entire response.
  4. If you have precomputed input tokens that you plan to reuse, enabling caching can improve performance—more information is available in the caching documentation.
  5. High API traffic may occasionally lead to increased latency.

Thank you

GIMINI is slow from 16 nov, 2024, same code was runing yesterday very fast but now very slow. even rsponse for just ‘HI’ is too late? whats goining on there???

3 Likes

It’s related to the new update; For some reason the model has an extremely heavy load.

1 Like

I am using Gemini 1.5 flash and since today, November 16, the response time of the API was 2-3 seconds at most, but now it takes at least 60 seconds. Even though I made the necessary settings in my Vercel installation, I get the error “An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT”. I was able to get answers to the queries in the tests I made on my local computer, localhost, but as I said, unfortunately it takes a very long time.

2 Likes

Hi Yusuf
I’m facing a similar problem. Did you manage to find any workarounds?

yeah i found one :

  1. made a round robin to use more than one region
  2. started using gemini-pro

something like :

regions = [“europe-west4”, “us-east1”, “europe-north1”, “europe-west1”, “europe-west2”, “europe-west3”, “europe-west6”, “europe-central2”]
random.shuffle(regions)
gemini_models = [“gemini-1.5-flash-002”,“gemini-1.5-flash-002”, “gemini-1.5-pro-002”]
random.shuffle(gemini_models)
selected_region = regions[0]
selected_model = gemini_models[0]

2 Likes

Hi . I was using “gemini-1.5-flash-latest” code for model section before . Now i changed it to “gemini-1.5-flash-002” I did only this change. After some time this situation resolved by itself.