Response time for Gemini API

Smit_Rathod · December 10, 2024, 2:06pm

Hi team,

We have a python script which is calling Gemini API to get the explanation of queries and we need to show the estimated time to complete the whole process.

To get the estimated time we are looking to get the average response time of Gemini API.

Below are the tech details:

Average number of token per request: 500
Modal used: Gemini 1.5 Flash

We have observed that Gemini AI takes 10 to 12 seconds to process a request of 500 tokens. What should be the expected time for processing 500 tokens?

Macha · December 11, 2024, 10:33pm

Hey there and welcome to the community!

Several factors can affect the speed of a language model’s response. There may also be different processing times depending on whether or not you’re using function calling or just asking to produce plain text, as well as utilizing other features and mediums.

I think you’re already doing the right thing, which is observe the speed based off of your use case, and use that as your marker for expected time to experiment further and see whether or not you can increase / decrease its speed.

Smit_Rathod · December 12, 2024, 9:34am

Thanks,
We are just asking for explanation of a Query pure text and we are targeting to use the base 1.5 Flash modal. We are not using any other features.

Based on this currently it is taking 10 to 12 seconds for response, we just wanted to confirm if this time range is similar for other people as well or is there anyone having less response time for the request like us(500 tokens per request)

On the other note we also wanted to know if we can provide proxy support though SDK.

alex_li · December 13, 2024, 3:36am

This problem only appeared in the past 3 days, and the interface returned very quickly before.

bai · December 13, 2024, 5:30am

In the past two days, both the API and AI STUDIO in the Singapore region have experienced extremely slow response times for 1.5 flash. The 2.0 exp response speed is normal. Please fix this as soon as possible!!!

OrangiaNebula · December 13, 2024, 4:39pm

The http://status.cloud.google.com/ shows there is a slow response problem with Gemini-1.5-flash at this time. The status page notes
QUOTE
13 Dec 2024 06:12 PST We will provide an update by Friday, 2024-12-13 10:00 US/Pacific with current details.
END QUOTE

It also shows no workaround at this time.

Topic		Replies	Views
Unexpected Delay in Gemini-1.5-Flash API Responses Gemini API gemini-15 , api	2	635	November 21, 2024
Extreme latency on gemini-1.5-flash API Gemini API api , models	3	596	January 6, 2025
Slow response from Gemini 2.0 Flash Experimental Google AI Studio gemini-flash	11	1293	March 1, 2025
Issues with Gemini 1.5 Flash API Performance Google AI Studio gemini-15 , api , models	1	244	July 3, 2025
Gemini taking too long to respond (~5m) Gemini API api , gemini-flash , gemini-2-5	2	154	July 24, 2025

Response time for Gemini API

Related topics