Streaming API is too slow

drift2 · October 14, 2025, 3:14am

I am using @googlegoogle/genai for streaming response for Gemini 2.5 pro mode, and the response time is lacking.

Here is the code snippet:

const startTime = Date.now();let chunkCount = 0;let totalContent = ‘’;
const fullPrompt = systemPrompt ? ${systemPrompt}\n\n${prompt} : prompt;

try {
  // Format contents properly as array of messages with role and parts
  const contents = [
    {
      role: 'user' as const,
      parts: [
        {
          text: fullPrompt,
        },
      ],
    },
  ];

  const response = await this.genAI.models.generateContentStream({
    model: this.config.model,
    contents,
    config: this.getGenerationConfig(),
  });

  for await (const chunk of response) {
    // Log finish reason and safety ratings for debugging
    if (chunk.candidates?.[0]?.finishReason) {
      logger.debug('Gemini finish reason:', chunk.candidates[0].finishReason);
    }
    if (chunk.candidates?.[0]?.safetyRatings) {
      logger.debug('Gemini safety ratings:', chunk.candidates[0].safetyRatings);
    }

    if (chunk.text) {
      chunkCount++;
      totalContent += chunk.text;
      yield { content: chunk.text };
    }
  }

 ....
}

Here is the raw response from this API. This shows it took 16 seconds to get the first chunk Is this expected delay?

Jeremy_Rippert · January 13, 2026, 12:49pm

I am encountering the same issue, did you manage to fix it?

Vishal · January 13, 2026, 9:55pm

Hey @drift2 - thanks for reaching out here. Since 2.5 Pro is a reasoning / thinking model, it will always generate thoughts before the final response. You can enable thought summaries (https://ai.google.dev/gemini-api/docs/thinking#summaries) to see the thoughts and reduce the perceived latency. On the flip side, if you want the model to think less and/or start its response sooner you can either reduce the thinking budget (https://ai.google.dev/gemini-api/docs/thinking#set-budget) or use a faster model like Gemini 2.5 Flash or Gemini 3.0 Flash.

Topic		Replies	Views
Very slow response time on the new 2.5 Pro 0605 model Gemini API generative-ai , gemini-2-5	4	2482	June 27, 2025
GenAi Apis is taking much time in generating response Gemini API gemini-2-5 , genai	1	335	July 29, 2025
Gemini taking too long to respond (~5m) Gemini API api , gemini-flash , gemini-2-5	2	753	July 24, 2025
Gemini-2.5-pro accessed over https://generativelanguage.googleapis.com/v1beta/openai/ has dramatic latency increase Gemini API api , model , gemini-2-5	10	959	July 21, 2025
Gemini Live API Response Delay Issue Gemini API api , performance	9	477	December 5, 2025

Streaming API is too slow

Related topics