Streaming API is too slow

I am using @googlegoogle/genai for streaming response for Gemini 2.5 pro mode, and the response time is lacking.

Here is the code snippet:

const startTime = Date.now();let chunkCount = 0;let totalContent = ‘’;
const fullPrompt = systemPrompt ? ${systemPrompt}\n\n${prompt} : prompt;
try {
  // Format contents properly as array of messages with role and parts
  const contents = [
    {
      role: 'user' as const,
      parts: [
        {
          text: fullPrompt,
        },
      ],
    },
  ];

  const response = await this.genAI.models.generateContentStream({
    model: this.config.model,
    contents,
    config: this.getGenerationConfig(),
  });

  for await (const chunk of response) {
    // Log finish reason and safety ratings for debugging
    if (chunk.candidates?.[0]?.finishReason) {
      logger.debug('Gemini finish reason:', chunk.candidates[0].finishReason);
    }
    if (chunk.candidates?.[0]?.safetyRatings) {
      logger.debug('Gemini safety ratings:', chunk.candidates[0].safetyRatings);
    }

    if (chunk.text) {
      chunkCount++;
      totalContent += chunk.text;
      yield { content: chunk.text };
    }
  }

 ....
}


Here is the raw response from this API. This shows it took 16 seconds to get the first chunk Is this expected delay?

1 Like

I am encountering the same issue, did you manage to fix it?

Hey @drift2 - thanks for reaching out here. Since 2.5 Pro is a reasoning / thinking model, it will always generate thoughts before the final response. You can enable thought summaries (https://ai.google.dev/gemini-api/docs/thinking#summaries) to see the thoughts and reduce the perceived latency. On the flip side, if you want the model to think less and/or start its response sooner you can either reduce the thinking budget (https://ai.google.dev/gemini-api/docs/thinking#set-budget) or use a faster model like Gemini 2.5 Flash or Gemini 3.0 Flash.