Latency in web search

Good afternoon, I’m experiencing difficulties in developing my project. I’m unable to reduce the latency related to web searches. Currently, I’m using the Gemini 2.5 Flash model. My system is built with LangGraph, where there is an orchestrator agent that routes decisions and tool usage, and finally a node for synthesizing responses. Could someone suggest any techniques or methods to reduce latency?

Hi @Albano_Lucas Apologies for late response
Could you please try the following approaches? If they are existing methods, let me know if they helped.

1.Use Gemini’s Google Search for grounding, or ensure any external search tool runs asynchronously to avoid blocking the main process with network delays.

2. Try to move all static system instructions to the top of your prompt. Gemini 2.5 will automatically cache this context, leading to faster processing on subsequent requests.

3.If you could configure LangGraph to execute multiple searches or tasks in parallel branches instead of running them sequentially . This could help in reducing the overall processing duration.

4.In the UI part add stream status updates (like “Searching the web…”) to the user interface to manage user expectations and reduce the perception of waiting.

5.Use examples (few-shot prompting) instead of complex instructions for the final answer generation, allowing the model to produce the summary faster with less “thinking” time.

Thanks