[feature request] Performance Metrics in AI Studio Logs for Timeout

The Problem: Currently, it is difficult to distinguish between a “hanging” request and a high-latency response from complex models like Gemini 3.1 Pro. Without specific timing data, developers cannot effectively design system architecture for “agentic” workflows.

Specific Example: We are consistently seeing Gemini 3.1 Pro take over 60 seconds to complete a single response. Because the logs don’t show the breakdown of that minute (e.g., Prompt Processing vs. Thinking vs. Generation), we are struggling to:

  1. Set accurate client-side timeouts to avoid killing valid but slow requests.

  2. Trigger automated fallbacks to Gemini 3.1 Flash when Pro exceeds a specific latency threshold.

Proposed Metrics:

  • Time to First Token (TTFT): To measure initial responsiveness.

  • Total Wall-Clock Time: Total duration from request to completion.

  • Thinking Latency: Time spent generating “thought” tokens versus output tokens.

Providing this transparency in AI Studio would allow us to benchmark model performance accurately before migrating to Vertex AI or production environments.