Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25)

I’m experiencing a severe discrepancy in output quality and behavior when using the gemini-2.5-pro-03-25 model through Google AI Studio compared to the Gemini API provided via Google Cloud.

Detailed Scenario:

In Google AI Studio:

  • The model can process thousands of lines of code.
  • Reasoning and generation are thorough, usually taking approximately 1-2 minutes to fully process and stream a detailed response.
  • Responses are very well-structured, comprehensive, and accurate. Typically, solutions provided by AI Studio are effective “one-shot” solutions. Overall, the experience here is excellent.

Using the Gemini API (Google Cloud):

  • Despite using the same model (gemini-2.5-pro-03-25), requesting the exact same inputs, the API behaves dramatically differently.
  • The output generation completes extremely quickly, typically within around 10 seconds.
  • The quality of the responses through the API is consistently poor. Solutions frequently fail, and outputs appear superficial or incomplete.
  • Responses are often abruptly truncated, even though I explicitly set the maximum token limit to the maximum allowed (64K tokens).

Question to Community and Google Developers:

  • Is this a known issue or expected behavior with the current implementation of the Gemini 2.5 Pro API (gemini-2.5-pro-03-25)?
  • Could there be undocumented limitations or parameters specific to the API that severely impact the processing quality and output completeness?
  • Has anyone else encountered this issue, or is there something significantly wrong in my configuration or implementation?

I’d greatly appreciate insights or clarifications about this issue. Thanks!