I’m experiencing a severe discrepancy in output quality and behavior when using the gemini-2.5-pro-03-25 model through Google AI Studio compared to the Gemini API provided via Google Cloud.
Detailed Scenario:
In Google AI Studio:
- The model can process thousands of lines of code.
- Reasoning and generation are thorough, usually taking approximately 1-2 minutes to fully process and stream a detailed response.
- Responses are very well-structured, comprehensive, and accurate. Typically, solutions provided by AI Studio are effective “one-shot” solutions. Overall, the experience here is excellent.
Using the Gemini API (Google Cloud):
- Despite using the same model (gemini-2.5-pro-03-25), requesting the exact same inputs, the API behaves dramatically differently.
- The output generation completes extremely quickly, typically within around 10 seconds.
- The quality of the responses through the API is consistently poor. Solutions frequently fail, and outputs appear superficial or incomplete.
- Responses are often abruptly truncated, even though I explicitly set the maximum token limit to the maximum allowed (64K tokens).
Question to Community and Google Developers:
- Is this a known issue or expected behavior with the current implementation of the Gemini 2.5 Pro API (gemini-2.5-pro-03-25)?
- Could there be undocumented limitations or parameters specific to the API that severely impact the processing quality and output completeness?
- Has anyone else encountered this issue, or is there something significantly wrong in my configuration or implementation?
I’d greatly appreciate insights or clarifications about this issue. Thanks!