Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25)

I’m experiencing a severe discrepancy in output quality and behavior when using the gemini-2.5-pro-03-25 model through Google AI Studio compared to the Gemini API provided via Google Cloud.

Detailed Scenario:

In Google AI Studio:

  • The model can process thousands of lines of code.
  • Reasoning and generation are thorough, usually taking approximately 1-2 minutes to fully process and stream a detailed response.
  • Responses are very well-structured, comprehensive, and accurate. Typically, solutions provided by AI Studio are effective “one-shot” solutions. Overall, the experience here is excellent.

Using the Gemini API (Google Cloud):

  • Despite using the same model (gemini-2.5-pro-03-25), requesting the exact same inputs, the API behaves dramatically differently.
  • The output generation completes extremely quickly, typically within around 10 seconds.
  • The quality of the responses through the API is consistently poor. Solutions frequently fail, and outputs appear superficial or incomplete.
  • Responses are often abruptly truncated, even though I explicitly set the maximum token limit to the maximum allowed (64K tokens).

Question to Community and Google Developers:

  • Is this a known issue or expected behavior with the current implementation of the Gemini 2.5 Pro API (gemini-2.5-pro-03-25)?
  • Could there be undocumented limitations or parameters specific to the API that severely impact the processing quality and output completeness?
  • Has anyone else encountered this issue, or is there something significantly wrong in my configuration or implementation?

I’d greatly appreciate insights or clarifications about this issue. Thanks!

Indeed, experiencing the same with gemini-2.5-flash-preview-04-17 and gemini-2.5-pro-preview-04-17. In google AI Studio, they behave correctly, but via the API, they just truncate the output or produce “Modified by moderator” in the middle

1 Like

Facing same issue. Following is for a run when using API:
Total Tokens: 16738
Prompt Tokens: 5676
Output Tokens: 5134
Thoughts Token: 5928
Finish Reason: STOP

maxToken: 24000

In my case the output expected is in JSON form, but its getting frequently truncated. Didnt notice this any time when working in studio.

1 Like

Just to share my experience: after restructuring the payload format in my requests, Gemini started working much more reliably. I was initially using an OpenAI-style structure, but switching to the Gemini-specific format made a big difference.

I’ll try to post a more detailed breakdown later of what exactly I changed to get it working — currently on mobile, so it’ll have to wait a bit.

1 Like

Promising! I’m eager to hear about your findings @rnaf . I’m currently using Google’s ADK framework, so there is nothing much I can change than the system prompt. And that is equal in both AI Studio and via the API. I have the suspicion that Google is not being honest with their APIs and they do things differently in the Google AI Studio web service.

@rnaf - eagerly looking forward to understanding what you did. In my case we are using REST APIs directly from a web application. relevant portion of my code looks like this:

        fun callGemini(text: String?, base64Image: String?, systemInstructions: String): String {
        val request = buildGeminiRequest(text, base64Image, systemInstructions)
        val headers = HttpHeaders().apply {
            contentType = MediaType.APPLICATION_JSON
        }

        val entity = HttpEntity(request, headers)
        val apiKey = apiKeyManager.getApiKey()
        val url = "$apiUrl/$modelName:generateContent?key=$apiKey"
        try {
            val response = restTemplate.exchange(url, HttpMethod.POST, entity, Map::class.java).body
            return extractResponse(response)
        } catch (e: Exception) { //...
        }
    }

    private fun buildGeminiRequest(
        text: String?,
        base64Image: String?,
        systemInstructions: String,
    ): GeminiRequest {
        val parts = mutableListOf<Part>()
        text?.let { parts.add(Part.Text(it)) }
        base64Image?.let {

val cleanBase64 = it.removePrefix("data:image/jpeg;base64,")
val mimeType = "image/jpeg"

            // Add the image part to the request
            parts.add(
                Part.Image(
                    InlineData(
                        mime_type = "image/png",
                        data = cleanBase64
                    )
                )
            )
        }

        val supportsThinking = modelName.contains("gemini-2.5")

        val generationConfig = if (supportsThinking) {
            GenerationConfig(
                temperature = temperature,
                maxOutputTokens = maxTokens,
                thinkingConfig = ThinkingConfig(thinkingBudget)
            )
        } else {
            GenerationConfig(
                temperature = temperature,
                maxOutputTokens = maxTokens
            )
        }

        val request = GeminiRequest(
            contents = listOf(Content(parts = parts)),
            systemInstruction =  SystemInstruction(parts = listOf(Part.Text(systemInstructions))),
            generationConfig = generationConfig,
            tools =   listOf(Tool(google_search = GoogleSearch()))
        )
        return request
    }

url = https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-04-17:generateContent?key=

When you say used you restructured the payload, what part did you change?

i am trying to use gemini-2.5-pro-preview-05-06 .. if i use from google studio it is pretty quick… i can get it work from local machine also within reasonable time..but when u try to use from aws lambda .. both the api version and vertexai becomes 10x slower.. any thoughts or suggestions how to get to work with same response time from
aws lambda

Its the auth! Set it up to keep the lambda warm, and cache the tokens