Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25)

rnaf · April 16, 2025, 7:36am

I’m experiencing a severe discrepancy in output quality and behavior when using the gemini-2.5-pro-03-25 model through Google AI Studio compared to the Gemini API provided via Google Cloud.

Detailed Scenario:

In Google AI Studio:

The model can process thousands of lines of code.
Reasoning and generation are thorough, usually taking approximately 1-2 minutes to fully process and stream a detailed response.
Responses are very well-structured, comprehensive, and accurate. Typically, solutions provided by AI Studio are effective “one-shot” solutions. Overall, the experience here is excellent.

Using the Gemini API (Google Cloud):

Despite using the same model (gemini-2.5-pro-03-25), requesting the exact same inputs, the API behaves dramatically differently.
The output generation completes extremely quickly, typically within around 10 seconds.
The quality of the responses through the API is consistently poor. Solutions frequently fail, and outputs appear superficial or incomplete.
Responses are often abruptly truncated, even though I explicitly set the maximum token limit to the maximum allowed (64K tokens).

Question to Community and Google Developers:

Is this a known issue or expected behavior with the current implementation of the Gemini 2.5 Pro API (gemini-2.5-pro-03-25)?
Could there be undocumented limitations or parameters specific to the API that severely impact the processing quality and output completeness?
Has anyone else encountered this issue, or is there something significantly wrong in my configuration or implementation?

I’d greatly appreciate insights or clarifications about this issue. Thanks!

mbertani · May 19, 2025, 5:43pm

Indeed, experiencing the same with gemini-2.5-flash-preview-04-17 and gemini-2.5-pro-preview-04-17. In google AI Studio, they behave correctly, but via the API, they just truncate the output or produce “Modified by moderator” in the middle

Anand_Sharma1 · May 20, 2025, 4:35am

Facing same issue. Following is for a run when using API:
Total Tokens: 16738
Prompt Tokens: 5676
Output Tokens: 5134
Thoughts Token: 5928
Finish Reason: STOP

maxToken: 24000

In my case the output expected is in JSON form, but its getting frequently truncated. Didnt notice this any time when working in studio.

rnaf · May 20, 2025, 9:23am

Just to share my experience: after restructuring the payload format in my requests, Gemini started working much more reliably. I was initially using an OpenAI-style structure, but switching to the Gemini-specific format made a big difference.

I’ll try to post a more detailed breakdown later of what exactly I changed to get it working — currently on mobile, so it’ll have to wait a bit.

mbertani · May 20, 2025, 1:15pm

Promising! I’m eager to hear about your findings @rnaf . I’m currently using Google’s ADK framework, so there is nothing much I can change than the system prompt. And that is equal in both AI Studio and via the API. I have the suspicion that Google is not being honest with their APIs and they do things differently in the Google AI Studio web service.

Anand_Sharma1 · May 20, 2025, 3:43pm

@rnaf - eagerly looking forward to understanding what you did. In my case we are using REST APIs directly from a web application. relevant portion of my code looks like this:

        fun callGemini(text: String?, base64Image: String?, systemInstructions: String): String {
        val request = buildGeminiRequest(text, base64Image, systemInstructions)
        val headers = HttpHeaders().apply {
            contentType = MediaType.APPLICATION_JSON
        }

        val entity = HttpEntity(request, headers)
        val apiKey = apiKeyManager.getApiKey()
        val url = "$apiUrl/$modelName:generateContent?key=$apiKey"
        try {
            val response = restTemplate.exchange(url, HttpMethod.POST, entity, Map::class.java).body
            return extractResponse(response)
        } catch (e: Exception) { //...
        }
    }

    private fun buildGeminiRequest(
        text: String?,
        base64Image: String?,
        systemInstructions: String,
    ): GeminiRequest {
        val parts = mutableListOf<Part>()
        text?.let { parts.add(Part.Text(it)) }
        base64Image?.let {

val cleanBase64 = it.removePrefix("data:image/jpeg;base64,")
val mimeType = "image/jpeg"

            // Add the image part to the request
            parts.add(
                Part.Image(
                    InlineData(
                        mime_type = "image/png",
                        data = cleanBase64
                    )
                )
            )
        }

        val supportsThinking = modelName.contains("gemini-2.5")

        val generationConfig = if (supportsThinking) {
            GenerationConfig(
                temperature = temperature,
                maxOutputTokens = maxTokens,
                thinkingConfig = ThinkingConfig(thinkingBudget)
            )
        } else {
            GenerationConfig(
                temperature = temperature,
                maxOutputTokens = maxTokens
            )
        }

        val request = GeminiRequest(
            contents = listOf(Content(parts = parts)),
            systemInstruction =  SystemInstruction(parts = listOf(Part.Text(systemInstructions))),
            generationConfig = generationConfig,
            tools =   listOf(Tool(google_search = GoogleSearch()))
        )
        return request
    }

url = https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-preview-04-17:generateContent?key=

When you say used you restructured the payload, what part did you change?

Gautam_Aggarwal · May 22, 2025, 12:35pm

i am trying to use gemini-2.5-pro-preview-05-06 .. if i use from google studio it is pretty quick… i can get it work from local machine also within reasonable time..but when u try to use from aws lambda .. both the api version and vertexai becomes 10x slower.. any thoughts or suggestions how to get to work with same response time from
aws lambda

Bryan_C · June 4, 2025, 7:21pm

Its the auth! Set it up to keep the lambda warm, and cache the tokens

Topic		Replies	Views
Truncated Response Issue with Gemini 2.5 Flash Preview Gemini API bug , gemini-flash	19	445	June 5, 2025
Gemini 2.5 Pro Preview is very bad! Google AI Studio api , models	25	3068	May 29, 2025
Significant differences in behavior between Gemini models and AI studio Gemini API gemini-15 , api , models	1	99	May 27, 2025
Gemini 2.5 API bug: missing finishReason when max token limit is reached Gemini API api , gemini-api	1	335	April 30, 2025
Different Responses in AI Studio and API for Fine-Tuned Gemini 1.0 Model Gemini API ai-studio , api , fine-tuning	1	140	June 24, 2024

Significant Difference in Response Quality between Google AI Studio and Gemini 2.5 Pro API (gemini-2.5-pro-03-25)

Related topics