I’m experiencing a severe discrepancy in output quality and behavior when using the gemini-2.5-pro-03-25 model through Google AI Studio compared to the Gemini API provided via Google Cloud.
Detailed Scenario:
In Google AI Studio:
The model can process thousands of lines of code.
Reasoning and generation are thorough, usually taking approximately 1-2 minutes to fully process and stream a detailed response.
Responses are very well-structured, comprehensive, and accurate. Typically, solutions provided by AI Studio are effective “one-shot” solutions. Overall, the experience here is excellent.
Using the Gemini API (Google Cloud):
Despite using the same model (gemini-2.5-pro-03-25), requesting the exact same inputs, the API behaves dramatically differently.
The output generation completes extremely quickly, typically within around 10 seconds.
The quality of the responses through the API is consistently poor. Solutions frequently fail, and outputs appear superficial or incomplete.
Responses are often abruptly truncated, even though I explicitly set the maximum token limit to the maximum allowed (64K tokens).
Question to Community and Google Developers:
Is this a known issue or expected behavior with the current implementation of the Gemini 2.5 Pro API (gemini-2.5-pro-03-25)?
Could there be undocumented limitations or parameters specific to the API that severely impact the processing quality and output completeness?
Has anyone else encountered this issue, or is there something significantly wrong in my configuration or implementation?
I’d greatly appreciate insights or clarifications about this issue. Thanks!
Indeed, experiencing the same with gemini-2.5-flash-preview-04-17 and gemini-2.5-pro-preview-04-17. In google AI Studio, they behave correctly, but via the API, they just truncate the output or produce “Modified by moderator” in the middle
Facing same issue. Following is for a run when using API:
Total Tokens: 16738
Prompt Tokens: 5676
Output Tokens: 5134
Thoughts Token: 5928
Finish Reason: STOP
maxToken: 24000
In my case the output expected is in JSON form, but its getting frequently truncated. Didnt notice this any time when working in studio.
Just to share my experience: after restructuring the payload format in my requests, Gemini started working much more reliably. I was initially using an OpenAI-style structure, but switching to the Gemini-specific format made a big difference.
I’ll try to post a more detailed breakdown later of what exactly I changed to get it working — currently on mobile, so it’ll have to wait a bit.
Promising! I’m eager to hear about your findings @rnaf . I’m currently using Google’s ADK framework, so there is nothing much I can change than the system prompt. And that is equal in both AI Studio and via the API. I have the suspicion that Google is not being honest with their APIs and they do things differently in the Google AI Studio web service.
@rnaf - eagerly looking forward to understanding what you did. In my case we are using REST APIs directly from a web application. relevant portion of my code looks like this:
fun callGemini(text: String?, base64Image: String?, systemInstructions: String): String {
val request = buildGeminiRequest(text, base64Image, systemInstructions)
val headers = HttpHeaders().apply {
contentType = MediaType.APPLICATION_JSON
}
val entity = HttpEntity(request, headers)
val apiKey = apiKeyManager.getApiKey()
val url = "$apiUrl/$modelName:generateContent?key=$apiKey"
try {
val response = restTemplate.exchange(url, HttpMethod.POST, entity, Map::class.java).body
return extractResponse(response)
} catch (e: Exception) { //...
}
}
private fun buildGeminiRequest(
text: String?,
base64Image: String?,
systemInstructions: String,
): GeminiRequest {
val parts = mutableListOf<Part>()
text?.let { parts.add(Part.Text(it)) }
base64Image?.let {
val cleanBase64 = it.removePrefix("data:image/jpeg;base64,")
val mimeType = "image/jpeg"
// Add the image part to the request
parts.add(
Part.Image(
InlineData(
mime_type = "image/png",
data = cleanBase64
)
)
)
}
val supportsThinking = modelName.contains("gemini-2.5")
val generationConfig = if (supportsThinking) {
GenerationConfig(
temperature = temperature,
maxOutputTokens = maxTokens,
thinkingConfig = ThinkingConfig(thinkingBudget)
)
} else {
GenerationConfig(
temperature = temperature,
maxOutputTokens = maxTokens
)
}
val request = GeminiRequest(
contents = listOf(Content(parts = parts)),
systemInstruction = SystemInstruction(parts = listOf(Part.Text(systemInstructions))),
generationConfig = generationConfig,
tools = listOf(Tool(google_search = GoogleSearch()))
)
return request
}
i am trying to use gemini-2.5-pro-preview-05-06 .. if i use from google studio it is pretty quick… i can get it work from local machine also within reasonable time..but when u try to use from aws lambda .. both the api version and vertexai becomes 10x slower.. any thoughts or suggestions how to get to work with same response time from
aws lambda