I’m getting similarly truncated responses via the latest nodejs @google/genai package…
Error in startStreamingChat: Error: Incomplete JSON segment at the end
at ApiClient.processStreamResponse_1 (…/node_modules/google/genai/src/_api_client.ts:463:19)
at processStreamResponse_1.next ()
at resume (…/node_modules/google/genai/dist/node/index.js:2509:44)
at fulfill (…/node_modules/google/genai/dist/node/index.js:2511:31)
at processTicksAndRejections (node:internal/process/task_queues:105:5)
Thanks for raising this issue. I checked the Gemini documentation and the output token limit is 65,536 , which is the maximum you are setting.
Could you also share if you are seeing any errors in your console?
Also, what is the type of your input fileUri and (prompt/System instruction) ?
Based on my understanding , the issue might be related to how the response is being handled. Because, on Vertex AI, i have checked the ‘gemini-2.5-flash-preview-04-17’ model is able to generate larger responses (more then 3000 token). Thank You!!
Because the error is masked (generic 500 with generic message), I don’t know what the problem is or how to reliably reproduce it. Also, this is not an open source project “Modified by moderator”
This happens when using any of the 2.5 models, or 2.0 flash with thinking.
It does not appear to happen when I switch to 2.0 pro or 2.0 flash (without thinking)
The expected return token length is probably 10k or less (not deterministic).
@junkx Following up on my last comment, I have run the code snippet you shared with the text file using the model gemini-2.5-flash-preview-04-17.
code snippet:
import {
GoogleGenAI,
createUserContent,
createPartFromUri,
} from "@google/genai";
const documentAnalysisSchema = {};
const ai = new GoogleGenAI({ apiKey: 'API-key' });
async function main() {
const image = await ai.files.upload({
file: "example.txt",
});
const fileUri = image.uri;
const mimeType = image.mimeType;
const prompt = "Tell me about this text file in 5000 json keys";
const result = await ai.models.generateContent({
model: 'gemini-2.5-flash-preview-04-17',
contents: [
{ text: prompt },
{ fileData: { fileUri, mimeType } }
],
config: {
responseMimeType: 'application/json',
responseSchema: documentAnalysisSchema,
thinkingConfig: { includeThoughts: false },
maxOutputTokens: 65536
}
});
console.log(result.text);
console.log(result.usageMetadata)
I have observed that it truncates the response only if the token limit is exceeded; otherwise, it provides the expected response. You can also check the output token count using result.usageMetadata.
The model output truncated the response when the limit was exceeded (after 4490 key).
Notice the FINISH reason is STOP, and tokens are less than 17K.
But the JSON resposne is frequently truncated (though not always). Didnt notice this any time when working in studio..
fun callGemini(text: String?, base64Image: String?, systemInstructions: String): String {
val request = buildGeminiRequest(text, base64Image, systemInstructions)
val headers = HttpHeaders().apply {
contentType = MediaType.APPLICATION_JSON
}
val entity = HttpEntity(request, headers)
val apiKey = apiKeyManager.getApiKey()
val url = "$apiUrl/$modelName:generateContent?key=$apiKey"
try {
val response = restTemplate.exchange(url, HttpMethod.POST, entity, Map::class.java).body
return extractResponse(response)
} catch (e: Exception) { //...
}
}
private fun buildGeminiRequest( text: String?, base64Image: String?, systemInstructions: String,
): GeminiRequest {
val parts = mutableListOf<Part>()
text?.let { parts.add(Part.Text(it)) }
base64Image?.let {
val cleanBase64 = it.removePrefix("data:image/jpeg;base64,")
val mimeType = "image/jpeg"
// Add the image part to the request
parts.add(
Part.Image(
InlineData(
mime_type = "image/png",
data = cleanBase64
)
)
)
}
val supportsThinking = modelName.contains("gemini-2.5")
val generationConfig = if (supportsThinking) {
GenerationConfig(
temperature = temperature,
maxOutputTokens = maxTokens,
thinkingConfig = ThinkingConfig(thinkingBudget)
)
} else {
GenerationConfig(
temperature = temperature,
maxOutputTokens = maxTokens
)
}
val request = GeminiRequest(
contents = listOf(Content(parts = parts)),
systemInstruction = SystemInstruction(parts = listOf(Part.Text(systemInstructions))),
generationConfig = generationConfig,
tools = listOf(Tool(google_search = GoogleSearch()))
)
return request
}