I’m using the openai endpoint to summarize videos like:
{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Provide a very short summary on one line. Keep it short!"
},
{
"type": "image_url",
"image_url": {
"url": "data:video/mp4;base64,<ENCODED_FILE>"
}
}
]
}
]
}
Yesterday it was working fine and the result was a very short summary. Today the same request, with the same video, ignores the instructions and provides a very long, multi-line, response describing every part of the video. It seems like a bug.
We’ve looked into the issue you reported, and it appears to be working correctly on our end.
For your specific situation, we recommend trying a modification to your prompt and setting the temperature to zero to get a similar response every time.
I did a little more testing and can say gemini-2.0-flash and gemini-1.5-flash appear to follow the instructions properly and the issue only happens with gemini-2.5-flash. See above for screenshot and request ID. Thanks
But as noted, it was working the first way before, which continues to work with gemini-2.0-flash and gemini-1.5-flash. Also, the documentation says the first way should be valid. I guess something is causing the content array to collapse so that only one item is seen by the model.
My apologies, I recreated your issue using the Gemini framework.
It seems your current configuration is set up to understand images. And if I am understanding correctly, you’re looking for the model to explain video content instead. Is that accurate?