I’m using the openai endpoint to summarize videos like:
{
"model": "gemini-2.5-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Provide a very short summary on one line. Keep it short!"
},
{
"type": "image_url",
"image_url": {
"url": "data:video/mp4;base64,<ENCODED_FILE>"
}
}
]
}
]
}
Yesterday it was working fine and the result was a very short summary. Today the same request, with the same video, ignores the instructions and provides a very long, multi-line, response describing every part of the video. It seems like a bug.
We’ve looked into the issue you reported, and it appears to be working correctly on our end.
For your specific situation, we recommend trying a modification to your prompt and setting the temperature to zero to get a similar response every time.
I did a little more testing and can say gemini-2.0-flash and gemini-1.5-flash appear to follow the instructions properly and the issue only happens with gemini-2.5-flash. See above for screenshot and request ID. Thanks
But as noted, it was working the first way before, which continues to work with gemini-2.0-flash and gemini-1.5-flash. Also, the documentation says the first way should be valid. I guess something is causing the content array to collapse so that only one item is seen by the model.
My apologies, I recreated your issue using the Gemini framework.
It seems your current configuration is set up to understand images. And if I am understanding correctly, you’re looking for the model to explain video content instead. Is that accurate?
Yes, and it seems to work fine to include other types of files using the “image_url” field on the openai-compatible endpoint. You can include pdfs and other types, too. (It’s the only field/content-type available to submit files in openai-compatible form, so there’s no other way.) The model processes files submitted that way fine, it’s just when the “content” field array contains both the file and the text, -sometimes- the instructions in the text are ignored. I imagine it might also happen with images in some cases, but I haven’t seen it myself. Maybe it only happens with larger files. But it’s strange, because the same files that were working before now have the issue.
Currently support for OpenAI’s libraries is still in beta, and new features are continually being developed. However, currently there isn’t any OpenAI compatible feature for video understanding.
You mentioned exploring an image understanding feature, and while it might have worked for some of your needs, it’s primarily designed for images and not videos.
I’m familiar with the native alternatives, but my program only supports openai-compatible right now. The workaround, creating two separate “messages” items each with one item in “content”, instead of having one message with two items in “content”, is working for now - the instructions are followed and the response is very short as requested.
It may be worth forwarding this to the dev team so they can identify the issue as it may pop up randomly for other users. I may update with a request id if an error occurs with an image as it wouldn’t be off-label like videos are, although videos and other file types do seem to be processed fine if included in place of images.