Video misclassification with Gemini 2.0, why?

Hi!

I’m using Gemini 2.0 flash for a multilabel video classification task, with a very detailed prompt to define the set of categories the model can classify the video as. I’ve found the results to be fairly aligned with my prompt definition at a temperature of 0.4, however there was not enough consistency (while most categories had exactly the type of videos I wanted in them, at scale most videos were also being misclassified).

To address this issue, I tried both temperatures of 0 and 0.2 to increase consistency in the results, but I now find that they barely align with my prompt. The classifications still make sense, but they’re not exactly following the explicit steps i assign for the task. What could be the reason for this, and how could I approach this problem to get more consistent results across similar videos?

I’d appreciate any and all technical advise on this issue, thank you!

@nikhilkuppa,

The “temp” and “top_P” only effects the decoder part of the transformer. something i would try is the split the task into two,

step 1: Describe the video in details lets say under 500 words.(depending on the length of the video )
step 2: Now use the above generated description to classify the video

this way you might be able to understand and tweek the temp and top-P of the first step to get the required info or keywords from the description that will help you with classification.