Hi!
I’m using Gemini 2.0 flash for a multilabel video classification task, with a very detailed prompt to define the set of categories the model can classify the video as. I’ve found the results to be fairly aligned with my prompt definition at a temperature of 0.4, however there was not enough consistency (while most categories had exactly the type of videos I wanted in them, at scale most videos were also being misclassified).
To address this issue, I tried both temperatures of 0 and 0.2 to increase consistency in the results, but I now find that they barely align with my prompt. The classifications still make sense, but they’re not exactly following the explicit steps i assign for the task. What could be the reason for this, and how could I approach this problem to get more consistent results across similar videos?
I’d appreciate any and all technical advise on this issue, thank you!