AI Studio Confusion: Model Loops on Self-Comparison During Code Augmentation Evaluation

Hello everyone,

I’m currently working with AI Studio as part of my Google fellowship on a data science team in cloud, and I’ve encountered an interesting issue during code snippet augmentation and evaluation.

My workflow involves using the model to:

  • Generate prompts for code augmentation.
  • Evaluate the augmented code snippets based on custom metrics (developed with the model’s assistance).
  • Compare up to 10 different augmented outputs to identify pathologies and repetitiveness.

The problem arises during the comparison phase. Occasionally, the model seems to get confused and starts comparing an output to itself (e.g., “augmentation 6 is identical to augmentation 6”). While I can correct it and the model usually resumes the workflow, it then incorporates these invalid self-comparisons into its final evaluation summary.

It’s manageable, but I’m curious if anyone has experienced similar issues or has insights into:

  • Why the model might be getting confused during the comparison process.
  • Strategies for organizing the experiment to minimize this confusion.
  • If there are any known limitations with the model when conducting these types of sequential evaluations.
  • I’m particularly interested in hearing from others who have worked with AI Studio for code augmentation, evaluation, or similar multi-step workflows.

Any suggestions or experiences you can share would be greatly appreciated!

Thanks!

BTW - I used Gemini 2.0 Flash to create this post and title and tags!

You got to reset your tokens when it hit 100k or 200k etc. after that 2.0-flash even the thinking-exp etc. They don’t give out the correct output based on your prompt or has confusions/errors when prompting, codes too especially etc. limitation from 2.0-flash is probably 200k imo, I’m working from 2-4k lines of python codes, a single prompt is 60k-120k for me etc. maybe Google AI devs can look into this limitations.

I should have mentioned that for the actual work I am doing in AI Studio I am using Gemini Pro 2.0 Experimental 02-05, reason being as a fellow I have been advised to use a configuration that will not charge.