Welcome to the forum!
The problem, I believe, is this specification:
You cannot get max_output_tokens
past the system limit (which you get from list_models()
). In the case of the 1.5 Gemini (both pro and flash) that value is 8192. The one million context window applies to input tokens.
That effectively forces you to partition your input to chunks of probably 2, and will require that you use enough generate_content()
requests to get through the dataset.