Tips on how to increase token output size in GenerateContentResponse?

Does anyone have any tips or tricks on how to increase output generation size for infernces on gemini-1.5-pro or flash? Trying to generate a large report using a couple of input files for context and a well designed prompt that should have no problem genrating large output, but i cant seem to get above 1800 tokens, no matter how large the input tokens are. I’ve tried lots of prompting tricks including variables for a target_word_count variable etc, but doesnt really seem to budge it. I cant really get any large generations over 1200 words (despite stated 8K token output window) except for doing things like batch processing of image descriptions.

With the new gemini-1.5-pro-002 model specifically touting its reduced verbosity, this has gotten even worse.

Has anyone had better success with some tips or tricks?

Well… Yes… Its a model and the cap is suposedly 8k so at the very least it does that if well prompted.

In fact, you can get it to go way further heh
Lemme make a fast example

Here you go