For my input to the model Gemini Flash 1.5, output is definitely more than 8196 tokens, So the model has to give me multiple responses to complete my request. How to do that, is there any option in API docs to do that, for every iteration model has to start the output from where it left off in the previous response.
1 Like
The usual approach to generating longer content is to structure an outline and then fill in the outline buckets. This cookbook example shows one way to do it: Google Colab
Hi @OrangiaNebula Thanks a lot for your reply and intention to help me.
In my case, I am converting XML from informatica mapping to SQL query, so that the output varies based on the XML and its size. I think we can’t decide the structure of the outline.
As the output tokens are 8192, not enough for the output we needed. the next iteration for my initial input has to be automatically generated without manual intervention.
So looking for the solution to do that kind of continuous generation.