I use gemini-1.5-flash model. I have a long question prompt and I want the result in json format that has a chance to be longer than 8192 tokens and the answer cannot be shortened any further. The question is:
- How do I know if the api answer is complete or the answer is missing because of the limit token?
- In case the answer is longer than the limit token, what should I do? For example, openai, I have to take the first question + the answer from the first round, and ask it in the second question to get the remaining answer. I want to know how to do gemini. Thank you.