Handling Token Limits in Gemini-1.5-Flash API Responses

I use gemini-1.5-flash model. I have a long question prompt and I want the result in json format that has a chance to be longer than 8192 tokens and the answer cannot be shortened any further. The question is:

  1. How do I know if the api answer is complete or the answer is missing because of the limit token?
  2. In case the answer is longer than the limit token, what should I do? For example, openai, I have to take the first question + the answer from the first round, and ask it in the second question to get the remaining answer. I want to know how to do gemini. Thank you.

Check out the FinishReason in the usageMetadata of the GenerateContentResponse object. Check out this page of dev docs that details the schema. Copy paste entire thing into chat so your ai can figure out how to extract.

The GenerateContentResponse object schema

Also, since flash versus pro handles json generation differently, check out this as well

How to use JSON in gemini