2.5 pro output length soft limit?

I uploaded a sizable pdf for Gemini to turn into semantic data suitable for a rag system. on ingestion of the pdf the context window is around 162k tokens. I am trying to create 100 chunks that is semantically dense with a lot of metadata.

It seems like Gemini is stopping well before it’s 65,536 output limit. I understand the reasoning part takes away from usable output. But It still looks like it is stopping at around 34k output total, including the reasoning… Thus I need to break down it’s output into smaller chunk requests.

This is such a powerful model, I am just curious as to what is constraining it.

Thanks!

2 Likes

Very interesting..can you share the prompt?
I am compressing literature myself and need a rag to experiment.