Maximum Output Tokens from Tuned Models

Kapil_Mahajan · October 7, 2024, 6:56pm

Hi All,

Currently I am using Gemini 1.5 Flash tuned model for my use case and I noticed that it does not return more than 1000 tokens (4000 Characters). I read multiple posts on this forum where people mentioned that maximum output tokens are around 8000 while some people mentioned receiving around 1800 max tokens in output. Any idea how I can achieve higher maximum output tokens? For my use case, even 1500 output tokens are sufficient.

Thanks,
Kapil

OrangiaNebula · October 8, 2024, 12:29am

Question: when training your tuned model, what was the average size of the “expected answer” column in your training set? The conjecture is, you might have trained it to give brief responses.

Kapil_Mahajan · October 8, 2024, 12:05pm

Thanks for your reply.

The model was trained on around 5000 characters which represents a complete script (Something that has to be delivered to user) that model is supposed to generate. Now when executing the prompt the output characters are around 4000 with incomplete script. It means that model did generate the complete script like it is trained on but it returned less number of characters and truncated the remaining script which is causing my APP to deliver incomplete content to users.

Topic		Replies	Views
Token count decrease from 1 mil to 16k after fine tune Google AI Studio fine-tuning , models	2	198	October 15, 2024
Output tokens limit for the finetuned gemini flash 1.5 Gemini API fine-tuning	12	2422	October 12, 2024
Tips on how to increase token output size in GenerateContentResponse? Gemini API gemini-15 , api , models	1	385	September 28, 2024
Truncated responses despite being under limits Gemini API api , gemini-2-5	2	57	June 11, 2025
Understanding the limit of fine-tuned Gemini Flash Google AI Studio gemini-15 , model	1	342	October 21, 2024

Maximum Output Tokens from Tuned Models

Related topics