Hello @Emre_Elbeyoglu,
Thanks for Reaching out the forum,
To estimate the cost for the model, you could always count the tokens not words,
if you want to, you could accurately read the number of tokens from meta data using the API as follows.(in general terms, each word can be estimated as a token. )
print(response.usage_metadata)
# ( e.g., prompt_token_count: 11, candidates_token_count: 73, total_token_count: 84 )
for further details , have a look at this reference documentation(Understand and count tokens | Gemini API | Google AI for Developers) page .
when working in with real world input apart from text, things like images and sound, the api have a way of converting the images into tokens.
for example
With Gemini 2.0, image inputs with both dimensions <=384 pixels are counted as 258 tokens. Images larger in one or both dimensions are cropped and scaled as needed into tiles of 768x768 pixels, each counted as 258 tokens
Video and audio files are converted to tokens at the following fixed rates: video at 263 tokens per second and audio at 32 tokens per second.
to answer your questions:
#1 should always estimate tokens.
#2 its just the capacity and your pricing will be for the tokens used.
total cost per call=((number_of_input_tokens)/1000000)*cost_per_mill_input_tokens+((number_of_output_tokens)/1000000)*cost_per_mill_output_tokens.
please check the pricing(Gemini Developer API Pricing | Gemini API | Google AI for Developers) for the model here
#3 there are quota limits which are implemented by a tier bases system with RPM,RPD,TPM,TPD
these will differ for each model for each tier.
check the API RateLimits(Rate limits | Gemini API | Google AI for Developers) here