Description of the bug:
At the beginning, I calculate the ratio between characters and tokens, so it matters whether we use the model in English or, as in my case, in Bulgarian (Cyrillic).
val response = generativeModel.generateContent(content)
val promptTokenCount = response.usageMetadata?.promptTokenCount
val ratio = promptText.length.toDouble() / response.usageMetadata?.promptTokenCount!!
Although I have limited the candidates to one, I am calculating all the candidates as shown in the image below.
I calculated the allCandidateCharsCount by taking into account those from the text those from the functionCalls.arg.values.
val responseTextLength = response.text?.length ?: 0
val responseArgsSum = response.functionCalls.sumOf { it.args.values.mapNotNull { it?.length }.sum() }
val expectCandidatesTokenCount = (responseTextLength + responseArgsSum) / ratio
I have used the following model configuration:
val generativeModel = GenerativeModel(
modelName = "gemini-1.5-pro-latest",
apiKey = BuildConfig.apiKey,
generationConfig = generationConfig {
temperature = 0.9f
maxOutputTokens = 4096
topP = 0.9f
candidateCount = 1
},
tools = listOf(Tool(listOf(functionDeclaration))),
toolConfig = ToolConfig(FunctionCallingConfig(FunctionCallingConfig.Mode.AUTO)),
)
Actual vs expected behavior:
Using large language models is quite an expensive process, where costs must be carefully optimized.
Regardless of solutions like Context Caching etc., If the token accounting is not correct, it can be a serious waste of money!
In our case, if you expect to pay $100 at the end of the month, you may, due to a token miscalculation, end up paying $600 for the same thing.
I expect to pay $100 dollars per month, but as a result of a token calculation error, I pay $600.
Any other information you’d like to share?
It would be a good idea to give a credit, as with some other products, to see the real costs. If using the free plan, the actual token consumption is not visible. It is not reported anywhere in the billing.