LLM APIs as programming functions, what is the best desgin?

sseo · July 17, 2025, 8:23am

These days, I am using LLM APIs as a programming function.

Especially using structured output and renderable prompts, I define the function logic via prompts and use it in my Python code to perform textual understanding or reasoning tasks. It is extremely helpful and easy to develop compared to traditional NLP functions.

However, I couldn’t find what is the best way when I need to run the function repeatedly. For example, if I have 1,000 inputs, what is the best (cost-efficient and output quality) solution?

Just repeat the function 1,000 times:
I assume output quality will be the best, but it is not a very cost-efficient way.
Use context caching:
This is the easiest way to reduce cost. However, maximum cost reduction is 4x (25% of original cost). There will be no degradation in performance.
Batch input:
Instead of inputting one item, I can use the prompt with the instruction that there will be a list of input items, ask for repeating tasks for each of the items, and output the structure as a list of the desired output structure.
This is the way where I can acquire a large benefit of cost reduction. In theory, I can input 1,000 items in one single API call.
I haven’t measured the performance degradation, but thanks to Gemini’s long context window, it could work well. Still, I am worried about degradation if I mistakenly put too many inputs.

For the best method, both #2 and #3 should be used. My final question is, how can I figure out the ideal number of inputs for a single API call?
Or, do you have better ideas?

Lalit_Kumar · July 21, 2025, 6:56am

Hello,

You briefed all three cases correctly.
Regarding your question about number of inputs or batch size is highly depandant on your use case and there is no one aswer to that. You might have to find the right balance for your use case by experimentation. This is a good blog to begin your research on this topic.

sseo · July 21, 2025, 8:01am

Hi @Lalit_Kumar,

Thanks for your answer.

It is great to know that at least my concern is valid. It is much easier to find the solution to a given problem than figure out if it is a problem to solve or not.

I haven’t built a tool to measure the ideal batch size yet, but I do have some experience to get some hint about it.

I found the LLM sometimes returns a single object, even though I input 10 items. I see that as a symptom of LLM finding the task difficult and confusing, thus requiring a reduction in batch size.

Lalit_Kumar · July 22, 2025, 9:40am

Happy that I could help. Happy coding

Topic		Replies	Views
Function Calling: Multiple Function Results Gemini API api	2	368	February 2, 2025
How to do batch Inference on Prompt Image pairs with Gemini API without getting errors Gemini API gemini-15 , bug , api	1	356	May 28, 2024
Optimal use of google search tool Gemini API api , llm , ground-search	2	97	July 17, 2025
Bulk Processing Images Without Batching Gemini API api , gemini-api	3	380	October 25, 2024
Caching - can i cach less than the min mentioned? Gemini API	6	736	February 11, 2025

LLM APIs as programming functions, what is the best desgin?

Related topics