With the help of AI Studio Gemini (so great), I have a working workflow sketched that enables multiple function calls to happen with a single prompt. So I prompt the model with multiple available FunctionCallingTool entries and it works great. I’m working on an agent-backed UI framework that integrates with my ibgib protocol (investments welcome!), and the gemini model spins off multiple functions surprisingly (and somewhat phenomenally!) well.
I’m now in the process of implementing a saga workflow where I feed back in some of those responses. Now for a single function, this is straight forward using the multi-turn cURL example in the function calling tutorial. But the question then becomes, what is the recommended pattern for return results to those multiple functions that spun off?
In concurrent programming, we have things like critical sections, barriers, etc. I’m not sure I’m looking for that kind of composability. I’m just wondering is there a way to pass back in multiple functionResponse objects into a single prompt? Basically I’m looking to await all the results of those functions via Promise.all and then feed all of the results back in, associating each result to each function call. Since these are incredible language models, I’m already working towards just shoving this in a formatted single text prompt addendum via saga IDs. But it seems like the API team may already have considered this situation.
I’m also wondering if there is already existing documentation and/or discussions regarding this particular facet of function calling. I have a hard time believing I am the first to think of it, but I’m having a hard time finding any other talk on the subject.
Well, it’s slightly disappointing not to hear anything on a more official approach to this, but I’ll give you an update on my approach.
I call it “context window compression”. My chat prompts are composed, not just from the raw text, but I store each participant’s text in a content-addressed record (via my ibgib protocol - open to investments!). So each entry has an address with metadata, similar to a git commit + metadata addressing a diff, matched with the actual comment text.
So for “Function Call Requests” (FCRs I don’t know the official term for this), I create a record for each function call requested and then I create a single “FCR Comment” record whose text contains a custom tag begin/end block of only those addresses. So I add this FCR comment of addresses only to the chat, i.e. the context window is “compressed” (assuming the address is shorter than each function info’s raw data). Then, for a single time (the very next prompt) the context window is prepended with the full data corresponding to each function info, i.e., we prepend the “decompressed” text.
This way the model’s next prompt has the context of the results of the function calls, but future chat rounds won’t be spammed with the full text. However, the addresses are part of the “permanent record” of the chat (though we can dynamically compose a chat filtering out the FCR comment). I am almost done with this implementation, but I don’t yet know how well the model will actually synthesize the information.
I would still be interested in others’ approach to this, as well as any thoughts on the overall approach of “context window compression”.