Hi there,
I am exploring the function-calling capabilities of FunctionGemma and noticed the impressive benchmark results on the BFCL listed in the Model Card.
As highlighted in the documentation (Base Prompt Structure), FunctionGemma employs a unique formatting convention using specific control tokens (e.g., <start_function_declaration>, <start_function_call>, and <escape> tokens) to delineate tool definitions and calls. To my knowledge, the standard BFCL evaluation framework does not natively support this specific prompt/output structure.
I would appreciate some clarification on the following:
- Reproduction of Results: To reproduce the benchmark results mentioned in the Model Card locally, is it necessary to implement a custom model handler within the BFCL framework to adapt to FunctionGemma’s special tokens?
- Evaluation Configuration: Were the reported results achieved using a specific prompt template or a modified version of the BFCL codebase?
- Reference Implementation: Is there an official or recommended version/fork of the BFCL repository (or a specific
model_handlerscript) that includes the pre-configured logic for FunctionGemma?
Thank you for providing these insights and for the great work on FunctionGemma!