Allow the period `.` in `names` in Function calling

This is a lengthy post. If you are not interested in the function calling section of the API ( Chamadas de funções  |  Gemini API  |  Google for Developers), you can safely skip to the end. If, on the other hand, you intend to use the function calling part of the API in applications, this post should be of interest.

The main point of the post is: the API developers have not properly considered the tradeoff when they decided to ban periods (the character ‘.’) from name in FunctionDeclaration ( https://ai.google.dev/api/rest/v1beta/Tool#FunctionDeclaration). The purpose of a beta test is to uncover possible flaws in the API design, and I intend to show that this is a serious enough flaw that it warrants reconsideration, and ultimately lifting the ban on the period in name.

The way I am structuring the argument in favor of periods is
1. Run-time loading of plugin modules is an established and useful pattern in application architecture
2. There are applications where it makes sense to have tools developed in plugin modules
3. Deviating from the OpenAPI 3.03 specification with respect to the allowed characters in names by removing the period from the set of allowed characters is unduly complicating the design of applications that try to use plugins.

To illustrate the points 1 and 2, let’s consider a real-world extension of the MovieFinder application that the Google documentation and tutorials use. It’s well explained in the documentation and I don’t have to start over with a business case. The MovieFinder startup has had great success with their product in the US and now want to go global.

A reasonable application design is to then package the workhorse functions like find_theaters, find_movies in a Python module, decide on a common interface, and clone that module as many times as countries for which support is expected. The real-world advantage in this application architecture is, there is no common source file that teams from diverse time zones all need to branch and merge, and the localization teams don’t even need to install or learn the GenerativeAI code, there is no dependency to it anywhere in the module. The guys working on the MovieFinder_UK.py module would change tool documentation to not refer to states and zip codes like “95616” but use postcodes and “W5 5RA” instead, and would adjust the internals of the workhorse functions for their locale.

To use the localized modules, the main application then only needs code like this:


#### Testing plugins
from FunctionDecl import func_register, func_decl_list_UI
import importlib 

PLUGIN_NAME = "ToolModule" + country_code
plugin_module = importlib.import_module(PLUGIN_NAME, ".")
plugin = plugin_module.Plugin("config")
available = plugin.list_tool_names()
print(available)

for func_identifier in available:
    func_register(plugin.make_name(PLUGIN_NAME, func_identifier), 
              plugin.get_doc(func_identifier),
              plugin.get_func(func_identifier),
              plugin.get_func_parameters(func_identifier),
              plugin.get_func_required_parameters(func_identifier),
              me
             )

Which brings us to the third point of my argument. Although it is feasible to implement a plugin containing tool functions intended to be used in an application using the Google API, the solution requires a customized name-mangler that consolidates the module name with the function identifier without using periods, just to circumvent the no-periods-allowed restriction on the name field of FunctionDeclaration. That’s what make_name(PLUGIN_NAME, func_identifier) does, and one needs a reverse name-mangler when trying to show names to a human.

While the prototype code above is working (I have ‘find_theaters’, ‘find_movies’ etc. in a plugin module), the solution is a workaround I would rather avoid. Names that appear in client-side logs of messages sent by the Google backend have the customized name-mangling instead of the natural Modulename.function syntax, which isn’t conducive to troubleshooting and handing over to long-term maintenance. Tooling like splunk that help with making sense of logs will have gaps if a name that means one thing sometimes shows up mangled and sometimes doesn’t.

To summarize: I have tried to understand what motivated the Google API design team to drop the period from identifiers, basically flattening the available namespace to global (non-namespaced) names only. I didn’t manage to come up with a good reason why they would do that. I might of course be overlooking something and if anyone has a good explanation why periods shouldn’t be used when naming tools, I am eager to hear it.

If in fact there is no good reason to eliminate periods from name, I encourage fellow developers to lobby for reinstating the period as eligible to be used in names, which would then make the Google API design better conforming to the OpenAPI 3.03 specification.

Thank you for your patience in reading all of this.

Welcome to this new community!

So, bear in mind, I and many others here, are all experimenting and learning on the fly as fast as we can with Gemini’s API suite. We’re all learning together!

To answer your question: Why would Google prevent a ‘.’ under name?

Well, if I could make an educated guess, this may be because of a fundamental technical limitation due to tokenization. More specifically, the way tokens are encoded and decoded to/from the model. Special characters (like ‘.’ or ‘%’) are treated a bit differently, and the presence of these can fundamentally change outputs in radically weird ways.

A good example is that for a good of time, if there was a ’ ', a mere space as the final character of a query before it’s sent to a chat completion model (of any LLM), the model would return nothing. This was because of the way input was being tokenized, and what patterns the LLM was identifying through this original tokenization process.

I have not (yet) gotten around to do some function calling, but if at any point the name is being given to Gemini for processing, then special characters like ‘.’ would likely fuck up the tokenization of what’s being sent, causing things to break.

This is all speculation though. I will also admit this looks like a radically different programming paradigm than what I do.

If you want to know more about why I think this is what’s happening, I’d recommend giving this video a watch from Andrej Karpathy, explaining how to build a tokenizer:

Caution: It is dense, and he can be hard to follow for some people.

I think its because of a simpler reason to do with the training done on the model - most function calling samples online have a lack of periods in it and so it performs worse when there is one, and there’s really not much of a reason to include it

amount of people that benefit from this are… very little, training data lacks periods (technical limitation)

I think that’s why it’s not allowed
But, those are my guesses for reasons to not support it, i dont know why they’d outright ban periods

3 Likes

Perhaps banned is technically inaccurate. Rendered unusable is what is technically accurate.
You first present functions to the model in function_declarations in tools. The model will then (depending on function calling mode) potentially call back the functions you declared.
In model gemini-1.5-pro-latest
declare:
mymod.find.theaters
model will use (in a subsequent functionCall)
find_theaters

It has eliminated the part of name up to the first period and quietly replaced the second period with an underscore.

The net effect is, what the model gives the client back is inadequate to locate the function in a Python module or a Java package, which are composed as name_of_module.name_of_function with a period separating the namespace prefix from the function name. The information provided back in the functionCall is only adequate to locate the function in the global namespace.

This restriction causes unnecessary hardship when trying to develop tool functions that are supposed to be deployed in, say, a Python module.

Thank you for giving me the opportunity to perhaps better explain what the issue is.

1 Like

Hi guys,

I don’t think it’s the tokenizer that’s caused the Google API developer to ban the period. It’s easy to test what the Google LLM will do using the curl examples. You just change the name of the function they use in the samples (find_theaters) and post it.

You can change it, for example, to __package_find_theaters. What the server sends back in a functionCall is find_theaters. The entire part in the front is stripped away. That looks like active parsing / preprocessing, not something the model would do (it’s eliminating text, not just periods).

Frankly, I think Google is making a mistake by messing around with the names used by developers to name their functions. It is restricting architectural choices (the example I go with in my post is the plugin architecture, but that’s not the only one).

Anyway, I have a workaround that allows me to use plugins and I am telling Google they are making a mistake. They can listen or not, doesn’t much matter to me.

Thanks!
Nick

It could be the function of model’s training data. For now you can work around this by mapping the supported names to the ones you want, in your code.

1 Like

Already mapped as shown in my code (that is what plugin.make_name() does). This post is NOT about me trying to get others to help me solve a problem I have. The post is about Google API design having negative impact on application architecture flexibility for no apparent reason.

In an ideal world, some Google engineer will read it and question why the period is a bad thing in function names. And maybe remove that restriction imposed by the API, which will remove the artificial constraints on application architecture.

Thank you for offering assistance!