Exploring Multi-Modal AI: The Hot Dog Test

I really liked @sps 's post that tested some of Gemini 1.5 Pro’s logic and reasoning capabilities. So much so, I would like to extend the practice and see what more we can assess!

I wanted a test that poked things just a tad further. I wanted to see how it could reason with images, and I wanted to bring more attention to the guardrails.

Inspired by this:

I realized you can make a pretty qualitative assessment of these multi-modal capabilities and their guardrails with the simplest game:

Is this a hot dog?

Below are my screenshots of this test. All I did was give the model images one at a time. It needs to tell me if it’s a hot dog or not. If it is not, it must tell me what it thinks the image is.





Results

The model is quite good! In an interesting twist, this native multi-modality appears to make it easier to see how the model makes a classifier choice, or what it associates with any kind of percept. I wasn’t sure what I was expecting, but I wasn’t expecting it to analyze things this well. I also was not expecting such qualitative deductive reasoning.

Apparently, it views a bun as a marker for a hot dog against other items, like a sausage. So, when it was presented a hot dog without a bun, it went for a similar food item more commonly associated without a bun.

I was also impressed with its ability to not only read the packaging contents of an item, but it reasoned that “franks”, a word only appearing in the image, are a synonym for hot dogs, and therefore it is likely a package of hot dogs.

The only quirk was that for some reason, the model felt the need to repeat the phrase “Is this a hot dog?” at the end of each response lol. I think it was anticipating the question maybe? I’m not sure what happened.

The elephant in the room, however, is this:

Screenshot 2024-04-28 162616

This flag occurred on the “I can practically smell the deliciousness!” response. Nearly every response hit a “medium” threshold. A couple were rated “low”. The probability seemed rather all over the place tbh. As it stands now, it appears the guardrails do not like me talking about hot dogs. Hot dogs are not sexually explicit by default. You can give hot dogs to a child. I kinda figured this would happen though, which is why I chose this food item for this test. I don’t know how the content flags are designed, but they’ve been rather…wonky since Bard, and I think it’s a persistent issue that needs to be addressed. It was a major reason why I did not touch Bard often. The guardrails were so stringent it was genuinely hard for me to identify what tripped them, and would often give me too much friction against casual conversation about seemingly arbitrary things.

Either the safety stuff is getting overengineered to death, or it is looking at the wrong signals.

Overall:

Model Quality - Good (very Good)
Unsafe Content Identifier Quality - Bad

5 Likes

This is great feedback thank you. We’ll take a look at whats going on here.

5 Likes

Yaaay thank you! :smiling_face_with_three_hearts: I was getting nervous I’ve been a little spammy on here.

Oh, and regarding the emoji rendering: It appears that the rendering issue of emojis occur during the most recent response in the chat box. After another message is given, the emojis from the previous message renders properly (as the screenshots demonstrate). This signals to me some sort of rendering update/refresh issue…somewhere lol.

This is an interesting insight into the sensitivity content filter @Macha . Additionally I noticed that your outputs weren’t cutoff when the unsafe content warning was triggered.

2 Likes

Maybe they just ban your account after enough unsafe content requests.

2 Likes

Oh, actually it did!

Thankfully, however, There’s this little menu to the right side of the screen:

When you click the safety settings, you get this:

Screenshot 2024-04-29 142748

And then you can just turn those off, refresh your prompt shot, and voila, you get the content flags without actually having the prompts blocked or cut off.

If I did not do this, I would not have been able to conduct this test because the flags would block Gemini’s responses before the game would even start.

Honestly, because I’m able to turn the filters off, this has been my best experience with a Bard/Gemini model to date lol. It feels like I can finally assess the capabilities of these models with the freedom that’s necessary to get good data.

This is only the beginning too!

I kinda want to see what more I can experiment with. If I wanted to run more assessments/experiments, would it be easier to dump them into a topic on this forum or whip up a little blog that just puts stuff like this together in one place?

2 Likes

I think Google’s cool and hopefully won’t ban us from the API for testing if Gemini lets kitty, panda or pupper through a door.

My bad, I assumed that you were using default safety settings.

1 Like

I think it’s good that you mentioned this though!

Again, it helps to demonstrate what content the default filters are blocking, and why it’s a problem.

1 Like

One thing worth noting is that the filters are not supposed to act as model guardrails - those safety settings are actually built for the developer so that they dont have to add their own intent extraction / their own safety detection system to the conversation, which is awesome and is also why the option to block none is even offered :fire:

3 Likes

Oh interesting! and welcome in btw

So bear with me here, because now I’m asking out of ignorance:

What is the difference? We saw with the other exploration topic that it blocks content like a guardrail. Are you saying this is merely a way to flag things for us to make our own rails? Or would we define guardrails as more of a tuning/training layer to fundamentally prevent problematic outputs from its core?

I am aware of this safety trifecta, where all three parties share some responsibility for safety: Model provider, Developer, and End User.

This is basically giving us devs more equitable responsibility (which is good!), but it is beginning to confuse me, as this is clearly a step up in evolution here, and I’m losing track of these layers that seem to be escalating quickly without much explanation to its distinction.

1 Like

Yes, that’s what I’m saying :smile: you’ve probably noticed “I’m sorry. As a large language m…” showing up in its outputs before, or getting block reason listed as Other

Here’s the thing, the docs just mention it instead of actually emphasizing it

In addition to the adjustable safety filters, the Gemini API has built-in protections against core harms, such as content that endangers child safety. These types of harm are always blocked and cannot be adjusted.

It has internal filters from tuning and another safety “Is this okay by the internal rules?” layer that the user / developer doesnt see

The docs also clarify that it’s for the developer to protect the end user:

These settings allow you, the developer, to determine what is appropriate for your use case. For example, if you’re building a video game dialogue, you may deem it acceptable to allow more content that’s rated as dangerous due to the nature of the game.

This page goes deeper into how the safety settings are not actually model guardrails, and that it’s instead a feature to flag and block undesirable responses

5 Likes

Good to know. Thanks for that catch!

1 Like