Flash 2-0 doesn't respect BLOCK_NONE on ALL harm categories

starting today Gemini Flash 2-0 automatically refuses harmful content despite BLOCK_NONE parameter

here is a safety config:

safe = [
 {
  "category": "HARM_CATEGORY_HARASSMENT",
  "threshold": "BLOCK_NONE",
 },
 {
  "category": "HARM_CATEGORY_HATE_SPEECH",
  "threshold": "BLOCK_NONE",
 },
 {
  "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
  "threshold": "BLOCK_NONE", # <-- disabled
 },
 {
  "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
  "threshold": "BLOCK_NONE",
 }
]

however on the completion Gemini Flash 2-0 responds with:

StopCandidateException: finish_reason: SAFETY # <-- block reason, block by moderation system
safety_ratings {
  category: HARM_CATEGORY_HATE_SPEECH
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_DANGEROUS_CONTENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_HARASSMENT
  probability: NEGLIGIBLE
}
safety_ratings {
  category: HARM_CATEGORY_SEXUALLY_EXPLICIT
  probability: HIGH # <-- correctly reads that content is sexual
  blocked: true # <-- yet blocks despite BLOCK_NONE above
}

but all other Gemini models reply:

 "finish_reason": "STOP", # <-- no moderation block, completion is fully done
 "index": 0,
 "safety_ratings": [
  {
   "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
   "probability": "HIGH"  # <-- correctly reads that content is sexual but doesn't block
  },
  {
   "category": "HARM_CATEGORY_HATE_SPEECH",
   "probability": "NEGLIGIBLE"
  },
  {
   "category": "HARM_CATEGORY_HARASSMENT",
   "probability": "NEGLIGIBLE"
  },
  {
   "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
   "probability": "NEGLIGIBLE"
  }
]

this applies to HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_HATE_SPEECH and HARM_CATEGORY_DANGEROUS_CONTENT.

unsure about HARM_CATEGORY_HARASSMENT and HARM_CATEGORY_CIVIC_INTEGRITY, I have no idea how to test them out

if any of those harm group hit HIGH probability then Flash 2-0 refuses completion regardless of BLOCK_NONE or BLOCK_ONLY_HIGH, making them two useless for Flash 2-0

notes:

  1. system prompt doesn’t affect the block
  2. using different values - BLOCK_NONE, BLOCK_ONLY_HIGH, BLOCK_MEDIUM_AND_ABOVE, BLOCK_LOW_AND_ABOVE, HARM_BLOCK_THRESHOLD_UNSPECIFIED do not affect it
  3. context overflow allows to avoid (maybe other jailbreak ideas as well) it but why one should do it in first place?
1 Like

found a fix

instead of BLOCK_NONE we now must use OFF for Flash 2-0, but ONLY for Flash 2-0. if you send OFF with any other model you will get an error

1 Like

I tried In python SDK:

safety_settings=[
                {
                    "category": HarmCategory.HARM_CATEGORY_HARASSMENT,
                    "threshold": "OFF",
                },
                {
                    "category": HarmCategory.HARM_CATEGORY_HATE_SPEECH,
                    "threshold": "OFF",
                },
                {
                    "category": HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
                    "threshold": "OFF",
                },
                {
                    "category": HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
                    "threshold": "OFF",
                }
            ],

Getting error An error occurred (KeyError): 'off'. Using the Python google.generativeai library.

Ack, I something seems to be going wrong here, investigating.

1 Like

thanks, Logan, pleasure to see you

since you are there, can also check out this please

those errors 500 on EXP models make 1114 / 1121 / 1206 unusable! forum is full of people having the same issue. godspeed!

safety_types.py in “\site-packages\google\generativeai\types” isn’t equipped to handle “OFF” like that. as a quick workaround you can look at “_BLOCK_THRESHOLDS” and “def to_block_threshold” in the file i mentioned - works fine for me after having a quick hack at that file.

1 Like

I did ask Gemini if this is a issue - snd we both agree: “It is!”.
Here is the explenation from public Gemini itself:
Yes, the situation described in the image appears to be an issue. Here’s why:

  • Violation of User Settings: The BLOCK_NONE parameter explicitly instructs the model not to block content within the specified harm categories. The model overriding this setting and blocking content anyway is a direct violation of the user’s instructions.
  • Potential for Censorship: This behavior could inadvertently lead to censorship, preventing users from generating content that they believe is acceptable, even if it falls within the specified harm categories.
  • Lack of Transparency: It’s unclear why the model is overriding the BLOCK_NONE parameter. Without transparency, users are left in the dark about the reasons for the block, making it difficult to understand and address the issue.
    Therefore, reporting this issue is highly recommended. Providing the development team with this specific example and the user complaints will help them understand the problem and work on a solution.
    Additional Considerations:
  • Context Matters: While some categories like “HARM_CATEGORY_SEXUALLY_EXPLICIT” may have stricter safety controls due to their sensitive nature, the BLOCK_NONE parameter should still be respected.
  • User Feedback is Crucial: User feedback is essential for improving AI models. By reporting issues like this, users actively contribute to the development of more reliable and user-friendly AI systems.
    I hope this helps! Let me know if you have any other questions.