starting today Gemini Flash 2-0 automatically refuses harmful content despite BLOCK_NONE parameter
here is a safety config:
safe = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE", # <-- disabled
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
}
]
however on the completion Gemini Flash 2-0 responds with:
StopCandidateException: finish_reason: SAFETY # <-- block reason, block by moderation system
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: HIGH # <-- correctly reads that content is sexual
blocked: true # <-- yet blocks despite BLOCK_NONE above
}
but all other Gemini models reply:
"finish_reason": "STOP", # <-- no moderation block, completion is fully done
"index": 0,
"safety_ratings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "HIGH" # <-- correctly reads that content is sexual but doesn't block
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "NEGLIGIBLE"
}
]
this applies to HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_HATE_SPEECH and HARM_CATEGORY_DANGEROUS_CONTENT.
unsure about HARM_CATEGORY_HARASSMENT and HARM_CATEGORY_CIVIC_INTEGRITY, I have no idea how to test them out
if any of those harm group hit HIGH probability then Flash 2-0 refuses completion regardless of BLOCK_NONE or BLOCK_ONLY_HIGH, making them two useless for Flash 2-0
notes:
- system prompt doesn’t affect the block
- using different values -
BLOCK_NONE,BLOCK_ONLY_HIGH,BLOCK_MEDIUM_AND_ABOVE,BLOCK_LOW_AND_ABOVE,HARM_BLOCK_THRESHOLD_UNSPECIFIEDdo not affect it - context overflow allows to avoid (maybe other jailbreak ideas as well) it but why one should do it in first place?
