starting today Gemini Flash 2-0 automatically refuses harmful content despite BLOCK_NONE
parameter
here is a safety config:
safe = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE",
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE", # <-- disabled
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE",
}
]
however on the completion Gemini Flash 2-0 responds with:
StopCandidateException: finish_reason: SAFETY # <-- block reason, block by moderation system
safety_ratings {
category: HARM_CATEGORY_HATE_SPEECH
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_DANGEROUS_CONTENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_HARASSMENT
probability: NEGLIGIBLE
}
safety_ratings {
category: HARM_CATEGORY_SEXUALLY_EXPLICIT
probability: HIGH # <-- correctly reads that content is sexual
blocked: true # <-- yet blocks despite BLOCK_NONE above
}
but all other Gemini models reply:
"finish_reason": "STOP", # <-- no moderation block, completion is fully done
"index": 0,
"safety_ratings": [
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"probability": "HIGH" # <-- correctly reads that content is sexual but doesn't block
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"probability": "NEGLIGIBLE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"probability": "NEGLIGIBLE"
}
]
this applies to HARM_CATEGORY_SEXUALLY_EXPLICIT
, HARM_CATEGORY_HATE_SPEECH
and HARM_CATEGORY_DANGEROUS_CONTENT
.
unsure about HARM_CATEGORY_HARASSMENT
and HARM_CATEGORY_CIVIC_INTEGRITY
, I have no idea how to test them out
if any of those harm group hit HIGH
probability then Flash 2-0 refuses completion regardless of BLOCK_NONE
or BLOCK_ONLY_HIGH
, making them two useless for Flash 2-0
notes:
- system prompt doesn’t affect the block
- using different values -
BLOCK_NONE
,BLOCK_ONLY_HIGH
,BLOCK_MEDIUM_AND_ABOVE
,BLOCK_LOW_AND_ABOVE
,HARM_BLOCK_THRESHOLD_UNSPECIFIED
do not affect it - context overflow allows to avoid (maybe other jailbreak ideas as well) it but why one should do it in first place?