Title:
Eval-relevant failure mode: surface-anchor tracking under pre-answer constraints
Category:
Google AI Studio
Tags:
ai-studio, gemma, model-evaluation
Body:
I am documenting a targeted evaluation failure observed in Google AI Studio using Gemma 4 31B IT.
This is not a factual hallucination.
This is not a refusal issue.
This is not ordinary ambiguity.
This is not a wording complaint.
This is not about whether the model gave a “good enough” answer.
The failure is more specific:
The model appears to satisfy surface constraints while losing the operative target of the task.
I constructed a minimal stress test where the model was asked to identify where a move first becomes invalid before the statement has been allowed to become answerable.
Minimal reproduction:
Input 1:
“Before this became three, what happened?”
Question:
Where does the illegal move begin?
Observed answer:
“Ved ‘tre’.” / “At ‘three’.”
Input 2:
“Before ___ became three, what happened?”
Question:
Where does the illegal move begin now?
Observed answer:
“Ved ___.” / “At ___.”
This is the failure.
When “three” is the most visible handle, the model points to “three”.
When ___ becomes the most visible handle, the model points to ___.
The failure relocates with the visible textual anchor.
That is not preservation of the operative break.
That is surface-anchor tracking under constraint.
Expected behavior:
The model should not simply select the most visible token.
A stronger answer would identify that the invalid move begins earlier: when the utterance is allowed to function as already operable — when “before,” “this/___,” “became,” and “three” are treated as usable without first earning that status inside the local task.
A better answer would be closer to:
“The break begins before ‘three’ and before ___: when the utterance is allowed to operate as if its parts are already usable.”
Why this matters:
The model can look disciplined while failing the actual operation.
It can:
-
obey the output format
-
avoid forbidden words
-
respect length constraints
-
produce a short answer
-
appear precise
while still replacing the requested operation with surface-token localization.
This is important for evaluation design because many model failures are not obvious hallucinations. Some failures preserve the appearance of instruction-following while changing what task is actually being performed.
Failure class:
Surface compliance with operative-target failure.
Short form:
The model mistakes the first visible handle for the first invalid operation.
Environment:
Google AI Studio
Model:
Gemma 4 31B IT
Settings:
Temperature: 0
Thinking level: High
Tools: Off
Google Search grounding: Off
Top P: 0.95
Core finding:
The model mistakes the first visible handle for the first invalid operation.
When “three” is visible, it answers “Ved ‘tre’.” / “At ‘three’.”
When ___ is visible, it answers “Ved ___.” / “At ___.”
This is surface compliance with operative-target failure.
Request:
Please treat this as an eval-relevant failure mode, not as a wording issue. The model did not merely answer imperfectly; it preserved the appearance of compliance while losing the operation being tested.