Image Understanding and Segmentation Mask Support

Are segmentation masks still supported? It is unclear what the current support is for the generation of image segmentation masks.

According to the docs, “starting with Gemini 2.5, models not only detect items but also segment them and provide their contour masks.”

The docs say that the segmentation masks are given as a base64 png that is a probability map with values between 0 and 255.

From the example in the docs, the following prompt suggests how to instruct gemini to give the segmentation masks:

prompt = """
Give the segmentation masks for the wooden and glass items.
Output a JSON list of segmentation masks where each entry contains the 2D
bounding box in the key "box_2d", the segmentation mask in key "mask", and
the text label in the key "label". Use descriptive labels.
"""

However, the behavior of segmentation masks is inconsistent across model versions.

Gemini 2.5 currently produces segmentations mask results like this:

{"mask": "<start_of_mask><seg_4><seg_20><seg_4><seg_35><seg_65><seg_27><seg_27>"}

While Gemini 3.5 produces the segmentation mask of the item as a polygon of [x,y] coordinates like this:

{"mask": [[325, 411], [327, 471], [332, 523], [397, 534], [403, 492], [408, 426]]}

In both of these cases the segmentation masks are not produced as base64 png.

Has support for segmentation masks been changed? What is the expected behavior?

The release notes explicitly state segmentation is no longer supported:

Image segmentation is not supported in Gemini 3.x. For segmentation workloads, continue using Gemini 2.5 Flash with thinking off, or Gemini Robotics-ER 1.6.

Gemini Robotics-ER 1.6 and Gemini 2.5 Flash produce different types of segmentation masks.

Gemini Robotics-ER 1.6 is producing segmentation masks as a polygon of [x,y] coordinates. This is the same type of mask that Gemini 3.5 Flash creates.

Should Gemini 2.5 Flash still be expected to produce segmentation masks as a base64 png probability map? At the moment, Gemini 2.5 Flash is giving me segmentation masks in the form of a string (looks like “<start_of_mask><seg_4>”) that can not be interpreted as a usable mask.

+1 I’m running into the exact same issues. The masks generated by Gemini 2.5 Flash are unusuable.

Would be also interested to know if segmentation masks are abandoned for the future models and if this feature should be hence avoided for long term usage ?