Question on image resolution scaling

NicoleMcNally · November 11, 2025, 1:16pm

Hi, in your technical report you note that the MedSigLIP model, while based on an 896x896 resolution encoder, is released at 448x448 by sharing the same model weights and using down-sampled positional embeddings. Could you elaborate on any challenges you faced when determining this scaling approach, or any limitations you observed with this method compared to, say, training a model natively at 448x448 from scratch without shared weights?

Lin_Yang · November 11, 2025, 4:55pm

The vision encoder was primarily designed to support the medgemma model, which works at 896x896 resolution. Its weight can be found inside the medgemma ckpts.

This 448x448 release is primarily to make it easier to be tuned for vision purposes with less chips.

Training another model natively at 448x448 likely would work as well.

Topic		Replies	Views
Questions from Mars Petcare Documentation gemini-15 , api , models	0	57	October 14, 2025
Question on pre-training for medical tasks HAI-DEF model , medgemma	3	170	December 3, 2025
Are there tasks where multimodal integration hinders performance? HAI-DEF medgemma	1	62	November 11, 2025
MedGemma and MedSigLIP for defining severity from clinical notes and fundus image HAI-DEF medgemma	2	234	August 11, 2025
MedGemma finetuning - padding and labels' masking HAI-DEF colab , medgemma	1	233	September 21, 2025

Question on image resolution scaling

Related topics