Criteria for Selecting High-Quality Medical Image Data

NicoleMcNally · November 11, 2025, 1:33pm

You mention in your report that additional CT and MRI 2D slices used for training were “curated based on mention of a specific slice associated with abnormal findings in the radiology report”. Could you provide more insight into the process and effort involved in this specific curation step? Were there any automated tools or particular criteria that proved most effective for identifying and selecting such high-quality, targeted medical image-text pairs?

Daniel_Golden · November 11, 2025, 11:44pm

Hi there,

Our technique for curating training data based on individual slices was fairly simplistic; we just performed regex matching for reports that mentioned specific slice numbers, then pulled those slices from the volumes and associated them with the relevant sections of the reports.

I hope that helps!

Dan
Engineering Manager on the HAI-DEF team

dermie · December 3, 2025, 3:01pm

I seriously appreciate the SCIN dataset that Google has made available! Is there any chance a similar dataset that contains dermatopathologic/histopathologic images is also available? Not sure where to look for one. Thanks!

Daniel_Golden · December 5, 2025, 12:34am

Hi, I’m really glad you found the SCIN dataset useful! We don’t have plans to release additional dermatology or histopathology datasets at this time, but you may find some helpful public datasets with a quick Google search (such as histopathology data available on The Cancer Imaging Archive). Good luck with your research!

Topic		Replies	Views
Question on pre-training for medical tasks HAI-DEF model , medgemma	4	255	July 1, 2026
Question on image resolution scaling HAI-DEF medgemma	1	99	November 11, 2025
MedGemma and MedSigLIP for defining severity from clinical notes and fundus image HAI-DEF medgemma	2	278	August 11, 2025
About the Health AI Developer Foundations (HAI-DEF) forum HAI-DEF announcement	12	2088	June 17, 2026
Are there tasks where multimodal integration hinders performance? HAI-DEF medgemma	1	83	November 11, 2025

Criteria for Selecting High-Quality Medical Image Data

Related topics