You mention in your report that additional CT and MRI 2D slices used for training were “curated based on mention of a specific slice associated with abnormal findings in the radiology report”. Could you provide more insight into the process and effort involved in this specific curation step? Were there any automated tools or particular criteria that proved most effective for identifying and selecting such high-quality, targeted medical image-text pairs?
Hi there,
Our technique for curating training data based on individual slices was fairly simplistic; we just performed regex matching for reports that mentioned specific slice numbers, then pulled those slices from the volumes and associated them with the relevant sections of the reports.
I hope that helps!
Dan
Engineering Manager on the HAI-DEF team
I seriously appreciate the SCIN dataset that Google has made available! Is there any chance a similar dataset that contains dermatopathologic/histopathologic images is also available? Not sure where to look for one. Thanks!
Hi, I’m really glad you found the SCIN dataset useful! We don’t have plans to release additional dermatology or histopathology datasets at this time, but you may find some helpful public datasets with a quick Google search (such as histopathology data available on The Cancer Imaging Archive). Good luck with your research!