Could MedGemma be potentially fine-tuned using the MedTok tokenizer by Harvard?
MEDTOK is a multimodal tokenizer of medical codes that combines text descriptions of codes with graph-based representations of dependencies between codes derived from clinical ontologies and standard medical terminologies. MEDTOK is a general-purpose tokenizer that can be integrated into any transformer-based model or system that requires tokenization.
This could enhance the model’s ability to embed nuanced distinctions between closely related ICD codes. For example, differentiating E11 (Type 2 diabetes mellitus) from E11.1 (Type 2 diabetes mellitus with ketoacidosis) or E11.10 (Type 2 diabetes mellitus with ketoacidosis without coma).