@deep-diver and I have worked on an MLOps project for the past couple of months. It shows how “Continuous Adaptation for ML System to Data Changes” can be done by building/interconnecting two separate pipelines (note this project is done in TFX and various GCP services).
We have written a blog post about some of the internal implementation details, and it is published on TensorFlow Blog. Please find it here:
Also, we have open-sourced all the materials to reproduce this project including in-depth explanations within a set of Jupyter notebooks. You can find the repo here
@Robert_Crowe, huge thanks for your help on this one.
Thanks for your valuable time to read this and we hope this will be helpful
Thanks for the suggestions! As you likely know many SoTA approaches stand quite colorless when they are exposed to real-world data but we will investigate and dig deeper.
We acknowledge (we do this from the post itself too) that JS Divergence (just a measurement) could have been used to capture the drift too but we wanted to follow another path.
Yes also in the not so “extreme” cases like Continual learning and openset the Active learning topic is always around the corner also with more “static” model but dynamic data pipelines:
the materials you shared will definitely expand our knowledge space and let us to think about the next project to work on.
we are interested in two topics recently.
monitor data drift comparing datasets (without model prediction) like JS Divergence
combining two CI/CD MLOps systems to open source a complete MLOps system. Notice that this doesn’t mean to cover every usecases but to provide a complete kickstarter in one specific usecase.