Hi all,
I’d like to share a new ML storage framework (GitHub - google/space: Unified storage framework for the entire machine learning lifecycle). It brings many database/lakehouse features, like data mutations, version management, SQL, materialized views of processed results, into ML datasets (currently TFDS, Ray, HuggingFace datasets are in scope).
We have a TFDS example: space/notebooks/tfds_coco_tutorial.ipynb at main · google/space · GitHub. Feedbacks and contributions are very welcome.
Thank you!