I am looking for a way to serialize/deserialize a tf.data.Dataset object in a way that captures the state without computing the pipeline.
A straightforward way to serialize a tf.data.Dataset would be to call the save method, then derserialize with load, but saving like this is not exactly serializing. Calling save forces a compute so any map/filter/etc. methods in the pipeline are called. I’d like to be able to store the state of the Dataset to disk so another process can load it later and have the state identical to the state at the time of serialization.
Maybe iterator checkpointing is the approach I should try?
I found this github issue on the topic but no solution to the original issue.
Thanks for any help.
Dennis