I want to speed up a multi-thread script, so I wanted to look to append multiple tf.train.Example in a single list and then write them to the serial TFRecord. How can I accomplish that?
Writing thousands of TFRecords with the conventional method seems a bad idea for my dataset, potentially slowing down data reading and creating a bottleneck.
Apologies for the delay in response.
To write TFRecords efficiently, I suggest to batch your tf.train.Example objects and write each batch to a TFRecord file which reduces computation overhead and balances memory usage and use tf.io.TFRecordWriter for serialization, adjusting the batch size for optimal performance.In addition, we can parallel process it with threading. Here’s a sample implementation gist for your reference.