How can I create sequences (windows) from data in a TFRecord?

Hi there, first post here. I will explain my problem the best I can.

I have around 1800 short videos. Each video is around 30 seconds. I trained a VAE to encode each frame into a latent vector (of size 200).

Using this VAE, then I created a TFrecord with an entry for every video. Each entry contains an array of size 830x200 (830 is the number of frames, and 200 is the size of the latent vector), and then also an array of just 4 elements (integers, some metadata).

Once the TFRecord is read for training, it is read into a Dataset, and this Dataset is what is sent to model.fit. To do this I use TFRecordDataset, then I map it to my function to read the examples, and then I do shuffle, prefetch and batch.

I am using now a model similar to the miniature GPT in the documentation to predict the next frame. The model works OK (ignoring the results, which are not good, but at least the data goes in and out, the loss decreases and so). But for every epoch, the model takes batches of full sequences, and predicts the next frame for every element in the batch. So in train_step if you print the input, it says something like (32,830,200). Where 32 is the batch size, and 830 the length of the sequence, and 200 the features.

What I would like is, instead of the model taking the full sequence of 830, I would like to split this sequence into small overlapping sequences.

First I tried to do this inside train_step, surrounding the usual code in train_step (gradienttape and all of that) with a for loop that would take sequences of the input. But this was painfully slow. In fact it took like 30 minutes before it even started training.

Then I tried to do this in the function that creates the Dataset from the TFRecord. Datasets seem to have the window function, which is exactly what I need. So once the Dataset was loaded/created with TFRecordDataset, and after I map my function to read examples, I would then do dataset.window(…), and then I would batch it as usual, but this was not working and the training was not started. I don’t know if the fact that each entry contains 2 arrays (the 800x200 one and the 4,1 one) was a problem with this.

So basically what I would like to do is, is to split the 800x200 sequence into overlapping sequences of size 100x200, batch them, and send them to train_step. So ideally train_step should receive tensors of size 32,100,200.

The only way I see now to do this is to write the tfrecords in this manner. But that is a waste of space since the sequences do overlap.

Any help about how to approach this would be very welcome.

You can see here: ################################################################################ - Pastebin.com the script I use to write/read the TFRecords