Hi,
I have made a dataset from generator like:
ds_series = tf.data.Dataset.from_generator(
trim_size, args=[data_input_tot_EqLen, trimmed_lbl, seq_len, max_len_per],
output_types=(tf.float32, tf.int32),
output_shapes=((5511, 101, 3), (1)))
then I shuffle the dataset and split it to training and testing:
ds_series= ds_series.shuffle(buffer_size=16)
ds_train=ds_series.take(train_smpls)
ds_valid=ds_series.skip(train_smpls)
I’d like to count the number of samples in each class, therefore, I’d like to see what labels would be assigned to the training and testing dataset.
I run the following command:
_, lbl_train = ds_train
this take a lot of time (I understand this because trim_size I defined above in pretty heavy) but my question is related to the messages that it shows:
I tensorflow/core/kernels/data/shuffle_dataset_op.cc:175] Filling up shuffle buffer (this may take a while): 1 of 16
so it counts filling up the buffer from 1 to 16. however, this does not fit with what has mention about shuffle buffer size in the documentation:
it is supposed to take random samples from a 16 sample-buffer which means that the randomization process is not limited to 16.
Am I wrong here?