Dataset map function returns wrong tensor shape

I have defined a map function that unpacks 6 32-bit integers into 192 (1/0) integers:

def unpack(x, y):
unpacked_data = tf.TensorArray(tf.uint32, size=0, dynamic_size=True)
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(unpacked_data.size(), b & 1)
b = tf.bitwise.right_shift(b, 1)

return x, unpacked_data.stack(), y

#for training I would return unpacked_data.stack(), y

The map function works:

a=np.array([0, 0, 0, 16, 45, 57],dtype=np.uint32)
b=unpack(a, a)
print(b)

returns:

(array([ 0, 0, 0, 16, 45, 57], dtype=uint32), <tf.Tensor: shape=(192,), dtype=uint32, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)>, array([ 0, 0, 0, 16, 45, 57], dtype=uint32))

Now I want to apply this map function to the features of a batched dataset:

with tf.device(“CPU”):
train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(BATCH_SIZE)
#for training I would use 4 * BATCH_SIZE
validate = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)

x_train and y_train are NumPy arrays. But when I apply the map function to the ‘train’ dataset:

train = train.map(unpack)
list(train.as_numpy_iterator())

The shape of the bit-array is wrong: it is transposed, so not 192x1, but 32x6:

[(array([[ 0, 0, 0, 16, 45, 57]], dtype=uint32),
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32),
array([1.])),

Why?

Regards,
GW

Hi @gwiesenekker,

Below is my understanding:

The shape of the bit-array is wrong because the TensorArray that you are using to store the bit-array is dynamic-sized. This means that the size of the TensorArray is not known in advance, and it will grow as needed to accommodate the data. When you apply the map function to the train dataset, the TensorArray will be created for each element in the dataset. However, the size of the TensorArray will be different for each element, depending on the value of the input data. This is why the shape of the bit-array is transposed when you iterate over the dataset.

To fix this, you can use a fixed-size TensorArray. This will ensure that the size of the TensorArray is the same for each element in the dataset, and the shape of the bit-array will be correct.

I hope this helps!

Thanks.

Hi,

Thank you for your reply. I have tried your suggestion by specifying the size:

def unpack(x, y):
#unpacked_data =
unpacked_data = tf.TensorArray(tf.uint32, size=192)
i = 0;
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(i, b & 1)
b = tf.bitwise.right_shift(b, 1)
i += 1

return x, unpacked_data.stack(), y

but this still returns the wrong shape. Perhaps I have to specify the element_shape as well as the size, but if so, how? element_shapes like ([192]), ([1]) etc. raise the error:

Incompatible shape for value (()), expected …

Thanks,
GW

You can see what happens by applying the original map function to a 2d array:

a=np.array([[0, 0, 0, 16, 45, 57],[0, 0, 0, 16, 45, 57]],dtype=np.uint32)
b=unpack(a,a)
print(b)

The output is:

(array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32), <tf.Tensor: shape=(64, 6), dtype=uint32, numpy=
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32)>, array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32))

So the shift gets applied to all elements of the 6 element array, and that repeated 32 (64) times which (although unintended) gives the result transposed.

So how can I change the map function so that each row of 6 integers gets unpacked into 192 integers?

Regards,
GW