Dataset map function returns wrong tensor shape

gwiesenekker · September 6, 2023, 7:28am

I have defined a map function that unpacks 6 32-bit integers into 192 (1/0) integers:

def unpack(x, y):
unpacked_data = tf.TensorArray(tf.uint32, size=0, dynamic_size=True)
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(unpacked_data.size(), b & 1)
b = tf.bitwise.right_shift(b, 1)
return x, unpacked_data.stack(), y
#for training I would return unpacked_data.stack(), y

The map function works:

a=np.array([0, 0, 0, 16, 45, 57],dtype=np.uint32)
b=unpack(a, a)
print(b)

returns:

(array([ 0, 0, 0, 16, 45, 57], dtype=uint32), <tf.Tensor: shape=(192,), dtype=uint32, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint32)>, array([ 0, 0, 0, 16, 45, 57], dtype=uint32))

Now I want to apply this map function to the features of a batched dataset:

with tf.device(“CPU”):
train = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(BATCH_SIZE)
#for training I would use 4 * BATCH_SIZE
validate = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(BATCH_SIZE)

x_train and y_train are NumPy arrays. But when I apply the map function to the ‘train’ dataset:

train = train.map(unpack)
list(train.as_numpy_iterator())

The shape of the bit-array is wrong: it is transposed, so not 192x1, but 32x6:

[(array([[ 0, 0, 0, 16, 45, 57]], dtype=uint32),
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32),
array([1.])),

Why?

Regards,
GW

Laxma_Reddy_Patlolla · September 7, 2023, 7:03pm

Hi @gwiesenekker,

Below is my understanding:

The shape of the bit-array is wrong because the TensorArray that you are using to store the bit-array is dynamic-sized. This means that the size of the TensorArray is not known in advance, and it will grow as needed to accommodate the data. When you apply the map function to the train dataset, the TensorArray will be created for each element in the dataset. However, the size of the TensorArray will be different for each element, depending on the value of the input data. This is why the shape of the bit-array is transposed when you iterate over the dataset.

To fix this, you can use a fixed-size TensorArray. This will ensure that the size of the TensorArray is the same for each element in the dataset, and the shape of the bit-array will be correct.

I hope this helps!

Thanks.

gwiesenekker · September 8, 2023, 2:00pm

Hi,

Thank you for your reply. I have tried your suggestion by specifying the size:

def unpack(x, y):
#unpacked_data =
unpacked_data = tf.TensorArray(tf.uint32, size=192)
i = 0;
for b in x:
for _ in range(32):
unpacked_data = unpacked_data.write(i, b & 1)
b = tf.bitwise.right_shift(b, 1)
i += 1
return x, unpacked_data.stack(), y

but this still returns the wrong shape. Perhaps I have to specify the element_shape as well as the size, but if so, how? element_shapes like ([192]), ([1]) etc. raise the error:

Incompatible shape for value (()), expected …

Thanks,
GW

gwiesenekker · September 11, 2023, 9:50am

You can see what happens by applying the original map function to a 2d array:

a=np.array([[0, 0, 0, 16, 45, 57],[0, 0, 0, 16, 45, 57]],dtype=np.uint32)
b=unpack(a,a)
print(b)

The output is:

(array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32), <tf.Tensor: shape=(64, 6), dtype=uint32, numpy=
array([[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]], dtype=uint32)>, array([[ 0, 0, 0, 16, 45, 57],
[ 0, 0, 0, 16, 45, 57]], dtype=uint32))

So the shift gets applied to all elements of the 6 element array, and that repeated 32 (64) times which (although unintended) gives the result transposed.

So how can I change the map function so that each row of 6 integers gets unpacked into 192 integers?

Regards,
GW

Topic		Replies	Views
TensorFlow with shape=<unkown> after tf.data.Dataset.from_generator General Discussion datasets	3	827	September 22, 2023
Can you bit-pack and then unpack binary inputs? General Discussion help_request	8	698	December 29, 2023
Addressing Shape Mismatch Error in TensorFlow Code for (None, 224, 224, 3) vs. (TensorSpec(shape=(None, None, 224, 224, 3)) Shape: Troubleshooting and Resolution General Discussion datasets , tfkeraslayer , tfmodel	7	718	January 26, 2024
What can you use within tf.data.Dataset map functions/on symbolic tensors? General Discussion datasets , tfdata	1	1141	September 5, 2023
Error in tensor shape when calling model.fit() TensorFlow model	4	81	October 24, 2024

Dataset map function returns wrong tensor shape

Related topics