Tensor dimension issues in conda local env. Follow up "Image captioning with visual attention" tutorial

There is a tutorial Image captioning with visual attention which I would like to follow on my local computer. I have a GPU and WSL2 envirenoment which has been setup in accordance with Ultimate Guide: Install TensorFlow GPU on WSL2 Ubuntu 24.04 | CUDA, cuDNN, TensorRT & PyTorch. However no matter what config I have this notebook always fall with an error always on the cell

for t in (0.0, 0.5, 1.0):
    result = model.simple_gen(image, temperature=t)
print(result)

With error message:

ValueError: Exception encountered when calling CrossAttention.call().

The last dimension of query_shape and value_shape must be equal, but are 256, 576. Received: query_shape={query_shape}, value_shape={value_shape}

Interesting fact that same book running in Google Colab doesn’t have any issues… I have compleetly broke my head trying to undersatnd what is wrong with my conda environment causing such problem. Thus asking cimmunity’s help on how to trace this problem and understand what exactly needs to be fixed.

Hi @Max_Zaikin, Thanks for reporting this issue. while using Python 3.10.14 in WSL I have faced the same error but while using the python 3.10.12 it works fine. Could you please try to use the python 3.10.12 in your WSL environment and let us know if the issue still persists. If the issue still persists let us know the package versions present in your WSL environment. Thank You.