There is a tutorial Image captioning with visual attention which I would like to follow on my local computer. I have a GPU and WSL2 envirenoment which has been setup in accordance with Ultimate Guide: Install TensorFlow GPU on WSL2 Ubuntu 24.04 | CUDA, cuDNN, TensorRT & PyTorch. However no matter what config I have this notebook always fall with an error always on the cell
for t in (0.0, 0.5, 1.0):
result = model.simple_gen(image, temperature=t)
print(result)
With error message:
ValueError: Exception encountered when calling CrossAttention.call().
The last dimension of query_shape
and value_shape
must be equal, but are 256, 576. Received: query_shape={query_shape}, value_shape={value_shape}
Interesting fact that same book running in Google Colab doesn’t have any issues… I have compleetly broke my head trying to undersatnd what is wrong with my conda environment causing such problem. Thus asking cimmunity’s help on how to trace this problem and understand what exactly needs to be fixed.