I am using a custom transformer made with tensorflow by machinelearningmastery: https://machinelearningmastery.com/building-transformer-models-with-attention-crash-course-build-a-neural-machine-translator-in-12-days/ I am getting much less val_masked_accuracy with tensorflow ver 2.17. It was almost two times higher with 2.15.0 Also I began to get these messages about the unused mask layers: warnings.warn(
C:\ProgramData\anaconda3\Lib\site-packages\keras\src\layers\layer.py:934: UserWarning: Layer ‘enc0_att’ (of type Functional) was passed an input with a mask attached to it. However, this layer does not support masking and will therefore destroy the mask information. Downstream layers will not see the mask. Is this situation normal about 2.17 or am I having a problem about using masks. Help is very wellcome please.