Warning: Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn

I am training CNN model on GPU on kaggle and while training I am receiving following warning:

2024-07-18 08:12:51.167214: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 5: 3.62197, expected 3.11286
2024-07-18 08:12:51.167224: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 7: 3.87954, expected 3.37043
2024-07-18 08:12:51.167232: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 8: 3.47842, expected 2.96931
2024-07-18 08:12:51.167239: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9: 3.88297, expected 3.37386
2024-07-18 08:12:51.167247: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 32: 3.63122, expected 3.12211
2024-07-18 08:12:51.167255: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 43: 3.30186, expected 2.79275
2024-07-18 08:12:51.167263: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 44: 2.66031, expected 2.1512
2024-07-18 08:12:51.167271: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 45: 3.67197, expected 3.16286
2024-07-18 08:12:51.167279: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 53: 3.71904, expected 3.20993
2024-07-18 08:12:51.167294: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[1,32,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,3,128,128]{3,2,1,0}, f32[32,3,3,3]{3,2,1,0}, f32[32]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-07-18 08:12:51.167302: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-07-18 08:12:51.167309: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-07-18 08:12:51.167316: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12040 (550.90.7)
2024-07-18 08:12:51.167323: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-07-18 08:12:51.167333: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0

Here’s code of my model:

        keras.layers.InputLayer(shape=(WIDTH, HEIGHT, 3)),
        keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Conv2D(64, (3, 3), activation='relu', padding="same"),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Conv2D(64, (3, 3), activation='relu', padding="same"),
        keras.layers.Flatten(),
        keras.layers.Dense(64, activation='relu'),
        keras.layers.Dense(1, activation='sigmoid')
    ])

What this warning is exactly about and how to get rid of it?

Hi @Preet_Sojitra, This warning might be due to the computation precision and the output precision are not the same. For example, when the computation is performed in FP32 and the output is in FP16. Thank You.

Oh okay thanks, but how to get rid of this? Do I need to make the precision same?

Hi @Preet_Sojitra, Could you please try executing the below code initially to suppress the warnings.

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'  

Thank You.