Warning: Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn

Preet_Sojitra · July 18, 2024, 8:19am

I am training CNN model on GPU on kaggle and while training I am receiving following warning:

2024-07-18 08:12:51.167214: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 5: 3.62197, expected 3.11286
2024-07-18 08:12:51.167224: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 7: 3.87954, expected 3.37043
2024-07-18 08:12:51.167232: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 8: 3.47842, expected 2.96931
2024-07-18 08:12:51.167239: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 9: 3.88297, expected 3.37386
2024-07-18 08:12:51.167247: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 32: 3.63122, expected 3.12211
2024-07-18 08:12:51.167255: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 43: 3.30186, expected 2.79275
2024-07-18 08:12:51.167263: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 44: 2.66031, expected 2.1512
2024-07-18 08:12:51.167271: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 45: 3.67197, expected 3.16286
2024-07-18 08:12:51.167279: E external/local_xla/xla/service/gpu/buffer_comparator.cc:1137] Difference at 53: 3.71904, expected 3.20993
2024-07-18 08:12:51.167294: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:705] Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn.
(f32[1,32,128,128]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,3,128,128]{3,2,1,0}, f32[32,3,3,3]{3,2,1,0}, f32[32]{0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convBiasActivationForward", backend_config={"conv_result_scale":1,"activation_mode":"kRelu","side_input_scale":0,"leakyrelu_alpha":0} for eng20{k2=2,k4=1,k5=1,k6=0,k7=0} vs eng15{k5=1,k6=0,k7=1,k10=1}
2024-07-18 08:12:51.167302: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:270] Device: Tesla P100-PCIE-16GB
2024-07-18 08:12:51.167309: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:271] Platform: Compute Capability 6.0
2024-07-18 08:12:51.167316: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:272] Driver: 12040 (550.90.7)
2024-07-18 08:12:51.167323: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:273] Runtime: <undefined>
2024-07-18 08:12:51.167333: E external/local_xla/xla/service/gpu/conv_algorithm_picker.cc:280] cudnn version: 8.9.0

Here’s code of my model:

        keras.layers.InputLayer(shape=(WIDTH, HEIGHT, 3)),
        keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Conv2D(64, (3, 3), activation='relu', padding="same"),
        keras.layers.MaxPooling2D((2, 2)),
        keras.layers.Conv2D(64, (3, 3), activation='relu', padding="same"),
        keras.layers.Flatten(),
        keras.layers.Dense(64, activation='relu'),
        keras.layers.Dense(1, activation='sigmoid')
    ])

What this warning is exactly about and how to get rid of it?

Kiran_Sai_Ramineni · July 19, 2024, 5:16am

Hi @Preet_Sojitra, This warning might be due to the computation precision and the output precision are not the same. For example, when the computation is performed in FP32 and the output is in FP16. Thank You.

Preet_Sojitra · July 20, 2024, 7:24am

Oh okay thanks, but how to get rid of this? Do I need to make the precision same?

Kiran_Sai_Ramineni · July 22, 2024, 4:06pm

Hi @Preet_Sojitra, Could you please try executing the below code initially to suppress the warnings.

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Thank You.

Topic		Replies	Views
Local output of Keras cost is nan but colab output of Keras cost is valid General Discussion models , help_request	1	786	October 19, 2023
Strange behavior of mixed precision both in metrics both in speed TensorFlow models , gpu , metrics	2	410	October 3, 2023
CUDA and cudnn error while training a pix-to-pix GAN using multi-gpu General Discussion distributed-training , gpu	1	952	February 27, 2023
Respected all TensorFlow developers, i am new to TensorFlow , as usual i am seeing an error, please help me to solve it General Discussion keras	3	769	December 19, 2023
Error occurred when finalizing GeneratorDataset iterator General Discussion help_request	4	10156	November 18, 2022

Warning: Results mismatch between different convolution algorithms. This is likely a bug/unexpected loss of precision in cudnn

Related topics