Hi everyone
working on model training using tensorflow = "2.14"
CUDA 11.8
tried to use both cuDNN version 8700 as mentioned here
and cuDNN version 8902 as mentioned here
and in both cases, I get the error on the title
here are the error logs:
[08:15:16.063] [STDERR] 2024-03-12 08:15:16.063291: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8700
[08:15:16.815] [STDERR] 2024-03-12 08:15:16.815453: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55e7a07a33b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
[08:15:16.815] [STDERR] 2024-03-12 08:15:16.815478: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6
[08:15:16.822] [STDERR] 2024-03-12 08:15:16.822469: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
[08:15:16.919] [STDERR] 2024-03-12 08:15:16.919810: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
[08:15:25.136] [STDERR] 2024-03-12 08:15:25.136023: E tensorflow/compiler/xla/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered