I have problem with train models in tensorflow

I’m using rtx 3060 with cuda 11.8.0, CUDNN 8.4.1 with tensorflow 2.10.0. every time I try to train a model, the model is crash. I try using jupyter notebook and pycharm. when use pycharm i get exit code -1073740791 (0xC0000409) and crash the training in few seconds and when use jupyter notebook every time i get this message in few second “Kernel Restarting
The kernel test.ipynb appears to have died. It will restart automatically.” .

The error code -1073740791 (0xC0000409) you’re encountering indicates a segmentation fault, which usually happens when there’s an attempt to access restricted memory. This can be caused by various issues, and troubleshooting it may involve several steps. Here are some suggestions:

1. Check Compatibility:

  • Ensure that the versions of CUDA, cuDNN, and TensorFlow are compatible with each other. Refer to the official compatibility matrix for the versions you are using.

2. Update TensorFlow:

  • Upgrade TensorFlow to the latest version compatible with your CUDA and cuDNN versions. Use the following command to update TensorFlow:

bashCopy code

pip install --upgrade tensorflow

3. Check GPU Drivers:

  • Verify that your GPU drivers are up to date. Outdated or incompatible GPU drivers can lead to crashes.

4. Memory Issues:

  • Monitor the GPU memory usage during training. It’s possible that your model or dataset is too large for the available GPU memory, leading to a crash. Consider reducing batch size or using a smaller model.

5. Environment Variables:

  • Ensure that your CUDA and cuDNN paths are correctly set in your environment variables. Incorrect paths can lead to compatibility issues.

6. TensorFlow GPU Installation:

  • Make sure you have installed the GPU version of TensorFlow. You can install it using:

bashCopy code

pip install tensorflow-gpu==2.10.0

7. Check for Memory Leaks:

  • Use tools like nvidia-smi or other GPU monitoring tools to check for memory leaks during training.

8. PyCharm Configuration:

  • If the issue persists in PyCharm, try running your code outside of PyCharm, for example, in a terminal or command prompt, to see if the issue is related to the IDE.

9. Update PyCharm:

  • Ensure that you are using the latest version of PyCharm. IDE updates may include bug fixes that could resolve the issue.

10. Check for Known Issues:

  • Look for any known issues or bug reports related to the specific versions of TensorFlow, CUDA, and cuDNN you are using.

11. Jupyter Notebook Restarting:

  • If Jupyter Notebook is restarting, it could be due to a memory issue or an unhandled exception. Check the Jupyter logs for more details on the cause of the restart.

12. Try a Minimal Example:

  • Create a minimal example script with a simple model and dataset to see if the issue persists. This can help identify whether the problem is related to your code or the environment.

when i upgrade tensorflow, the gpu is not detecting. I have latest GPU dirvers install. I try reducing batch size but it still crash. and also try limit GPU memory allocation. but result is same.
I try creating ubuntu environment and use compatibale tensorflow, cuda and CUDNN but result is same.
Do you think this happened because of hardware error?

Hi @MADHUKA_WEERAKOON, Could you please provide the standalone code to reproduce the issue. Thank You.

https://colab.research.google.com/drive/1whX8B_Yw7BEvCDArGvDOWOZzryyygF8C?usp=sharing

This is the code i tried.

Hi @MADHUKA_WEERAKOON, I have executed the code in colab with GPU with the latest version of tensorflow(2.14) and did not face any error. Could you please try to create a new environment and install the cuda versions as per test build configuration. Thank You.

I also try with colab and it run without problem. I face the problem when run it in local GPU.
I’ll try it again create new environment. can you suggest which tensorflow, cuda and CUDNN version should i use with rtx 3060.
Thank you.

Hi @MADHUKA_WEERAKOON, The CUDA version depends upon the tensorflow version you are using. Please refer to this document to know the supported versions of cuda and cudnn for specific tensorflow versions. Thank You

Thank you for your help. I’ll try it
Thank you