Hello! I had got the problem with setting up the Tensorflow with 4 GPUs GTX 1070. I’ve tried different variations of systems (Ubuntu, Debian 10/11/12, Windows 10/11/Server 2022), tried WSL with miniconda, tried docker - nothing, got the same error everywhere (LINUX, Windows WSL, Docker)
To make the long story short - If you have the ERROR “E Tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory” when trying “tf.config.list_physical_devices()” - try DISABLE the Integrated INTEL Graphics in my case Intel HD Graphics 630 in Device manager if Windows, then disable nvidia cards after that enable them one by one in different order, but not enable the Intel - this should help! But you need to do this after every reboot
OS: Windows 11 (x64) 22H2 build 22621.2283 (updated September 2023) - WSL Version 2
Motherboard: Colorful Technology And Development Co.,LTD C.B250A-BTC PLUS
Chipset: Intel B250 (Kaby Lake)
CPU: Intel Core i5-7500 3.4Ghz 4cores Kaby Lake-S Socket H4 (LGA1151) Virtualization enabled
RAM: DDR4 SODIMM 4GB 2400Mhz
GPU:
1 x Intel HD Graphics 630 (Kaby Lake-S GT2) [Intel] PCIe v2.0 x0 (5.0 GT/s)
4 x Nvidia GTX1070 8GB Driver 522.06 CUDA 11.8:
1 x Nvidia GTX1070 8GB Driver PCIe v3.0 x16 (8.0 GT/s) @ x16 (2.5 GT/s)
3 x Nvidia GTX1070 8GB Driver PCIe v3.0 x16 (8.0 GT/s) @ x1 (2.5 GT/s)
Steps:
-Install fresh Windows 11 22H2
-Set Windows SWAP file to 40GB
-Install CUDA Toolkit 11.8, reboot
-Tried to allow access to the GPU performance counters to all users in development section of Nvidia control panel, reboot
-checked nvidia-smi and nvcc is working
-Install WSL 2 with Ubuntu, reboot
-add .wslconfig
[wsl2]
memory=2GB
swap=40GB
-Install Docker Desktop
->docker run -it --rm -p 8888:8888 --gpus all tensorflow/tensorflow:latest-gpu-jupyter
-try
{import tensorflow as tf
tf.config.list_physical_devices()}
-get errors
2023-09-21 17:26:00.060803: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.2023-09-21 17:26:11.186982: E tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:268] failed call to cuInit: CUDA_ERROR_OUT_OF_MEMORY: out of memory
[PhysicalDevice(name=‘/physical_device:CPU:0’, device_type=‘CPU’)]
Today I’ve tried to disable Intel HD and 3 of 4 GTX 1070 in device manager, then I enabled GTX1070 back and get things worked!
2023-09-21 18:29:33.655221: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-21 18:29:40.812272: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.812646: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.812926: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.813276: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.959611: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.959873: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.960084: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.960289: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.960490: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.960690: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:02:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.960888: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:03:00.0/numa_node
Your kernel may have been built without NUMA support.
2023-09-21 18:29:40.961097: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:981] could not open file to read NUMA node: /sys/bus/pci/devices/0000:05:00.0/numa_node
Your kernel may have been built without NUMA support.
[PhysicalDevice(name=‘/physical_device:CPU:0’, device_type=‘CPU’),
PhysicalDevice(name=‘/physical_device:GPU:0’, device_type=‘GPU’),
PhysicalDevice(name=‘/physical_device:GPU:1’, device_type=‘GPU’),
PhysicalDevice(name=‘/physical_device:GPU:2’, device_type=‘GPU’),
PhysicalDevice(name=‘/physical_device:GPU:3’, device_type=‘GPU’)]