New to Tensorflow and Keras - Cant get GPU to work

I am having difficulties trying to run a model I made using Keras. I already ran the model without the GPU using the CPU and it worked fine, however it was extremely slow so I decided to use the GPU. Whenever I try and use GPU I get this error

2023-10-24 21:27:48.304916: I tensorflow/core/util/port.cc:111] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-10-24 21:27:48.330569: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-10-24 21:27:48.330597: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-10-24 21:27:48.330613: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-10-24 21:27:48.334602: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-24 21:27:49.051801: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2023-10-24 21:27:49.823168: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:49.848589: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:49.848753: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:49.850888: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:49.851057: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:49.851139: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:50.121065: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:50.121197: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:50.121270: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at linux/Documentation/ABI/testing/sysfs-bus-pci at v6.0 · torvalds/linux · GitHub
2023-10-24 21:27:50.121330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4223 MB memory: → device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6
2023-10-24 21:27:50.427549: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:521] Can’t find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice.
Searched for CUDA in the following directories:
./cuda_sdk_lib
/usr/local/cuda-11.8
/usr/local/cuda
/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/tensorflow/python/platform/…/…/…/nvidia/cuda_nvcc
/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/tensorflow/python/platform/…/…/…/…/nvidia/cuda_nvcc
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule’s DebugOptions. For most apps, setting the environment variable XLA_FLAGS=–xla_gpu_cuda_data_dir=/path/to/cuda will work.
Epoch 1/30
2023-10-24 21:27:51.930556: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5568e7cbead0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-24 21:27:51.930578: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6
2023-10-24 21:27:51.934001: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var MLIR_CRASH_REPRODUCER_DIRECTORY to enable.
2023-10-24 21:27:51.952257: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8902
2023-10-24 21:27:51.964519: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:51.965547: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-10-24 21:27:51.981620: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:51.982567: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-10-24 21:27:51.994306: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:51.995692: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-10-24 21:27:52.008449: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:52.009760: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-10-24 21:27:52.022965: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:52.024437: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
2023-10-24 21:27:52.036904: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:559] libdevice is required by this HLO module but was not found at ./libdevice.10.bc
2023-10-24 21:27:52.038364: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at xla_ops.cc:624 : INTERNAL: libdevice not found at ./libdevice.10.bc
Traceback (most recent call last):
File “/home/daniel/Projects/AI/Tensorflow-Keras/Nets/RNNs/NASA_RNN.py”, line 65, in
train_test(model, train_x ,train_y, test_x, test_y, 0.2, 30, optimizer, lossfns, metrics)
File “/home/daniel/Projects/AI/Tensorflow-Keras/Nets/RNNs/NASA_RNN.py”, line 53, in train_test
model.fit(x=Inputs_x, y=Inputs_y,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py”, line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/tensorflow/python/eager/execute.py”, line 60, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tensorflow.python.framework.errors_impl.InternalError: Graph execution error:

Detected at node RMSprop/StatefulPartitionedCall_4 defined at (most recent call last):
File “/home/daniel/Projects/AI/Tensorflow-Keras/Nets/RNNs/NASA_RNN.py”, line 65, in

File “/home/daniel/Projects/AI/Tensorflow-Keras/Nets/RNNs/NASA_RNN.py”, line 53, in train_test

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py”, line 65, in error_handler

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/engine/training.py”, line 1783, in fit

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/engine/training.py”, line 1377, in train_function

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/engine/training.py”, line 1360, in step_function

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/engine/training.py”, line 1349, in run_step

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/engine/training.py”, line 1130, in train_step

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 544, in minimize

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 1223, in apply_gradients

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 652, in apply_gradients

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 1253, in _internal_apply_gradients

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 1345, in _distributed_apply_gradients_fn

File “/home/daniel/.pyenv/versions/3.11.4/envs/myPythonenv/lib/python3.11/site-packages/keras/src/optimizers/optimizer.py”, line 1340, in apply_grad_to_update_var

libdevice not found at ./libdevice.10.bc
[[{{node RMSprop/StatefulPartitionedCall_4}}]] [Op:__inference_train_function_1170]

I am on Ubuntu 23.04, tensorflow and keras versions are 2.14.0 and my Cuda version is 11.8.89

Hi @Lubemaster, If possible could you please provide the standalone code to reproduce the issues?
Thank You.

import keras
import pandas as pd
import sys
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
from keras.layers import TextVectorization
from keras.utils import to_categorical
from keras.regularizers import L1, L2, l1, l2

data = pd.read_csv(“AI/Pytorch/archive/archive(1)/Stars.csv”)
vectorize = keras.layers.TextVectorization(240)

for x in data:
if isinstance(data[1], str):
vectorize.adapt(data)
data = vectorize(data)

test_data = pd.DataFrame.sample(data[0:int(len(data) * 0.2)], frac=1.0, random_state=1)
train_data = pd.DataFrame.sample(data[int(len(data) * 0.2):len(data)], frac=1.0, random_state=1)
train_x, test_x = scale.fit_transform(train_data.drop(‘Type’, axis=1)), scale.fit_transform(test_data.drop(‘Type’, axis=1))
train_y, test_y = train_data[‘Type’], test_data[‘Type’]

def Neural_Network_Functional(hidden_size : int, Input_shape):
Inputs = layers = keras.layers.Input(shape=Input_shape)

for x in range(hidden_size):
    layers = keras.layers.Dense(248, activation="relu")(layers)
    
layers = keras.layers.Dropout(0.5)(layers)
layers = keras.layers.Dropout(0.5)(layers)

Outputs = keras.layers.Dense(6, activation="sigmoid")(layers) 

model = keras.Model(Inputs, Outputs)
return model

model = Neural_Network_Functional(2, train_x.shape[1])

def train_test(model, Inputs_x, Inputs_y, Test_x, Test_y, v_split, epochs, optimizer, loss, metrics):
model.compile(optimizer=optimizer, loss = loss, metrics = metrics)

model.fit(x=Inputs_x, y=Inputs_y, 
          callbacks=keras.callbacks.EarlyStopping(patience=2), 
          epochs=epochs, 
          validation_split=v_split,
          batch_size = 16)

print(f"Test accuracy: {model.evaluate(x=Test_x, y=Test_y)}")

optimizer = ‘rmsprop’
lossfns = ‘sparse_categorical_crossentropy’
metrics = ‘sparse_categorical_accuracy’

train_test(model, train_x ,train_y, test_x, test_y, 0.2, 30, optimizer, lossfns, metrics)