SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device

Robert_Kudyba · August 18, 2021, 4:22pm

We have a SLURM batch file that fails with TF2 and Keras, and also fails when called directly on a node that has a GPU. Here is the Python script contents:

from datetime import date
import numpy as np
import matplotlib.pyplot as plt

import pandas as pd
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN
from keras.optimizers import adam
from keras.layers import Dropout
from tensorflow.keras.callbacks import Callback, EarlyStopping
from sklearn.preprocessing import StandardScaler
from datetime import datetime, timedelta
from sklearn.metrics import r2_score, mean_squared_error, accuracy_score
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
from keras.models import load_model
from keras.callbacks import EarlyStopping, ModelCheckpoint
import warnings
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "3"
warnings.filterwarnings('ignore')
import tensorflow as tf
import logging
logging.getLogger('tesorflow').setLevel(logging.FATAL)
delay = 252
window = 60
factor = 15
K = 8.4
sbo = 1.25
sso = 1.25
sbc = 0.75
ssc = 0.5
r = 0.02
tran_cost = 0.0002
leverage = 1.0
start_val = 100
bo = 1
so = -1
X_pd=pd.read_pickle('./data/X_pd.pkl')
X = pd.DataFrame(columns=range(0, window))
Y = []
for tag in X_pd.columns[:1]:
    # i=0 ....len(X_pd.index)-window
    for i in range(0, len(X_pd.index) - window):
        X_example = X_pd.loc[i:i + window - 1][tag].values

        X= X.append(pd.Series(X_example), ignore_index=True)
        Y.append(X_pd.loc[i + window][tag])
    print('done %s stocks' % (tag))
Y=pd.DataFrame(Y)
#normalization
SS = StandardScaler()
features = SS.fit_transform(X.values)
X=features
X=pd.DataFrame(X)
#LSTM model
def trainLSTMModel(layers, neurons, d):
    model = Sequential()

    model.add(LSTM(neurons[0], input_shape=(layers[1], layers[2]), return_sequences=False,activation='relu'))
    #model.add(Dropout(d))

    #model.add(LSTM(neurons[1], input_shape=(layers[1], layers[2]), return_sequences=False))
    #model.add(Dropout(d))

    #model.add(Dense(neurons[2], kernel_initializer="uniform", activation='relu'))
    model.add(Dense(neurons[3], kernel_initializer="uniform", activation='relu'))
    optimizer=adam(learning_rate=0.001)
    #adam = Adam(decay=0.2)
    # predict up and down
    # model.compile(optimizer="adam", loss="binary_crossentropy", metrics=['accuracy'])
    model.compile(loss='mse', optimizer=optimizer)
    model.summary()
    return model
length=X.shape[0]
X=np.array(X)
Y=np.array(Y)
time_step = 60
d = 0.3
output=1
shape = [length,time_step, output] # feature, window, output
neurons = [64, 64, 32, 1]
epochs = 100
batch_size=10000
model = trainLSTMModel(shape, neurons, d)
#shape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
gpu_no = 0
with tf.device('/gpu:' + str(gpu_no)):
#    sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))
#    keras.backend.set_session(sess)

    print('model_manager: running tensorflow version: ' + tf.__version__)
    print('model_manager: will attempt to run on ' + '/gpu:' + str(gpu_no))
    model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size)

The log shows this:

Loading requirement: cuda10.1/toolkit/10.1.243
Loading cm-ml-python3deps/3.3.0
  Loading requirement: gcc5/5.5.0 python36
Loading tensorflow2-py36-cuda10.1-gcc/2.0.0
  Loading requirement: ml-pythondeps-py36-cuda10.1-gcc/3.3.0
    openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py36-cuda10.1-gcc/2.3.1
    protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8
Loading openmpi/cuda/64/3.1.4
  Loading requirement: hpcx/2.4.0
2021-08-18 11:11:43.064175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1

2021-08-18 11:18:08.026219: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-08-18 11:18:08.031771: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2021-08-18 11:18:08.031811: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: node001
2021-08-18 11:18:08.031819: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001
2021-08-18 11:18:08.031921: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1
2021-08-18 11:18:08.031958: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1
2021-08-18 11:18:08.031966: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.73.1
2021-08-18 11:18:08.032266: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
Using TensorFlow backend.
done A stocks
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm_1 (LSTM)                (None, 64)                16896
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65
=================================================================
Total params: 16,961
Trainable params: 16,961
Non-trainable params: 0
_________________________________________________________________
model_manager: running tensorflow version: 2.0.0
model_manager: will attempt to run on /gpu:0
Traceback (most recent call last):
  File "stocks.py", line 99, in <module>
    model.fit(X, Y, epochs=epochs, verbose=2,batch_size=batch_size)
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 1213, in fit
    self._make_train_function()
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/engine/training.py", line 316, in _make_train_function
    loss=self.total_loss)
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 75, in symbolic_fn_wrapper
    return func(*args, **kwargs)
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in get_updates
    for (i, p) in enumerate(params)]
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/optimizers.py", line 519, in <listcomp>
    for (i, p) in enumerate(params)]
  File "/cm/shared/apps/keras-py36-cuda10.1-gcc/2.3.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 963, in zeros
    v = tf.zeros(shape=shape, dtype=dtype, name=name)
  File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2349, in zeros
    output = _constant_if_small(zero, shape, dtype, name)
  File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 2307, in _constant_if_small
    return constant(value, shape=shape, dtype=dtype, name=name)
  File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 227, in constant
    allow_broadcast=True)
  File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 235, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/cm/shared/apps/tensorflow2-py36-cuda10.1-gcc/2.0.0/lib/python3.6/site-packages/tensorflow_core/python/framework/constant_op.py", line 96, in convert_to_eager_tensor
    return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Why is the script not seeing the GPU?

Bhack · August 19, 2021, 2:43pm

Can you try to just list the visibile devices?

Robert_Kudyba · August 19, 2021, 7:43pm

Part of the problem was the code requires TF > 2.0.

The only difference I see is that the user told me he got it to work by adjusting the comment tags as such:

#sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=True))
#keras.backend.set_session(sess)

Now the GPU works.

I also changed:
from keras.optimizers import adam
to
from keras.optimizers import adam_v2
and
optimizer=adam(learning_rate=0.001)
to
optimizer=adam_v2.Adam(learning_rate=0.001)

Before this the logfile blew up to 6 GB with entries like:

2021-08-19 05:08:41.796216: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.
2021-08-19 05:08:41.796223: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SOURCE}}'Will fall back to a default kernel.
2021-08-19 05:08:41.796232: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _SINK}} = NoOp[]()
2021-08-19 05:08:41.796238: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.
2021-08-19 05:08:41.796245: I tensorflow/core/framework/op_kernel.cc:1287] No device-specific kernels found for NodeDef '{{node _SINK}}'Will fall back to a default kernel.
2021-08-19 05:08:41.796255: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod}} = FloorMod[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size)
2021-08-19 05:08:41.796283: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size)
2021-08-19 05:08:41.796303: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]()
2021-08-19 05:08:41.796319: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]()
2021-08-19 05:08:41.796335: I tensorflow/core/framework/op_kernel.cc:1487] Instantiating kernel for node: {{node _send_training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod_0}} = _Send[T=DT_INT32, client_terminated=true, recv_device="/device:CPU:0", send_device="/device:CPU:0", send_device_incarnation=-6529568560417163830, tensor_name="training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod:0"](training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/mod)
2021-08-19 05:08:41.796357: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 0 step -1 {{node _SOURCE}} = NoOp[]() device: /device:CPU:0
2021-08-19 05:08:41.796368: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 4 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size}} = Const[_class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"], dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: 2>]() device: /device:CPU:0
2021-08-19 05:08:41.796378: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 5 step -1 {{node loss/dense_1_loss/mean_squared_error/Mean/reduction_indices}} = Const[dtype=DT_INT32, value=Tensor<type: int32 shape: [] values: -1>]() device: /device:CPU:0
2021-08-19 05:08:41.796390: I tensorflow/core/common_runtime/executor.cc:1717] Process node: 3 step -1 {{node training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/add}} = AddV2[T=DT_INT32, _class=["loc:@loss/dense_1_loss/mean_squared_error/Mean"]](loss/dense_1_loss/mean_squared_error/Mean/reduction_indices, training/Adam/gradients/loss/dense_1_loss/mean_squared_error/Mean_grad/Size) device: /device:CPU:0

Anyways seems to be good now perhaps this will help someone down the line.

Robert_Kudyba · August 20, 2021, 2:53pm

Well in Slurm this still fails

Loading cudnn7.6-cuda10.1/7.6.5.32
  Loading requirement: cuda10.1/toolkit/10.1.243
Loading cm-ml-python3deps/3.3.0
  Loading requirement: gcc5/5.5.0 python36
Loading tensorflow2-py37-cuda10.1-gcc/2.2.0
  Loading requirement: python37 ml-pythondeps-py37-cuda10.1-gcc/4.1.2
    openblas/dynamic/0.2.20 hdf5_18/1.8.20 keras-py37-cuda10.1-gcc/2.3.1
    protobuf3-gcc/3.8.0 nccl2-cuda10.1-gcc/2.7.8
Loading openmpi/cuda/64/3.1.4
  Loading requirement: hpcx/2.4.0
2021-08-20 10:36:18.057370: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Using TensorFlow backend.
Traceback (most recent call last):
  File "stocks.py", line 9, in <module>
    from keras.models import Sequential
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/__init__.py", line 3, in <module>
    from . import utils
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/__init__.py", line 6, in <module>
    from . import conv_utils
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/utils/conv_utils.py", line 9, in <module>
    from .. import backend as K
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/__init__.py", line 1, in <module>
    from .load_backend import epsilon
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/load_backend.py", line 90, in <module>
    from .tensorflow_backend import *
  File "/cm/shared/apps/keras-py37-cuda10.1-gcc/2.3.1/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 5, in <module>
    import tensorflow as tf
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 64, in <module>
    from tensorflow.python.framework.framework_lib import *  # pylint: disable=redefined-builtin
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/framework_lib.py", line 24, in <module>
    from tensorflow.python.framework.device import DeviceSpec
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device.py", line 24, in <module>
    from tensorflow.python.framework import device_spec
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/framework/device_spec.py", line 21, in <module>
    from tensorflow.python.util.tf_export import tf_export
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py", line 48, in <module>
    from tensorflow.python.util import tf_decorator
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_decorator.py", line 64, in <module>
    from tensorflow.python.util import tf_stack
  File "/cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/util/tf_stack.py", line 28, in <module>
    from tensorflow.python import _tf_stack
ImportError: /cm/shared/apps/tensorflow2-py37-cuda10.1-gcc/2.2.0/lib/python3.7/site-packages/tensorflow/python/_tf_stack.so: undefined symbol: PyThread_tss_set

Is this a known issue with TF 2.2.0?

Bhack · August 20, 2021, 4:12pm

Does it work with TF 2.6.0?

Robert_Kudyba · August 20, 2021, 5:38pm

When I run this directly on a node which has Python 3.6 and TF 2.6 yes I get expected results: Is there a way to get the earlier TF/Keras to work with this?

done A stocks
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm (LSTM)                  (None, 64)                16896
_________________________________________________________________
dense (Dense)                (None, 1)                 65
=================================================================
Total params: 16,961
Trainable params: 16,961
Non-trainable params: 0
_________________________________________________________________
model_manager: running tensorflow version: 2.6.0
model_manager: will attempt to run on /gpu:0
Epoch 1/100
7/7 - 36s - loss: 38939.2383
Epoch 2/100
7/7 - 17s - loss: 38939.2383
Epoch 3/100

Bhack · August 20, 2021, 6:58pm

I don’t know but generally we have a support Policy for older versions, and so patch releases, only for security bugs.
So I suggest you to use an updated version of TF.

Robert_Kudyba · August 20, 2021, 7:20pm

Even with 2.6 I see this error:

  Loading requirement: hpcx/2.4.0
2021-08-20 14:23:09.943253: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li
bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cm/
shared/apps/openmpi/cuda/64/3.1.4/lib:/cm/shared/apps/hpcx/2.4.0/sharp/lib:/cm/shared/apps/hpcx/2.4.0/hcoll/lib:/cm/shared/app
s/hpcx/2.4.0/ucx/lib:/cm/shared/apps/cudnn7.6-cuda10.2/7.6.5.32/lib64:/cm/shared/apps/cuda10.2/toolkit/10.2.89/targets/x86_64-
linux/lib:/cm/shared/apps/cuda10.1/toolkit/10.1.243/extras/CUPTI/lib64:/cm/local/apps/cuda/libs/current/lib64:/cm/shared/apps/
cuda10.1/toolkit/10.1.243/targets/x86_64-linux/lib:/cm/local/apps/python3/lib:/cm/shared/apps/gcc5/5.5.0/lib64:/cm/shared/apps
/gcc5/5.5.0/lib32:/cm/shared/apps/gcc5/5.5.0/lib:/cm/shared/apps/slurm/20.11.3/lib64/slurm:/cm/shared/apps/slurm/20.11.3/lib64
:/cm/local/apps/gcc/8.2.0/lib:/cm/local/apps/gcc/8.2.0/lib64:/cm/shared/apps/openmpi/gcc/64/1.10.7/lib64
2021-08-20 14:23:09.943288: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not hav
e a GPU set up on your machine.
2021-08-20 14:24:41.582692: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u
nknown error
2021-08-20 14:24:41.582920: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information
for host: node001
2021-08-20 14:24:41.582935: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001
2021-08-20 14:24:41.583068: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1
2021-08-20 14:24:41.583108: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1
2021-08-20 14:24:41.583115: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.
73.1
2021-08-20 14:24:41.583609: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA
PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512
F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-20 14:24:41.871823: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass
es are enabled (registered 2)
WARNING: Logging before flag parsing goes to stderr.
W0820 14:24:42.032056 46912496384256 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc
als>.train_function at 0x2aab736d7f28> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY
=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

Bhack · August 20, 2021, 7:25pm

It is a problem with your env setup as TF doesn’t find CUDA libraries in your system paths:

W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'li
bcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
``

Robert_Kudyba · August 20, 2021, 7:48pm

Sorry I should’ve posted more of the logs. The CUDA diagnostic does appear to find CUDA. Just not the GPU.

2021-08-20 15:21:38.393015: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_UNKNOWN: u
nknown error
2021-08-20 15:21:38.393070: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information
for host: node001
2021-08-20 15:21:38.393081: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: node001
2021-08-20 15:21:38.393208: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 460.73.1
2021-08-20 15:21:38.393248: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 460.73.1
2021-08-20 15:21:38.393256: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 460.
73.1
2021-08-20 15:29:06.834136: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneA
PI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512
F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-20 15:29:07.343075: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Pass
es are enabled (registered 2)
WARNING: Logging before flag parsing goes to stderr.
W0820 15:29:07.578475 46912496383040 ag_logging.py:146] AutoGraph could not transform <function Model.make_train_function.<loc
als>.train_function at 0x2aab74dc1840> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY
=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
done A stocks
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
lstm (LSTM)                  (None, 64)                16896
_________________________________________________________________
dense (Dense)                (None, 1)                 65

Is that error Cause: 'arguments' object has no attribute 'posonlyargs' just a re herring?

Bhack · August 20, 2021, 10:50pm

I see that CUDA has failed to initialize. Your environment is not in good shape.

We had many CUDA setup issues in the repo like:

github.com/tensorflow/tensorflow

"failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error" unless running with sudo

opened 04:18PM - 18 Sep 19 UTC

closed 01:16AM - 23 Sep 19 UTC

josei

stat:awaiting tensorflower type:build/install type:support comp:gpu TF 1.14

**System information** - OS Platform and Distribution (e.g., Linux Ubuntu 16.04…): ClearLinux 31030 - Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: - - TensorFlow installed from (source or binary): binary - TensorFlow version: tensorflow-gpu 1.14.0 - Python version: 3.7.4 - Installed using virtualenv? pip? conda?: pyenv's pip - Bazel version (if compiling from source): - - GCC/Compiler version (if compiling from source): - - CUDA/cuDNN version: cuda_10.0.130_410.48_linux, cudnn-10.0-linux-x64-v7.6.3.30 - GPU model and memory: Geforce RTX 2060, 6GB RAM **Describe the problem** The error "failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error" is thrown when initializing tensorflow-gpu, falling back to CPU instead of GPU. When running python with sudo, GPU is detected but libraries cannot be opened. A subsequent run without sudo works, enabling GPU being used. I don't understand why running with sudo is needed to enable future calls without sudo work. The specific output is: $ python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:26.342297: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:26.364789: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:26.365673: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b4f7008c70 executing computations on platform Host. Devices: 2019-09-18 17:49:26.365685: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:26.387440: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:26.397897: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error 2019-09-18 17:49:26.397918: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: linux 2019-09-18 17:49:26.397923: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: linux 2019-09-18 17:49:26.397950: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.50.0 2019-09-18 17:49:26.397965: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.50.0 2019-09-18 17:49:26.397969: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.50.0 2019-09-18 17:49:26.399634: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device $ sudo python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:33.476640: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:33.492804: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:33.493345: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8b43539a0 executing computations on platform Host. Devices: 2019-09-18 17:49:33.493356: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:33.494037: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:33.525519: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:33.525827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:01:00.0 2019-09-18 17:49:33.525900: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.525942: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.525980: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526017: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526055: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526092: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526130: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory 2019-09-18 17:49:33.526135: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-09-18 17:49:33.614070: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-18 17:49:33.614090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-09-18 17:49:33.614095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-09-18 17:49:33.615437: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:33.615752: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55b8b71df500 executing computations on platform CUDA. Devices: 2019-09-18 17:49:33.615761: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5 2019-09-18 17:49:33.616523: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device $ python -c "import tensorflow as tf; tf.Session(config=tf.ConfigProto(log_device_placement=True))" 2019-09-18 17:49:38.343247: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-09-18 17:49:38.359840: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz 2019-09-18 17:49:38.360790: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d522b77c70 executing computations on platform Host. Devices: 2019-09-18 17:49:38.360803: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-09-18 17:49:38.361478: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-09-18 17:49:38.377924: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.378224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce RTX 2060 major: 7 minor: 5 memoryClockRate(GHz): 1.755 pciBusID: 0000:01:00.0 2019-09-18 17:49:38.382955: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-09-18 17:49:38.426461: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0 2019-09-18 17:49:38.452107: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0 2019-09-18 17:49:38.468904: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0 2019-09-18 17:49:38.517258: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0 2019-09-18 17:49:38.545852: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0 2019-09-18 17:49:38.660617: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-09-18 17:49:38.660684: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.661018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.661283: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0 2019-09-18 17:49:38.661304: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 2019-09-18 17:49:38.727136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-09-18 17:49:38.727157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-09-18 17:49:38.727164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-09-18 17:49:38.727262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.727564: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.727848: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-09-18 17:49:38.728115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5451 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2019-09-18 17:49:38.729363: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55d525ddbe50 executing computations on platform CUDA. Devices: 2019-09-18 17:49:38.729373: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5 2019-09-18 17:49:38.730199: I tensorflow/core/common_runtime/direct_session.cc:296] Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5 /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device Device mapping: /job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device /job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5 /job:localhost/replica:0/task:0/device:XLA_GPU:0 -> device: XLA_GPU device **Any other info / logs** CUDA and Nvidia drivers installed at /opt following [ClearLinux guide](https://docs.01.org/clearlinux/latest/tutorials/nvidia.html). $ echo $LD_LIBRARY_PATH /usr/local/cuda/lib64: $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2018 NVIDIA Corporation Built on Sat_Aug_25_21:08:01_CDT_2018 Cuda compilation tools, release 10.0, V10.0.130 $ nvidia-smi Wed Sep 18 17:58:36 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A | | 0% 50C P8 9W / 170W | 152MiB / 5931MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 651 G /usr/bin/X 39MiB | | 0 789 G /usr/bin/gnome-shell 111MiB | +-----------------------------------------------------------------------------+

Robert_Kudyba · August 21, 2021, 12:56am

Well kind of. We use Bright Cluster with Slurm. So on our head node we use a “SBATCH” file (Slurm batch) that calls modules. TF 2.6 is not yet available in Bright’s packages. I used pip to install TF 2.6 on a node in Python 3. So now I exclude the call to the TF module in the SBATCH file and let Slurm auto-magically find that TF 2.6 I installed. It looks like we also needed CUDA 11 or greater. For now it’s running but without the GPU.

Topic		Replies	Views
New to Tensorflow and Keras - Cant get GPU to work General Discussion gpu	2	2048	October 25, 2023
Not able to run my code on gpu General Discussion gpu	1	328	January 22, 2024
Unicode decode error when trying to train model TensorFlow models , object-detection	5	874	October 30, 2023
You must feed a value for placeholder tensor 'gradients/.../split_dim' with dtype int32 General Discussion models , keras	7	4996	May 24, 2023
TF encountered strange errors when using GPU General Discussion gpu , tensorflow	1	69	May 19, 2024

SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device

Related topics