Hello everyone,
I’m trying to train this sequential model with an LSTM layer, but I’m getting errors. I’ve tried reducing the number of cells, but I still get errors. However, this error doesn’t occur with a standard RNN layer (SimpleRNN), but only with the LSTM and GRU layers.
Does anyone have an idea how to remedy this?
PS: I’m using version 2.16.1 of Tensorflow and 12.0 of cuda.
thank you for your feedback
Best regards,
#Definition
sequence_length = 120
features_len = 6
##Model
model1=keras.models.Sequential()
model1.add(keras.layers.InputLayer(input_shape=(sequence_length,features_len)))
model1.add(keras.layers.LSTM(32,return_sequences=False))
model1.add(keras.layers.Dense(120))
model1.compile(optimizer="rmsprop", loss="mse")
model1.summary()
history1 = model1.fit(dataset_train,
epochs=10,
validation_data=dataset_val
)
Epoch 1/10
2024-05-13 15:01:12.015551: E external/local_xla/xla/stream_executor/dnn.cc:1158] <unknown cudnn status: 14>
in external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc(2394): 'cudnnRNNBackwardData_v8( cudnn.handle(), rnn_desc.handle(), reinterpret_cast<const int*>(seq_lengths_data.opaque()), output_desc.data_handle(), output_data.opaque(), output_backprop_data.opaque(), input_desc.data_handle(), input_backprop_data->opaque(), input_h_desc.handle(), input_h_data.opaque(), output_h_backprop_data.opaque(), input_h_backprop_data->opaque(), input_c_desc.handle(), input_c_data.opaque(), output_c_backprop_data.opaque(), input_c_backprop_data->opaque(), rnn_desc.ParamsSizeInBytes(), params.opaque(), workspace.size(), workspace.opaque(), reserve_space_data->size(), reserve_space_data->opaque())'
2024-05-13 15:01:12.015582: W tensorflow/core/framework/op_kernel.cc:1839] OP_REQUIRES failed at cudnn_rnn_ops.cc:2192 : INTERNAL: Failed to call DoRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 6, 32, 1, 120, 256, 32]
2024-05-13 15:01:12.015596: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: INTERNAL: Failed to call DoRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 6, 32, 1, 120, 256, 32]
[[{{function_node __inference_one_step_on_data_7252}}{{node gradient_tape/sequential_4_1/lstm_4_1/CudnnRNNBackpropV3}}]]
---------------------------------------------------------------------------
InternalError Traceback (most recent call last)
Cell In[24], line 1
----> 1 history1 = model1.fit(dataset_train,
2 epochs=10,
3 validation_data=dataset_val
4 )
File ~/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
119 filtered_tb = _process_traceback_frames(e.__traceback__)
120 # To get the full stack trace, call:
121 # `keras.config.disable_traceback_filtering()`
--> 122 raise e.with_traceback(filtered_tb) from None
123 finally:
124 del filtered_tb
File ~/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
51 try:
52 ctx.ensure_initialized()
---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
54 inputs, attrs, num_outputs)
55 except core._NotOkStatusException as e:
56 if name is not None:
InternalError: Graph execution error:
Detected at node gradient_tape/sequential_4_1/lstm_4_1/CudnnRNNBackpropV3 defined at (most recent call last):
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/runpy.py", line 197, in _run_module_as_main
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/runpy.py", line 87, in _run_code
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel_launcher.py", line 17, in <module>
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/traitlets/config/application.py", line 992, in launch_instance
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 701, in start
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 195, in start
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/asyncio/events.py", line 80, in _run
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 534, in dispatch_queue
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 523, in process_one
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 429, in dispatch_shell
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 767, in execute_request
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 429, in do_execute
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 549, in run_cell
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3024, in run_cell
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3079, in _run_cell
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3284, in run_cell_async
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3466, in run_ast_nodes
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3526, in run_code
File "/tmp/ipykernel_12737/1001880985.py", line 1, in <module>
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 117, in error_handler
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/backend/tensorflow/trainer.py", line 314, in fit
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/backend/tensorflow/trainer.py", line 117, in one_step_on_iterator
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/backend/tensorflow/trainer.py", line 104, in one_step_on_data
File "/home/otakagle/anaconda3/envs/tf1-gpu/lib/python3.9/site-packages/keras/src/backend/tensorflow/trainer.py", line 66, in train_step
Failed to call DoRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 6, 32, 1, 120, 256, 32]
[[{{node gradient_tape/sequential_4_1/lstm_4_1/CudnnRNNBackpropV3}}]] [Op:__inference_one_step_on_iterator_7283]