Bilstm and lstm Graph execution error

nekon · September 7, 2023, 1:00pm

Hi, I want to use BiLSTM mode for text classification tasks. I use a data generator to get already batched and embedded files that have been split into 64 different files(for training) and 4 files each for test and valid. The shape of the numpy.array is (512,768), which I have used in the CNN task before and works fine.
However, when I try to use it in the BiLTSM model below

BiLSTM_model = tf.keras.models.Sequential(name = "BiLSTM_model")
BiLSTM_model.add(layers.Bidirectional(layers.LSTM(300), input_shape = (512,768)))
BiLSTM_model.add(layers.Dense(300, activation='relu'))
BiLSTM_model.add(layers.Dropout(0.5))
BiLSTM_model.add(layers.Dense(300, activation='relu'))
BiLSTM_model.add(layers.Dropout(0.5))
BiLSTM_model.add(layers.Dense(1, activation='sigmoid'))
adam = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)
BiLSTM_model.compile(optimizer=adam, loss='binary_crossentropy', metrics=["accuracy"])
BiLSTM_model.summary()

I get this error immediately and do not start the epoch. when I try on this fit

history = BiLSTM_model.fit(train_data_gen, 
                        steps_per_epoch = 64, 
                        epochs = 20, 
                        validation_data = valid_data_gen,
                        validation_steps=4,
                        verbose = 1)

InternalError: Graph execution error:

Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 768, 300, 1, 512, 1840, 300] 
	 [[{{node CudnnRNN}}]]
	 [[BiLSTM_model/bidirectional/backward_lstm/PartitionedCall]] [Op:__inference_train_function_5912]

So I thought the model was too large, so I reduced the model to LSTM and tried it again with one step for train and validation.
model

#model based on VuldeePecker model
BiLSTM_model = tf.keras.models.Sequential(name = "BiLSTM_model")
BiLSTM_model.add(layers.LSTM(128, input_shape = (512,768)))
BiLSTM_model.add(layers.Dense(1, activation='sigmoid'))
adam = tf.keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999)
BiLSTM_model.compile(optimizer=adam, loss='binary_crossentropy', metrics=["accuracy"])
BiLSTM_model.summary()

it able to run one epoch but still crash and stop working with this error

InternalError: Graph execution error:

Epoch 1/20
1/1 [==============================] - ETA: 0s - loss: 0.6879 - accuracy: 0.9179
---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
Cell In[8], line 1
----> 1 history = BiLSTM_model.fit(train_data_gen, 
      2                         steps_per_epoch = 1, 
      3                         epochs = 20, 
      4                         validation_data = valid_data_gen,
      5                         validation_steps=1,
      6                         verbose = 1)
      7 CNN_model.save("./finish_model/w2v_BiLSTM_model.h5")
      8 with open("./history/w2v_BiLSTM_history.pkl", "wb") as file_pi:

File ~\AppData\Local\anaconda3\envs\python39\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~\AppData\Local\anaconda3\envs\python39\lib\site-packages\tensorflow\python\eager\execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     52 try:
     53   ctx.ensure_initialized()
---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     55                                       inputs, attrs, num_outputs)
     56 except core._NotOkStatusException as e:
     57   if name is not None:

InternalError: Graph execution error:

Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 768, 128, 1, 512, 3666, 128] 
	 [[{{node CudnnRNN}}]]
	 [[BiLSTM_model/lstm/PartitionedCall]] [Op:__inference_test_function_3644]

I use Windows 10 OS with Nvidia GeForce RTX 3090 and cuda 12.2. Tensorflow v2.10.1 and Keras v2.10.0 in python 3.9.17 in jupyter notebook

any help would be appliciate!
Thank you in advance.

Renu_Patel · October 19, 2023, 4:28pm

Hi @nekon

Welcome to the TensorFlow Forum!

It seems you have installed the incompatible version of CUDA with TensorFlow version 2.10 which might be causing the above error. Please try again by installing the compatible version of cuDNN 8.1 and CUDA 11.2 for TensorFlow 2.10 as mentioned in this tested build configuration.

Let us know if the issue still persists. Thank you.

Topic		Replies	Views
LSTM layer Backprop error in Tensorflow General Discussion lstm , tensorflow	3	148	June 10, 2024
No OpKernel was registered to support Op 'CudnnRNN' used by {{node CudnnRNN}} General Discussion tf-directml-plugin , gpu , help_request	1	1558	June 9, 2023
SLURM errors: failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error; GPU:0 unknown device General Discussion keras , gpu , help_request	11	4932	August 21, 2021
Cannot run on Nvidia GPU General Discussion nlp , keras , gpu , windows , help_request	9	6167	February 2, 2022
Error when train on 2GPUs General Discussion distributed-training	1	445	March 13, 2023

Bilstm and lstm Graph execution error

Related topics