I have created a virtual environment which contains Tensorflow 2.11. I am using this virtual environment’s python as spark driver and executor python. The venv zip has been added as an archive in PySpark. I am reading tf record files from s3 and created a tf dataset called train_dataset
.
Calling model.fit
:
history = model.fit(
train_dataset,
epochs=epochs,
steps_per_epoch=train_steps,
validation_data=val_dataset,
validation_steps=val_steps,
callbacks=callbacks
)
is throwing the below error:
File "/home/hadoop/environment/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/hadoop/environment/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
**tensorflow.python.framework.errors_impl.PermissionDeniedError:
Graph execution error:**
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNextAsOptional]] [Op:__inference_train_function_4621]
This is running on EMR serverless, I’m not sure why the permission error is popping up. Any suggestions/help is appreciated.