TFlite model maker object_detector.create hang

Alvaro_Tester · September 11, 2021, 6:21am

Firstly, some details on my software/ hardware:
TF 2.5.0
TF model maker 0.3.2
GPU RX3080 16GB
32GB of DRAM
Ubuntu 18.04 LTS
Nvidia Driver Version: 471.68 CUDA Version: 11.4

I have been trying to train a custom model using “efficientdet_lite1” spec but somehow it always hang randomly mid way in
model = object_detector.create() call
and not always in the same epoch runs:
e.g.
109/176 [=================>…] - ETA: 46s - det_loss: 0.4381 - cls_loss: 0.2731…

I tried to debug by adding
tf.get_logger().setLevel(‘DEBUG’)

but nothing is printed when the hang happens.

Any hint welcome?

I have also tried TF 2.6.0, nightly-builds, etc

Alvaro_Tester · September 15, 2021, 11:53am

I think I may have found a fix.
After reading

I tried setting
CUDA_LAUNCH_BLOCKING =1

And I completed my training without any issue! Hope I can help anyone having the same issue.

Topic		Replies	Views
Aborted (core dumped) tflite model maker int8 quantised model General Discussion tflite , models	1	684	February 8, 2024
How to stop and resume object detector training(object detection model maker) General Discussion training , models , tflite , help_request , model_maker	8	5195	January 18, 2022
Tflite-model-maker generated SAVED_MODEL is not working General Discussion tflite , model_maker	1	976	January 5, 2022
Stuck on Use `tf.cast` instead. Object Detection Model General Discussion colab , tfdata , model-training	1	488	January 8, 2024
Need help training with ModelMaker & Cloud TPU in Colab General Discussion tpu , models , help_request , model_maker , tfhub	2	2128	August 29, 2021

TFlite model maker object_detector.create hang

Related topics