I am working on object detection with autonomous datasets . I want to train my model with 10000 train images,2000 test,2000 validation images.So, i will use object detection tensorflow lite model maker.
But the large dataset and batch size of 32, the training takes 50 epochs and takes 2 days(Step 3).I can’t keep my computer on for two days.I am running the project in jupyter notebook
How can i stop model training and again resume it ? (e.g. stop the 10th epoch and continue one day later)
I don’t think you can do that. Model Maker, as of today, doesn’t have a stop and resume option.
you have a couple of options:
make sure you’re using a GPU for training. This makes a huge difference in execution time
run the same notebook on the cloud (eg: GCP) with a higher spec machine. This way you can keep the machine turned on during the process. The drawback is that you have to pay
Object Detection is a complex task and it’s expected that it would take a long time to finish, even with top HW spec.
Do you have any plans to introduce support for resuming training from a mode previously trained/created using TFLiteModelMaker?
I often have a situation where training data is acquired continuously from existing camera installations. It would be a great feature to be able to use a previously trained model as baseline when continuing the training with more and new data.
For example an option to pass the path to an existing checkpoint when calling tflite_model_makerobject_detector.create() ?
Is there any update on ability of Model Maker, as featured in EfficientDet Tutorial to resume from a checkpoint? I notice that the current version of EfficientDetLiteXSpec() takes an argument for a “model_dir” . When set, object_detector.create() dutifully records checkpoints as it is training. Is there any other use for these checkpoints (other than resuming from a checkpoint)?
I made a workaround to allow resuming from a checkpoint saved in model_dir by manually calling tf.keras.models.load_weights({checkpoint_path}) on the model before starting to train again.
The quickest way if you want to try it is to install TFLiteModelMaker as source in pip and add:
model.load_weights({checkpoint_path}), in the train() function, just before the call to model.fit() in object_detector_spec.py