Hello,
I ran a image_classifier.create(train_data, validation_data=validation_data, epochs=70)
I wanted to add more epochs and carry-forward the learning gained by 70.
How do I do that?
Hello,
I ran a image_classifier.create(train_data, validation_data=validation_data, epochs=70)
I wanted to add more epochs and carry-forward the learning gained by 70.
How do I do that?
Did you try
image_classifier.create(train_data, validation_data=validation_data, model_dir = āckp_pathā, epochs=20)
@Kzyh donāt I need to pass use_hub_library
as False if I use āmodel_dirā?
@Kzyh in addition to my previous commentā¦ how do I create a model checkpoint directory to begin with?
You can use model.export
with the right param
Yes
@Bhack are you referring to ExportFormat.SAVED_MODEL
It is not clear if it is a ācheckpointā or the finalized model and it if makes a difference.
I tried this, but it doesnāt seem to work. The accuracy does not improve at all between the runs, whereas within 10 epochs it used to reach > 90 % earlier.
image_classifier.create(train_data, validation_data=validation_data, epochs=1)
model.export(export_dir=ā.ā, export_format=ExportFormat.SAVED_MODEL)
model = image_classifier.create(train_data, validation_data=validation_data, epochs=1, model_dir=ā.ā, use_hub_library=False)
model.export(export_dir=ā.ā, export_format=ExportFormat.SAVED_MODEL)
model = image_classifier.create(train_data, validation_data=validation_data, epochs=1, model_dir=ā.ā, use_hub_library=False)
ā¦(9 times in total)
Note that I tried both ā.ā and ā./saved_modelā in the create() call, but neither worked.
@Bhack pls help. I tried the default export format too but that didnt work either
It is not explicitly documented but I suppose that model_dir
it is only used for the keras callback to save the model and not for init the model weight. See
I suppose that it is a Feature request for this API
Doesnt it save ckpt file every 20 epochs? You can check it like that:
model = image_classifier.create(train_data, validation_data=validation_data, epochs=30, model_dir=ā.ā, use_hub_library=False)
Then it should load this ckpt.
Or you can edit train_image_classifier_lib.py file to save after every epoch.
Yes but he want to load the check-point as the initial state and this is a FR for this API.
I think a simple PR could support this bootstrap.
You can open a ticket or a PR if you want to contribute to this feature.
@Kzyh I think you are talking about the notebookās auto-save? I donāt see the model being saved - there is not additional folders or files being auto-created that I see, unless it is in some other /var like folder.
@Bhack is right. It wont load checkpoint file.
You can try adding this to train_image_classifier_lib.py in train_model method.
status = tf.train.latest_checkpoint(hparams.model_dir)
if status:
ācheckpoint = tf.train.Checkpoint(model)
ācheckpoint.restore(checkpoint_path)
Hello all,
I have exactly the same problem. I try to continue the training or train the model with additional data, but I canāt find a solution.
Could anyone already find a solid solution or can say if @Kzyh 's solution works?
The search has already taken me days and I am grateful for any help.
Kind regards Daniel
I have tried this suggestion but it does not work.
Is there something missing or where exactly does the code need to be placed in the function?
I am grateful for any help!
I looked at the codebase in detail, I donāt think the incremental training will work without major changes in the API. Best bet is to use the non-lite version of Tensor Flow and make a non-lite model and then convert that to TFLITE model - I havenāt tried it myself so I donāt know what hurdles await there. Overall, given my recent experience with TensorFlow Lite Model Maker, I think the API is quite poor (just like the non-lite version) and the documentation is misleading or wrong in places. The configurations do not have reasonable defaults (Shuffle is False for example), Augmentation is not well documented and reading the code it seems like it has random crop for example which may just leave out the subject from the image and cause false examples, as another example. It seems like the forum complaints do not affect any change with the authors of the API at Google. Given this, I am will probably switch to some other API.
Hi Marcus,
I guess for your use case (have the checkpoint and keep training later) Iād go straight to the regular Keras API. Create a simple model and do the transfer learning.
for this use case, thereās this tutorial here: Retraining an Image Classifier | TensorFlow Hub
It even shows how to convert the model to TFLite later.
hope it helps
Yes that is essentially what I have been doing. I made some tweaks to accommodate for Google Colab Pro dying on me as well. To help out @Daniel_Kuhn here are some snippets:
# Mount the Google Drive of 'XXX@gmail.com' account so we can access 'TRAINING_DATA.zip'
from google.colab import drive
drive.mount('/content/drive')
# Extract the training data from Google Drive
!unzip /content/drive/MyDrive/TRAINING_DATA.zip
# Path on Google Drive where we will save the model
SAVED_MODEL_PATH = f"./drive/MyDrive/my_saved_model_{model_name}"
# 5 epochs at a time
for i in range(0, 5):
hist = model.fit(
train_ds,
epochs=1, steps_per_epoch=steps_per_epoch,
validation_data=val_ds,
validation_steps=validation_steps).history
# Save the model to Drive after every epoch, so that it can be reloaded to continue training, if the notebook dies
model.save(SAVED_MODEL_PATH)
# If the Google Colab dies, remount the Google drive as before, reload the saved model and continue training
model = tf.keras.models.load_model(SAVED_MODEL_PATH)
Rest of the code is similar to what @lgusm pointed us to: Retraining an Image Classifier | TensorFlow Hub
Colab Free-tierā¦ A bit of a laugh that the platform in the middle of training comes back withā¦ Hey, are you there? Please confirm or env is disconnectedā¦
I know itās free but free easily ends up taking more time than available hours of the day soā¦ I upgraded to Pro and now its working for meā¦ Free?? Noā¦
Hi everyone, for people that wants to keep training the model with more data when using Model Maker, take a look on this thread: https://tensorflow-prod.ospodiscourse.com/t/how-to-stop-and-resume-object-detector-training-object-detection-model-maker/3487/14?u=lgusm
thereās a work around and a PR to address this!