Unable to train Mask R-CNN - checkpoint version conflict

John_Phillip · June 12, 2023, 2:22am

Describe the bug
While using TensorFlow Object Detection API, I’m experiencing an issue with a pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model. When attempting to fine-tune this model for my custom task, I receive an error regarding missing variables even though the specified checkpoint seems to contain the appropriate parameters for this model.

To Reproduce
Steps to reproduce the behavior:

Download the pre-trained Mask R-CNN Inception ResNet V2 1024x1024 model from the TensorFlow Model Zoo.
Set up a custom training pipeline configuration, specifying the path to the downloaded checkpoint in the fine_tune_checkpoint field.
Run the model training script (model_main_tf2.py).
The error appears indicating some variables from the checkpoint are not found in the model.

Traceback (most recent call last): File "/content/models/research/object_detection/model_main_tf2.py", line 114, in <module> tf.compat.v1.app.run() File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/platform/app.py", line 36, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run _run_main(main, args) File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/content/models/research/object_detection/model_main_tf2.py", line 105, in main model_lib_v2.train_loop( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 605, in train_loop load_fine_tune_checkpoint( File "/usr/local/lib/python3.10/dist-packages/object_detection/model_lib_v2.py", line 398, in load_fine_tune_checkpoint raise ValueError('Checkpoint version should be V2') ValueError: Checkpoint version should be V2

Expected behavior
I expect the model training to begin by loading weights from the specified pre-trained model. The error seems to suggest a mismatch between the model architecture defined in my pipeline and the architecture of the pre-trained model. Still, my pipeline configuration appears to be correctly set up for the Mask R-CNN Inception ResNet V2 1024x1024 model.

Desktop (please complete the following information):

OS: MacOS 13.4 (22F66)
Browser Safari
Version 16.5 (18615.2.9.11.4)

N.B: I am using Google Colab Pro

Additional context

Upon inspecting the checkpoint file with inspect_checkpoint.py, it does appear to contain all the expected variables for a Mask R-CNN Inception ResNet V2 1024x1024 model. I also confirmed that the downloaded files include ckpt-0.index, ckpt-0.data-00000-of-00001, and checkpoint. Yet, the issue persists. Any guidance or solutions to this problem would be greatly appreciated.

Laxma_Reddy_Patlolla · June 12, 2023, 9:03pm

Hi @John_Phillip ,

I can see that you are using a research model for your object detection training, those models contain some deprecated lines of code. Tensorflow does not officially support research models.

I recommend you to use the official tensorflow/models for your use case. Please refer to this instance segmentation with Model Garden using official models and let us know if you facing any errors?

Thank You.

Topic		Replies	Views
Error in checkpoint.py in Tensorflow2 on custom dataset using pre-built FRCNN model General Discussion datasets	1	1043	March 26, 2024
Tensorflow Model Garden tutorial TensorFlow model_garden	5	517	January 5, 2024
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error: TensorFlow models , object-detection	1	866	October 11, 2023
Evaluating in TensorFlow Object Detection API - AttributeError General Discussion model_garden , help_request	1	2062	November 8, 2022
Tensorflow model garden with different models General Discussion models , model_garden , tensorflow	3	537	January 10, 2024

Unable to train Mask R-CNN - checkpoint version conflict

Related topics