Error in checkpoint.py in Tensorflow2 on custom dataset using pre-built FRCNN model

seekme_94 · January 17, 2023, 4:01am

I am using TF2 for teeth classification using panoramic X-ray images. The images have been reduced to 640x640 and the annotations scaled accordingly.

I tried the following sequence on Colab:

!python object_detection/builders/model_builder_tf2_test.py
!python generate_tfrecord.py --csv_input=images/train/annotation.txt --image_dir=images/train --output_path=train.record
!python generate_tfrecord.py --csv_input=images/test/annotation.txt --image_dir=images/test --output_path=test.record
model_name = faster_rcnn_resnet50
!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/frcnn-resnet152-640
!tar -xf faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.tar.tg

The config file:

num_classes = 6
batch_size = 8
num_steps = 100
num_eval_steps = 100
min_dimension, max_dimension = WIDTH, WIDTH
first_stage_nms_iou_threshold = 0.4
model_name = 'frcnn-resnet152-640'
base_pipeline_file = 'faster_rcnn_resnet50_v1_640x640_coco17_tpu-8.config'

pipeline_filename = '/content/tensorflow-models/research/object_detection/configs/tf2/' + base_pipeline_file
fine_tune_checkpoint = '/content/tensorflow-models/research/object_detection/' + model_name + '/checkpoint/ckpt-0'

# Read a sample config
with open(pipeline_filename) as f:
	config = f.read()

with open('model_config.config', 'w') as f:
	# Set labelmap path
	config = re.sub('label_map_path: ".*?"', 
					'label_map_path: "labelmap.txt"', config)

	# Set checkpoint path
	config = re.sub('fine_tune_checkpoint: ".*?"',
					'fine_tune_checkpoint: "{}"'.format(fine_tune_checkpoint), config)  
	# Set fine-tune checkpoint type to detection
	#config = re.sub('fine_tune_checkpoint_type: "classification"', 
	#            'fine_tune_checkpoint_type: "{}"'.format('detection'), config)
	config = re.sub('fine_tune_checkpoint_type: "detection"', 
				'fine_tune_checkpoint_type: "{}"'.format('classification'), config)
	# Set train tf-record file path
	config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")', 
					'input_path: "train.record"', config)
	# Set test tf-record file path
	config = re.sub('(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")', 
					'input_path: "test.record"', config)
	# Set number of classes.
	config = re.sub('num_classes: [0-9]+', 
					'num_classes: {}'.format(num_classes), config)
	# Set batch size
	config = re.sub('batch_size: [0-9]+',
					'batch_size: {}'.format(batch_size), config)
	# Set training steps
	config = re.sub('num_steps: [0-9]+',
					'num_steps: {}'.format(num_steps), config)  
	# Set dimensions
	config = re.sub('min_dimension: [0-9]+',
					'min_dimension: {}'.format(max_dimension), config)
	config = re.sub('max_dimension: [0-9]+',
					'max_dimension: {}'.format(min_dimension), config)

	# Set training steps
	config = re.sub('first_stage_nms_iou_threshold: [0-9](.)[0-9]+',
					'first_stage_nms_iou_threshold: {}'.format(first_stage_nms_iou_threshold), config)  

	f.write(config)

Training command:

model_dir = 'training/'
pipeline_config_path = 'model_config.config'
!python /content/tensorflow-models/research/object_detection/model_main_tf2.py \
	--pipeline_config_path={pipeline_config_path} \
	--model_dir={model_dir} \
	--alsologtostderr \
	--num_train_steps={num_steps} \
	--sample_1_of_n_eval_examples=1 \
	--num_eval_steps={num_eval_steps}

When executing the training script, I get this error:

...
W0115 08:34:06.106962 139783846991616 deprecation.py:350] From /usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py:1176: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

Traceback (most recent call last):
  File "/content/tensorflow-models/research/object_detection/model_main_tf2.py", line 114, in <module>
	tf.compat.v1.app.run()
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/platform/app.py", line 36, in run
	_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 308, in run
	_run_main(main, args)
  File "/usr/local/lib/python3.8/dist-packages/absl/app.py", line 254, in _run_main
	sys.exit(main(argv))
  File "/content/tensorflow-models/research/object_detection/model_main_tf2.py", line 105, in main
	model_lib_v2.train_loop(
  File "/usr/local/lib/python3.8/dist-packages/object_detection/model_lib_v2.py", line 605, in train_loop
	load_fine_tune_checkpoint(
  File "/usr/local/lib/python3.8/dist-packages/object_detection/model_lib_v2.py", line 407, in load_fine_tune_checkpoint
	ckpt.restore(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/checkpoint/checkpoint.py", line 852, in assert_existing_objects_matched
	raise AssertionError(
AssertionError: Found 265 Python objects that were not bound to checkpointed values, likely due to changes in the Python program. Showing 10 of 265 unmatched objects: [SyncOnReadVariable:{
  0: <tf.Variable 'conv2_block1_3_bn/moving_variance:0' shape=(256,) dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
	   1.], dtype=float32)>
}, MirroredVariable:{
  0: <tf.Variable 'conv4_block2_1_bn/gamma:0' shape=(256,) dtype=float32, numpy=
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
...

List of libraries and their versions:
hdfs-2.7.0; keras-2.11.0; pyyaml-5.4.1; tensorboard-2.11.2; tensorflow-2.11.0

I’m not sure what I’m missing here. Is it because I’m trying to load the weights from some incompatible model?

If required, how can I train it from scratch, i.e. without loading pre-trained model?

VidushiSIngh90 · March 26, 2024, 3:48pm

Try this solution:

Change pipeline config :

fine_tune_checkpoint_type: “classification” to “Detection”

Topic		Replies	Views
Tensorflow Training Error General Discussion tensorflow , object-detection	1	497	October 18, 2024
Object detecion: INVALID_ARGUMENT: required broadcastable shapes while training General Discussion help_request , model_garden , tpu	1	1773	June 12, 2024
The runtime issue on Google Colab when using the TensorFlow object detection model TensorFlow models	1	1726	June 12, 2023
Getting issue in updating code from Tensorflow V1 to V2 General Discussion help_request	1	2002	March 16, 2022
Tensorflow running error General Discussion help_request	14	5367	July 8, 2021

Error in checkpoint.py in Tensorflow2 on custom dataset using pre-built FRCNN model

Related topics