Hi,
I’m trying to follow this colab notebook with different model. Google Colab
However, I faced with some errors.
I’m trying to use cascadercnn_spinenet_coco as a model with following code:
exp_config = exp_factory.get_exp_config(‘cascadercnn_spinenet_coco’)
My model change the model configuration with following lines:
batch_size = 16
num_classes = 1
HEIGHT, WIDTH = 512, 512
IMG_SIZE = [HEIGHT, WIDTH, 3]
Backbone config.
exp_config.runtime.num_gpus=1
exp_config.task.freeze_backbone = True
exp_config.task.annotation_file = ‘’
Model config.
exp_config.task.model.input_size = IMG_SIZE
exp_config.task.model.num_classes = num_classes + 1
exp_config.task.model.detection_generator.max_classes_per_detection = exp_config.task.model.num_classes
Training data config.
exp_config.task.train_data.input_path = train_data_input_path
exp_config.task.train_data.dtype = ‘float32’
exp_config.task.train_data.global_batch_size = batch_size
exp_config.task.train_data.parser.aug_scale_max = 1.0
exp_config.task.train_data.parser.aug_scale_min = 1.0
Validation data config.
exp_config.task.validation_data.input_path = valid_data_input_path
exp_config.task.validation_data.dtype = ‘float32’
exp_config.task.validation_data.global_batch_size = batch_size
train_steps = 50000
exp_config.trainer.steps_per_loop = 100 # steps_per_loop = num_of_training_examples // train_batch_size
exp_config.runtime.num_gpus = 1
exp_config.trainer.summary_interval = 100
exp_config.trainer.checkpoint_interval = 100
exp_config.trainer.validation_interval = 100
exp_config.trainer.validation_steps = 100 # validation_steps = num_of_validation_examples // eval_batch_size
exp_config.trainer.train_steps = train_steps
exp_config.trainer.optimizer_config.warmup.linear.warmup_steps = 100
exp_config.trainer.optimizer_config.learning_rate.type = ‘cosine’
exp_config.trainer.optimizer_config.learning_rate.cosine.decay_steps = train_steps
exp_config.trainer.optimizer_config.learning_rate.cosine.initial_learning_rate = 0.1
exp_config.trainer.optimizer_config.warmup.linear.warmup_learning_rate = 0.05
Configured Model:
{ ‘runtime’: { ‘all_reduce_alg’: None,
‘batchnorm_spatial_persistent’: False,
‘dataset_num_private_threads’: None,
‘default_shard_dim’: -1,
‘distribution_strategy’: ‘mirrored’,
‘enable_xla’: False,
‘gpu_thread_mode’: None,
‘loss_scale’: None,
‘mixed_precision_dtype’: ‘bfloat16’,
‘num_cores_per_replica’: 1,
‘num_gpus’: 1,
‘num_packs’: 1,
‘per_gpu_thread_count’: 0,
‘run_eagerly’: False,
‘task_index’: -1,
‘tpu’: None,
‘tpu_enable_xla_dynamic_padder’: None,
‘use_tpu_mp_strategy’: False,
‘worker_hosts’: None},
‘task’: { ‘allow_image_summary’: False,
‘allowed_mask_class_ids’: None,
‘annotation_file’: ‘’,
‘differential_privacy_config’: None,
‘freeze_backbone’: True,
‘init_checkpoint’: None,
‘init_checkpoint_modules’: ‘all’,
‘losses’: { ‘class_weights’: None,
‘frcnn_box_weight’: 1.0,
‘frcnn_class_loss_top_k_percent’: 1.0,
‘frcnn_class_use_binary_cross_entropy’: False,
‘frcnn_class_weight’: 1.0,
‘frcnn_huber_loss_delta’: 1.0,
‘l2_weight_decay’: 4e-05,
‘loss_weight’: 1.0,
‘mask_weight’: 1.0,
‘rpn_box_weight’: 1.0,
‘rpn_huber_loss_delta’: 0.1111111111111111,
‘rpn_score_weight’: 1.0},
‘model’: { ‘anchor’: { ‘anchor_size’: 3,
‘aspect_ratios’: [0.5, 1.0, 2.0],
‘num_scales’: 1},
‘backbone’: { ‘spinenet’: { ‘max_level’: 7,
‘min_level’: 3,
‘model_id’: ‘49’,
‘stochastic_depth_drop_rate’: 0.0},
‘type’: ‘spinenet’},
‘decoder’: {‘identity’: {}, ‘type’: ‘identity’},
‘detection_generator’: { ‘apply_nms’: True,
‘max_classes_per_detection’: 2,
‘max_num_detections’: 100,
‘nms_iou_threshold’: 0.5,
‘nms_version’: ‘v2’,
‘pre_nms_score_threshold’: 0.05,
‘pre_nms_top_k’: 5000,
‘soft_nms_sigma’: None,
‘use_cpu_nms’: False,
‘use_sigmoid_probability’: False},
‘detection_head’: { ‘cascade_class_ensemble’: True,
‘class_agnostic_bbox_pred’: True,
‘fc_dims’: 1024,
‘num_convs’: 4,
‘num_fcs’: 1,
‘num_filters’: 256,
‘use_separable_conv’: False},
‘include_mask’: True,
‘input_size’: [512, 512, 3],
‘mask_head’: { ‘class_agnostic’: False,
‘num_convs’: 4,
‘num_filters’: 256,
‘upsample_factor’: 2,
‘use_separable_conv’: False},
‘mask_roi_aligner’: { ‘crop_size’: 14,
‘sample_offset’: 0.5},
‘mask_sampler’: {‘num_sampled_masks’: 128},
‘max_level’: 7,
‘min_level’: 3,
‘norm_activation’: { ‘activation’: ‘swish’,
‘norm_epsilon’: 0.001,
‘norm_momentum’: 0.99,
‘use_sync_bn’: True},
‘num_classes’: 2,
‘outer_boxes_scale’: 1.0,
‘roi_aligner’: { ‘crop_size’: 7,
‘sample_offset’: 0.5},
‘roi_generator’: { ‘nms_iou_threshold’: 0.7,
‘num_proposals’: 1000,
‘pre_nms_min_size_threshold’: 0.0,
‘pre_nms_score_threshold’: 0.0,
‘pre_nms_top_k’: 2000,
‘test_nms_iou_threshold’: 0.7,
‘test_num_proposals’: 1000,
‘test_pre_nms_min_size_threshold’: 0.0,
‘test_pre_nms_score_threshold’: 0.0,
‘test_pre_nms_top_k’: 1000,
‘use_batched_nms’: False},
‘roi_sampler’: { ‘background_iou_high_threshold’: 0.5,
‘background_iou_low_threshold’: 0.0,
‘cascade_iou_thresholds’: [ 0.6,
0.7],
‘foreground_fraction’: 0.25,
‘foreground_iou_threshold’: 0.5,
‘mix_gt_boxes’: True,
‘num_sampled_rois’: 512},
‘rpn_head’: { ‘num_convs’: 1,
‘num_filters’: 256,
‘use_separable_conv’: False}},
‘name’: None,
‘per_category_metrics’: False,
‘train_data’: { ‘apply_tf_data_service_before_batching’: False,
‘autotune_algorithm’: None,
‘block_length’: 1,
‘cache’: False,
‘cycle_length’: None,
‘decoder’: { ‘simple_decoder’: { ‘attribute_names’: [ ],
‘mask_binarize_threshold’: None,
‘regenerate_source_id’: False},
‘type’: ‘simple_decoder’},
‘deterministic’: None,
‘drop_remainder’: True,
‘dtype’: ‘float32’,
‘enable_shared_tf_data_service_between_parallel_trainers’: False,
‘enable_tf_data_service’: False,
‘file_type’: ‘tfrecord’,
‘global_batch_size’: 16,
‘input_path’: ‘./pothole_coco_tfrecords/train-00000-of-00001.tfrecord’,
‘is_training’: True,
‘num_examples’: -1,
‘parser’: { ‘aug_rand_hflip’: True,
‘aug_rand_vflip’: False,
‘aug_scale_max’: 1.0,
‘aug_scale_min’: 1.0,
‘aug_type’: None,
‘mask_crop_size’: 112,
‘match_threshold’: 0.5,
‘max_num_instances’: 100,
‘num_channels’: 3,
‘pad’: True,
‘rpn_batch_size_per_im’: 256,
‘rpn_fg_fraction’: 0.5,
‘rpn_match_threshold’: 0.7,
‘rpn_unmatched_threshold’: 0.3,
‘skip_crowd_during_training’: True,
‘unmatched_threshold’: 0.5},
‘prefetch_buffer_size’: None,
‘seed’: None,
‘sharding’: True,
‘shuffle_buffer_size’: 10000,
‘tf_data_service_address’: None,
‘tf_data_service_job_name’: None,
‘tfds_as_supervised’: False,
‘tfds_data_dir’: ‘’,
‘tfds_name’: ‘’,
‘tfds_skip_decoding_feature’: ‘’,
‘tfds_split’: ‘’,
‘trainer_id’: None,
‘weights’: None},
‘use_approx_instance_metrics’: False,
‘use_coco_metrics’: True,
‘use_wod_metrics’: False,
‘validation_data’: { ‘apply_tf_data_service_before_batching’: False,
‘autotune_algorithm’: None,
‘block_length’: 1,
‘cache’: False,
‘cycle_length’: None,
‘decoder’: { ‘simple_decoder’: { ‘attribute_names’: [ ],
‘mask_binarize_threshold’: None,
‘regenerate_source_id’: False},
‘type’: ‘simple_decoder’},
‘deterministic’: None,
‘drop_remainder’: False,
‘dtype’: ‘float32’,
‘enable_shared_tf_data_service_between_parallel_trainers’: False,
‘enable_tf_data_service’: False,
‘file_type’: ‘tfrecord’,
‘global_batch_size’: 16,
‘input_path’: ‘./pothole_coco_tfrecords/valid-00000-of-00001.tfrecord’,
‘is_training’: False,
‘num_examples’: -1,
‘parser’: { ‘aug_rand_hflip’: False,
‘aug_rand_vflip’: False,
‘aug_scale_max’: 1.0,
‘aug_scale_min’: 1.0,
‘aug_type’: None,
‘mask_crop_size’: 112,
‘match_threshold’: 0.5,
‘max_num_instances’: 100,
‘num_channels’: 3,
‘pad’: True,
‘rpn_batch_size_per_im’: 256,
‘rpn_fg_fraction’: 0.5,
‘rpn_match_threshold’: 0.7,
‘rpn_unmatched_threshold’: 0.3,
‘skip_crowd_during_training’: True,
‘unmatched_threshold’: 0.5},
‘prefetch_buffer_size’: None,
‘seed’: None,
‘sharding’: True,
‘shuffle_buffer_size’: 10000,
‘tf_data_service_address’: None,
‘tf_data_service_job_name’: None,
‘tfds_as_supervised’: False,
‘tfds_data_dir’: ‘’,
‘tfds_name’: ‘’,
‘tfds_skip_decoding_feature’: ‘’,
‘tfds_split’: ‘’,
‘trainer_id’: None,
‘weights’: None}},
‘trainer’: { ‘allow_tpu_summary’: False,
‘best_checkpoint_eval_metric’: ‘’,
‘best_checkpoint_export_subdir’: ‘’,
‘best_checkpoint_metric_comp’: ‘higher’,
‘checkpoint_interval’: 100,
‘continuous_eval_timeout’: 3600,
‘eval_tf_function’: True,
‘eval_tf_while_loop’: False,
‘loss_upper_bound’: 1000000.0,
‘max_to_keep’: 5,
‘optimizer_config’: { ‘ema’: None,
‘learning_rate’: { ‘cosine’: { ‘alpha’: 0.0,
‘decay_steps’: 50000,
‘initial_learning_rate’: 0.1,
‘name’: ‘CosineDecay’,
‘offset’: 0},
‘type’: ‘cosine’},
‘optimizer’: { ‘sgd’: { ‘clipnorm’: None,
‘clipvalue’: None,
‘decay’: 0.0,
‘global_clipnorm’: None,
‘momentum’: 0.9,
‘name’: ‘SGD’,
‘nesterov’: False},
‘type’: ‘sgd’},
‘warmup’: { ‘linear’: { ‘name’: ‘linear’,
‘warmup_learning_rate’: 0.05,
‘warmup_steps’: 100},
‘type’: ‘linear’}},
‘preemption_on_demand_checkpoint’: True,
‘recovery_begin_steps’: 0,
‘recovery_max_trials’: 0,
‘steps_per_loop’: 100,
‘summary_interval’: 100,
‘train_steps’: 50000,
‘train_tf_function’: True,
‘train_tf_while_loop’: True,
‘validation_interval’: 100,
‘validation_steps’: 100,
‘validation_summary_subdir’: ‘validation’}}
Faced error:
When ı run the following code:
for images, labels in task.build_inputs(exp_config.task.train_data).take(1):
print()
print(f’images.shape: {str(images.shape):16} images.dtype: {images.dtype!r}‘)
print(f’labels.keys: {labels.keys()}’)
output error :
InvalidArgumentError Traceback (most recent call last)
in <cell line: 1>()
----> 1 for images, labels in task.build_inputs(exp_config.task.train_data).take(1):
2 print()
3 print(f’images.shape: {str(images.shape):16} images.dtype: {images.dtype!r}‘)
4 print(f’labels.keys: {labels.keys()}’)
3 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
5881 def raise_from_not_ok_status(e, name) → NoReturn:
5882 e.message += (" name: " + str(name if name is not None else “”))
→ 5883 raise core._status_to_exception(e) from None # pylint: disable=protected-access
5884
5885
InvalidArgumentError: {{function_node _wrapped__IteratorGetNext_output_types_21_device/job:localhost/replica:0/task:0/device:CPU:0}} indices[0] = 0 is not in [0, 0)
[[{{node GatherV2_2}}]] [Op:IteratorGetNext] name:
When I run the training with following code:
model, eval_logs = tfm.core.train_lib.run_experiment(
distribution_strategy=distribution_strategy,
task=task,
mode=‘train_and_eval’,
params=exp_config,
model_dir=model_dir,
run_post_eval=True)
Output error:
WARNING:absl:SpineNet output level 2 out of range [min_level, max_level] = [3, 7] will not be used for further processing.
InvalidArgumentError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model, eval_logs = tfm.core.train_lib.run_experiment(
2 distribution_strategy=distribution_strategy,
3 task=task,
4 mode=‘train_and_eval’,
5 params=exp_config,
17 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
5881 def raise_from_not_ok_status(e, name) → NoReturn:
5882 e.message += (" name: " + str(name if name is not None else “”))
→ 5883 raise core._status_to_exception(e) from None # pylint: disable=protected-access
5884
5885
InvalidArgumentError: {{function_node _wrapped__StridedSlice_device/job:localhost/replica:0/task:0/device:CPU:0}} slice index 0 of dimension 1 out of bounds. [Op:StridedSlice] name: strided_slice/
In call to configurable ‘Trainer’ (<class ‘official.core.base_trainer.Trainer’>)
In call to configurable ‘create_trainer’ (<function create_trainer at 0x7e67d0270f70>)
I think two errors have same problem which is related with batch size but I did not figure it out how to solve. Thank you for your interest.