MobileNet SSD input image size

Mary_Joy · August 1, 2024, 11:56am

I am trying to implement an object detection algorithm using MobileNet SSD v1 fpn 640x640 using Tensorflow Object Detection API. But my input image is of size 1024x25, but it gave me errors regarding image size.

I would like to know what are the constraints related to input image size while using Mobilenet SSD and is there any alternate ways to implement the object detection algorithm on my images…

Kiran_Sai_Ramineni · August 2, 2024, 8:55am

Hi @Mary_Joy, In the config file could you please try by changing the height and width in the image_resizer part. Thank You.

Mary_Joy · August 6, 2024, 6:44am

Thank you for you reply.
I tried changing the dimensions to 1024x25 but got the error like ‘minimum height should be 33’.

Then I changed the dimensions to 1024x33 and got the following error.

ValueError: Dimensions must be equal, but are 4 and 3 for '{{node ssd_mobile_net_v2_fpn_keras_feature_extractor/FeatureMaps/top_down/add}} = AddV2[T=DT_FLOAT](ssd_mobile_net_v2_fpn_keras_feature_extractor/FeatureMaps/top_down/nearest_neighbor_upsampling/nearest_neighbor_upsampling/Reshape_1, ssd_mobile_net_v2_fpn_keras_feature_extractor/FeatureMaps/top_down/projection_2/BiasAdd)' with input shapes: [16,4,64,128], [16,3,64,128].
        
        
        Call arguments received:
          • image_features=[("'layer_7'", 'tf.Tensor(shape=(16, 5, 128, 32), dtype=float32)'), ("'layer_14'", 'tf.Tensor(shape=(16, 3, 64, 96), dtype=float32)'), ("'layer_19'", 'tf.Tensor(shape=(16, 2, 32, 1280), dtype=float32)')]
    
    
    Call arguments received:
      • inputs=tf.Tensor(shape=(16, 33, 1024, 3), dtype=float32)
      • kwargs={'training': 'False'}

Given below is my pipeline.config file

# SSD with Mobilenet v2 FPN-lite (go/fpn-lite) feature extractor, shared box
# predictor and focal loss (a mobile version of Retinanet).
# Retinanet: see Lin et al, https://arxiv.org/abs/1708.02002
# Trained on COCO, initialized from Imagenet classification checkpoint
# Train on TPU-8
#
# Achieves 22.2 mAP on COCO17 Val

model {
  ssd {
    inplace_batchnorm_update: true
    freeze_batchnorm: false
    num_classes: 1
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    encode_background_as_zeros: true
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: [1.0, 2.0, 0.5]
        scales_per_octave: 2
      }
    }
    image_resizer {
      fixed_shape_resizer {
        height: 33
        width: 1024
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        depth: 128
        class_prediction_bias_init: -4.6
        conv_hyperparams {
          activation: RELU_6,
          regularizer {
            l2_regularizer {
              weight: 0.00004
            }
          }
          initializer {
            random_normal_initializer {
              stddev: 0.01
              mean: 0.0
            }
          }
          batch_norm {
            scale: true,
            decay: 0.997,
            epsilon: 0.001,
          }
        }
        num_layers_before_predictor: 4
        share_prediction_tower: true
        use_depthwise: true
        kernel_size: 3
      }
    }
    feature_extractor {
      type: 'ssd_mobilenet_v2_fpn_keras'
      use_depthwise: true
      fpn {
        min_level: 3
        max_level: 7
        additional_layer_depth: 128
      }
      min_depth: 16
      depth_multiplier: 1.0
      conv_hyperparams {
        activation: RELU_6,
        regularizer {
          l2_regularizer {
            weight: 0.00004
          }
        }
        initializer {
          random_normal_initializer {
            stddev: 0.01
            mean: 0.0
          }
        }
        batch_norm {
          scale: true,
          decay: 0.997,
          epsilon: 0.001,
        }
      }
      override_base_feature_extractor_hyperparams: true
    }
    loss {
      classification_loss {
        weighted_sigmoid_focal {
          alpha: 0.25
          gamma: 2.0
        }
      }
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    normalize_loss_by_num_matches: true
    normalize_loc_loss_by_codesize: true
    post_processing {
      batch_non_max_suppression {
        score_threshold: 1e-8
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 100
      }
      score_converter: SIGMOID
    }
  }
}

train_config: {
  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "/content/models/mymodel/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  batch_size: 16
  sync_replicas: true
  startup_delay_steps: 0
  replicas_to_aggregate: 8
  num_steps: 100
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: .08
          total_steps: 50000
          warmup_learning_rate: .026666
          warmup_steps: 1000
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
}

train_input_reader: {
  label_map_path: "/content/labelmap.pbtxt"
  tf_record_input_reader {
    input_path: "/content/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}

eval_input_reader: {
  label_map_path: "/content/labelmap.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/content/val.tfrecord"
  }
}

Is it possible to use this image dimensions for training this model or do I have to make further changes in the config file?

Kiran_Sai_Ramineni · August 9, 2024, 10:59am

Hi @Mary_Joy, I think as the input is being passed to the hidden layers of the model the data shape might be reducing. Could please try with the images having some more height. Thank You.

Mary_Joy · August 10, 2024, 9:47am

@Kiran_Sai_Ramineni Thank you for your reply.
It is working for 1024x64.
But is it possible to use 1024x33 by modifying the layer architecture? What all changes should I need to make for this?

Kiran_Sai_Ramineni · August 19, 2024, 5:28am

Hi @Mary_Joy, Yes by changing the model architecture it is possible to pass the 1024x33.

you have to go through the model layer by layer and need to find at which layer the dimension is being reduced. Thank You.

Mary_Joy · August 21, 2024, 5:39am

Can you provide some insights on where this model architecture is defined in the tensorflow object detection API. Thank you

Kiran_Sai_Ramineni · September 19, 2024, 7:28am

Hi @Mary_Joy, you use the model.summary() to get the model architecture and input and output shape for the layers present in the model architecture. Thank You.

Topic		Replies	Views
Detection results are worse for smaller images than those specified in pipeline.config General Discussion model_garden , help_request	3	1348	September 2, 2021
Reshape error in ssd_mobilent_v3 object detection model in TensorflowLite Micro TensorFlow tflite	0	1302	September 7, 2021
Mobilenet v2 pretrained weights with lowered depth multipliers General Discussion model_garden , help_request	5	3419	March 14, 2022
Color format and bit depth of images for ssd_mobilenet v1 640X640 TensorFlow datasets , model_garden , help_request	3	1463	April 27, 2022
Minimum detection box size for Object Detection General Discussion models , datasets , object-detection	1	1032	December 18, 2023

MobileNet SSD input image size

Related topics