Hi Folks.
I have some queries specific to the TF Object Detection API.
Basically, I am trying to fine-tune a SSDMobilenetV2 from the TF OD API model zoo to detect traffic signs, so first I fine-tuned on GTSDB (around 500 samples) and the resulting model wasnt very good.
Then, I tried to train on an augmented version(using rotation, translation, shearing etc.) of GTSDB (4500 samples). During training this time after about 9k iterations, the validation loss starts increasing and never comes back down.
I am assuming that implies overfitting, which according to me could be due to:
- train and eval data being very different – i checked and this isnt the case
- learning rate too high – i reduced it by a factor of 10 and still overfitting occurs
- model might be too complex – the same model was getting properly trained on un-augmented GTSDB with only 500 train samples, so this model shouldnt be too complex for the augmented GTSDB which has around 4500 samples
- Augmented dataset might not have been properly created – I converted all the annotated images into a video and checked that manually, the dataset seems fine
I am trying to think of other reasons and would appreciate any help in that regard.
Note: I used imgaug library for data augmentation.
For reference, I have attached my loss curves and config file
Config file:
model {
ssd {
inplace_batchnorm_update: true
freeze_batchnorm: false
num_classes: 9
box_coder {
faster_rcnn_box_coder {
y_scale: 10.0
x_scale: 10.0
height_scale: 5.0
width_scale: 5.0
matcher {
argmax_matcher {
matched_threshold: 0.5
unmatched_threshold: 0.5
ignore_thresholds: false
negatives_lower_than_unmatched: true
force_match_for_each_row: true
use_matmul_gather: true
similarity_calculator {
iou_similarity {
encode_background_as_zeros: true
anchor_generator {
ssd_anchor_generator {
num_layers: 6
min_scale: 0.2
max_scale: 0.95
aspect_ratios: 1.0
aspect_ratios: 2.0
aspect_ratios: 0.5
aspect_ratios: 3.0
aspect_ratios: 0.3333
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
box_predictor {
convolutional_box_predictor {
min_depth: 0
max_depth: 0
num_layers_before_predictor: 0
use_dropout: false
dropout_keep_probability: 0.8
kernel_size: 1
box_code_size: 4
apply_sigmoid_to_scores: false
class_prediction_bias_init: -4.6
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
random_normal_initializer {
stddev: 0.01
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
feature_extractor {
type: 'ssd_mobilenet_v2_keras'
min_depth: 16
depth_multiplier: 1.0
conv_hyperparams {
activation: RELU_6,
regularizer {
l2_regularizer {
weight: 0.00004
initializer {
truncated_normal_initializer {
stddev: 0.03
mean: 0.0
batch_norm {
train: true,
scale: true,
center: true,
decay: 0.97,
epsilon: 0.001,
override_base_feature_extractor_hyperparams: true
loss {
classification_loss {
weighted_sigmoid_focal {
alpha: 0.75,
gamma: 2.0
localization_loss {
weighted_smooth_l1 {
delta: 1.0
classification_weight: 1.0
localization_weight: 1.0
normalize_loss_by_num_matches: true
normalize_loc_loss_by_codesize: true
post_processing {
batch_non_max_suppression {
score_threshold: 1e-8
iou_threshold: 0.6
max_detections_per_class: 100
max_total_detections: 100
score_converter: SIGMOID
train_config: {
fine_tune_checkpoint_version: V2
fine_tune_checkpoint: "./sprint1_ssd_mobilenetv2_try2/pretrained_model/mobilnetv2/checkpoint/ckpt-0"
fine_tune_checkpoint_type: "detection"
batch_size: 24
sync_replicas: true
startup_delay_steps: 0
replicas_to_aggregate: 8
num_steps: 50000
data_augmentation_options {
random_horizontal_flip {
optimizer {
momentum_optimizer: {
learning_rate: {
cosine_decay_learning_rate {
learning_rate_base: .0008
total_steps: 50000
warmup_learning_rate: 0.00013333
warmup_steps: 1000
momentum_optimizer_value: 0.9
use_moving_average: false
max_number_of_boxes: 100
unpad_groundtruth_tensors: false
train_input_reader: {
label_map_path: "./sprint1_ssd_mobilenetv2_try2/label_map.pbtxt"
tf_record_input_reader {
input_path: "./sprint1_ssd_mobilenetv2_try2/gtsdb_stop_train_9.record"
eval_config: {
metrics_set: "coco_detection_metrics"
use_moving_averages: false
batch_size: 1
eval_input_reader: {
label_map_path: "./sprint1_ssd_mobilenetv2_try2/label_map.pbtxt"
shuffle: false
num_epochs: 1
tf_record_input_reader {
input_path: "./sprint1_ssd_mobilenetv2_try2/gtsdb_stop_val_9.record"