Applying Dropout

When you add a dropout layer to a model (like below), does the dropout only apply to the preceding layer or does it apply to all the hidden layers?

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation='softmax')
])
1 Like

It really depends by the arch.

E.g. This was the position some years ago for the convolutional layers:
http://mipal.snu.ac.kr/images/1/16/Dropout_ACCV2016.pdf

More in general one interesting recent work co-authored by Google is autodropout but I donā€™t know why the code isnā€™t available:

1 Like

Thanks. I guess my question is more specific to tf.keras.layers.Dropout().

If I want to use dropout regularization throughout my model, do I need to add a second Dropout layer after tf.keras.layers.Dense(128, activation=ā€˜reluā€™)?

Generally it is ok but this is also a quite small model.

When using the tf.keras.layers.Dropout layer, the Dropout operation is applied only to the preceding layer.

3 Likes

yes, if you want to apply the Dropout layer, it needs to do to each layer separately after it. In addition, the Dropout layer also can be used for the Input layer. Moreover, ā€œdrop_levelā€ is a hyperparameter in Dropout(drop_level).

1 Like

Some examples:

Residual Dropout We apply dropout [27] to the output of each sub-layer, before it is added to the
sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the
positional encodings in both the encoder and decoder stacks.

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 500, 1)]     0                                            
__________________________________________________________________________________________________
layer_normalization (LayerNorma (None, 500, 1)       2           input_1[0][0]                    
__________________________________________________________________________________________________
multi_head_attention (MultiHead (None, 500, 1)       7169        layer_normalization[0][0]        
                                                                 layer_normalization[0][0]        
__________________________________________________________________________________________________
dropout (Dropout)               (None, 500, 1)       0           multi_head_attention[0][0]       
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 500, 1)       0           dropout[0][0]                    
                                                                 input_1[0][0]                    
__________________________________________________________________________________________________
layer_normalization_1 (LayerNor (None, 500, 1)       2           tf.__operators__.add[0][0]       
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 500, 4)       8           layer_normalization_1[0][0]      
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 500, 4)       0           conv1d[0][0]                     
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 500, 1)       5           dropout_1[0][0]                  
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 500, 1)       0           conv1d_1[0][0]                   
                                                                 tf.__operators__.add[0][0]       
__________________________________________________________________________________________________
layer_normalization_2 (LayerNor (None, 500, 1)       2           tf.__operators__.add_1[0][0]     
__________________________________________________________________________________________________
multi_head_attention_1 (MultiHe (None, 500, 1)       7169        layer_normalization_2[0][0]      
                                                                 layer_normalization_2[0][0]      
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 500, 1)       0           multi_head_attention_1[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_2 (TFOpLam (None, 500, 1)       0           dropout_2[0][0]                  
                                                                 tf.__operators__.add_1[0][0]     
__________________________________________________________________________________________________
layer_normalization_3 (LayerNor (None, 500, 1)       2           tf.__operators__.add_2[0][0]     
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 500, 4)       8           layer_normalization_3[0][0]      
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 500, 4)       0           conv1d_2[0][0]                   
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 500, 1)       5           dropout_3[0][0]                  
__________________________________________________________________________________________________
tf.__operators__.add_3 (TFOpLam (None, 500, 1)       0           conv1d_3[0][0]                   
                                                                 tf.__operators__.add_2[0][0]     
__________________________________________________________________________________________________
layer_normalization_4 (LayerNor (None, 500, 1)       2           tf.__operators__.add_3[0][0]     
__________________________________________________________________________________________________
multi_head_attention_2 (MultiHe (None, 500, 1)       7169        layer_normalization_4[0][0]      
                                                                 layer_normalization_4[0][0]      
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 500, 1)       0           multi_head_attention_2[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_4 (TFOpLam (None, 500, 1)       0           dropout_4[0][0]                  
                                                                 tf.__operators__.add_3[0][0]     
__________________________________________________________________________________________________
layer_normalization_5 (LayerNor (None, 500, 1)       2           tf.__operators__.add_4[0][0]     
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 500, 4)       8           layer_normalization_5[0][0]      
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 500, 4)       0           conv1d_4[0][0]                   
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 500, 1)       5           dropout_5[0][0]                  
__________________________________________________________________________________________________
tf.__operators__.add_5 (TFOpLam (None, 500, 1)       0           conv1d_5[0][0]                   
                                                                 tf.__operators__.add_4[0][0]     
__________________________________________________________________________________________________
layer_normalization_6 (LayerNor (None, 500, 1)       2           tf.__operators__.add_5[0][0]     
__________________________________________________________________________________________________
multi_head_attention_3 (MultiHe (None, 500, 1)       7169        layer_normalization_6[0][0]      
                                                                 layer_normalization_6[0][0]      
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 500, 1)       0           multi_head_attention_3[0][0]     
__________________________________________________________________________________________________
tf.__operators__.add_6 (TFOpLam (None, 500, 1)       0           dropout_6[0][0]                  
                                                                 tf.__operators__.add_5[0][0]     
__________________________________________________________________________________________________
layer_normalization_7 (LayerNor (None, 500, 1)       2           tf.__operators__.add_6[0][0]     
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 500, 4)       8           layer_normalization_7[0][0]      
__________________________________________________________________________________________________
dropout_7 (Dropout)             (None, 500, 4)       0           conv1d_6[0][0]                   
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 500, 1)       5           dropout_7[0][0]                  
__________________________________________________________________________________________________
tf.__operators__.add_7 (TFOpLam (None, 500, 1)       0           conv1d_7[0][0]                   
                                                                 tf.__operators__.add_6[0][0]     
__________________________________________________________________________________________________
global_average_pooling1d (Globa (None, 500)          0           tf.__operators__.add_7[0][0]     
__________________________________________________________________________________________________
dense (Dense)                   (None, 128)          64128       global_average_pooling1d[0][0]   
__________________________________________________________________________________________________
dropout_8 (Dropout)             (None, 128)          0           dense[0][0]                      
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 2)            258         dropout_8[0][0]                  
==================================================================================================

hey, thereā€™s my answer there :grinning_face_with_smiling_eyes: machine learning - Where Dropout should be inserted.? Fully Connected Layer.? Convolutional Layer.? or Both.? - Stack Overflow

2 Likes

Dropout is generally used after a Dense or Convolutional layer. It affects the hidden neurons passed to the following layer.

A Convolutional layer outputs a set of feature maps. In 2D image-processing, these feature maps are gray-scale images that correspond to common shapes in the image set: cat eyes v.s. cat noses, for example. Applying Dropout after Convolutional layers does not do what you would expect, because the values in feature maps are strongly correlated: it is like putting a slice of Swiss cheese over a picture- you can still see the picture through the holes!

Convolutional layers, Dropout and BatchNormalization interact in complex ways. The best discussion that I have found on this topic is right here:
https://stackoverflow.com/questions/59634780/correct-order-for-spatialdropout2d-batchnormalization-and-activation-function