Applying Dropout

When you add a dropout layer to a model (like below), does the dropout only apply to the preceding layer or does it apply to all the hidden layers?

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(64, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
It really depends by the arch.

E.g. This was the position some years ago for the convolutional layers:

More in general one interesting recent work co-authored by Google is autodropout but I donā€™t know why the code isnā€™t available:

Thanks. I guess my question is more specific to tf.keras.layers.Dropout().

If I want to use dropout regularization throughout my model, do I need to add a second Dropout layer after tf.keras.layers.Dense(128, activation=ā€˜reluā€™)?

Generally it is ok but this is also a quite small model.

When using the tf.keras.layers.Dropout layer, the Dropout operation is applied only to the preceding layer.


yes, if you want to apply the Dropout layer, it needs to do to each layer separately after it. In addition, the Dropout layer also can be used for the Input layer. Moreover, ā€œdrop_levelā€ is a hyperparameter in Dropout(drop_level).

Some examples:

Residual Dropout We apply dropout [27] to the output of each sub-layer, before it is added to the
sub-layer input and normalized. In addition, we apply dropout to the sums of the embeddings and the
positional encodings in both the encoder and decoder stacks.

Model: "model"
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 500, 1)]     0                                            
layer_normalization (LayerNorma (None, 500, 1)       2           input_1[0][0]                    
multi_head_attention (MultiHead (None, 500, 1)       7169        layer_normalization[0][0]        
dropout (Dropout)               (None, 500, 1)       0           multi_head_attention[0][0]       
tf.__operators__.add (TFOpLambd (None, 500, 1)       0           dropout[0][0]                    
layer_normalization_1 (LayerNor (None, 500, 1)       2           tf.__operators__.add[0][0]       
conv1d (Conv1D)                 (None, 500, 4)       8           layer_normalization_1[0][0]      
dropout_1 (Dropout)             (None, 500, 4)       0           conv1d[0][0]                     
conv1d_1 (Conv1D)               (None, 500, 1)       5           dropout_1[0][0]                  
tf.__operators__.add_1 (TFOpLam (None, 500, 1)       0           conv1d_1[0][0]                   
layer_normalization_2 (LayerNor (None, 500, 1)       2           tf.__operators__.add_1[0][0]     
multi_head_attention_1 (MultiHe (None, 500, 1)       7169        layer_normalization_2[0][0]      
dropout_2 (Dropout)             (None, 500, 1)       0           multi_head_attention_1[0][0]     
tf.__operators__.add_2 (TFOpLam (None, 500, 1)       0           dropout_2[0][0]                  
layer_normalization_3 (LayerNor (None, 500, 1)       2           tf.__operators__.add_2[0][0]     
conv1d_2 (Conv1D)               (None, 500, 4)       8           layer_normalization_3[0][0]      
dropout_3 (Dropout)             (None, 500, 4)       0           conv1d_2[0][0]                   
conv1d_3 (Conv1D)               (None, 500, 1)       5           dropout_3[0][0]                  
tf.__operators__.add_3 (TFOpLam (None, 500, 1)       0           conv1d_3[0][0]                   
layer_normalization_4 (LayerNor (None, 500, 1)       2           tf.__operators__.add_3[0][0]     
multi_head_attention_2 (MultiHe (None, 500, 1)       7169        layer_normalization_4[0][0]      
dropout_4 (Dropout)             (None, 500, 1)       0           multi_head_attention_2[0][0]     
tf.__operators__.add_4 (TFOpLam (None, 500, 1)       0           dropout_4[0][0]                  
layer_normalization_5 (LayerNor (None, 500, 1)       2           tf.__operators__.add_4[0][0]     
conv1d_4 (Conv1D)               (None, 500, 4)       8           layer_normalization_5[0][0]      
dropout_5 (Dropout)             (None, 500, 4)       0           conv1d_4[0][0]                   
conv1d_5 (Conv1D)               (None, 500, 1)       5           dropout_5[0][0]                  
tf.__operators__.add_5 (TFOpLam (None, 500, 1)       0           conv1d_5[0][0]                   
layer_normalization_6 (LayerNor (None, 500, 1)       2           tf.__operators__.add_5[0][0]     
multi_head_attention_3 (MultiHe (None, 500, 1)       7169        layer_normalization_6[0][0]      
dropout_6 (Dropout)             (None, 500, 1)       0           multi_head_attention_3[0][0]     
tf.__operators__.add_6 (TFOpLam (None, 500, 1)       0           dropout_6[0][0]                  
layer_normalization_7 (LayerNor (None, 500, 1)       2           tf.__operators__.add_6[0][0]     
conv1d_6 (Conv1D)               (None, 500, 4)       8           layer_normalization_7[0][0]      
dropout_7 (Dropout)             (None, 500, 4)       0           conv1d_6[0][0]                   
conv1d_7 (Conv1D)               (None, 500, 1)       5           dropout_7[0][0]                  
tf.__operators__.add_7 (TFOpLam (None, 500, 1)       0           conv1d_7[0][0]                   
global_average_pooling1d (Globa (None, 500)          0           tf.__operators__.add_7[0][0]     
dense (Dense)                   (None, 128)          64128       global_average_pooling1d[0][0]   
dropout_8 (Dropout)             (None, 128)          0           dense[0][0]                      
dense_1 (Dense)                 (None, 2)            258         dropout_8[0][0]                  

hey, thereā€™s my answer there :grinning_face_with_smiling_eyes: machine learning - Where Dropout should be inserted.? Fully Connected Layer.? Convolutional Layer.? or Both.? - Stack Overflow


Dropout is generally used after a Dense or Convolutional layer. It affects the hidden neurons passed to the following layer.

A Convolutional layer outputs a set of feature maps. In 2D image-processing, these feature maps are gray-scale images that correspond to common shapes in the image set: cat eyes v.s. cat noses, for example. Applying Dropout after Convolutional layers does not do what you would expect, because the values in feature maps are strongly correlated: it is like putting a slice of Swiss cheese over a picture- you can still see the picture through the holes!

Convolutional layers, Dropout and BatchNormalization interact in complex ways.