I was looking at the documentation of optimizers for TPUs. (SGD optimizer
One of the parameters is weight_decay_factor. According to the docs it is
amount of weight decay to apply; None means that the weights are not decayed. Weights are decayed by multiplying the weight by this factor each step.
So if the value of the factor is 0.3, are the weights (w) updated as follows?
w = (1-0.3)*w
I want to understand how to set the value for this parameter. What are some standard ranges?
Thank you!