I have the following plot relating to learning rate finder results (following the principles of Smith (2015)), made according to this code example and found the loss drop section to be very narrow, what does that mean? The optimal learning rate found is 1.5655376e-05 and the batch size wrt code that was executed is 512.
Now I made the same plot using Plotly to see it better, as follows, I zoomed in on the section of interest, and I verified that the optimal learning rate is not even in the section with an abrupt drop, it is located much earlier (trying to understand why the optimal learning rate chosen is far from the abrupt drop).