I have read in a few place that it is important to have equal sized sample sets. For example, if you were training an image recognition network to differentiate between cats and dogs, what would the effect of having 1,000 cat samples and 1,000 dog samples?
Is there (an at least approximate) measure of how unbalanced sample sets effect a model?
If one sample set had more variation (lets say there were 10x more species of dogs than of cats) would you want to have more dog samples or would it be better to keep the same?