Let N_train
, N_val
and N_test
are the number of examples in training, validation, and test sets.
As I understood, in general these values are taken as N_train >> N_val ~= N_test
.
As I understood, the loss and metrics are evaluated (in an average sense) on the whole training set. In this context, how can we compare the performance on sets of different sizes?
Why isn’t it like model performance is evaluated on a subset of the training set whose size is comparable to that of the validation or test set?
One can argue that it might increase the computational cost, but we can at least draw (randomly) from the losses evaluated during the training step.
Please let me know if there are any reasons behind this approach!