I looked into TFDF it is really interesting and also when I looked into the examples specifically with regression type. I noticed {'loss': 0.0, 'mse': 4.355661392211914}
I’m wondering if loss is 0, ‘mse’ should also be zero right and it also says that model memorize the training data.
I’m confused here, would you be able to explain why loss is 0.0 ad mse is 4.355661392211914
A loss of 0.0 in TFDF with a non-zero MSE usually indicates overfitting. The model fit the training data perfectly (loss of 0) but isn’t generalizing well (high MSE). Try regularizing your model or using a validation set to catch this early.
# Evaluate the model on the test dataset.
model_7.compile(metrics=["mse"])
evaluation = model_7.evaluate(test_ds, return_dict=True)
print(evaluation)
print()
print(f"MSE: {evaluation['mse']}")
print(f"RMSE: {math.sqrt(evaluation['mse'])}")
should I ignore that value?
Then how do i get the real train loss and validation loss value to understand model performance?
insp = model.make_inspector()
print(insp.evaluation().loss)
# The full sequence of validation losses is available in the training logs:
print([log.evaluation.loss for log in insp.training_logs()])
For the training logs, the procedure is a bit different:
raw_logs = insp.specialized_header().training_logs.entries
# Prints a list of all training losses
print([log.training_loss for log in raw_logs])
Note that only GradientBoostedTrees models have validation losses and training losses (since only they use a validation dataset). For RandomForests, these losses are either unavailable or computed on the out-of-bag dataset.