Hi there,
Is there an equivalent to xgboost’s colsample_by*
parameter? The idea behind xgboost’s colsample_by*
parameter is to specify the fraction of feature columns to be subsampled at the tree, level and node.
I find my tfdf gradient boosted tree models become obsessed with certain features, and was wondering if there was a way to balance out the importances? Although the performance is good on test data, I am attempting to reduce the risk of one of the features going wrong in production and severely impacting my predictions.
Below is the way I am currently calculating importances. Perhaps I am doing something wrong here:
for feature, imp_score in model.make_inspector().variable_importances()["SUM_SCORE"]:
feature_importances[feature[0]] = imp_score
Thank you!