Balancing Dominant Feature Importances

shayan_sadeghieh · May 12, 2022, 1:12pm

Hi there,

Is there an equivalent to xgboost’s colsample_by* parameter? The idea behind xgboost’s colsample_by* parameter is to specify the fraction of feature columns to be subsampled at the tree, level and node.

I find my tfdf gradient boosted tree models become obsessed with certain features, and was wondering if there was a way to balance out the importances? Although the performance is good on test data, I am attempting to reduce the risk of one of the features going wrong in production and severely impacting my predictions.

Below is the way I am currently calculating importances. Perhaps I am doing something wrong here:

for feature, imp_score in model.make_inspector().variable_importances()["SUM_SCORE"]:
            feature_importances[feature[0]] = imp_score

Thank you!

LK_Kadali · September 13, 2024, 5:27am

Hi @shayan_sadeghieh

The Xgboost’s colsample_by* parameter is not available with tfdf GBT model. However, the parameter SubSample in the API tfdf.keras.GradientBoostedTreesModel create a column-wise subsampling similar to colsample_by* parameter. Setting smaller value for subsample can reduce correlation similarity between trees so that you can balance the feature importance. Alternatively, the feature importance can be implemented efficiently by preprocessing the data using correlation, mutual information methods.

Thank You

Topic		Replies	Views
Feature selection and tree size in TF-DF General Discussion models , tfdf , help_request	1	756	October 3, 2024
Access to variable importances other than NUM_AS_ROOT General Discussion datasets , decision_forests , tfdf , random_forests , help_request	3	1717	August 31, 2021
Feature importance and classes importance General Discussion model-training , tfmodel	1	72	July 1, 2024
TFDF speed up training time General Discussion datasets , decision_forests , tfdf , help_request	4	3611	February 26, 2024
Specifying feature_columns in tf.keras.experimental.WideDeepModel TensorFlow models , keras , experimental	1	445	May 17, 2024

Balancing Dominant Feature Importances

Related topics