Feature selection and tree size in TF-DF

Gaurav_Chakravorty · August 1, 2022, 1:46am

Question on feature selection:

I was looking at figure 6 of

It mentioned feature subset selection at say 95% of cumulative loss reduction.

This was a common question from leads in launch reviews using Ranklab models as well. Like … “Can a much smaller number of features get almost as much accuracy as the full model?”

Curious to hear your thoughts on this and the support TFDF might have for this?

Or if you feel there is an established approach in the industry for this that we can build on top of TFDF that would be useful as well.

There is another dimension to this question, about the choice of number of trees in the model. For instance, section 7 of the aforementioned paper claims:

We have presented a tradeoff between the number of boosted decision trees and accuracy. It is advantageous to keep the number of trees small to keep computation and memory contained.

Curious to hear your thoughts about this as well.

Jetti_Bharat · October 3, 2024, 5:41pm

Hello @Gaurav_Chakravorty

By using the tutorials form TensorFlow, we can understand and model the Feature subset selection by using common metrics like mean decrease impurity, by calculating Feature importance. TFDF also offers inbuilt methods for the Feature subset selection, which contribute to significantly to accuracy.

In the case of no. of Trees in the model, It is a trade-off, If no. of trees are more, model can learn complex relationships but the computational power required is more, If trees are less in number, there is a scope of overfitting. So the tuning based on optimal no.of trees and accuracy is required.

In the current scenario SHAPley values can be used to select important features from dataset, there are other methods as well like TCAV algorithm and etc.

Topic		Replies	Views
Balancing Dominant Feature Importances General Discussion models , tfdf , help_request	1	1144	September 13, 2024
Are Tensorflow Decision Forest RandomForestModel attributes num_candidate_attributes and min_examples synonymous with sklearn's RandomForestRegressor's max_features and min_samples_split? General Discussion tfdf , help_request	1	276	February 16, 2024
TFDF customise training logs PR-AUC General Discussion tfdf , help_request	4	1234	November 2, 2021
Access to variable importances other than NUM_AS_ROOT General Discussion datasets , decision_forests , tfdf , random_forests , help_request	3	1717	August 31, 2021
Sklearn random forests results seem to be different to tfdf's General Discussion datasets , tfdf , help_request	4	2215	June 30, 2021

Feature selection and tree size in TF-DF

Related topics