I’m employing an H2O gradient boosted Trees model to predict a binary dummy target variable.
My independent variables are binary dummies as well.
I want to use the variable importance output as a tool for the model interpretation.
- Is that a valid approach?
- Can I distinguish if the independent variable is seen as important at a 1 or 0 level?
Thank you so much in advance!
The variable importance of these nodes is based on whether the feature was selected for a split and the decrease in overall error due to this split. The higher the value, the more important the variable. For further reference please revisit the H2O documentation: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/variable-importance.html#feature-importance-aka-variable-importance-plots
Thank you so much Marten.
I solved my problem by switching to the non H20 Gradient boosted tree node, implementing a permutation feature importance to extract the most important variables, and used the partial dependency plot to identify the direction of the relationship.