Variable Importance Clustering

Hi everyone, I wonder if you might consider add a node to calculate the variable (predictor) importance in clustering, which can be a valuable input to explain cluster results. Such test is documented in an algorithm guide of a commercial software (link 50 MB).

The definition of predictor importance according to the documentation is:

Predictor importance indicates how well the variable can differentiate different clusters. For both range (numeric) and discrete variables, the higher the importance measure, the less likely the variation for a variable between clusters is due to chance and more likely due to some underlying difference.”

And here is an excerpt of the documentation.

Best Regards

Mau

2 Likes

Hi @mauuuuu5 -

Looks like this is an application of some pretty standard chi-square tests to come up with the relevant importance metrics. We haven’t had a lot of requests for calculation of feature importance with respect to clustering, but this one seems simple enough. I’ll add it to the list and ask some folks internally about it.

I know we implemented the Silhouette Coefficient node based on community feedback, so we’ll see what happens. Thanks for the link!

1 Like

Hi Scott thanks for looking at it, highly appreciated.

Best regards

Mau

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.