Variable Importance Clustering

mauuuuu5 · December 17, 2020, 1:24am

Hi everyone, I wonder if you might consider add a node to calculate the variable (predictor) importance in clustering, which can be a valuable input to explain cluster results. Such test is documented in an algorithm guide of a commercial software (link 50 MB).

The definition of predictor importance according to the documentation is:

“Predictor importance indicates how well the variable can differentiate different clusters. For both range (numeric) and discrete variables, the higher the importance measure, the less likely the variation for a variable between clusters is due to chance and more likely due to some underlying difference.”

And here is an excerpt of the documentation.

Best Regards

Mau

ScottF · December 18, 2020, 9:38pm

Hi @mauuuuu5 -

Looks like this is an application of some pretty standard chi-square tests to come up with the relevant importance metrics. We haven’t had a lot of requests for calculation of feature importance with respect to clustering, but this one seems simple enough. I’ll add it to the list and ask some folks internally about it.

I know we implemented the Silhouette Coefficient node based on community feedback, so we’ll see what happens. Thanks for the link!

mauuuuu5 · December 19, 2020, 1:31am

Hi Scott thanks for looking at it, highly appreciated.

Best regards

Mau

system · June 19, 2021, 1:32pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.