clustering nodes

es_aml_project · January 4, 2017, 6:15pm

Hi,

I am new to Knime world and I need some help :)

I am trying to use Knime nodes for clustering. My data table has both numeric attributes and nominal attributes. As I understand clustering nodes only consider numeric attributes when computing the distance between examples.

Is there a way to use heterogeneous metrics for clustering?

(e.g. distance for a categorical attribute: 0 if the considered examples have the same attribute value, 1 if the considered examples have different values)

Thanks for your time!

Geo · January 5, 2017, 3:56pm

A few possibilities:

transform all attributes into numeric ones (dummy coding with or without PCA transformation, etc.) using One To Many and maybe PCA (followed by Normalizer), then proceed as usual;
combine distance measures for each data type using Aggregated Distance. The question is then how much weight to allocate to each distance metric. One solution may be to weigh them by the number of attributes per distance measure;
look into R integration, where heterogenous metrics may have been implemented.

es_aml_project · January 6, 2017, 8:14am

Thank you for your suggestions :)