Hello

I have a categorical data- Doeas Knime has nodes that deal with such data for machine leanring or clustering ?

Best

Malik

Hello

I have a categorical data- Doeas Knime has nodes that deal with such data for machine leanring or clustering ?

Best

Malik

Hi Malik,

If the values of each nominal/categorical variable are a few, I suggest to convert them to dummy variables and then use regular clustering methods on them.

For that you can use â€śOne to Manyâ€ť node and you may need to use â€śDomain Calculatorâ€ť node before.

Armin

Hi @armingrudd

Thanks for your reply. Unfortunately, the nominal/categorical variable is about 100 ( i have it as numbers 0 to 99).

I have tried your approach, i got â€śWARN One to Many 3:198 column: index#59 has no possible valuesâ€ť.

I need that the classifier knows that this is categorical values and the distance between 3 and 7 is the same as 3 and 10.

Malik

Dear @armingrudd

I just figure out that the H2O Random Forest node can deal with catogorical features.

Best

Malik

Hi malik

try â€śDomain Calculatorâ€ť before â€śOne to Manyâ€ť node. In â€śDomain Calculatorâ€ť node change value of â€śRestrict number of possible valuesâ€ť from 60 to 100 or more.

The problem is default value of number of possible values in categorical column. Default value is 60 for every categorical column. If your column contains more than 60 values, â€śSpecâ€ť will be not generated. If â€śSpecâ€ť is missing, â€śOne to Manyâ€ť node (and some other nodes) is unable to work with that column. The only way to generate missing â€śSpecâ€ť is â€śDomain Calculatorâ€ť node.

Internally, the algorithm is also performing some kind of categorical â†’ numeric encoding on categorical fields. You can configure the type of encoding within the node. However, similar encodings can be done with other KNIME nodes as well, in case you want to use other algorithms.

1 Like