Random Forest for categorical Data

Hello
I have a categorical data- Doeas Knime has nodes that deal with such data for machine leanring or clustering ?

Best
Malik

Hi Malik,

If the values of each nominal/categorical variable are a few, I suggest to convert them to dummy variables and then use regular clustering methods on them.
For that you can use “One to Many” node and you may need to use “Domain Calculator” node before.

Armin

Hi @armingrudd
Thanks for your reply. Unfortunately, the nominal/categorical variable is about 100 ( i have it as numbers 0 to 99).
I have tried your approach, i got “WARN One to Many 3:198 column: index#59 has no possible values”.
I need that the classifier knows that this is categorical values and the distance between 3 and 7 is the same as 3 and 10.

Malik

Maybe this topic helps you in this case.

Armin

Dear @armingrudd
I just figure out that the H2O Random Forest node can deal with catogorical features.

Best
Malik

Hi malik
try “Domain Calculator” before “One to Many” node. In “Domain Calculator” node change value of “Restrict number of possible values” from 60 to 100 or more.
The problem is default value of number of possible values in categorical column. Default value is 60 for every categorical column. If your column contains more than 60 values, “Spec” will be not generated. If “Spec” is missing, “One to Many” node (and some other nodes) is unable to work with that column. The only way to generate missing “Spec” is “Domain Calculator” node.

Internally, the algorithm is also performing some kind of categorical → numeric encoding on categorical fields. You can configure the type of encoding within the node. However, similar encodings can be done with other KNIME nodes as well, in case you want to use other algorithms.

1 Like