Am working on a data set which has got many categories in categorical columns so before i perform one hot encoding ,i thought to bin data using rule set editor as i didn’t find other node which can be used to perform lambda function, and in rule set editor we can only perform binning for one column at a time.can someone suggest a better way to perform binning than to use this rule set editor multiple times for each categorical variable?
Thanks in advance,
Try “One to Many” node (select all categorical columns), then apply “Column Aggregator”
node with appropriate aggregation method like “Maximum” to get multiple values of specific categorical
variable into a single bin. I hope it’s a quicker way than doing rules manually.
Hi Martin , thanks for the reply , even using column aggregator node we wont be able to perform binning for multiple categorical variables at same time right we need to use this node multiple times for each category . is there a better way to do this like by using any single node than to perform same operation mutliple times consuming more memory?
Basically, what we call binning is applicable on numeric data.
If you are going to bin categorical data, its not impossible but IMHO, there is necessary to perform
some kind of mapping each categorical column to appropriate numeric column in such a way that categorical bin ranges corresponded with numeric ones.
Then you are able to use in a loop node “Binner Dictionary”, which expects as a second input dictionary table with defined numeric bins.
Unfortunately, I think it doesn’t take less effort than doing it by Rule engine node or the way I have described.