I have array-values like this in my data:
[“promotionjob”, “studentenjob”, “gastrojob”]
[“messejob”, “studentenjob”, “hostessjob”]
[“promotionjob”, “eventjob”, “hostessjob”]
I want to transform the string-values in the array to caregory-number-values so I can make a linear regression. So I think I need n single columns for each string which holds 1/0 if the array contains this value.
How can I transform/prepare the data this way?
Thanks for some help!
Hi @iparker !! Welcome to the forum. In order to do hot-encoding you can use then one to many node:
Hi @iperez, thanks a lot for your reply!
I tried the one-to-many-node, but it seems that it’s not exactly what I’m looking for.
It seems that every complete cell-value is handled as a new column, not if the single tag appears in the cell.
I mean: I dont want one column [“promotionjob”, “studentenjob”, “gastrojob”] but three column for the single tag values.
Hope you understand what I mean.
Based on the example that you have given in the first post, can you please draft your expected output? This makes it for people easier to jump in and help you out.
I see that your “array” is a flat string which changes things a bit.
This requires a little bit of data manipulation. The following workflow should work replacing the table creator with your dataset
Hot encoding tags.knwf (28.1 KB)
thanks a lot for your reply and your workflow.
This works very well and is a great learning for me!
Thanks a lot!
here is a slightly cleaner / advanced solution based on your data which might be ready to use:
thanks for your reply and your workflow. This works pretty good too!
Thanks a lot @iparker. I’d appreciate if you could mark the solution of your choice by clicking the solution icon in the bottom right of the corresponding post. Cheers, Mike