Missing Value dummy variable node

Hi,

Would it be possible to get a node (something like the One2Many node) which can operate on a number of columns at once and creates new binary variables wfor each column indicating whenther or not there was a missing value?

Best regards,

Jay

Hi,

it is a good question. I also would like to use a 'make dummy variables from a category variable' node. Any idea?

 

Sándor

 

hi Sandor,

you need the one2many node for this.

best, iris

Hi Iris,

thank you!

Best, Sandor

If you are using regression or any other correlation based algorithm thereafter, you should take care to remove one of the dummies after One To Many for each categorical variable you've transformed.

1 Like

To revive a very old thread (again), is there still no node to accomplish what the original poster wanted? One to Many seems to only work on nominal variables, but I’d like a way to create dummy indicators for a series of numeric variables. The new variables would each indicate whether the associated original numeric variable was missing.

I was trying to accomplish this in the Multi Column Math Formula node, but it can’t seem to handle missing values.

Hi there!

Have you tried Column Expressions node? It has a bit more functions…

If you can share a workflow or even better dummy input and expected output result and I can take a look :slight_smile:

Br,
Ivan

Hi Ivan,

Thanks for the advice. I got what I needed by using the Math Formula Node combined with a Missing Value node. I don’t see an obvious way to do what I want within the Column Expressions node without a fair amount of manual typing. Essentially this is what I had:

Var1 Var2 Var3
1    3    2
2    3    1
3    ?    5
?    ?    1
3    2    ?

What I wanted was:

Var1 Var2 Var3 Var1_ind Var2_ind Var3_ind
1    3    2    0        0        0
2    3    1    0        0        0
3    ?    5    0        1        0
?    ?    1    1        1        0
3    2    ?    0        0        1

The math formula (multi column) node creates the indicator variables for each variable and changes all the values to 0 while leaving the missing values as missing. Then the Missing Value node changes the missing values to 1. It works fine, but it’s a bit round-about.

1 Like

Hi!

Column Expressions node has isMissing() function so there is no lot of typing :wink:

Your approach seems just fine but I have constructed couple of ways to deal with this using Rule Engine node or Column Expressions node (I would go with them rather than Math Formula) and whether you have static number of columns or not. Take a look in a workflow attached.

2019_01_10_Detecting Missing Values.knwf (44.1 KB)

Br,
Ivan

1 Like