Handling missing values using Rule Engine vs Missing Value node

Hi, I recently learnt how to use the Rule Engine node and found that it too is able to handle missing values to an extent.

So it got me thinking about what advantages or disadvantages the Rule Engine has over the Missing Value Node when it comes to identifying and replacing missing values.

Hi @Leongwd,

interesting question :slight_smile:

The Missing Values node has the advantage that you can replace missing values in multiple columns at the same time, by defining how the values should be replaced either based on the data type or for each column. Here you don’t need to implement any custom logic, but you can easily select from an extensive list of imputation options. Important though the replacement options are all based on the other values in the same column, e.g column mean, or most frequent value.

With the Rule Engine node you can only replace missing values in one column. It has the advantage that you can implement also more complicated rules to impute your missing values. E.g. we want to relplace the missing values in the Column A and we want to use the fix value x, if Column B has the value 2. Otherwise we want to use the fix value y. Something like this is not possible with the Missing Value node, as it does not take into account the values of other columns.



Hi @Leongwd & @Kathrin

Thanks @Leongwd for asking this interesting question on the forum and thanks @Kathrin for your clear explanation.

In addition to what @Kathrin said, I would add that the -Missing Value- node is able to replace missing values with values that the -Rule Engine- node cannot calculate, e.g. Mean, Min, Max, Previous Value, etc. This is also a great advantage in itself.

I hope this additional information will also be useful.




@aworker @Kathrin Thank you both for the clarification, it was of great help.


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.