Missing value node has long running times for many columns

Dear KNIME users,

yesterday I was looking for a way to filter out rows containing missing values. Most Forum posts suggest the missing value node. So I went about and tried it, however with ~1000 columns and 3 rows the node takes about 5 minutes to execute. As far as I see the main issue is the PMML generation (which is non standard if I exclude the row, instead of filling in a value), so the node hangs at around 43% for most of the 5 minutes.
I benchmarked the timings for different numbers of colums (using 100 iterations of the benchmark nodes):
1 column: ~0.5s
~100 columns: ~40s
~500 colums ~180s
~1100 columns ~300s

(Using Win10, 16GB RAM, KNIME version 4.1.2)

is there any possibility to not generate the PMML or a more efficient way to remove rows with missing values?
The final table will contain ~200.000 rows so I guess transposing and filtering is also not too efficient.

Thank you for your help!
Best,
Jennifer

Hi @jenniferh -

Thanks for the detailed writeup. I can confirm this same behavior with the Missing Value node in KNIME 4.2.1 as well. Your idea for an optional PMML generation switch is an interesting one - let me run this by the developers and see if I can get some feedback and/or a possible workaround. :slight_smile:

EDIT: went ahead and created a ticket for this. (Internal: AP-15071)

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.