How can i eliminate all the missing value from a data set?

Hi. I have a dataset which has few attributes with missing value. I used row filter to eliminate some row with missing value, but when i used row filter i had to define a column to test and so i have some rows that aren't be eliminated. How can I eliminate all the rows with at least one missing value?

Just use the Missing Value node for this task with the Remove Row option.

1 Like

Hi Aborg,

I tried several "Missing value" techniques but none is working for me.

I have the following two columns:

When there is an empty cell in Serialno, I would like to exclude the whole Row.

I tried "Missing value column filter", I put Serialno in Include box, run it, nothing changes. I tried out the wildcard and the type selection option. It doesnt exclude the empty cell.

I tried the "Missing value" and "Missing value (apply)" node together, as is indicated in the Knime examples. Selected "Remove row" for both columns..again nothing changes.

 

Whats wrong? how can I exclude the missing cells, and concequencly the whole row?

 

 

 

Hi atzitzi,

From my understanding, empty cells in 'serialno' aren't missing values, they are either null values or blank characters (missing values should appear as '?', not empty cells). I would give a try to the 'Rule-based Row Filter' node, using a single rule similar to:

$serialno$ MATCHES "[^\s]+" => TRUE

and configuring the node to include only matching rows ("Include TRUE matches" option).

Best Regards,

--

Jorge

 

 

 

 

 

2 Likes

Finally!!!!!

YES!

many thanks Jorge!

Try using Rule Based Row Filter and using expression to filter out the rows with missing values.. 

 

Eg :

MISSING $Col_NAME1$ => TRUE

MISSING $Col_NAME2$ => TRUE

Select Exclude True Matches... 

this should help you in filtering rows with missing values. 

 

1 Like

Hi Atziti,

This is very helpful. May I ask what [^\s]+ means?

Thanks!

-Jay

Hi Jay!

It is a regular expression if I’m not mistaken. Check bottom web page. You can write this expression and get some explanation :wink:

https://regexr.com/

Br,
Ivan

1 Like

I’ve been trying to remove rows if a certain column has a null as well. I’ve tried both of the recommendations above and neither seems to work. I’m not sure if it would make any difference but my original dataset is a csv. I’ve done this like a thousand time in Alteryx (very easy) and I’m surprised at how difficult it has been to remove nulls/blanks from my data set. Any other recommendations?

The missing value node should be able to handle individual columns.

“… These settings apply to all columns in the input table that are not explicitly mentioned in the second tab, labeled “Individual”. This second tab permits individual settings for each available column (thus, overriding the default). To make use of this second approach, select a column or a list of columns which needs extra handling, click “Add”, and set the parameters. Click on the label with the column name(s), will select all covered columns in the column list.”

Hi @zebing!

Welcome to KNIME community!

The original dataset being in a csv doesn’t matter. When you say null is it missing value (?), just empty cell or something third? If indeed you want to remove row where some column(s) has missing value Missing Value node should do the trick.

Considering you are coming from Alteryx check out this topics:

Br,
Ivan

1 Like

Missing Values Test.knwf (9.8 KB)

Thanks for the responses - I have a number of fields with nothing in them, whether we call that a null or an empty. Rather than try to explain, I’ve created a small sample workflow which takes a small dataset of 100 rows and tries all 3 of the different methods for removing empty fields that are mentioned in this post. For whatever reason, none of them are working for me. Can anybody jump in and see where I’m off?

Hi there!

Can’t see data if workflow in state reset. Do not reset while exporting.

Br,
Ivan

Missing Values Test.knwf (22.9 KB)

Trying again, hopefully this one will work

Hi I believe the issue here is that the “empty spaces” actually have white spaces in them. This is why the tables structure doesn’t treat them as “missing” or NULL which is usually denoted in KNIME as a “?”.

I have attached an example that might help.

Missing Values Test (mod_HH).knwf (14.0 KB)

PS: If your tables are very large it would help to make this more generic by using a single rule engine and looping the columns through.

2 Likes

Hi there!

This one did work @zebing. I don’t think there are white spaces in this cells @heshanhenry else above regex would work. And welcome to the forum btw :wink:

For some reason CSV Reader reads this cells in as empty strings ("") so using Missing Value node or missing functions within other nodes won’t work. You can use Rule-based Row Filter node with following rule:

$Sales$ MATCHES "" OR $Employees$ MATCHES "" => TRUE

I tried but couldn’t make CSV Reader to import empty strings into KNIME… Maybe someone will shed a bit light on this one :slight_smile:

Br,
Ivan

I can confirm that there are no white spaces in my cells, and I can also confirm that the above code does work. Thank you, Ivan.

It’s interesting because empty strings, being empty, were the same as nulls in my mind. Guess it’s not quite what’s expected in this case. Would be good to see if the reader can be made more robust for situations like this one.

Hi there!

Glad it works now. Empty string is not the same as null (?) in KNIME.

As said I couldn’t get empty strings from a CSV Reader with my csv example. Can you send me your csv to test it and to try to figure it out?

Br,
Ivan

Hi @zebing!

Just an info: I managed to get empty strings instead of missing values when reading data from csv file having multiple spaces (more than one) in my file. Guess that is why you are getting empty strings as well…

Br,
Ivan

Hi @zebing

I also come from using alteryx, hope this can help you and/or you can help update this tool mapping sheet: