Filter out rows w/ empty cells

#1

I try to use the Row Filter to filter out all rows with empty cells but it doesn’t work.
The input table has two columns, one is named “lastName”.
So I set the “Column to Test” to “lastName”, marked regular expression and entered ^$ (empty string).
Furthermore I marked “Exclude rows by attribute value”.

0 Likes

#2

Hi @dgoebel, did you try the option “only missing value match” ?

0 Likes

#3

Hi Hans, thanks for reply.
Yes I did but it had no effect.

0 Likes

#4

Then it’s not empty :thinking:. Can you upload a small example file with (dummy) data?

0 Likes

#5

Test.knwf (9.7 KB)

Dear Hans, thanks for help.
Regards
Dieter

0 Likes

#6

Hi @dgoebel,
we have a special node for this: Missing Value. Here you have more options regarding filtering and replacing missing values. In the options dialog’s second tab you can simply select the columns containing missing values on the left hand-side and then choose “Remove row” as missing value handling option on the right hand-side. Let me know if this works!
Kind regards
Alexander

2 Likes

#7

Dear Alexander, thanks for help.
I’ve also checked this node before w/ same result. As I mentioned inside the workflow as annotation the input file is created by the Table Writer node and the “empty” cells seem to have some hidden content (see created csv file).
Regards,
Dieter

0 Likes

#8

Hello Dieter,
I see, it is not really missing values. Sorry for the misunderstanding. Could you also share the file “Desktop.table”? Without it I cannot reproduce the problem.
Kind regards
Alexander

0 Likes

#9

Archive.zip (2.5 KB)
Sure

0 Likes

#10

Hello,
thank you! There seem to be indeed some strange characters in those cells. They are not matched by the whitespace regex ("\s") but when checking the length of the content with the string manipulation node, it shows that there are several characters. For the data you have, it might be possible to simply filter out rows where the column does not contain any alphanumeric characters or the minus sign. Using the expression “[^\w-]*” works. From your previous try I guess you know regex, but I will explain it here in case anyone else comes across this topic: the angular brackets define a set of possible characters and the leading ^ within the group is a negation, i.e. “anything but the following”. Then comes the special regex symbol for alphanumeric characters, which is just short for a-zA-Z0-9_ and I also include the minus sign. So in total, this expression means “Anything except alphanumeric characters or the minus sign, or an empty string”.
I hope this helps!
Kind regards
Alexander

0 Likes

#11

Hi @dgboebel

This is what I had in mind row_filter_missing.knwf (6.3 KB)

0 Likes

#12

Hi Alexander,
first of all “THANK YOU” - the regex works.
But what happens when writing the table using the node “Table Write”?
Regards,
Dieter

0 Likes

#13

Hi @dgoebel ,
what was the input to the table writer? When I simply load the attached CSV file and write it using a table writer, then load that file again using a table reader, I get proper empty cells.

Hi @HansS,
this won’t work because the Desktop.table file does not contain truly missing values, but string cells that appear empty but actually contain some strange non-printable characters.

Kind regards
Alexander

0 Likes

#14

Hi Alexander,
in this case the input to the table write comes from emails. The first step is to read eml files, then I filter the last email in order to omit all the forwarded and replied emails. Then I filter company name, last name and first name from the email address and pre-process the email content.
After that I create different kind of graphical reports like tag clouds, networks, etc.
Best regards,
Dieter

0 Likes

#15

Hi Dieter,
I think then those strange characters probably come from the eml file and end up in your table. Maybe you could filter them out with a String Manipulation node using the Regex I provided before passing the table to the Table Writer?
Kind regards
Alexander

0 Likes