Regex Filter Node problem

Hi Everybody,

Regex Filter filters when I want to eliminate all digits "/d+, but when I want to eliminate all non-digits it excludes the last row's digits.

I attached the example workflow.

Any ideas/suggestions are welcomed.

Bora

 

I had a look at this and am struggling to work out why the numbers are not kept in the last row in the non-digit Regex filter you used.

seems like a bug to me.

simon

Have you tried using the string replacer node instead and choosing RegEx option.

for replacement field you can leave it blank, thereby giving the same effect as a filter.

simon.

Hi Simon,

You are the only one who always respond to my weird questions :))

Yeah, I tried your suggestion and works fine. Thanks a lot, once again.

Regarding the Regex Filter, whom should I tell if it is a bug?

Bora

Someone from the knime team always reads these posts, so they'll spot it and take note.

simon.

Hi boraster,

this behaviour has to do with the tokenization. The number in the last line belongs to the token "'lulara10". Note that 10 is not a separate token. Regular expressions are matched on the complete tokens. The filter filters only when the complete token matches. This is not the case with regex "\D+" on a string "lulara10". Thus the token is not filtered.

If you want to get rid of all non digits in the text you can apply a "String Replacer" before the "Strings to Document". This node works on strings not on tokenized strings.

Attached is a workflow with an example for the  "String Replacer" node. To check the tokenization the BoW creator node can be used.

Cheers, Kilian

Thanks a lot, Kilian.
I will do as you have suggested.
Regards
Bora