Processing an INVERTED COMMA/QUOTATION MARK in a Rule-based filter

In my dataset there is a column called “SmokesPerDay”. I am looking to extract a number from this column, as there are many entries with additional text qualifications.

I am asking two questions:

  1. How do I identify an entry which starts with " like this one?

“changes on the day”

I tried using the following expression - $08b.SmokesPerDay$ MATCHES "["]{1,}.*" - but it greys out on error checking. Also tried an escape character - $08b.SmokesPerDay$ LIKE "\"*" - but also no good.

  1. As you can already tell, I am unskilled in Regex (as with many coding issues), but how do I select the entries where the first 2 characters are numbers? (and yes, I realise that if I can do this, my first question is irrelevant, but I’ll include it anyway.)

As always, help appreciated and humbling!

A

Hi @AAM ,

It could be that what you copied/pasted here is not exactly how they are in reality, but if that was the case, then what you pasted does not satisfy your condition simply because the quotes that you are evaluating are different.

Here’s a sample data I worked with:
image

And the first row, with ID = 1 is what I copied from you.

If I do the check via String Manipulation using this expression, I am able to get the expected results:


substr($column1$, 0, 1).equals("\"") ? "True" : "False"

(This expression is checking if the first character is a quote ", in which case it will return “True”, otherwise will return “False”)

image
As expected, only the row with ID = 2 will match as it contains the quotes that we are looking for (Row ID = 1 has different quotes, that’s what I copied and pasted from your post)

Similarly, if I do a Row Filter, it works too:


image

With the Rule-based Row Filter, somehow, Knime does not seem to like the statement LIKE "\"*". It’s ok to do LIKE "\t*" for example, but it gives an error if we do LIKE "\"*". It looks like we’d have to use regex via the MATCHES statement as a workaround.

Any reason why you need to use the Rule-based Row Filter instead of the Row Filter node?

As for your 2nd question, you can do this via the Rule-based Row Filter indeed with this expression:
$column1$ MATCHES "[0-9][0-9].*" => TRUE

3 Likes

Any reason why you need to use the Rule-based Row Filter instead of the Row Filter node?

Not particularly except that I was trying to filter a whole lot of rubbish out, and because I suck at Regex your nice final solution was beyond me.

Thanks for taking the time. I am very grateful and appreciative,

A

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.