Column Expression

I am not sure wrong with column expression - regexmatcher. I am trying to validate date field format. I have a column with 2 different date format
12/31/2020
2020-01-31

Here is my code
if (regexMatcher(column(“column1”), “[^\d]{2}-\d{2}-\d{4}”))
“true”;
else
“false”;

Hi @kamleshp,

I’m not quite sure what you are wanting the regex to match but since you have the ^ symbol inside the square bracket this acts to negate the \d, so the first part will be trying to match two non-digits. If you were wanting to use ^ to represent “start of line” then it needs to be before the square bracket. It isn’t necessary in this case.

The other problem you have is that the \ sign needs to be entered in Column Expression regex as a pair \\ This is because the javascript on which it is based treats the first \ as an escape, so you need to “escape the escape” so that it gets fed correctly as a single \ to the regex parser.

I’ve attached a sample with several different regex patterns as an example

// returns A if format is nn/nn/nnnn , allowing 1 or 2 digits for month and day
if (regexMatcher(column("column1"),"\\d{1,2}/\\d{1,2}/\\d{4}"))
{"A"}
else
    // returns B if format is nnnn-nn-nn, allowing 1 or 2 digits for month and day
    if (regexMatcher(column("column1"),"\\d{4}-\\d{1,2}-\\d{1,2}"))
    {"B"}
    else
        // returns C if format is nn-nn-nnnn, requiring 2 digits for month and day, else D
        if (regexMatcher(column("column1"),"\\d{2}-\\d{2}-\\d{4}"))
        {"C"}
        else
        {"D"}

image

Note that if this is all you need to do, it is more easily (and probably more efficiently) done with a Rule Engine as follows:

$column1$ MATCHES "\d{1,2}/\d{1,2}/\d{4}"=> "A"
$column1$ MATCHES "\d{4}-\d{1,2}-\d{1,2}"=> "B"
$column1$ MATCHES "\d{2}-\d{2}-\d{4}"=> "C"
TRUE => "D"

Note that with the Rule Engine, you DON’T escape the \, so that’s something to be wary of.

image

column expression date regex.knwf (8.4 KB)

4 Likes

That worked.
Thank you @takbb

2 Likes

Great post. Is there an overview of which nodes require the “\” ? I am used to work with single slash so I always forget that when using KNIME

1 Like

Hi @Daniel_Weikert , I don’t think there is a list but in general the nodes that use are based on actual programming language need to add the extra backslash.

e.g. Requiring double-backslash:
String Manipulation, Column Expressions, Java Snippet

Not requiring double-backslash:
Rule Engine
Row Based Rule Filter
Regex Split
Column Rename (Regex)

For the “scripting” nodes, there is a small visual clue as the colour of the string changes. If the String is not displayed as red when you enter a backslash, you need to enter a second one! This is because the script editor cannot determine that the string is being “closed” correctly, or something like that…

e.g.
String Manipulation:
image
image
image

Column Expressions:
image
image
image

Rule Engine (doesn’t use double-backslash):
image
image

Even the java snippet node does have this visual clue, although it’s not quite as obvious because the colour changes from blue to black. Although it will also underline the line, and when you move off it will highlight it.
image
image
image

I think there might be some cases though where you need to enter a third backslash. It depends on how many “levels” of interpretation there are before it actually gets to run the resultant code, but maybe that’s a story for another day… or when I can find an example… :wink:

The main thing I think is to be aware that double-backslash may be a possibility so if you hit problems you have an idea where to look. :slight_smile:

4 Likes

Not sure about third but there is definitely a case when you need a fourth one :slight_smile:

Br,
Ivan

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.