New column returns "?" value after using Rule Engine

minhngoc25a · February 20, 2022, 10:09am

This is my workflow file and the source data CSV file. Please note that it is compulsory to put all cleaned data in new columns. Therefore, I will always name the new column New + <Old name> and use the “Append Column” option in the Rule Engine node.

I’m performing data cleaning. I used the first Rule Engine node to filter the Airline ID column, with the requirement: “change all records with a negative value to 0”.

Next, I tried to use another Rule Engine node connected to the previous node to filter the Name column. The requirement is to take all records that start with a non-English character or a numerical number and change it to “Unknown.” This is where the problem begins. I tried to use the function MATCHES with regular expressions, but all record in the new column returns “?”. For example:

// $double column name$ > 5.0 => "large"
// $string column name$ LIKE "*blue*" => "small and blue"
// TRUE => "default outcome"
$Name$ MATCHES "^([A-Za-z])gm" => $Name$
FALSE => "Unknown"

I knew that something was wrong, however, I couldn’t find a way to fix the issue. This regular expression was generated with the help of https://regex101.com. I tried to execute the node with the standard syntax and the Java syntax of the regular expression. In case you don’t know, someone mentioned in another thread that KNINE uses Java syntax of the regex, so I clicked on “Java 8” under “FLAVOR” on regex101.com, and it generated the Java syntax. The example above uses the Java 8 syntax of regex. I’m also finding ways to illiterate this in a single node, so please enlighten me if you know a more efficient way.

Thank you!

aborg · February 20, 2022, 10:33am

Hello, Most probably you do not need the gm part in your regular expression, but something like this:

^[A-Za-z].*

So your rule for the names starting with English letters:

$Name$ MATCHES "^[A-Za-z].*" => $Name$

minhngoc25a · February 20, 2022, 10:48am

Thanks for helping! I knew that something was wrong with my syntax! In addition, how would I change the NULL value (denoted by “?” in table view) to Unknown? I tried this rule:
$Name$ MATCHES "^\0*" => "Unknown"
and this rule: $Name$ MATCHES "^(^[A-Za-z].*)" => "Unknown"
but KNIME says that it’s an invalid pattern.

aborg · February 20, 2022, 11:07am

For missing values (?) check, you can use the MISSING $Name$ expression. This will return true in case the value in Name is missing.

mlauber71 · February 20, 2022, 11:24am

@minhngoc25a I will see if I can take a look at your example later. In the meantime you could take a look at this example replacing special characters with Regex:

minhngoc25a · February 20, 2022, 11:57am

Sorry for not being specific. I’m trying to transform the missing values generated from the rule $Name$ MATCHES "^[A-Za-z].*" => $Name$ and already present missing value to “Unknown”. Is that possible in one node?