This is my workflow file and the source data CSV file. Please note that it is compulsory to put all cleaned data in new columns. Therefore, I will always name the new column New + <Old name>
and use the “Append Column” option in the Rule Engine node.
I’m performing data cleaning. I used the first Rule Engine node to filter the Airline ID column, with the requirement: “change all records with a negative value to 0”.
Next, I tried to use another Rule Engine node connected to the previous node to filter the Name column. The requirement is to take all records that start with a non-English character or a numerical number and change it to “Unknown.” This is where the problem begins. I tried to use the function MATCHES
with regular expressions, but all record in the new column returns “?”. For example:
// $double column name$ > 5.0 => "large"
// $string column name$ LIKE "*blue*" => "small and blue"
// TRUE => "default outcome"
$Name$ MATCHES "^([A-Za-z])gm" => $Name$
FALSE => "Unknown"
I knew that something was wrong, however, I couldn’t find a way to fix the issue. This regular expression was generated with the help of https://regex101.com. I tried to execute the node with the standard syntax and the Java syntax of the regular expression. In case you don’t know, someone mentioned in another thread that KNINE uses Java syntax of the regex, so I clicked on “Java 8” under “FLAVOR” on regex101.com, and it generated the Java syntax. The example above uses the Java 8 syntax of regex. I’m also finding ways to illiterate this in a single node, so please enlighten me if you know a more efficient way.
Thank you!