Hi,
I have a data column that either contains numbers (eg 3.14) or numbers with a modifier (e.g. >3.14) and I need to split the modifiers out into a separate column. The numbers aren’t fixed length and not every value has one. The modifiers possible are >, <, and ~. Can anyone help?
Thanks Alexander, that works like a charm. So I can understand how the regex query was constructed can you point me in the direction of a suitable tutorial?
you are right. Won’t work in that case. Alternatively you can use Rule-based Row Splitter to separate data based whether there is modifier or there is no modifier.
$column1$ LIKE "<*" OR $column1$ LIKE ">*" OR $column1$ LIKE "~*" => TRUE
and after splitting Concatenate tables. But maybe that is too much and regex is just fine
Otherwise it will only match numbers with a single digit before the decimal point. Additionally the regex I posted before could not identify negative numbers, which is why I added \-?, an optional minus sign.
A very useful tool for finding a good regex is regex101. Unfortunately I cannot recommend a specific tutorial, but if you google it I am sure you will find a suitable one. Let me explain quickly what the regex above does. In general, regular expressions define a certain string pattern that can be matched against an input. A group in angular brackets means “one of those”, so the part [<>~] matches one of your modifiers. The question mark after the group means “it can be there, but it does not have to be”. So [<>~]? is "one of <, >, or ~, or nothing. Wrapping it in parentheses like in the second regex makes this part of the pattern a group, which we can reference later using the $1 in the replacement. In the first regex we do not want this group, but the number, so that is why there we wrap the rest of the expression in parentheses. As mentioned before, \\-? is an optional minus sign (- is a special character in regex and must be escaped with \, but because that itself is an escape character in the String Manipulation node, we have to escape it itself with another \). \d stands for any digit, but again we have to escape the backslash itself, so it becomes \\d. The plus sign signifies that we have one or more digits. After this sequence of digits we could have a decimal point and some more digits, that is why there is (\\.\\d+)?. A dot in a regular expression normally stands for any character, so to get the literal dot we have to escape it again with \\. The whole thing is wrapped in parentheses and followed by a question mark to mark it as optional, because not every number has a decimal point.
Please be aware that this does not deal with numbers in scientific notation.