How to separate >, <, ~ signs from data columns

Hi,
I have a data column that either contains numbers (eg 3.14) or numbers with a modifier (e.g. >3.14) and I need to split the modifiers out into a separate column. The numbers aren’t fixed length and not every value has one. The modifiers possible are >, <, and ~. Can anyone help?

Nick

Hi @n_ray,
you can do that with a String Manipulation node and regex. Use the following expression for extracting only the number:

regexReplace($column1$, "[<>~]?(\\-?\\d+(\\.\\d+)?)", "$1")

and this for the modifier in another String Manipulation node:

regexReplace($column1$, "([<>~]?)\\-?\\d+(\\.\\d+)?", "$1")

Kind regards
Alexander

Edited: Added support for negative numbers and numbers with more than one digit before the decimal point.

4 Likes

Hi @n_ray,

if modifier is always first char Cell Splitter By Position can be option as well :wink:

Br,
Ivan

1 Like

Ivan, that won’t work if there’s no modifier though will it?
Nick

Thanks Alexander, that works like a charm. So I can understand how the regex query was constructed can you point me in the direction of a suitable tutorial?

Nick

Hi Nick,

you are right. Won’t work in that case. Alternatively you can use Rule-based Row Splitter to separate data based whether there is modifier or there is no modifier.

$column1$ LIKE "<*" OR $column1$ LIKE ">*" OR $column1$ LIKE "~*" => TRUE

and after splitting Concatenate tables. But maybe that is too much and regex is just fine :wink:

Br,
Ivan

1 Like

Hi @n_ray,
First of all: I noticed two mistakes in my regex. It should be:

regexReplace($column1$, "[<>~]?(\\-?\\d+(\\.\\d+)?)", "$1")

and

regexReplace($column1$, "([<>~]?)\\-?\\d+(\\.\\d+)?", "$1")

Otherwise it will only match numbers with a single digit before the decimal point. Additionally the regex I posted before could not identify negative numbers, which is why I added \-?, an optional minus sign.

A very useful tool for finding a good regex is regex101. Unfortunately I cannot recommend a specific tutorial, but if you google it I am sure you will find a suitable one. Let me explain quickly what the regex above does. In general, regular expressions define a certain string pattern that can be matched against an input. A group in angular brackets means “one of those”, so the part [<>~] matches one of your modifiers. The question mark after the group means “it can be there, but it does not have to be”. So [<>~]? is "one of <, >, or ~, or nothing. Wrapping it in parentheses like in the second regex makes this part of the pattern a group, which we can reference later using the $1 in the replacement. In the first regex we do not want this group, but the number, so that is why there we wrap the rest of the expression in parentheses. As mentioned before, \\-? is an optional minus sign (- is a special character in regex and must be escaped with \, but because that itself is an escape character in the String Manipulation node, we have to escape it itself with another \). \d stands for any digit, but again we have to escape the backslash itself, so it becomes \\d. The plus sign signifies that we have one or more digits. After this sequence of digits we could have a decimal point and some more digits, that is why there is (\\.\\d+)?. A dot in a regular expression normally stands for any character, so to get the literal dot we have to escape it again with \\. The whole thing is wrapped in parentheses and followed by a question mark to mark it as optional, because not every number has a decimal point.
Please be aware that this does not deal with numbers in scientific notation.

3 Likes

Thanks Alexander. For anyone following I did find a bunch of useful tutorials on the web; this cheat sheet was a good starting point.

1 Like

Hi @n_ray,
you could try with
Mastering Regular Expressions
Third Edition
Jeffrey E. F. Friedl

Regards

PB

Hi @n_ray,

How about using Regex Split with this pattern:

^([^\d]?)(.*)$

Or if you have to keep minus sign with numbers:

^([^\d-]?)(.*)$

Or if you just prefer to keep some certain characters:

^([<>~]?)(.*)$

Does it work as expected?

:blush:

3 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.