Extracting percentage value from string via regex seems not to work

Hi community,

I want to extract a percentage value from a string like:
“1234, Name of Entity, 29.5%, Germany”

My extract should contain the 29.5%.
The regex 101 code works, however, I cannot make the string manipulation regex matcher work. I only receive a false as output. Any clues?

My REGEX: \d+(.\d+)?%
Example how it works: regex101: build, test, and debug regex

My String manipulation configuration:
image

Best regards,
Stiefel

Hi @Residentstiefel,

You can use Column Expressions – KNIME Hub this way :

column("column1").match(/[0-9]+.[0-9]?%/)[0]
or
column("column1").match(/\d+.\d?%/)[0]

Get explanation of this regex by going on regex101: build, test, and debug regex

If you use String Manipulation – KNIME Hub, is it a normal behavior for regexMatcher to return “True or False”, but you could use regexReplace. But you’ll have to inverse your regex to match the non-wanted string and replace it with nothing, I guess.

Regards, Samir

4 Likes

I’d use the Regex Extractor node for this.

3 Likes

I would use the Cell Splitter node by comma splitting instead. Very simple.

Here are various examples of regex in KNIME.

Extraction is done a little differently in KNIME. As well the \d may require several “\” That is why I usually use [0-9] instead. If you really want to use Regex, I would recommend the String Manipulation node: regexReplace($column1$, “.+\\s([0-9]+.[0-9]+%).+”, “$1”)

3 Likes

As mentioned by @elsamuel, with the Regex Extractor you get a live extraction preview similar to common different regex tools, but directly in your workflow for your data:

You can even define several groups at once (optionally including names) and map them to separate output columns.

I used the following regex here \d+(?:\.\d+)?% which considers the decimal part optional and would also match e.g. “29%”.

The Regex Extractor is available as free extension within the Palladian extensions which are exclusively distributed through NodePit and available here:

For any questions, feedback, etc. about the node in particular, or Palladian in general, don’t hesitate to reach out on the forums!

–Philipp

1 Like

.(\b\d+?.?\d+?%).

Hi community,

Thanks for all the replies. I ended up utilizing the Regex Extractor, which is amazing. Column expressions also worked! However, what puzzles me, is that using the same Regex did not work in the string manipulation, which was my starting point. Pretty frustrating to see it work in the Regex101 and not in dedicated generic KNIME nodes… Actually, I gave up trying to make the string manipulation work (taking into account the extra backslashes etc.) :confused: Yet, there are so many other nodes that help out in these situations :wink:

Thanks and cheers,
Stiefel

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.