Extract specific numbers from string

I would like to extract specific numbers starting with “8” and they’re 8 digits.
But they sometimes are in the parentheses, and sometimes are not.

How do I extract these 8 digits numbers with String Manipulation?

Thank you :slight_smile:

Hi @NancyShen and welcome to the Knime Community.

You will have to use regular expressions for this. You can use the regexReplace() function from the String Manipulation for this, or there is a Regex Extrator node from Palladian that you can use.

Thanks! Yes, i’d like to use regexReplace, but I’m not familiar with regex code. Can you write for me?

Hi @NancyShen I’m not good at regex code either, but doing some research, and some innovative ideas, I managed to come up with the code.

Here’s what I used a sample data:

And this is the results of extracting numbers with 8 digits and starting with 8:

This is the code that I used:
replace(regexReplace($column1$,".*(8\\d{7}).*", "$1"), $column1$, "")

\d{8} would look for any numbers with 8 digits
8\d{7} would look for any numbers that start with 8, followed by 7 digits, which makes 8 digits in total
And because we’re passing it as a string, I have to escape the slash by adding another slash:

When you use String Manipulation, you are limited to only the Functions it provides, and there is no regex-related functions that allow you to extract any matching expressions.

So, I have to work with the regexReplace(), and then find a way to remove anything extra, hence the additional replace().

Here’s the workflow: Extract numbers with 8 digits starting with 8.knwf (6.8 KB)


@NancyShen There are some regex online editors or regex online testers out there. Just google for e.g. “regex101”. In these you can copy and paste some variants of the strings you have to deal with. And can check if your regex works somewhat properly. And can develop a working regex really fast. You wont get a super high sophisticated regex from start on. But it mostly is ok when it simply work. You will need more regex in future imo to 99.x% again :wink:


Hi @bruno29a
Thanks it works! I can successfully extract numbers w/ 8 digits and starting with 8.
However, there’s still a little help needed. For the cell which doesn’t fit this rule, it will be inserted with original content. But I would like to make them as null. How do I do?

Thank you for your kindly sharing!

Hi @NancyShen , did you use the full code I gave you, or only part of it?

The full code takes care of removing the original content, as you can see in my screenshot.

If you run only this part: regexReplace($column1$,".*(8\\d{7}).*", "$1"), you will get the results you are currently getting.

That’s why I do a replace() to replace the original content with nothing:
replace(regexReplace($column1$,".*(8\\d{7}).*", "$1"), $column1$, "")

1 Like

Ahhh my bad. I miss “replace()” part. Now it totally works!!! Thank you so much :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.