Extract a specfic number

Sorry what should I change here to correct it? This is the workflow shared by @mehrdad_bgh :slight_smile:

As I said, regex is not my strength, until someone comes with a better expression, one dirty way I can think of is to do it in 2 steps like this:

regexReplace(regexReplace($column1$,"(\\d{8}).*", "$1"), ".*(\\d{8}).*", "$1")

The first regexReplace will take anything from the left of the 8 digits along with the 8 digits, and the 2nd regexReplace eliminates what’s on the left of the 8 digits.

It’s dirty but it gives you the results that you want.

@AlyKnime Here’s the workflow: Extract 8 digit numbers from string 2.knwf (8.2 KB)

2 Likes

Hi @AlyKnime ,

Sorry for delay.
Write this line in string manipulation node:
toInt(regexReplace($column1$, "^.*?(\\d{8}).*","$1" ))
I used toInt function to maintain missing values, You can delete it.
The previous regex was not accurate, my bad.

Br,
Mehrdad

5 Likes

That elusive “?” in the regex pattern answered a question I had when I was trying this out, as the behaviour of the Palladian Regex Extractor is different to String Manipulation and Regex Split.

I was trying out the original @elsamuel’s original suggestion of \d{4,20} on the Regex Extractor and it worked just fine.

80310851 DWY FRSHPROTC APR FRSH 4/20.1Z => 80310851

However, trying to make it work with String Manipulation and Regex Split wasn’t working for me

With String Manipulation:
regexReplace($column1$,".*(\\d{4,20}).*" ,"$1" ) => 0851

With Regex Split:
both .*\d{4,20}.* and \d{4,20} returned the whole cell
and
.*(\d{4,20}).* returned 0851 just as String Manipulation did, as it was returning the minimum set of 4 digits.

With the insertion of the ? the behaviour of both those nodes was (mostly) fixed

String Manipulation:
regexReplace($column1$,".*?(\\d{4,20}).*" ,"$1" )
80310851 DWY FRSHPROTC APR FRSH 4/20.1Z => 80310851
DOWNY FRSHPRCT APRIL FRESH 4/14.8Z => returns entire cell

The returning of the entire cell where the digits weren’t found was “fixed” in this instance with the inclusion of the toInt() as in your example.

Regex Split
.*?(\d{4,20}).*
80310851 DWY FRSHPROTC APR FRSH 4/20.1Z => 80310851
DOWNY FRSHPRCT APRIL FRESH 4/14.8Z => returns ?

It was interesting to note the slight variations in behaviour between the different nodes, and for sure it shows that the Palladian node was easier to work with here, but useful I think to see the differences in outcome, which I wasn’t expecting.

Regex Split also resulted in a warning where the pattern wasn’t find which the others didn’t.

2 Likes

Hi Brain,

I’m agree with you, Regex Extractor is best node for issues like this.

And you can add Column Expressions to your list:

if(regexReplace(column("column1"),"^.*?(\\d{8}).*" ,"$1" )==column("column1")){
        null
    }
    else{
        regexReplace(column("column1"),"^.*?(\\d{8}).*" ,"$1")
    }

GL,
Mehrdad

2 Likes

Thank you so much @mehrdad_bgh @bruno29a @elsamuel @takbb for all your help! Really appreciate it! I was kinda hesitant at first to post a topic/ask a question afraid no one will notice or reply. But you just save my day! Again thank you so much! :slight_smile: :slight_smile: :slight_smile:

Alyssa

4 Likes

@AlyKnime I think you will see there is no need to be hesitant! I have worked with many pieces of software and belonged to many software-user-communities over the past 30 years or so, and in my view, without a doubt, I would say that with Knime, the community has to be the most active, responsive and helpful I have seen, even though I’ve only been a member here for only one month myself! I think Knime users are really passionate about this platform (and rightly so) and it shows! I am finding Knime addictive (in a good way!) :slight_smile:

4 Likes

@takbb I agree with you community is very helpful and responsive. I am also finding Knime addictive (in a good way) :smiley: to the point that I really want to learn everything!
Hopefully in the future I’ll be able to share knowledge, the same you guys did to me. Thanks again!

4 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.