Extract a specfic number

Hi,
I am a newbie in KNIME and I have an issue on extracting a number to a string. The number that I need are in different locations. Sample are below:

81727904HG ASDHASD 275 EE
Resource - 70331103
Ma 922803694FF400g*24

Output I need are:
81727904
70331103
922803694

Sorry last item should be:

input: Ma 92280369FF400g*24
output: 92280369

Welcome to the forum @AlyKnime .

Do you have more examples of strings with numbers you’re trying to extract? From what you’ve provided it looks like a regex string that matches numbers with more than 4 digits would suffice.

I used the Regex Extractor node (part of the Palladian collection), with the string \d{4,20} which finds numbers with 4 to 20 digits. If you have more complicated data then this may need to be modified.

image

4 Likes

Hi @elsamuel

Sample.xlsx (270.2 KB)

I have attached the sample of strings. actually there are rows that do not have a “code” that i need.
The file has two columns, a short text and a code. in the short text, there are items that includes the codes.

BTW I am not really familiar with the Regex Extractor :neutral_face:

Hi @AlyKnime,

Here is an example for you, I extracted the codes with both String Manipulation node and Regex Extractor node, use each one you prefer.
For installing and using Palladian nodes follow this instruction : Palladian [Product] — NodePit

GL,
Mehrdad

2 Likes

@mehrdad_bgh , I think maybe you didn’t upload the example :wink:

2 Likes

Thanks Brian :smile: :grin:

KNIME_project270.knwf (13.9 KB)

2 Likes

Hi @AlyKnime , you should first confirm if the logic proposed by @elsamuel makes sense, that is “finds numbers with 4 to 20 digits”. You can see why he’s doing greater than 3 digits, because for example, from “81727904HG ASDHASD 275 EE”, if you extract the numbers without any restrictions, you would get 81727904 and 275, similarly for “Ma 92280369FF400g*24”, you will get 92280369, 400 and 24.

So, do the numbers you are targeting always have a specific range of digits (minimum and maximum)? Are they always between 4 and 20 digits?

2 Likes

Hi @bruno29a , what I only need is the 8 digits :slight_smile:

Hi @mehrdad_bgh ,

I did try this, but encounter an error on the output for string, 81715738180ML HE CND BAMBOO. I had an output of 17157381 instead of 81715738.

1 Like

Hi @AlyKnime , then from @elsamuel regex, you only need \d{8}

Be advised though that if you have data that may contain 2 sets of 8 digits, then both sets would be return. For example “81727904HG ASDHASD 27527527 EE” would return 81727904 and 27527527.

2 Likes

Hi @AlyKnime , I just read this sample from you:

the \d{8} would still work for this sample

1 Like

Hi @bruno29a, is it via the regex extractor or in string manipulation? Cause what I am using is the string manipulation. Thanks!:slight_smile:

Hi @AlyKnime , I put something together.

It looks like this:
image

Input data is:
image

Output data is:
image

And here is the workflow: Extract 8 digit numbers from string.knwf (6.7 KB)

2 Likes

Will this be possible as well with the string manipulation used by @mehrdad_bgh ?

It’s done via Regex Extractor as it’s straight forward for what you need - extract the numbers. You cannot do this in 1 operation with String manipulation.

Is Regex Extractor an issue because you don’t have Palladian extensions?

EDIT: Correction: After viewing what @mehrdad_bgh has done, it looks like it can be done in 1 operation with String manipulation with the regexReplace. @AlyKnime , the regex probably needs to be adjusted. I’m not too strong with regex, so I can’t really help for the expression.

2 Likes

Yes that’s right I don’t have the Palladian extensions. and wondering if I can use the string manipulation instead :frowning:

Hi @mehrdad_bgh , just wanted to ask why do you think it captures the 17157381 instead of 81715738? when the “8” comes first than “1”? Just confuse on the logic :frowning: Sorry and thanks for your help! :slight_smile:

Yes, it looks like it takes 8 digits from the right - if you change from 8 to 6 for example, you can see this behaviour for all of the input data.

2 Likes