Regex to extract data from a string

TigerCole · June 10, 2015, 8:09pm

I need to use a 'Regex Splitter' module to split the BAY and DRIVER values from strings like these included below into a new columns.

BAY 4: Loading complete - Driver 68055178/768, OR:18637, TR: 32349, CO:5, PR:BC4-1-A1, Gross QTY:477.

BAY 3: Loading complete - Driver 68055180/117, OR:18690, TR: 36349, CO:3, PR:BC4-1-A1, Gross QTY:547.

I have used /(?<=BAY )[0-9]/g in my regex test app and (?<=BAY )[0-9] in another app that I use to do regex extractions to get the BAY value and /(?<=DRIVER )[0-9]*\/[0-9]*/g and (?<=DRIVER )[0-9]*\/[0-9]* to get the DRIVER value.

The first expression completes in KNIME with no errors but creates no results.

The second expression creates a new column, split_0, but generates an error message = 535 input string(s) did not match the pattern or contained more groups than expected.

If anyone can help I would really appreciate it.

Thanks,

tC/.

Docminus · June 11, 2015, 7:48am

Sounds like a similar problem that I had not too long ago, see if this helps (scroll to the answer posts :D )

https://tech.knime.org/node/48402/view

TigerCole · June 11, 2015, 9:32am

Thanks for the reference. I will give it a try.

tC/.

TigerCole · June 11, 2015, 11:41am

I tried as Richards99 suggested and put a .* before and after the expression but still got no results. I then followed aborg's link to http://regex101.com and tested the expressions online and they worked perfectly (I even managed to tune them a bit while I was there) but still no errors or results in KNIME.

I would really appreciate any other possible suggesttions.

Thanks

tC/.

Macca · June 12, 2015, 11:47am

You could use the "Cell Splitter" Node and enter space as the delimiter. You would have your bay information in Arr[0] and your driver information in Arr[6]. You could then use "column filter" node to get rid of the other columns. Using the string manipulator node to get rid of the semi colon and commas if needed.

Hope this helps

TigerCole · June 24, 2015, 4:24pm

Hi Macca,

I gave your suggestion a try and it worked well. Thanks.

tC/.

izaychik63 · February 24, 2017, 9:20pm

Hello, where.

I have a task yo extract pdf file names and as a second field to have document effective date.

Documents are in a folder. The date is a part of the text like below:

Policy Effective Date: May 18, 2015

I started fromm PDF parser. It generated documents names from document.

I'd like to have a file name. How to get the date from text I do not know.

Please advice.

Thank you, Igor

asenkron · March 19, 2019, 1:18pm

Hi everyone,

I am pretty new user and couldn’t solve a problem akin to this topic:

I could fix it in jupyter notebook so I will share the exact example:

data = pd.DataFrame( {'country':['France', 'Japan', 'England'], 
          'Inspection Interval' :['Residential 5 years commercial 2 years Mean stok age 30.1', 
                                  'Residential 4 Commercial 1 year Mean age 25.2', 
           'Residential proposed 10 years commercial adfs 15 year MSA 22.0']
         })
data['Numbers']=0
for i in range(data.size):
    data['Numbers'][i]=re.findall('\d*\S*\d+', data['Inspection Interval'][i])

Output is as intended:

How could I do the same with Knime nodes?

Thanks a lot