REGEX SPLIT NODE IN KNIME

Hello,

Can someone help me how can I get the below data using Regex Split Node? Appreciate if you could use the regex code not a substring or indexof formula. Thank you.

DATA NEED TO GET Remarks
TESTING PURPOSES ONLY PT Invoice Number: ABCDER00195436 ABCDER00195436 Just want to get the Invoice Number only
Title Invoice No. : ABCDER00195436 - HAWB : 8928435563 - ABC ABCDER00195436 Just want to get the Invoice Number only
EGGHEAD, VEGAPUNK JINBEI Invoice Date: 18-03-2024 18-03-2024 Just want to get the Invoice Date only

Try this. Uses Regex Substring node from AF Utilities.
Regex Extract Invoice.knwf (95.2 KB)

1 Like

I added a loop so all of the data can be parsed automatically. If you change the format of the data rows significantly the loop probably will have to be modified since it depends on a LIKE function in the Rule Engine node to assign the correct regex.


Loop Output

4 Likes

@rfeigel thank you so much, can you share the workflow with the loop?

@rfeigel sorry got the workflow, but just want to ask why there is a space in the date?

Strange. I’m not getting a space. I’ll double check the regex.


Here’s a table from the Table View node. Its Ok too. @iCFO makes a good point. I had to copy the data from your post. That may be the difference.

1 Like

It is best to copy and past the sample data into a Table Creator node and share the workflow with the forum to catch things like leading space or hidden characters. Having to manually copy / paste or convert forum text will miss things like that. It will also dramatically speed up solutions and more of us will have time to quickly jump in and help.

2 Likes

A quick fix would be to use a String Cleaner node after the Loop End to remove the leading space. Or as @iCFO suggested send the data in table format and I’ll see about fixing the regex to match it.

1 Like

Thank you so much @rfeigel, I tried strip in string node and it worked.

But for your reference, here’s the data.

Data
Title Invoice No. : ABCDER00195436 - HAWB : 8928435563 - ABC
Title Invoice No. : ABCDER00195436 - HAWB : 8928435563 - ABC
EGGHEAD, VEGAPUNK JINBEI Invoice Date: 18-03-2024
EGGHEAD, VEGAPUNK JINBEI Invoice Date: 18-03-2024
Title Invoice No. : ABCDER00195436 - HAWB : 8928435563 - ABC
Title Invoice No. : ABCDER00195436 - HAWB : 8928435563 - ABC
TESTING PURPOSES ONLY PT Invoice Number: ABCDER00195436

Glad it worked for you. If you want me to check the regex you need to post the file you’re using not a screenshot.

hola me llamo claudia , y para hacer esa limpieza por que nodo has optado ?