Manipulate data in REGEX

Hello Everyone, can someone help me the code for regex and get the below data:

FREIGHT LINKS EXPRESS ARCHIVERS SINGAPORE SG
GENERAL MOTORS OVERSEAS DIST. LLC 049315

These data can be captured and our identifier will be ‘always’ BNF= and ORG=. Can this be possible? Thank you so much!

Data:
/REF-6212195H0B53
/ENTRY-17 JUL POSTED=00:06
PYTR BOOK TRANSFER DEBIT
BBK=
BNF=FREIGHT LINKS EXPRESS ARCHIVERS SINGAPORE SG
ORG=GENERAL MOTORS OVERSEAS DIST. LLC 049315
OBI=15000460932023

CC: just incase you are not busy, @takbb :slight_smile:

1 Like

@trafalgarlaw

Is the text part of one cell or are they individual lines?

A convenient way is to use the Regex Extractor node (Regex Extractor — NodePit) and use ((?<=BNF=)[A-Z0-9 .]+(?=ORG))|((?<=ORG=)[A-Z0-9 .]+(?=OBI))

Putting it in a list gives the cleanest output. You can then split it again to have the two values in seperate columns.

6 Likes

Wow I liked it! @ArjenEX though I can’t find Regex Extractor extensions, can this be use in Regex Split node so I can try it? or the code can be used too in Column Expressions?

I can recommend to open the link I embedded in the post and follow the installation instruction which are referenced in the top right :wink:

Regex split requires a different way of writing. Column Expression can potentially be used but then you have to completely reverse the pattern because you need to regexReplace everything that is not equal to the original pattern

1 Like

Thank you @ArjenEX! Will try to install it but would you mind if you could provide the code if we will use column expressions node? But if it will take your time, no worries. :slight_smile:

It seems it is not allowing us to install the NodePit KNIME. I’ve tried it but keeps on asking enhanced access for us.

I think the node we can use is Regex Split or Column Expressions. Hope you can help :slight_smile: @ArjenEX

For CE you can then use

regexReplace(column("column1"),"(.*)(?=BNF=)|(?=ORG)(.*)","")
regexReplace(column("column1"),"(.*)(?=ORG=)|(?=OBI)(.*)","")

3 Likes

I will always try to help if I have the time (and any ideas!), but as you can see, the KNIME community forum generally has plenty of good people available, such as @ArjenEX, to assist! :slight_smile:

btw, I don’t think I have anything to add other than to note that the expressions that @ArjenEx gave for Column Expressions could also be ported to a pair of String Manipulation nodes, where the expressions would become:

regexReplace($column1$,"(.*)(?=ORG=)|(?=BNF)(.*)","")

and

regexReplace($column1$,"(.*)(?=BNF=)|(?=ORG)(.*)","")

2 Likes

Thank you so much @ArjenEX!!! Also @takbb for your inputs. KNIME Community is really awesome… I will try to learn more of this REGEX. :slight_smile:

Hello @ArjenEX, thank you so much for your help. Sorry just wondering what if the data will be as below in which ‘BNF=’ is not part of the data. Is there a way that it will return as blank if BNF= is missing or it is not possible? Appreciate your thoughs. Thanks again!!

Data:

/REF-6212187H0944075
/ENTRY-06 JUL POSTED=15:51
PYTR BOOK TRANSFER DEBIT
ORG=IRAS
OGB= DBS BANK LTD
OBI=INCOME TAX

Sure @trafalgarlaw

In that case use a RegexMatcher or contains function to first determine if the term is actually present. You can setup an if-else for this.

if (regexMatcher(column("column1"),"(.*)(BNF=)(.*)") == true) {
    regexReplace(column("column1"),"(.*)(?=BNF=)|(?=ORG)(.*)","")
} else {
    null
}
if (regexMatcher(column("column1"),"(.*)(ORG=)(.*)") == true) {
    regexReplace(column("column1"),"(.*)(?=ORG=)|(?=OBI)(.*)","")
} else {
    null
}

Tested this with 4 scenario’s : both, BNF only, ORG only or neither.

3 Likes

Thank you so much @ArjenEX!! I’ve tried it and it worked for BNF :slight_smile:

For ORG, it is getting also the ‘OGB= DBS BANK LTD’ data. Would you know what should I tweak in the code?

Data:

/REF-6212187H0944075
/ENTRY-06 JUL POSTED=15:51
PYTR BOOK TRANSFER DEBIT
ORG=IRAS
OGB= DBS BANK LTD
OBI=INCOME TAX

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.