I need to extract a string when it is equal to a combination of a given character's type

mpoppi · February 23, 2022, 4:19pm

Hello,

I would like to extract from a column which contains a description, a string only when it is equal to a given combination of character’s type. For example, i want to extract from: “dsjbjdieEUUBFB 284847 8328HF83 BK540GG N.1000030” a string when corresponding to the following combination “LETTER & LETTER & NUMBER & NUMBER & NUMBER & LETTER & LETTER”, that is “BK540GG”.

I have no idea how to do this.

Thank you in advance if you can help!

bruno29a · February 23, 2022, 4:59pm

Hi @mpoppi and welcome to the Knime Community.

You need to use some Regex to do this, and the Regex to match your rule would be this:
[a-zA-Z]{2}[0-9]{3}[a-zA-Z]{2}

I’m not sure how to extract the value via Knime nodes. The Palladian extension has a Regex Extractor node, which can do this:

Configuration:

Result:

EDIT: You can get the Regex Extractor from here:

mpoppi · February 23, 2022, 6:35pm

Thank you Bruno

I’m trying with the “Regex Split” and it doesn’t work. The message displayed is the following: “Input strings did not match the pattern or contained more groups than expected”.

I’m going to try with the node you were sharing. I’ll let you know.

Thank you anyway

Edit: my source is a Excel Reader node. Maybe this could be a problem (?)

JanDuo · February 23, 2022, 6:50pm

@mpoppi it should be possible with the String Manipulation — NodePit
I don’t have a KNIME environment available right now so I cannot add a screenprint for you, so this is from memory.

Use as function
regexReplace($yourColumn$, "(.*)([a-zA-Z]{2}[0-9]{3}[a-zA-Z]{2})(.*)", "$2")

You’ll recognise as second group the regex given by @bruno29a. A group in regex is anything between these brackets ( ). Whatever is matched as a group can be used again in the replacement by $1 for the first group, $2 for the second and so on.
In the above formula the first and third groups are simply skipped in the replaced argument, leaving the second part you are looking for.

bruno29a · February 23, 2022, 10:57pm

Hi @JanDuo , it’s exactly what I was looking for. I’m not an expert with Regex. In fact, I’ve only started writing regex on my own a couple of months ago, so I’m not too familiar with what $1, $2, etc are, but you pretty much explain what they are. It’ll take some time for me to get used to how to use them.

I initially tried to use the negate sign (^), thinking I could remove anything NOT matching, but that did not work.

Thank you for sharing.

JanDuo · February 24, 2022, 6:48am

A bit offtopic, but you can use the same regex replace logic when you want to rename columns using regex (Column Rename (Regex) — NodePit).

mpoppi · February 24, 2022, 9:29am

Thank you all guys!

I’ve tried the solution suggested by @bruno29a and it worked pretty fine, while the “Regex Split” node still doesn’t work and don’t know why.

So thank you so much Bruno, you really helped me

Daniel_Weikert · February 24, 2022, 4:47pm

Does it mean it’s solved or not? If so you could mark the solution for others.
br

system · March 3, 2022, 4:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.