I need to extract a string when it is equal to a combination of a given character's type

Hello,

I would like to extract from a column which contains a description, a string only when it is equal to a given combination of character’s type. For example, i want to extract from: “dsjbjdieEUUBFB 284847 8328HF83 BK540GG N.1000030” a string when corresponding to the following combination “LETTER & LETTER & NUMBER & NUMBER & NUMBER & LETTER & LETTER”, that is “BK540GG”.

I have no idea how to do this.

Thank you in advance if you can help!

Hi @mpoppi and welcome to the Knime Community.

You need to use some Regex to do this, and the Regex to match your rule would be this:
[a-zA-Z]{2}[0-9]{3}[a-zA-Z]{2}

I’m not sure how to extract the value via Knime nodes. The Palladian extension has a Regex Extractor node, which can do this:
image

Configuration:

Result:
image

EDIT: You can get the Regex Extractor from here:

4 Likes

Thank you Bruno :slightly_smiling_face:

I’m trying with the “Regex Split” and it doesn’t work. The message displayed is the following: “Input strings did not match the pattern or contained more groups than expected”.

I’m going to try with the node you were sharing. I’ll let you know.

Thank you anyway :slight_smile:

Edit: my source is a Excel Reader node. Maybe this could be a problem (?)

@mpoppi it should be possible with the String Manipulation — NodePit
I don’t have a KNIME environment available right now so I cannot add a screenprint for you, so this is from memory.

Use as function
regexReplace($yourColumn$, "(.*)([a-zA-Z]{2}[0-9]{3}[a-zA-Z]{2})(.*)", "$2")

You’ll recognise as second group the regex given by @bruno29a. A group in regex is anything between these brackets ( ). Whatever is matched as a group can be used again in the replacement by $1 for the first group, $2 for the second and so on.
In the above formula the first and third groups are simply skipped in the replaced argument, leaving the second part you are looking for.

3 Likes

Hi @JanDuo , it’s exactly what I was looking for. I’m not an expert with Regex. In fact, I’ve only started writing regex on my own a couple of months ago, so I’m not too familiar with what $1, $2, etc are, but you pretty much explain what they are. It’ll take some time for me to get used to how to use them.

I initially tried to use the negate sign (^), thinking I could remove anything NOT matching, but that did not work.

Thank you for sharing.

1 Like

A bit offtopic, but you can use the same regex replace logic when you want to rename columns using regex (Column Rename (Regex) — NodePit).

1 Like

Thank you all guys!

I’ve tried the solution suggested by @bruno29a and it worked pretty fine, while the “Regex Split” node still doesn’t work and don’t know why.

So thank you so much Bruno, you really helped me :slight_smile:

1 Like

Does it mean it’s solved or not? If so you could mark the solution for others.
br

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.