Regex between first and second "-"

IMR2KA · June 1, 2023, 11:30am

Hello all,
I have column “Data” from which I need only data after first “-”, but before second one, please see table. So my aim is to have what you see in the column “Result”

For this I used regex:

Guys, how I can do same logic in knime? I know that it should be String Manipulation node, but I have no clue what to write there.

Thank you all in advance!

gonhaddock · June 1, 2023, 12:28pm

Hello @IMR2KA
You can test the following code in a String Manipulation node:

regexReplace($text$, "(\\S+?[|-](.*?)[|-][|\\S]+|\\S+?[|-](.*))", ""$2$3"")

The regex101 code…
(\S+?[|-](.*?)[|-][|\S]+|\S+?[|-](.*))

BR

AlexanderFillbrunn · June 1, 2023, 12:35pm

Hi,
I think this can be simplified a little bit. You could use:

regexReplace($column1$, "^[^-]+-([^-]+)-.*", "$1")

[^-]+ means “one or more times everything but the dash character” and so you have everything but a dash character one or more times, followed by a dash, followed again by a string of everything but a dash (captured in a group this time), followed again by a dash and then the rest, which we do not care about (.*).
Kind regards,
Alexander

gonhaddock · June 1, 2023, 12:40pm

Hello @AlexanderFillbrunn
Your provided code is not working for the following row case:

qwe-tru

BR

AlexanderFillbrunn · June 1, 2023, 12:49pm

Hi,
You are right! It needs to be adapted like this I think:

^[^-]+-([^-]+)[-$].*

gonhaddock · June 1, 2023, 12:54pm

Hello,
Sill not. You can modify by adding a pipe before the ‘after-group’ aiming to make it work.

^[^-]+-([^-]+)|[-$].*

However it still fails vs latest Row3 ‘qweq’ as it should return a blank().

Warning! I think my first code fails at latest row as well

It would be easier a Cell Splitter with delimiter == ‘-’ and keeping the second Array…

This Regex is working for me:

(\S+?[|-](.*?)[|-][|\S]+|\S+?[|-](.*)|.*)

AlexanderFillbrunn · June 2, 2023, 11:25am

Hey,
Good point! Now I think I have found a one-node-solution, but it is not very pretty. You can use this expression in the String Manipulation node to handle all 4 cases:

regexReplace($Data$, "^[^-]+-([^-]+).*", "$1") == $Data$ ? "" : regexReplace($Data$, "^[^-]+-([^-]+).*", "$1")

Using the Java ternary operator was inspired by this post. But of course it is a bit ugly that we need the regexReplace twice.
Alexander

takbb · June 2, 2023, 11:29am

lol… @AlexanderFillbrunn , sometimes there’s low-code, sometimes there’s no-code, and sometimes there’s no-no-no! code

denisfi · June 2, 2023, 11:40am

Hi guys… just use it to get the only second element:

regexReplace($column1$, “(.)-(.)(-.*)?”, “$2”)

I know that I have the first element until the first “-”, then you have the second with the same condition, BUT you can have or not something after that. the “?” was set to this propous. I delete the first and IF i’ve something as the third value, and get the second value only.

If you works with numbers, can use “\d+” OR “[0-9]+”
If you works with numbers, can use “\w+” OR “[a-zA-Z]+”

Use the “-” as separator between them and “()” to save as variables.

Thats It…

Tks,

Denis

gonhaddock · June 2, 2023, 11:55am

It doesn’t seem to work properly @denisfi …

I still think that the best approach for this challenge is the ‘Occam’s razor’ one -the simplest-:

BR

AlexanderFillbrunn · June 2, 2023, 12:00pm

Hi,
The Cell Splitter approach could cause problems when none of the examples have a dash in them. Then the cell splitter will not create the necessary column and the workflow will fail, so you will need to check if the column exists with a Table Validator and handle the case.
Alexander

denisfi · June 2, 2023, 12:02pm

From my previous post, this editor erase the “*” information. It was:

Sorry, I didn’t see after save it, but as the example, you can create a better syntax for it if you now the rules (letters, numbers, symbols…)

Thanks again,

Denis

gonhaddock · June 2, 2023, 12:14pm

Thanks for the clarification @denisfi
Adding the ‘*’ it doesn’t seem to work either. As it only works for the first and second use cases.

Besides, you can make script style in your posts by using the dedicated function in the editor:

It would improve the readability, and the posts look tidy and clear.

BR

elsamuel · June 2, 2023, 12:27pm

I’d use the Regex Extractor node with a positive lookbehind: (?<=^[^-]+-)[^-]+

Basically, it first looks at the beginning of the line and finds a string of letters that doesn’t contain a dash, but ends with a dash. Then it returns the subsequent string of letters that doesn’t contain a dash.

nan · June 2, 2023, 12:45pm

Nice challenge folks, let me have a try, too.
Pattern: [^-]*-?([^-]*).*
Replacement: $2

Does it work as you expect?

IMR2KA · June 13, 2023, 12:54pm

Thank you very much guys for your help!
I tried the smallest regex provided by @nan [^-]-?([^-]).* and it works.

Thank you one more time and have a good one!

Best,
Ram

system · June 20, 2023, 12:55pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.