Regex between first and second "-"

Hello all,
I have column “Data” from which I need only data after first “-”, but before second one, please see table. So my aim is to have what you see in the column “Result”
image

For this I used regex:

Guys, how I can do same logic in knime? I know that it should be String Manipulation node, but I have no clue what to write there.

Thank you all in advance!

Hello @IMR2KA
You can test the following code in a String Manipulation node:

regexReplace($text$, "(\\S+?[|-](.*?)[|-][|\\S]+|\\S+?[|-](.*))", ""$2$3"")

The regex101 code…
(\S+?[|-](.*?)[|-][|\S]+|\S+?[|-](.*))

BR

1 Like

Hi,
I think this can be simplified a little bit. You could use:

regexReplace($column1$, "^[^-]+-([^-]+)-.*", "$1")

[^-]+ means “one or more times everything but the dash character” and so you have everything but a dash character one or more times, followed by a dash, followed again by a string of everything but a dash (captured in a group this time), followed again by a dash and then the rest, which we do not care about (.*).
Kind regards,
Alexander

3 Likes

Hello @AlexanderFillbrunn
Your provided code is not working for the following row case:

qwe-tru

BR

1 Like

Hi,
You are right! It needs to be adapted like this I think:

^[^-]+-([^-]+)[-$].*
1 Like

Hello,
Sill not. You can modify by adding a pipe before the ‘after-group’ aiming to make it work.

^[^-]+-([^-]+)|[-$].*

However it still fails vs latest Row3 ‘qweq’ as it should return a blank().

Warning! I think my first code fails at latest row as well :sweat_smile:

It would be easier a Cell Splitter with delimiter == ‘-’ and keeping the second Array…

This Regex is working for me:

(\S+?[|-](.*?)[|-][|\S]+|\S+?[|-](.*)|.*)

2 Likes

Hey,
Good point! Now I think I have found a one-node-solution, but it is not very pretty. You can use this expression in the String Manipulation node to handle all 4 cases:

regexReplace($Data$, "^[^-]+-([^-]+).*", "$1") == $Data$ ? "" : regexReplace($Data$, "^[^-]+-([^-]+).*", "$1")

Using the Java ternary operator was inspired by this post. But of course it is a bit ugly that we need the regexReplace twice.
Alexander

5 Likes

lol… @AlexanderFillbrunn , sometimes there’s low-code, sometimes there’s no-code, and sometimes there’s no-no-no! code :wink:

5 Likes

Hi guys… just use it to get the only second element:

regexReplace($column1$, “(.)-(.)(-.*)?”, “$2”)

I know that I have the first element until the first “-”, then you have the second with the same condition, BUT you can have or not something after that. the “?” was set to this propous. I delete the first and IF i’ve something as the third value, and get the second value only.

If you works with numbers, can use “\d+” OR “[0-9]+”
If you works with numbers, can use “\w+” OR “[a-zA-Z]+”

Use the “-” as separator between them and “()” to save as variables.

Thats It…

Tks,

Denis

1 Like

It doesn’t seem to work properly @denisfi

I still think that the best approach for this challenge is the ‘Occam’s razor’ one -the simplest-:

BR

1 Like

Hi,
The Cell Splitter approach could cause problems when none of the examples have a dash in them. Then the cell splitter will not create the necessary column and the workflow will fail, so you will need to check if the column exists with a Table Validator and handle the case.
Alexander

2 Likes

From my previous post, this editor erase the “*” information. It was:

image

Sorry, I didn’t see after save it, but as the example, you can create a better syntax for it if you now the rules (letters, numbers, symbols…)

Thanks again,

Denis

2 Likes

Thanks for the clarification @denisfi
Adding the ‘*’ it doesn’t seem to work either. As it only works for the first and second use cases.

Besides, you can make script style in your posts by using the dedicated function in the editor:

It would improve the readability, and the posts look tidy and clear.

BR

2 Likes

I’d use the Regex Extractor node with a positive lookbehind: (?<=^[^-]+-)[^-]+

image

Basically, it first looks at the beginning of the line and finds a string of letters that doesn’t contain a dash, but ends with a dash. Then it returns the subsequent string of letters that doesn’t contain a dash.

2 Likes

Nice challenge folks, let me have a try, too.
Pattern: [^-]*-?([^-]*).*
Replacement: $2

Does it work as you expect?

3 Likes

Thank you very much guys for your help!
I tried the smallest regex provided by @nan [^-]-?([^-]).* and it works.

Thank you one more time and have a good one!

Best,
Ram

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.