Regex Split

Hi everybody. I have to split a column that contains strings but haven’t been able to solve it. So I need to separate the string by a specific set of words. I’ll add a table to make more visual.

INPUT DESIRED OUTPUT 1 DESIRED OUTPUT 2
2024 Campaign 1 BHV Shoppable Display 2024 Campaign 1 BHV Shoppable Display
Campaign 1 2024_Feb KW Shoppable Display Campaign 1 2024_Feb KW Shoppable Display
Program 1 Feruary_2024 Banner Display Program 1 Feruary_2024 Banner Display

so i was thinking I could split by these specific words such as “KW Shoppable Display” , “Banner Display” etc… since they are always the same words. However I dont have any specific numer of caracthers to split by, or any delimiter. Only I know its they are always the same words.

Is there any way I can split by these specific strings with regex? I tried cell splitter, string manipulation, and regex split but haven’t figured it out.

I’d appreciate any help on this.
Thank you

Hi,
You can use a Regex like this:

(.*) (BHV Shoppable Display|KW Shoppable Display|Banner Display)

There are 2 groups in this. The first is (.*) and captures the beginning, up to the point where one of your predefined strings start. The second group captures any occurrence of your predefined strings. They are separated by an OR, which is represented by the pipe character |. If you want to add more strings, just append them inside the second parentheses, always separated by |.
Kind regards,
Alexander

2 Likes

Hi @gretelruiz23,

The solution provided by @AlexanderFillbrunn looks to be perfect. I cannot comment on that because I am also still learning Regex. However, if I have to achieve this, I may prefer to use Rule Engine with the LIKE function.

$INPUT$ LIKE "*BHV Shoppable Display" => "BHV Shoppable Display"
$INPUT$ LIKE "*KW Shoppable Display" => "KW Shoppable Display"

One line for one check.

Best,

2 Likes

Wait a minute @gretelruiz23! I think you want separation of the two fields. However, with Rule Engine you get only the second half. You still need to work for the first half. The below workflow may help you.

Edited: I just realized that you don’t need table row to variable loop start node at all. It is of no use. So, you can drop that node completely.

Best,

1 Like

Thank you all,

the (.*) (BHV Shoppable Display|KW Shoppable Display|Banner Display) Regex Split, for some reason didn’t work because I had for example, “Banner Display”, “_Banner Display”, "*Banner Display) and it seems that when Regex found “Banner Display” it wouldn’t take into acocunt the other versions of the string.

So I went with the large process with the rule engine and String manipulation and it worked.

I appreciate the help as Im new in Knime :slight_smile:

1 Like

Happy to know that it worked for you. Thank you for marking the response as solution.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.