Split Cell at the furthest right character - "["

Hello,
i got the following problem.
The table i am trying to split contains a list of all our products (1 in each row). The product-string varies in lenght, depending on how deep the product lies in the taxonomy.
I want to extract the product ID, which is in all cases the last number in the string.

Example of 2 rows:
Kategorien/Kategorie/Haushalt [100162]/Bodenpflege [100171]/Staubsauger [3598]/Dyson Cyclone V10 Absolute nickel/kupfer [1317151648]/Übersicht

Kategorien/Kategorie/Unterhaltungselektronik [100000]/Heimkino & Video [100004]/Heimkino [100006]/Surround-Systeme [103961]/Bose Soundbar 700 + Bass Module 700 schwarz [1454353543]/Übersicht

So i tried to split the column at “[” on the furthest right with the Node “Regex Split”. As i’m not good with regex, i didn’t even come close to a solution…
I would be greatful for any ideas how to solve this.

-Oliver

@solute_ok, you can use String Manipulation node with combination of functions function indexOf(, , “b”) and substr() or Column Expression node with the same functions.

5 Likes

Worked like a charm! Thank you very much :slightly_smiling_face:

For follow-up:
I used the Column Expression Node and this:
substr(column(“Col0”), indexOf(column(“Col0”), “[”,“b” ) )

2 Likes

Hi @solute_ok

or do a String Manipulation node wirh
substr($column1$,lastIndexOfChar($column1$,’[’ )+1 ,(lastIndexOfChar($column1$,’]’ ) - lastIndexOfChar($column1$,’[’ ) -1 ) )

gr. Hans

3 Likes

And here’s one with the Regex Extractor from Palladian :slight_smile:

Regex:

.*                  # arbitrary characters
\[                  # opening square bracket
(?<productId>[\d]+) # the product ID
\]                  # closing square bracked
.[^\[]+             # arbitrary characters without [

Workflow:

5 Likes

Hi @solute_ok and welcome to the KNIME forum,

In addition to the solution already provided by @izaychik63, you can use this regex in the Regex Extractor node from Palladian 2 or the regexReplace() function in the String Manipulation node:

Regex Extractor:
\d+(?!.*\d+)
or to be more precise:
(?<=\[)\d+(?=\])(?!.*\[\d+\])

String Manipulation:
regexReplace($column1$, ".*\\[(\\d+)\\].*", "$1")

:blush:

P.S. It seems you have received so many solutions already…!

5 Likes

Thanks for all your replies!
I didn´t think i get an solution this quick :slight_smile:

4 Likes

Right! This is the most active forum I have ever seen. :heart_eyes:

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.