Split Cell at the furthest right character - "["

solute_ok · February 14, 2020, 3:38pm

Hello,
i got the following problem.
The table i am trying to split contains a list of all our products (1 in each row). The product-string varies in lenght, depending on how deep the product lies in the taxonomy.
I want to extract the product ID, which is in all cases the last number in the string.

Example of 2 rows:
Kategorien/Kategorie/Haushalt [100162]/Bodenpflege [100171]/Staubsauger [3598]/Dyson Cyclone V10 Absolute nickel/kupfer [1317151648]/Übersicht

Kategorien/Kategorie/Unterhaltungselektronik [100000]/Heimkino & Video [100004]/Heimkino [100006]/Surround-Systeme [103961]/Bose Soundbar 700 + Bass Module 700 schwarz [1454353543]/Übersicht

So i tried to split the column at “[” on the furthest right with the Node “Regex Split”. As i’m not good with regex, i didn’t even come close to a solution…
I would be greatful for any ideas how to solve this.

-Oliver

izaychik63 · February 14, 2020, 3:49pm

@solute_ok, you can use String Manipulation node with combination of functions function indexOf(, , “b”) and substr() or Column Expression node with the same functions.

solute_ok · February 14, 2020, 4:10pm

Worked like a charm! Thank you very much

For follow-up:
I used the Column Expression Node and this:
substr(column(“Col0”), indexOf(column(“Col0”), “[”,“b” ) )

HansS · February 14, 2020, 4:13pm

Hi @solute_ok

or do a String Manipulation node wirh
substr($column1$,lastIndexOfChar($column1$,’[’ )+1 ,(lastIndexOfChar($column1$,’]’ ) - lastIndexOfChar($column1$,’[’ ) -1 ) )

gr. Hans

qqilihq · February 14, 2020, 4:16pm

And here’s one with the Regex Extractor from Palladian

Regex:

.*                  # arbitrary characters
\[                  # opening square bracket
(?<productId>[\d]+) # the product ID
\]                  # closing square bracked
.[^\[]+             # arbitrary characters without [

Workflow:

armingrudd · February 14, 2020, 4:27pm

Hi @solute_ok and welcome to the KNIME forum,

In addition to the solution already provided by @izaychik63, you can use this regex in the Regex Extractor node from Palladian 2 or the regexReplace() function in the String Manipulation node:

Regex Extractor:
\d+(?!.*\d+)
or to be more precise:
(?<=\[)\d+(?=\])(?!.*\[\d+\])

String Manipulation:
regexReplace($column1$, ".*\\[(\\d+)\\].*", "$1")

P.S. It seems you have received so many solutions already…!

solute_ok · February 14, 2020, 4:50pm

Thanks for all your replies!
I didn´t think i get an solution this quick

armingrudd · February 14, 2020, 4:52pm

Right! This is the most active forum I have ever seen.

system · February 21, 2020, 4:52pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.