I have relatively simple question that I’ve been banging my head on for a while.
I would like to use the Regex Split node to split text like the following:
gH2AX_694;53BP1_159
Ideally, I would like to have search and capture expression that searches for “gH2Axxxx” and “53BP1xxxx” splitting them into two columns.
I know how to do this with cell splitter but I would really like to understand how regex works in Knime. I’ve tried regex101 and the solutions I get there don’t seem to work in Knime.
Does anyone have a suggestion on the best way to go about this?
this is a perfect case for the brand new Regex Extractor node in Palladian 2.0 – especially if you’re used to more intuitive tools such as Regex101 you’ll feel right at home. See here for the announcement:
It uses the “named capture groups“ firstValue and secondValue which give the name of the output columns. Any way, when editing the expression you’ll always see a preview of the results as you’re used to from Regex101.
Any feedback welcome!
– Philipp
PS: An alternative approach could be to define a “tokenization expression”. This makes sense, if you have a variable number of items separated with a ;
(?:\w+|[^;]+)
It will basically create a match for each value between the semicolon: