Need help on a column splitting, possibly regex operation

pt2501 · August 24, 2022, 4:23am

Hi KNIME community,

I am working with column A, and wish to split into columns B and C below.

2 logics are at play here. Could someone help me how do I realize these with KNIME?

If A contained brackets “[” or “]”, I want strings within the bracket move to C.
If A contained 2 underscores “_”, I want strings following the 2nd underscore move to C.

I am guessing one would have to use regex operation and I am looking into it, but it’s a completely new area for me and I’d appreciate some assistance.

Thank you in advance!

bruno29a · August 24, 2022, 4:38am

Hi @pt2501 and welcome to the Knime Community.

You can, but do not not have to.

Regarding the rules, you mentioned for the first one that If A contained brackets “[” or “]”, does this mean that either or of the 2 brackets, or should it be both brackets? And both brackets in that order right? For example, what if you have MS3]1[? What should happen? And what if you have [MS3]? Would column B be empty?

And for the second rule, is it possible to have more than 2 underscores? or 1 underscore too?

And can we get a mix of both rules as data? For example MS4_1_[1] ?

And please share some data that we can work with so that we do not need to spend time typing in sample data and we can instead spend time on building the solution for you. Please help us help you

pt2501 · August 24, 2022, 5:07am

Hi @bruno29a

Thank you so much for your swift reply!
In answer to your question.

The dataset will always report both brackets in open-closed form of “MS3[1]” not like “MS3]1[”, so the 1st rule can kick in if either “[” or “]” appeared in column A.
If “[MS3]” shows up, then column B should have “MS3” according to the 1st rule.
The data set could have 1 underscore or 2 underscores.
I don’t expect any mixture of brackets and underscores like “MS4_1_[1]” in column A.

Please refer to the following excel sheet for sample data. Column A contains values other than “MS~”, so simple splitting using positions won’t work.
20220824.xlsx (11.0 KB)

bruno29a · August 24, 2022, 8:25pm

Hi @pt2501 thanks for the extra info and for the sample file.

I put something quickly together that did not use Regex. This can also be done via Column Expressions which would involve some coding, but I wanted to do this with just pure Knime nodes.

I used the data from your Excel file, but I also added these 2 cases at the end:

Results:

Here’s how it was done:

Here’s the workflow: Column splitting based on value.knwf (29.1 KB)

pt2501 · August 25, 2022, 2:28am

Hi @bruno29a,

Thank you so much! I have looked at the flow and I totally understand what’s going on.
I really appreciate for using pure KNIME nodes. It’s a lot easier to understand each transformation steps.

I will try to compress this to Column Expression node via some coding.

Again, thanks a lot!

bruno29a · August 25, 2022, 3:05am

Hi @pt2501 , no problem, happy to help.

Yes, one of the reasons why I did it this way was because I was not sure how comfortable you were with the coding part. There’s a lot, if not all, that can be done via the Column Expressions.

I kinda “cheated” a bit in this workflow, in that everything is processed by the Node 9, but the cheat is happening between Node 5 and 7 where I’m transforming the data in the same format as the top part of the split (Node 4 is just a copy of the data so that I can concatenate into a new column without affecting the original column, which I wanted to keep to reflect what was in your original screenshot).

The approach might be different in Column Expressions, but the same overall logic can be applied.

system · September 1, 2022, 3:05am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.