Regex Coding for Text to Column splitting

Dear Forum,

I wonder if somebody could help me to figure out the appropriate coding for the Regex Split node for attached data set.
What I am trying to do is simply to split the numbers, which represent accounts, and the according text in two columns.

I am a newbie in regards to regex expression coding.
If anybody has some source link for explanation material for regex coding, would be highly appreciated.
Thanks a lot!
Br,
Tobias
Table to Split
Dataset.xlsx (11.1 KB)

Hi @Toby76,

You can use the following regular expression to split your column:

^\s*(\d+)\s*(.+)$

Technically, it searches for patterns that (1) start with zero or more spaces (^\s*), (2) followed by one or more digits (\d+), (3) followed by zero or more spaces (\s*), and (4) ends by one or more digits or letters (\.+$). (2) and (4) are – due to the brackets – so called capture groups that are selected for the output. You can play around using this tool: https://regex101.com/r/nRX8iE/1

I added an example workflow to my NodePit Space that takes your data and makes use of the Regex Split and Regex Extractor to split the column. You can use either, but I can especially recommend the Regex Extractor as it gives you some visual feedback – similar to Regex101 – about the result of your regular expression. It is part of the free Palladian Nodes.

The example workflow can be found here:

Best regards,
Daniel

6 Likes

Hi Daniel,
Thanks a lot for providing me the solution, explanation and sharing a recommended node to manage this task.
Really superb!
Highly appreciate your help!
Again, a big THANK YOU!

Cheers,
Tobias

3 Likes

Hello Daniel,
I installed Node Pit. Just when trying to drag and drop the regex extractor node to my canvas I get below error. Would you mind helping me out this?

Hi Tobias,

You need to install the Palladian Nodes extension. The Regex Extractor is shipped as part of them (and not included in the NodePit for KNIME extension).

Anyhow, you can use NodePit for KNIME to install the Palladian Nodes. If you already have installed NodePit for KNIME, start KNIME, go to the NodePit View, search for to the Regex Extractor node or Palladian and click the download button in the Installation section (see screenshot).

Alternatively, you can install the Palladian Nodes the conventional way as follows: Start KNIME and go to Help → Install New Software…, add the update site

https://download.nodepit.com/palladian/4.2

to the upcoming dialog and install the Palladian for KNIME plugin from there.

Once done, the Regex Extractor will show up in the Node Repository and can be drag&drop’ed from there. Let me know, if you are running into any problem. Happy to help!

Best regards,
Daniel

3 Likes

Hi Daniel,

Again, thanks a lot! Issue solved!

Be safe!

Cheers from Japan,
Tobias

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.