EXAMPLE SPA (Tss-Ex Esa-Diretta Hub Emilia Romagna)1321GIURELLI STEFANO S.A.
EXAMPLE TEAM s.r.l.8060493PET EXAMPLE DI EXAMPLE EXAMPLE
SER.IN EXAMPLE INFORMATI115340812EXAMPLE
I would like to find a smart way within knime for split each single string in multiple strings like this:
You could do that with a Regex Split
(^[^0-9]+)(\d{2,})(.*)
=> it looks like your split always is the number. Here I used two or more numbers as ‘identifiers’. You would have to test a lot of examples to see if that always works, sometimes you have to add something to make special cases work. Esp. very short lines or if there is nothing in front of the number but the number still should go to the second column. I have not tested all the possibilities.
() represents a group to be split into columns
first group :
(^[^0-9]+) => anything that is not a number 0-9
second group
(\d{2,}) => a string consiting of 2 to n numbers
third group
(.*) => anything after the second group
Hello @mlauber71,
thank you for the great example and for the regex explanation.
Really much appreciated!
What if I have also stranger cases? Like for example:
EXAMPLE Spa (TS2)1568220022EXAMPLE SRL 4WD EXAMPLE SRL1568330227EXAMPLE S.R.L.
EXAMPLE & 2EXAMPLE1568330227EXAMPLE S.R.L.
In this case the first split could probably contain at least a number within the part in which we supposed that we should expect only any character that is not a number. Is there a way always with the regex syntax for tell to go over and don’t split in that point?
Maybe we should consider the case of length. For example if we have a random number within the first split we could tell it to go over if the length of the number sequence is just 1 or 2 for example. I’m pretty sure that there are better solutions than my suggestion.
Let me know what you think and thanks again for the help.
~gujo
I thought about this and came up with a solution. I changed the first group to something like this. You might have to toy around with it to see if it matches all cases. If you have more complicated US dresses it could be that there is not a single or easy RegEx solution.
first group :
(^\d{0,1}\D{1,}+\d{0,1}\D{0,}+) => at the start of the string a number of lengt zero to 1 or a non-digit with length 1-n and a number lengt 0 to 1 and a non-digit character of length 0 to n.