When you say you need the node to “discover the pattern” do you mean you have no idea what the pattern might be and you need it to inspect all the data and just find one, and it could literally be given data in any format?
What is the basis by which it might “discover” a pattern? Would it always split on letter/number boundaries?
Why for example might the pattern not equally be…
Col1, Col2
A12, CD12
A13, CA12
and so on?
My approach to a task like this is to ask how I might achieve the task if I were performing it manually and then try to work on how to automate that same process. It feels at the moment like either I’m not aware of the “rules” or else there is a lot that is subjective here.
It is an interesting challenge but I think more discussion is needed before anybody would be able to embark on a generalised solution.
First, I would check the conformity and convert strings into a general format using the String Manipulation node like upper case letters to “A”, lower case letters “a” and digits to “9” leaving other characters as they are.
For example:
John Smith 36 Teacher → Aaaa Aaaaa 99 Aaaaaaa
Then I would go with Regex Split to create my separate columns. Regarding the previous example I can use this regex: ([A-Z][a-z]+)\s([A-Z][a-z]+)\s(\d+)\s([A-Z][a-z]+)
Output:
THNAKS! this look so tricky for me. As i wrote before I printed 5/10 row and I plit manually, with a pen, and after this i used cell splitter.
I supposed knime should do this automatically, but after read both answer to me i think is not possible.
thanks a lot for explanation ant time spent for me! this look soooooooo much smart, i’ll test soon as possible!
PS: If you are going to develope something i suggest to use a visual model, as excel do in “split by position”, where i can draw column selecting only selecting the position.
Actually, it would be very helpful if you provide me with the desired output table as well. So I know how exactly you need these values be split. It would be a help to create the general approach.
If that’s not possible, no worries. I will work on it anyways.
the workflow i sent to you is based on records before the 2019.
Since 2019 a have the index of that records, so i know what is (and how much long is) the column 1,2,3,…80
Before the 2019 i don’t have documentation about, for this reason i created this thread and for this reason i cannot give to you the desired output table
I could find a way to split your strings: 44032.knwf (102.7 KB)
The idea is to split strings character by character (I was inspired by your “split by position” idea). Then we find positions for which all rows are blank. We consider these positions as delimiters (multiple blank positions aggregate into one). And the rest is straightforward. Yet, I think there are some columns containing strings which can be split further. Take a look and let me know if this is what you were looking for.