My current problem is that each row is of different length, meaning I can’t just use a cell splitter using space as a delimiter or else it would turn out like this.
Like “Home” and “Office” shouldn’t be split or like especially with city names where each of them consists of multiple words like “New York City” consisting of 3 substrings and on the other side we have cities like “Seattle” only consisting of 1 substring.
Is it possible to specifically target each substring from each row individually like creating a substring for example with cities having both “New York City” and “Seattle” in the same column?
have you found a solution in the meantime?
You are unfortunately facing a common but non-trivial problem in my opinion.
Two possible solutions that you can explore:
Dive into the word of regular expressions and use a Regex Split node or
Since you can discern the domain of the data you are dealing with, a more basic solution might be creating a dictionary of the problematic words (e.g. New York City, United States etc…) and replacing them with their “hyphen” version (New-York-City, United-States…) using, for example, a String Replacer (Dictionary) node. After that, the Cell Splitter node will do the job correctly.
Let me know if this helps or if you found a more creative solution!
I unfortunately haven’t found a solution yet but your
suggestion with replacing problematic words with a hyphen version was exactly what I was looking for.
Thanks a lot you are a lifesaver.
I recommend approaching this in a multi-step approach. Looking at your specific table structure, I would focus on the fields that appear to be uniform and based on a restricted list. Identifying all of potential list of values for the following fields (Segment, Country, State, Region) and pinpointing them via Regex should allow you to break this up into columns cleanly without a complex paragraph long single Regex expression. Splitting before and after those possible values should do the bulk of the cleanup. After that, you should be able to split between Sales and Quantity columns using the space character.
Unless you get really unlucky and one of the customers happens to live in a town named “Home Office, Texas”.