So, I have a file format that is fixed width, but the field positions can vary based upon the first few characters of the row. I used Cell Splitter by Position to pull these characters out already, and what I’m looking to do it possibly do a lookup to a cached table (from either another file or from a database) to pull in the settings (split indices and column names) into flow variable inputs for another Cell Splitter by Position.
However, I’m somewhat stumped on how to accomplish this. I’m not getting the switch logic (i need to lookup some more tutorials there) and I’m not seeing how to set a flow variable based upon a column value… table column to row gives tons of variables and table row to column doesn’t look right either.
Hi @jtarbox Welcome to KNIME.
For this question I think using a String Manipulation node (extract substrings by position) is a better idea. See this workflow: split_by_position.knwf (13.7 KB)
Well, that is presuming the variable length is stored as a column, but it’s not. What is stored it a type field and then I need to extract many other fields based on the pre-determined values for the split positions of each field based upon the type. It’s a pain, lol.
At present, I’m thinking about working around this via filtering the data by each type and having a cell splitter by position setup for each and hopefully merge the output rows back together.
Do you have an example of (dummy) input and expected output? It may be helpful to find a solution.
So, the format is convoluted, but it’s a pricey standard so what do you expect. Basically, there is a 6 character identifier leading the line, which defines the many variable fields (between 20-50 fields) following it on that line. Each unique 6 character identifier has a different field layout. And while I can use a row filter and (cell splitter by position or string manipulation) to handle each identifier and layout, that’s a lot of work setting up node sets for each identifier.
If I can take the identifier, lookup/join to get the proper ‘split positions’ and ‘column names’ parameters via a cached table from a database, set those values into flow variables to input into cell splitter by position, followed by some normalization functionality (missing/default columns), I would think it would be a bit cleaner.
Or can a flow variable not have different values per row it’s processing?
Dummy example, brackets added as example of variable field lengths based upon identifier… as in the same identifier has all the same field lengths.
123456|7161234567| Jon Doe| Software Engine|
654321| 123 Makers Dr| Buffalo|NY| 14564|
I’m not concerned about the result of different columns because in my usage case i’m only pulling certain identifiers and they should mostly all have the same core fields I need, just in different locations.
can you format your sample data to give a better understanding about the expected results? I read and re-read the comments but do not get the slightest bit what you seek for
I.e. what is meant with “identifier” and “variable fields (20-50)”. From what I understand you’ve got data row which have variating data (sometimes values are not present). The unstructured data you’d like to structure (put correct values in corresponding columns). All somehow based on the identifier.
In other words, depending on the identifier at the beginning of the row, different data in a different format is present. Correct?
If that’s the case, you’ve got to:
- Identify / Set some rules for each identifier which can be used to process the data
- Loop over each row applying the rule
The aforementioned is kind of abstract, on purpose.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.