Processing fixed length file

aneesshahzad · May 5, 2021, 6:53pm

Hello,
I have a legacy dat fixed length file that has data field of various length(s) at position based on the fixed length label at the start of the line.

I have split the data into two columns in “Header” and “Data” column. I have filter the Header data and apply “Cell Splitter By Position” node to process the data. This works great but I have to process each set of records separately. I am trying to find a way to dynamically pass the “Cell Splitter By Position” split indices and column name with variable based on Header label but it is not working and I have tried various ways.

Just curious if anybody in the forum has similar situation and has a better solution for these type of files.

Thanks

elsamuel · May 5, 2021, 7:02pm

Hi @aneesshahzad, welcome to the forum.

It would be helpful if you provided some example data, as well as your desired output. If you shared with us some of what you’ve tried, or even a workflow, this could save us a lot of time as well.

aneesshahzad · May 5, 2021, 7:25pm

Thanks @elsamuel. I have attached document showing what I am doing.Cell split.docx (45.1 KB)

elsamuel · May 6, 2021, 2:44pm

That document doesn’t really help. It just repeats what you said in your original post.

I am trying to find a way to dynamically pass the “Cell Splitter By Position” split indices and column name with variable based on Header label but it is not working and I have tried various ways.

It would be best if you provided a data file that exemplified your specific context. It’d also be helpful if you told us what you tried. What is the logic behind the variable creation? How are are the header labels and split indices and column names related? Give us something to work with.

aneesshahzad · May 6, 2021, 3:28pm

Hello @elsamuel,

Thanks for looking into it. I appreciate that. I don’t have a working solution but the idea was to start a loop on set of header values that is passed through a variable and based on that value of variable split indices and column labels would be determined.

Now, is it a best way to parse a fixed length file where you have a varying length of different records but could be identified from starting position of the text or there could be better way of doing that. I am new to Knime and thought maybe someone else has exposure to processing of old “DAT” files.
Example data:
REC1 123 ABC XYX
REC2 ABC 1213 ZZYY
REC3 A abcd zzzdD

I know what sort of data is at what position based on the starting text like REC1, REC2, REC3, etc.

Thanks,
Anees

elsamuel · May 6, 2021, 4:10pm

Here’s a workflow based on the example data you provided:

aneesshahzad · May 7, 2021, 5:03pm

Hi @elsamuel,

Thanks for the solution. It works but the problem is that I have over 3 million lines to process and I try to execute the workflow but it was running for more than 3 hours and it was still running (the loop) and I have to stop the workflow.

Any suggestion on this.

elsamuel · May 7, 2021, 6:30pm

Try using a group loop start instead of a chunk loop start. Group by the header column.

I tested this with about 1000 rows and the difference in speed was remarkable.

system · November 6, 2021, 6:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.