How to solve this Joiner error in loop operation?

qianyi · June 4, 2019, 6:59am

Hello, there

I have two file groups, both are including several hundred CSV files.
Now, my task is simply to merge two files from each group, and then export the result to new files.

The solution would be to use a Joiner node to merge with a key column in a dual loop.
But I don’t know how to create and control a dual loop to do this operation.

As a test I simply created two loop (not dual) to read files from each group, and then try the Joiner.
This operation receives the following error.

WARN Joiner 3:41 Unable to merge flow object stacks: Conflicting FlowObjects: <Loop Context (Head 3:40, Tail unassigned)> - iteration 0 vs. <Loop Context (Head 3:34, Tail unassigned)> - iteration 0 (loops/scopes not properly nested?)

The test workflow is like this:
WS000000

Please give me a tip of how to tackle it. Thanks!

morpheus · June 4, 2019, 7:27am

Hi there,
if I’m right you have to extend your workflow with a loop end node between each csv reader node and the joiner node.

This can solve your problem.

Br
Hermann

qianyi · June 4, 2019, 8:45am

Hi @morpheus,

Thank you for your response.

You are right, the error should be solved if I end the loop before Joiner node.
But I have anther thing to worry about.

Each group contains csv files larger than over 4GB. The total file size is very huge.
If I end the loop right after CSV reader, I think all files will be operated on memory here.
Is that going to be operated properly?
I’m not sure about that, so currently I’m considering finding a way to do the merge in loop, operated file by file.

morpheus · June 4, 2019, 9:04am

Hi @qianyi,

it depends on what you want to join. Do you want to join the content of the files or only files with the same filename independent on their content?

qianyi · June 4, 2019, 9:36am

Hi @morpheus,

I need to join the content of the files.

The target is to merge the columns from each file. The merge key is a Date&Time column which has been prepared in each file. I don’t know whether there is other nodes instead of Joiner can do this or not.

Maybe to end a loop for only one CSV Reader (in a dual loop) would properly work well?

morpheus · June 4, 2019, 11:16am

Hi @qianyi,

Maybe you can solve it with a workflow configuration like the attached image.

HansS · June 4, 2019, 5:01pm

Hi @morpheus and @qianyi,

I don t know the structure of both csv files. But if the total records in both files matches and they are in the same order on your key-column then you can use a Column Appender node instead of a Joiner node. This will speed up the matching part.

qianyi · June 5, 2019, 12:27am

Hi @morpheus,

Thanks for your reply.
I have tried what you have suggested and finally failed. But I think perhaps it can be done if I could make some change to the workflow.

I will prepare for uploading my workflow later with a part of data (which has been normalized), help to understand the data and the workflow operation.

qianyi · June 5, 2019, 12:32am

Hi @HansS,

Thanks for your advice.
The total record and the key column are different, so Column Appender should not help in this task.
Please find the workflow with the data which I will upload in next reply.

qianyi · June 5, 2019, 6:02am

I realized this prolem is not only a Joiner node problem, it is a large file merge issue.
So I uploaded my workflow in a new topic here:
https://forum.knime.com/t/how-to-merge-large-files-without-memory-overflow-in-knime/15386

Please see more information and give your advice there.

Thanks again!

system · December 4, 2019, 6:02pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.