Large temp files - cross join in recursive loop

I am struggling with a workflow creating large temp files which max out my space and stop the workflow.

I am running a recursive loop on name records read from a database. The loop consists of a cross join. Takes the first record and cross joins the rest of the table checking for matches.

The matches are determined via a rule engine which have 4 different result options. Then row splits between match and no match.

The no match column names are cleaned up and then sent back to the beginning of the loop to be cross joined with the next row.

This loop appears to be creating the large temp files when I step through the process and check the file size throughout.

About 10k loops resulted in about a 5GB temp file. I believe it run through about 40-50k when it maxed out my storage at about 200GB. If I reset the recursive loop start the files are gone.

Ideally I want to be able to run this over 360k rows. What is the cause of these large files and can I get around it?

All suggestions appreciated! Thanks!

Hallo @ksmith91,

Welcome to the KNIME Forum. :slight_smile:
I am not sure if I completely understood your workflow, but I think you don’t need a Cross Joiner. I was wondering if the String Matcher might solve your problem, so you could get rid of the Recursive Loop and the Cross Joiner. Here is small workflow snippet demonstrating the usage of the String Matcher node to do a Fuzzy Address Matching. I think this would optimise your workflow and decrease your temp file. Please let me know if this helps.
In case, I misunderstood your problem and the String Matcher didn’t help, could you maybe share a minimal example of your workflow. Then, I could have a closer look and maybe provide a better solution.

Kind regards,
Janina

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.