Hi,
In order to optimize the execution time of the Joiner Node, how should I set the 'Maximum number of open files' parameter in the Performin Tuning section of the node? What is the relation between this number and the joining files?
Nik
Hi,
In order to optimize the execution time of the Joiner Node, how should I set the 'Maximum number of open files' parameter in the Performin Tuning section of the node? What is the relation between this number and the joining files?
Nik
According to the node description:
Maximum number of open files: The maximum number of opened temporary files. Increase it for better performance.
I assume in this case performance = speed but will cost more in memory.
I don't think this value is effective for the execution time. For me 200 temporary files (default) are equal to 1e1000. I have seen that a join with a large amount of data has low performance, it is a series of merging tables. I think there is a good idea to split the data in input to avoid an overflow of memory or the message "this may cause an endless loop". Another solution is to use the node Parallel Chunk Loop but is available only on the last versions of knime.
ok, so i must assume that it does not make sense increase the number over 200 and my possible range is 1-200 ?
Hi Nik,
well, it is called "maximum number of open files", so I guess that the node will actually determine the used number of files in the range 1-x and you give it more freedom by increasing x. I would not expect the performance to grow significantly for values beyond 200, but I would expect it to drop for very small numbers.
The parameter could be there just to avoid issues with "too many open files" in some setups.
If I remember correctly, then the Joiner node had more options for "performance tuning" in old versions of KNIME and this parameter is one of the few ones left.
Nils
Hi Weskamp
very clear explatation
I have understood
Thanks a lot
Nik