Optimizing KNIME Workflow on a Smart PC: Tips for Heavy Data Processing

Hi KNIME community!

I’ve been exploring KNIME on my smart PC for processing large datasets, and while the platform is incredibly versatile, I’ve hit some bottlenecks with performance. For example, tasks like reading multiple large Excel sheets with the Excel Reader node sometimes lead to freezes or “Java Heap Space” errors.
Here’s what I’ve tried so far:

  1. Memory Management: Enabled the Java Heap monitor in KNIME preferences and increased the heap size, but the improvement is limited.
  2. Workflow Segmentation: Added garbage collection nodes and saved intermediate tables to disk to reduce memory load.
  3. Parallel Execution: Experimented with parallel loops and node configuration, but performance gains are inconsistent.

These tweaks have been somewhat helpful, but I believe there’s more I could do to make my KNIME workflows smarter and faster. For instance:

  • Are there specific settings or configurations in KNIME to leverage a smart PC multi-threading more effectively?

  • Any best practices for data preprocessing to reduce workflow complexity?

I’d love to hear tips from others who have optimized KNIME for heavy workflows, especially using a smart PC setup. Have you found particular nodes, extensions, or configurations that significantly improve performance for tasks like data wrangling or machine learning?

@harry_1 welcome to the KNIME forum.

For Excel in the past sometimes employing Python /OpenPyxl could help:

You can also try and use Apache Arrow which is more ‘native’ to KNIME and might speed up the process.

You should leave some memory for other processes besides KNIME in general. How much RAM do you have and how large are your files.

Some further ideas about KNIME and performance are in this article:

And about the handling of very large files:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.