How does KNIME write to disk?

During workflow run like row filter and when saving workflow? is it sequential read/writes? I ask because I wonder if one would benefit from pcie4 ssds?

Would be a cool blog post, some heavy workflow and then compare different hardware solutions.

1 Like

Hi there,

Short answer: Yes, data is mostly written / read sequentially, but you should not always notice and there are exceptions.

Long answer: During workflow execution, KNIME Analytics Platform 4.0.2 writes data tables to disk sequentially (row by row), yet asynchronously (i.e., as long as memory is sufficient, downstream nodes won’t have to wait for data to be written to disk). When saving a workflow, KNIME waits until all tables have been written to disk before the save operation concludes successfully. When reading tables from disk, data is also read sequentially.

Two more things:

  1. Some nodes that only rearrange / add / remove columns (e.g. Column Filter, Column Appender) do not actually write their full output table to disk, since at least parts of the data are already there albeit rearranged.
  2. If you install the extension “KNIME Column Storage (based on Apache Parquet)” and enable it in Preferences -> KNIME -> Data Storage, KNIME Analytics Platform will write data tables to disk using the columnar storage format Apache Parquet. Data will then still be written to disk sequentially, but does not have to be read fully from disk, i.e., nodes can choose to read only subsets of rows / columns.

Marc

2 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.