What's the Cache Node for?!

So I read the description:


Caches all input data (rows) onto disk. This is particularly useful when connected to a preceding node, which performs a column transformation. For example, a filter column node, which "hides" most of the columns to the output. This node only caches the data that is actually contained in the input table. Iterating on it may be considerably faster than accessing the table of a wrapper node. 


Ok... what?  What's the advantage?  Why is this 'particularly useful when connected to a preceding node"??

I have a feeling this could be useful, but as is I have no idea what it's used for...


1 Like

I guess the last sentence of the node description is the important one. The Cache node basically does a full scan over the input table and copies the data row-by-row to a new(!) data table at the output. To understand the advantages of the node, we'd need to recap the KNIME's smart data caching. A node which derives a new column or does column filtering or whatever only stores the new data and refers always back to the original data tables. This might slow down the data access since data can be in different branches and depth of the workflow. The Cache node takes all this information together and builds one full data table from scratch.

1 Like