Feature suggestion: Table Reader Chunk Loop Start

Dear Knimers,

One use case of the “Chunk Loop Start” is to process a large table without storing the intermediate results in their entirety (another way to achieve this would be streaming, but this is not always possible).

I suggest a “Table Reader Chunk Loop Start” node that would read chunks from a table in a file on disk instead from an input port; this way one could avoid loading a giant table into Knime in its entirety by loading and processing it in chunks.

Similarly, a “Table Writer Loop End” node could assemble a large table on disk without storing the table in its entirety in the workflow.

Best
Aswin

2 Likes

Hi Aswin,

I think this is a really interesting idea! However, I’m not quite sure about the implications (e.g. will we need a loop version of every local reader and writer?), so maybe we can narrow down the use-cases: what cases are not covered by streaming? Streaming does itself come with a chunk size.

The large local files could maybe also be solved in a different way (cutting them down beforehand or importing them into a DB), but we can leave that aside for the time being.

Kind regards
Marvin

1 Like

Dear Marvin,

I think it would make most sense for the normal Table Reader/Writer, and perhaps the Line Reader. I don’t think it would be useful in most other cases like for example the PMML and Image reader/writer.

(Now that I think about it, it would be cool to have an Image Writer Loop End that writes individual frames of an animation, but that is a whole different story).

And I might be wrong, but it looks as if the normal Table Writer that Knime has now cannot be streamed, so the output port of the node before it will always have the full table. A Table Writer Loop End would enable the user to cut up this table into smaller pieces.

Merry christmas!
Aswin

1 Like