Streaming with CSV Loading

I used to load CSV by CSV Reader. Unfortunately it is not using streaming but File Reader does. I set File Reader wrapped it to the wrapped meta node and ran. The loading time was the same 10 min for about 11 millions records. The Actian CSV load was 2 times faster.
Could you please explain why streaming did not help? Could loading speed be increased?

Hi Izaychik63,

Streaming File Reader node by itself cannot make the execution faster because all data is being loaded to the platform. However, if you connect Row Filter node to File Reader node, this will speed up the execution because then only the data passing the filtering criteria is loaded.

Hope it helps.

Best,
Daria

1 Like

Thank you, Daria, for your advice. I added filter. It filters about 76K records from about 10 millions original one. As a result, I saved about 1 min from 10 min of data load and filtering time.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Hi Izaychick63,

Iā€™m sorry it was only 1 min.
I recently started using .orc or .parquet files when working with very large data tables. It significantly improves performance of reading/writing the data.

You can get the ORC / Parquet Reader/Writer nodes by installing KNIME Extension for Big Data File Formats.

Best,
Daria

1 Like