Reading Large Files Fast

Hi All,

I am trying to find a solution to reading in huge csv files (60mil+ rows) fast. Is there a node or custom coding option? As of now, I have tried to following:

  1. Base Knime Nodes (CSV/File Reader)
  2. Creating Custom Knime Node (and keeping all data in memory - not writing to disk)
  3. R Snippet (Data.table to knime.out)
  4. Increasing the ram in the Knime ini file

The best results I have had is the creation of a Knime node. This is usually faster than the other methods, though I do have some slowdown converting from a List<String> to Knime data container (basically looping through rows & cell values to populate the data table).

Are there other solutions to reading in large csv files?

Try to use file reader instead rapping it in a metanode. File reader is streamable and allow to chose fields data type.


Thanks izaychik62,

I think limiting/sampling the input until the flow is full made, and then making it streamable so things run concurrently will be the route we will go. Thanks!


This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.