Reading Large Files Fast


Hi All,

I am trying to find a solution to reading in huge csv files (60mil+ rows) fast. Is there a node or custom coding option? As of now, I have tried to following:

  1. Base Knime Nodes (CSV/File Reader)
  2. Creating Custom Knime Node (and keeping all data in memory - not writing to disk)
  3. R Snippet (Data.table to knime.out)
  4. Increasing the ram in the Knime ini file

The best results I have had is the creation of a Knime node. This is usually faster than the other methods, though I do have some slowdown converting from a List<String> to Knime data container (basically looping through rows & cell values to populate the data table).

Are there other solutions to reading in large csv files?



Try to use file reader instead rapping it in a metanode. File reader is streamable and allow to chose fields data type.



Thanks izaychik62,

I think limiting/sampling the input until the flow is full made, and then making it streamable so things run concurrently will be the route we will go. Thanks!