Speed Up Reading Data / Heap Space

Hi all,

Do you have any recommendations about how to avoid crashes and heap space exhaustion when reading input data, and how to read data in a faster way?

First off, I’ve already gone through the usual material about heap space: changing the .ini file, changing the node’s memory policy, and closing all other applications. Even then, let me give you an example: I have an excel file that isn’t that big (30mb, it has 25 columns and 265,440 rows). My machine has 8GB RAM, and I’ve given 5GB to KNIME in .ini. Reading this file through an Excel Reader takes about a minute, so not the fastest (and here comes my question about how to gain time in this) but still ok. However, if I were to put a Read Excel Sheet Names, I either exhaust the heap space or KNIME just crashes… and it’s just a 30mb file!

Thanks a mill,
Gui

Hi there Gui,

regarding Excel nodes - due to the underlying library of the Apache POI project performance is limited so don’t think that giving more GBs to KNIME and shutting down every application on your computer will help. This can especially be seen with Read Excel Sheet Names node as you already experienced. And if I remember well file size is not that important here at all… If

As a workaround for Excel Reader streaming is available so you can try that to speed things up. Or you can try reading Excel with R Integration for example. Here is a link for more info. And you don’t even need sheet names :wink:

Br,
Ivan

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.