Problem with memory when processing big dataset

Hi,

I had a problem with processing dataset in KNIME. Dataset has ~ 900 k rows, ~100 columns and I tried to sort it by one of the columns. Firstly I got an error related with "Java heap space". I changed parameters in knime.ini to 

- XX:MaxPermSize=1g

-Xmx2g

(where total RAM on my PC is 8GB), but the sorting node stopped after 50%.

What to do to handle effectively such big datasets. How properly set KNIME parameters? Does KNIME use only RAM to process operations or also uses HD?

Thank you for help.

Paul

Hi Paul,

   Increasing the max perm size is not necessary -causes less available memory for data- unless you get an error message complaining about that. Increasing -Xmx is advisable to an extent that is close, but not exceeds your free memory (I think 6-7 gb might be at most on a 64 bit OS with 8 gb RAM as some memory is required for the OS and also for the programs, caches).

   KNIME uses the HDD/SSD for temporary files if the data does not fit in memory.

Cheers, gabor

Hi Paul,

I had the same issue previously and changing the parameters does not solve my problems.

However, I solved it by adding the Parallel Chunk Start (and End) nodes in between my processing nodes. It basically splits the input data and executes it in parallel, each processing one chunk (at least that it what is says in the node description). 

It works for my big data so its worth trying. Good luck! 

- LM

Thank you!

Both things helped.

I have one more question. Does placing workspace on SSD improve KNIME effieciency in general?

Best

Paul

If your dataset does not fit in memory: yes it does.

I am knew to knime. I downloaded the KNIME Analytics Platform for Windows (self-extracting archive). I unzipped the file and whenever I want to run knime I run the knime.exe file from the folder. I have not installed knime onto my machine.

I am facing problem whenever I try to load excel files of 1.6 GB size using "Excel reader". the following error messafes are displayed-

"Potential deadlock in SWT Display thread detected. Full thread dump will follow as debug ouput." and
"Potential deadlock in AWT Event Queue detected. Full thread dump will follow as debug ouput."

I then split up the excel files into two parts and then I was able to load both of them. I then tried to load a third excel file when the following error message was displayed-

"Execute failed: java.lang.OutOfMemoryError: Java heap space".

I did not use the "File reader" or "CSV reader" as they are not displaying data correctly.

I am working on WINDOWS 10 machine with 8 GB RAM. I checked out the memory usage through Windows Task Manager utility and it shows only 50 percent memory utilisation by Knime.

The Knime log file shows following details-

: # java.version=1.8.0_60

: # java.vm.version=25.60-b23

: # java.vendor=Oracle Corporation

: # os.name=Windows 10

: # os.arch=amd64

: # number of CPUs=4

: # assertions=off

: # max mem=1820MB

: # application=org.knime.product.KNIME_APPLICATION

How do I resolve this error problem. Though I have 8 GB memory why does knime log show ": # max mem=1820MB"