KNIME Processing Big Data

Hi All, 

I'm using KNIME to process my very large chemical dataset. With the limitation of computational resources that I have now, there's nothing much that I can do to continue my research. I've used the Parallel Chunk nodes but end up having Java heap space or gcc overhead.

Has anyone having the same problem with me? Or is there any alternative in processing big data that anyone had experienced?

I've found this article "From the desktop to the grid: conversion of KNIME Workflows to gUSE ". Has anyone used this?

Any replies are really appreciated. Thank you.




Hi LM,

the generic knime nodes are available on github . For more details in the paper, I would suggest you write Luis an EMail. And say hello to him from me :).

If you are using the parallel loop chunk the data is still processed at the same point in time. 

If whatever you do inside the loop reduces the data quite a lot you could use our newest streaming extension. 

Best regards, Iris 


Your link goes "404"... :-(

Also, just to add from what I've previously posted on

"Don't save" nodes from the Image mining extension can help saving disk I/O