Big Data

tuan08 · May 27, 2013, 5:19am

Hi ,

I am doing the research to use knime with a big data project. It look like that We will have to use hive and hadoop to store and analyse the data. My question is it seems that the knime api node force the data go through the input and out put port. But our data is really big and it is not good idea to have the data to go through the UI. What is the best method to handle this situation, we just want to use knime to for the data flow design and probably monitor the progress , the error and configuration.

Thank your for your help.

Tuan Nguyen

richards99 · May 27, 2013, 8:08am

There are specific plug ins in knime for big data. Does this help;

http://www.knime.org/knime-big-data

does this help.

simon.

Aaron_Hart · May 27, 2013, 5:29pm

Hi Tuan,

The data does not go through the UI and in fact, KNIME's executor is designed to be as memory efficient as possible so that for most operations, disk capacity and performance are the limiting factors, not memory or CPU power.

As Simon mentions, Actian (formerly Pervasive) do has an extension, DataRush, that allow workflows to be streamed rather than cached so for certain applications it can be much faster to use this approach. In fact, DataRush even enables you to run KNIME workflows on a hadoop file system which can scale rather well indeed :)

Regards,

Aaron

vioravis · June 20, 2013, 1:04pm

From the link below it appears that the Hadoop plugin for KNIME is commercial and not open source.

http://bigdata.pervasive.com/Solutions/Hadoop-Data-Integration.aspx

Are there any open source KNIME plugins for Hadoop?

Thank you.

Ravi

Aaron_Hart · June 21, 2013, 10:20am

Hi Ravi,

That is not currently on our roadmap, sorry!

Regards,

Aaron