I am doing the research to use knime with a big data project. It look like that We will have to use hive and hadoop to store and analyse the data. My question is it seems that the knime api node force the data go through the input and out put port. But our data is really big and it is not good idea to have the data to go through the UI. What is the best method to handle this situation, we just want to use knime to for the data flow design and probably monitor the progress , the error and configuration.
Thank your for your help.
There are specific plug ins in knime for big data. Does this help;
does this help.
The data does not go through the UI and in fact, KNIME's executor is designed to be as memory efficient as possible so that for most operations, disk capacity and performance are the limiting factors, not memory or CPU power.
As Simon mentions, Actian (formerly Pervasive) do has an extension, DataRush, that allow workflows to be streamed rather than cached so for certain applications it can be much faster to use this approach. In fact, DataRush even enables you to run KNIME workflows on a hadoop file system which can scale rather well indeed :)
From the link below it appears that the Hadoop plugin for KNIME is commercial and not open source.
Are there any open source KNIME plugins for Hadoop?
That is not currently on our roadmap, sorry!