KNIME + H2O - how it works...

zebov · June 11, 2018, 4:46pm

I have tried KNIME and SPARK on our cluster, and seen some configuration available in NODE (address, port etc), but when trying H2O nodes I dont see any node with config dialog. I mean how it works with H2O in general ( I Have no idea about H2O btw, but how KNIME integrates with H2O either on cluster or locally? - like with spark we have configurable node + spark job server etc.)…

Thanks for answers.

ScottF · June 11, 2018, 5:44pm

Hi @zebov -

You initially invoke H2O using an H2O Local Context node, and then usually transfer data from KNIME into the H2O format using a H2O Table to Frame node to begin processing.

Last week Marten wrote an excellent post on our blog about how to tackle a Kaggle challenge using KNIME’s H2O integration. The example workflow would be very useful for you, I think. Check it out here: https://www.knime.com/blog/solving-a-kaggle-challenge-using-the-combined-power-of-knime-analytics-platform-h2o

zebov · June 11, 2018, 7:31pm

Thanks, I read it - but I dont have H20 installed and still was able to execute the node. I mean its completely different - as its a little bit confusing - no H2O installed and still execution of local context node is OK… This is something I dont get…

christian.birkhold · June 11, 2018, 9:11pm

Hi @zebov,

in the case of the H2O Local Context we provide the installation of H2O with KNIME. It’s not yet possible to run H2O Sparking Water, but we’re on it.

I hope this helps,

Christian

zebov · June 12, 2018, 8:49am

Ok, so if H2O is integrated within KNIME how does it do “distributed” operations then? Or within KNIME its only in Memory and on locally 1 node cluster? In general how can we create h2o cluster in KNIME? - Is it possible?

christian.birkhold · June 14, 2018, 12:34pm

Right, at the moment H2O runs only on “a single cluster node”, which is your local machine. Still running in parallel etc. However, we designed the extension in a way that we could theoretically add other “H2O Context”, which are not local (e.g. running on a cluster) and the nodes will still run out of the box. One of these contexts will be “H2O Sparkling Water”. I can’t promise anything regarding a timeline, but we’re on it