I have some problems with R scripting nodes and h2o package. I don’t know why, but R node sends “POST shutdown” command to h2o cluster after successful execution. It is a bit problematic, because it’s killing my “cluster”. For example, executed “R source (Workspace)”:
h2o.init(ip = “localhost”, port = 54321, startH2O = FALSE, strict_version_check = FALSE)
will send “POST shutdown”. The same script works if I executed by “Eval script”/“Eval Selection” button “inside” node (in configure view). I can’t find any reason for this behavior.
OS: Windows Server 2016
H2O started locally from .bat file.
I will be grateful for any help.
Before we try to reproduce this: Have you had a look at the H2O nodes in KNIME? They allow you to use H2O without going through R. If you looked into the KNIME H2O integration already, is there some feature you are missing? We’re actively working on expanding our H2O integration, so any feedback from H2O users is highly appreciated!
yes, I have. I often use H2O nodes, but there are 2 important features missing: connection to remote server and grid search. Most important feature is connection to remote server. I need to share resources (Data Frames, fitted models, etc.) between Knime, H2O Flow and H2O Steam. Current H2O Integration allows to use H2O in local context only so I can’t send prepared data frame to cluster.
As alternative for grid search I can (probably) use “Parameter Optimization” loops but it is not comfortable.
thanks for the feedback. Can you provide more details what exactly is not comfortable in this setup? Maybe we can improve it in a future version.
PS: Do you need “REST” access or access to Sparkling Water?
configuration takes a lot of time and lacks of some important features. For H2O models, e.g. Gradient Boosted Trees, you can set about 20 parameters. To do that, I have to:
- Set name, min value, max value and step value in Parameter Optimization Loop Start node for each parameter.
- Configure flow variables in learner node for each parameter.
In total, it takes a lot of time. It is really easy to make mistakes during configuration.
The additional problem is lack of string parameters (In Gradient Boosted Trees you can set “Categorical encoding”, “Distribution” - both from the list) and constant step for every parameter but I can’t set list/vector of values for each parameter.
Of course, I can use other loop nodes (maybe REST nodes?) as “workaround”, but in H2O Flow or R/Python interfaces, it takes less time and effort. I just don’t have to configure additional things like flow variables and can set list of values for each model parameter.
“Do you need “REST” access or access to Sparkling Water” -> REST access is enough for me.
First off, thanks for your feedback, you raise some very good points! Second, I was able to reproduce your problem regarding the shutdown. We’re not quite sure yet why this happens, so we need some time to investigate.