I am trying to use the R nodes with Knime 2.9 and R 3.0. I have already taken care of all errors (updating the Knime.ini file, preferences, installing package "Rserve", etc). All works well, except for two things, and both with the R Snippet node from Knime Labs.
First, when going into the configuration settings, the Send Input Table to R hangs at 96% but then ultimately completes.
Second, though, the node will not execute and hangs at 30% indefinitely and you can't cancel the running node either. The only solution is to shut down Knime entirely and restart.
My data is about 125,000 lines so not too terrible.
Hi Arne, He Mark, we recently discovered a problem with the data handling in KNIME 2.10 which we believe have fixed in 2.10.1 -- can you please give it a try and update KNIME.
Hi Gabriel,
I've updated KNIME to the 2.10.1 version but I still have the same problem.
When the R Snippet node executes, it doesn't get further than 30%, but KNIME itself doesn't crash (I can still cancel).
Are you using the R Snippet from the Interactive category? What happens if you sample the data down to 1% and/or reduce the number of columns? Sorry, I am drawing a blank.
Yes, it's the R Snippet node from the R Interactive category.
But I just found out that nothing was going wrong. The calculations that I did in the R script in the R Snippet node were just too "heavy". That's probably why the progress of the execution of that node was 30% for such a long time. Now I've splitted up the calculations into a couple R Snippet nodes and joined the outputs with the Joiner node, and I think it will work now. I'll keep you posted.
Like I said, I used multiple R Snippet nodes next to each other and then joined the outputs afterwards. This worked out fine for me. So I think there was no problem when all of the calculations were done in just 1 R Snipped node, except for me being impatient. :)
Just for future reference, we can't really monitor the execution progress of an R script so there are a couple of "magic" numbers in the progress indicator. The first 30% is the loading of the data into R, and the last 30% is reading it back into KNIME.
The upshot of this is that the progress indicator will hang at 30% while the R code is actually running and then jump to 70% once it completes. If you are seeing a lot of the execution time of the node in the 0-30 and 70-100 range of the progress indicator, you may be able to achieve significant performance increases if you split out and send to R only the columns you need, and then return from R only the needed data generated there.