Status of using KNIME to work with R

KNIME workflows would seem like a great environment for working with R.  Given the popularity of R, I would expect there to be a lot of interest in such a capability.

However, there doesn't seem to be much activity in the KNIME forums related to R.  Before my post yesterday, previous topic was from 5 weeks ago, only 3 new topics in last 6 months.  Same with the forum on R Scripting (under Community Contributions) -- no posts in over 6 months.  This would seem to indicate that few people are actually using R in KNIME.

Anyone know why there aren't more posts in this Forum?  Possible explanations I can think of:

  • There's another forum out there, not at knime.org, with more traffic (e.g., a KNIME forum at an R-based site?)
  • There's actually little demand for the combination -- most people are content to either work completely in KNIME or completely in R.
  • People who want a workflow interface for working with R are using something other than KNIME.
  • There really is a great deal of demand for using R in KNIME, but users are finding the current solution to be too difficult to work with.

For me, one characteristic of the "R Snippet" nodes that come with KNIME that I don't think I can live with is the fact that each node is a seperate R session.  This means that any R object created in one node is unavailable in another node, except for the single R object passed via KNIME Ports.  And if that is a large object, the overhead of the time added to read and write it between nodes (several minutes) makes it practically unusable.

Does anyone else have any input on this topic?

-- Randy Kerber
Kavaii Analytics

Hi Randy, thanks for your comments on our R integration. Just to highlight the three different ways of using R in KNIME: a) the one you mentioned, uses an local R installation and is based on CSV import/export for data exchange, b) the Remote one uses the Rserve package and can run on a server, c) R Scripting, implemented by the community namely MPI Dresden. I guess, the drawback of the first is clear due to its data exchange via flat files, the second one is more robust in terms of R data types and the last one provides more functionality towards R templates and interactions. During the year we will improve our integration towards better, interactive and user-friendly UI and data exchange between R/KNIME nodes. Similar to the R Learner and R Predictor from the local integration, we will focus on special R ports allowing to work in the same workspace throughout different R nodes.

Gabriel, thanks for the summary of the 3 modes of R usage in KNIME.

I had experimented with the first two -- Local and Remote -- but for some reason wasn't finding the right R Scripting nodes in my setup.  That's been addressed, so I've also been able to experiment with R Scripting nodes.  That works much better, in terms of not changing the data types of the R objects.

Maybe the "Remote" option would also work in that regard.  When I was experimenting with Remote versions, I did not notice that it had different behavior from Local, in terms of the behavior of the snippet variable named "R" that is used to hold the input and output data objects.  My main motivation for checking out the Remote version was hoping that using Remote would allow me to create large R objects in one node and use them in other R nodes without requiring the R objects to be rebuilt in every node they are used.  I assume that's what you mean by "work in the same workspace".

Since the Remote option wasn't giving me that behavior either, I decided I would need to implement my own R Node type for working with R, so I did that a few weeks ago.  I attempted to modify the KNIME implementation of Remote R Snippet.  I noticed that the Java code for working with Rserve supported the option of saving and restoring R sessions, and the KNIME code used a method called getRconnection() to get the R session to use to run the code in the R Snippet.  So if I could override the definition of getRconnection() I ought to be able to re-open R sessions.

That turned out to be rather difficult, because many of the things that I needed to use or override were declared private or final in the KNIME Java code.  I did eventually manage to implement a version that basically "worked".  It was ugly, but I was able to create an R object in one node and re-use it in other R nodes.  Unfortunately, since updating KNIME to version 2.6, I see that it's now broken and not compiling.

Are the new R + KNIME capabilities you mentioned close to being available?

-- Randy Kerber
    San Jose, California

Hi Randy,

In the R scripting integration we also have generic R nodes which allow to push any kind of R object from node to node. If you have multiple objects of different types, you could simply transfer them as list.

For example:

First snippet:

rOut <- list(a = c(1,2,3), b = data.frame(...))

Second snippet:

a <- kIn$a

b <- kIn$b

continue working with these objects.

Of course you need to push a KNIME table to the generic format and you only can retrieve back data frames to KNIME. Anyhow, it still transfers all data from/to the R server and it won't keep other session stuff (like libraries - you will have to load them in each snippet/plot if you want to use them).

 

Hi Rake,

I do most of my R work in KNIME - its great. I find I can modularise my R code and use it like Lego. I have a few posts related to R, try searching fro my posts on the forum.

Cheers

Mark