execute multiple R processes

dougb · January 2, 2015, 10:27pm

I am using Knime and R to analyze a large dataset. I am using Knime filter nodes to select a subset of my data which I then pass to R. When I wanted to filter my data in a different manner and perform the same test in R, rather than create a separate Knime workflow I duplicated my connected nodes into the same workflow such that I have two parallel workflows each with it's own end node If I execute both of these workflows at the same time they both begin processing. If I use top to view the cpu and memory utilization of the R processes on my linux machine, I see a single R process. I had expected that when I executed both workflows at the same time I would see two R processes, one for each group of connected nodes.

Is there a way to configure my workflow such each group of nodes will execute a separate R process and thereby utilize the resources of my computer more efficiently?

Thanks, Doug

wiswedel · March 13, 2015, 6:39pm

The execution of the R script itself runs concurrently, though the data transmission in/from R is tunneled through a R/Java library called JRI. This step (any moving of data between the processes and anything you do in the configuration dialog) is happening in this single-threaded library. The execution of the scripts itself is then done in separate R process(es).

Btw, in previous versions we used files to exchange data between R and Java ... but that was super slow compared to the current implementation.

So to answer your question I don't see how to optimize this process (unless inventing something completely new)....

system · June 2, 2023, 9:32pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.