Setup KNIME with R running on server

Hi guys,

first of all thanks for developing such a great tool. I have little experience with KNIME and have just started working with it. But I can already see the potential that this software brings.

Right now, I’m trying to get everything set up. So I installed KNIME on an Ubuntu server and successfully connected the Windows client. So I am able to configure some workflows in the client and run them on the server. I now want to add some R scripts to my workflow, so I first installed the required extension.

Now I had difficulties to deal properly with the R path in the client. Namely, on our server we already have RStudio and the R package manager installed. The latter is used to install packages from our internal Gitlab server and official packages, the former is used to develop and run R scripts.

When setting the R path in the KNIME client, it seems that we can only set a local path. With more than three developers who will be working with KNIME in the future, I want to avoid each building and maintaining their own R environment. Then it is preprogrammed that these drift apart.

Is there no way to directly link the R environment on the server with the client? I’m sure I’m not the first one with this problem - how do you deal with it?

Thanks!!

@ThoMi welcome to the KNIME forum. You could try and propagate an R version (and packages) thru the Conda Environment Propagation which you might distill into a YAML file:

1 Like

Hello @ThoMi ,

I’d like to add to what @mlauber71 had to say;

KNIME server AMIs come with python/conda installed and configured for the executor.
R is installed, but not configured in the executor.
If your server does not yet have R installed, see [0] for installation tips.

To configure the executor,
get location of R executable - which R

$ which R
/usr/bin/R

then, edit /srv/knime_server/workflow_repository/config/client_profiles/executor/executor.epf and add:

/instance/org.knime.ext.r.bin/knime.r.home=<location>

e.g.

/instance/org.knime.ext.r.bin/knime.r.home=/usr/bin/R

This is what my executor.epf looks like now:

$ more /srv/knime_server/config/client-profiles/executor/executor.epf
file_export_version=3.0
\!/=
/instance/org.knime.workbench.core/database_timeout=120
/instance/org.knime.ext.r.bin/knime.r.home=/usr/bin/R
 
# Add a mount point for this server. This is useful for the new filehandling nodes.
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/active=true
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/address=${origin:KNIME-EJB-Address}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/factoryID=com.knime.explorer.server
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountID=${origin:KNIME-Default-Mountpoint-ID}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountpointNumber=1
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/restPath=${origin:KNIME-Context-Root}/rest
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/user=${sysprop:user.name}
/instance/org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/useRest=true
 
/instance/org.knime.python2/condaDirectoryPath=/opt/miniconda3
/instance/org.knime.python2/defaultPythonOption=python3
/instance/org.knime.python2/python3CondaEnvironmentName=py39_knime
/instance/org.knime.python2/python3Path=/opt/miniconda3/envs/py39_knime/bin/python3

then restart executor: sudo systemctl restart knime-executor

I can verify that the executor correctly picked up this change by looking at the combined-preferences.epf:

knime@ip-10-0-1-5:/opt/knime/knime-executor/workspace/.metadata/.plugins/org.knime.product$ more combined-preferences.epf
#
#Thu Nov 03 17:55:18 UTC 2022
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/user=knime
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountpointNumber=1
org.knime.python2/condaDirectoryPath=/opt/miniconda3
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/restPath=/knime/rest
org.knime.workbench.core/database_timeout=120
file_export_version=3.0
org.knime.python2/defaultPythonOption=python3
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/factoryID=com.knime.explorer.server
org.knime.ext.r.bin/knime.r.home=/usr/bin/R
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/useRest=true
org.knime.python2/python3CondaEnvironmentName=py39_knime
\!/=
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/active=true
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/mountID=knime-server
org.knime.workbench.explorer.view/mountpointNode/KNIME-Server/address=${origin\:KNIME-EJB-Address}
org.knime.python2/python3Path=/opt/miniconda3/envs/py39_knime/bin/python3

Once the executor knows where to find R, then you can use the Conda Environment Propagation node in your workflow to configure the environment for subsequent R nodes to work in. [2]

After that, workflow-hack away!

Regards,
Nickolaus

[0] KNIME Interactive R Statistics Integration Installation Guide
[1] KNIME Interactive R Statistics Integration Installation Guide
[2] https://www.knime.com/blog/how-to-manage-python-environments-conda-and-knime

1 Like

Thanks for your answers! But I must honestly confess that I am not yet completely satisfied with this process.

As I said, we already have an R environment on our server (where KNIME is also running) that is linked to our package manager. That allows us to easily install official packages and packages from our internal gitlab. With the suggested procedure you would have to create a Conda environment locally and send it to the server. I would rather change the server R environment directly.

Is there no way to set a remote path to the R_HOME?

Alternatively, is there a possibility to execute a bash script in the client that uses code that is located on the server? I mean we have configured the client to execute the nodes on the server on pressing “execute”.

After searching a little bit, i found the remote workflow editor plugin. Is this something I could use for this?

To elaborate more on the question above, might something like "knime://knime.mountpoint/" be used for setting path ro R_HOME on server?

Hello @ThoMi ,

I am familiar with the use of Conda Environment Propagation node, and really like it because it allows the environment to be specified by the workflow developer in the workflow, which makes it portable and not reliant on a server admin to configure a specific environment outside of KNIME. But, I understand that you have packages coming from an internal gitlab - something I might look into there would be if conda, on the KNIME server host, could be configured to use your gitlab as a repo, such that if your CEP node asks for internal packages it would look there first, and if it can’t find a thing, then it could then look at official repos.

W/rt the RWE (remote workflow editor), that is used to start a workflow as a job on the server, but then interact with it while it is running on the executor as if you were only doing things in AP. It’s a way to run it in the KS/KE environment while still doing troubleshooting/stepthroughs. But it is not going to solve your R_HOME/R environment questions.

I can look into whether there is a way to set R_HOME, but will need to consult with internal resources and get back to you on that.

Regards,
Nickolaus