passing multiple flow variables to R node

Hi,
I have an R script as follows:

As you can see there are many columns, and I dont want to be writing 10 ifelse statements (there are actually more columns, this is just an example) . Is there a way to pass all the columns as variables to the R node?
The table column to variable takes only 1 column at a time. That would defeat the purpose.

I am ok with using regex or sring matches. For example the new column name would always be $columnname$_max
And the columns that would start with the name Device

Would greatly appreciate any insights!
Thank You

Hello again @thentangler,

If I understand correctly you can use the Extract Column Header – KNIME Hub before using the table row to variable now. This will create a variable for every column that will be Column 0, Column 1… Column n and their values will be the actual Column Name.

If you would like you can even use the Column Combiner – KNIME Hub between the Extract Column Header and the Table Row to Variable nodes to get all of the column names into one variable.

Regards,
Wali Khan

4 Likes

Hello @thentangler

This isn’t clear whether your question is about KNIME workflow or it is a question about R scripting…

Just in case if it helps, you can test this code in a ‘R Snippet’ node:

df <- as.data.frame(knime.in)
# --------------------------------------------

library(dplyr)

column_names <- colnames(df)

for(i in 2:length(colnames(df))){
  df$col_maxlogic <- ifelse(df[,i] == max(df[,i]), TRUE, FALSE)
  df <- df %>% 
    rename(!!paste0(column_names[i], "_max") := col_maxlogic)
}

# -----------------------------------------------
knime.out <- as.data.frame(df)

BR

5 Likes

Hi @wkhan
Thank you so much for this information. Now i know how to extract, combine and use the column names as variable. However I guess i was looking for a way to combine all the columns into a variable so that I can loop through them. You did answer my question, so the solution goes to you!
However i found it was easier to do it in R like @gonhaddock suggested.
Thank You!

2 Likes

Hi @gonhaddock. Yes , I ended up using something similar to what you suggested. Doing it in R was much easier in the end without having to create loop and variable nodes.
Thank you for your suggestion!

1 Like

Glad it was of help! If you used @gonhaddock solution I would mark that as the solution… especially because I don’t know any R :grinning:

1 Like

Hi @thentangler

You are right because the question as stated, was a KNIME question; and it was answered by @wkhan :tophat: . What wasn’t clear to me, is that you didn’t need all columns as variables, but a variable vector containing all the column names.

Indeed is much easier to do it in R once you have moved into a scripting node. Because R is so efficient that you can save many KNIME nodes with little coding.

I did intentionally disaggregate the following line of code from the loop section, as it by itself replaces the 3 workflow nodes in the one kindly suggested by @wkhan :

BR

1 Like

You are exactly right! I needed a variable vector that contained all the column names so that I can cycle through them. And I agree! I would prefer to do most of the slicing and dicing using R and python nodes.

Unfortunately KNIME doesn’t play nice with Python ( due to numerous environment issues) and if I use R libraries in my R node, I cant execute my Knime workflow in batch mode for automated executions :frowning:
This was the reason I was trying to do everything via KNIME. But i decided it was easier to just do it all in R and manually execute my workflow.

I really hope the KNIME developers can rectify this in future versions!

1 Like

Have you seen the Conda Environment Propagation? This should help with managing both R and Python environments if you use conda.

And there is also a KNIME + Jupyter integration to let you call notebooks and workflow from each other if you use Jupyter- Cutting down implementation time by integrating Jupyter and KNIME | KNIME

2 Likes

Hi @wkhan
Yes i have used the Conda Environment propagation. The main issue with Python and R is that I cannot execute my workflow in batch mode (via command line)
The command line execute only works if all my nodes are purely KNIME based or if my R node is basic and does not use any libraries.
If i try to execute, via command line, i get an exit error 4. It has something to do with the batchflow execution not being able to run the python and R libraries.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.