How to use flow variables in R snippet?

Hi guys,

I'm using the R snippet node in KNIME Labs – R (Interactive) repository. I would need your help to do something easy in principle: using a flow variables content as a variable name.

I explain myself better. I have the following working code:

lm.fit=lm(housecost~.,data=knime.in)

that builds a linear model using the variable “housecost” as response, present in knime.in data. Now I want to generalize the code passing the response variable name (in this case “housecost”) from the outside, inside a flow variable called “response”.

I've tried with this:

lm.fit=lm(knime.flow.in[["response"]]~.,data=knime.in)

but it gave me the following error:

Error in model.frame.default(formula = knime.flow.in[["response"]] ~ ., :

variable lengths differ (found for 'X')

In the past a KNIME user suggested me to use this:

lm.fit=lm(knime.in[[knime.flow.in[["response"]]]]~.,data=knime.in)

With it I don't get any error. Anyway it seems that the linear model is still not able to take "housecost" as response variable name. Indeed it appears between the other dependent variables.

I attached my workflow. Please, do you have some suggestions?

Gio

Hi Gio, 

I have been curious about this for a while now and your question made me finally spend some time looking at it.  The best solution I found, was to create the formula for the call to lm manually by constructing the string first and then converting it to a formula.  I imagine there is a more elgant way to do it, but I could not discover what this is.  

 

My solution:

target = knime.flow.in[["response"]]

formula = paste0(target, "~ .")

lm.fit=lm(formula, knime.in)

 

Hi Aaron,

Your solution works perfectly. I really don't mind if it's not "elegant", what's important is that it allows me to generalize my workflows!

:-)

Thank you so much for your help.

Hi again Aaron,

I'm sorry if I come back to this issue but I found that although your solution perfectly work with the R lm() function, it does NOT with the lda() nor the qda() ones (the R functions to calculate Linear Discriminant Analysis and Quadratic Discriminant Analysis, respectively).

It gives me the following error:

Error in lda.default(formula, data = knime.in) : 'x' is not a matrix

Please do you have other suggestions?

Gio

I think the formula is still fine, but you need to recast your data frame as a matrix for this kind of model.

For example, you could add the line:

knime.in = data.matrix(knime.in)

above your QDA call

https://stat.ethz.ch/R-manual/R-devel/library/base/html/data.matrix.html