How to build an R scatter plot using flow variables?

gcincilla · April 20, 2015, 1:12pm

Dear R lovers,

I'm trying to use the “R View (Table)” node in order to generate a X-Y scatter plot including the best fit line over the points. My scatterplot represent the response variable in the X axis and the prediction of the response on the Y axis.

As the response and the prediction response can vary from time to time, I'm trying to pass them in the R plot through the flow variables in this way:

x<-knime.flow.in[["response_name"]]

y<-knime.flow.in[["prediction_response_name"]]

plot(x,y)

Unfortunately this does not seem the right approach as I obtain the following error:

Error in plot.window(...) : need finite 'xlim' values

In addition: Warning messages:

1: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion

2: In xy.coords(x, y, xlabel, ylabel, log) : NAs introduced by coercion

Please, do you have any suggestion?

Gio

gabriel · May 15, 2015, 12:42pm

x and y are nominal variables (strings) which need to be used to access the columns with the name given by the parameter x and y within the dataframe knime.in, that means, your script would read as plot(knime.in[,x], knime.in[,y]); Hope this helps.

gcincilla · May 18, 2015, 5:31pm

Thank you Gabriel,

Your suggestion allowed me to achieve a generalized R scatterplot including the best line fit and the R2 value as legend. The only input of the R (view) table node is a table containing the independent variable (which flow var name is "response_name" and the dependent variable "prediction_response_name".

I write here the code for if somebody can be interested.

Cheers,

Gio

x<-knime.in[,knime.flow.in[["response_name"]]]
y<-knime.in[,knime.flow.in[["prediction_response_name"]]]

mod1 = lm(y~x)
modsum = summary(mod1)

plot(x, y, pch = 20, type = 'p', las = 1,
        xlab=knime.flow.in[["response_name"]],
        ylab=knime.flow.in[["prediction_response_name"]])

abline(mod1, col="red")

r2 = modsum$adj.r.squared

mylabel = bquote(italic(Rext)^2 == .(format(r2, digits = 3)))
legend('topleft', legend = mylabel, bty = 'n')

gcincilla · May 18, 2015, 11:27pm

Thank you Gabriel,

Your suggestion allowed me to achieve a generalized R scatterplot including the best line fit and the R2 value as legend. The only input of the R (view) table node is a table containing the independent variable (which flow var name is "response_name" and the dependent variable "prediction_response_name".

I write here the code for if somebody can be interested.

x<-knime.in[,knime.flow.in[["response_name"]]]

y<-knime.in[,knime.flow.in[["prediction_response_name"]]]


mod1 = lm(y~x)

modsum = summary(mod1)


plot(x, y, pch = 20, type = 'p', las = 1,

xlab=knime.flow.in[["response_name"]],

ylab=knime.flow.in[["prediction_response_name"]])


abline(mod1, col="red")


r2 = modsum$adj.r.squared


mylabel = bquote(italic(R)^2 == .(format(r2, digits = 3)))

legend('topleft', legend = mylabel, bty = 'n')

system · June 2, 2023, 9:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.