Complete noob here. I'm running linear regressions on a dataset but I want to know two measures that do not seem to be in the summary stats: the R-Squared and the p-values. 

Any help would be greatly appreciated!


You can use the linear correlation node on your measured and predicted columns. This gives the pearsons coefficient "R". Simply use a maths formula node after this node to get the R2 by squaring this column.

alternatively, install the erlwood community extension, and use the 2d/3d scatter plot which displays the R2 automatically.

Unfortunately I don't believe there is an option to calculate the p value from this type of analysis really easily in one node. If you want it, you can use another maths formula node with the equation r ((n-2)/(1-r2))^1/2. Where r is the Pearson coefficient from earlier and n is the number of rows which you can get from the statistics node with the row count. The answer gives you the t value which you would need to look up in the t test table to get the p value. Not ideal I know.


i hope the knime team are reading this to implement :-)


Hope this helps.


Was this ever implemented in a single node??

How can one analise the residuals from any of the regressions??

In the last release we added a numeric scorer, which calculates the R^2, and the linear and logistic regression nodes now provide more detailed statistics. The polynomial regression node should have these coefficients come KNIME 2.10 (due this summer).