regression: dynamic filtering of columns with low p value

spider · April 17, 2015, 12:53pm

Hi,

if you make a (linear) regression you have to deal with the p value and remove columns with a p<0.05 since the probability of null hypothesis is too likely (http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients). So I use ist this way:

I do the regression.
I open the table with the statictics data in the regression node and sort it by the p value.
I open the regression node and manually remove the columns with a p>0.05.
I do the regression again and test if some columns now have a p>0.05 and remove ist again.
If I have finised I can look at the coefficents.

I wish that the regression node would have a property for removing columns with a higher p value automatically.

What do you think? That is a recurring task that is annoying if you play with several column and do it again and again.

Aaron_Hart · May 4, 2015, 7:06pm

Hi Spider,

I would try using a filter on the p values (lower bound p=0.05) and then use a Reference Column Filter to removie insignificant columns, and then re-run the regression with all surviving columns. In between the Row Filter and 2nd regression node you would likely need to use RowID to set the column name to be the new RowID and then Transpose in order to get the table formatted properly for the Reference Column Filter.

Clear as mud?

Aaron