Weighted Regression and other Analyses

sgchase · February 27, 2018, 4:12pm

I often work with survey data where each record/observation represents a different proportion of the overall population. This is quantified via a weight variable. The weight variable needs to be considered when computing simple statistics like averages where it must be applied to a single variable. In other circumstances, as in predictive modeling, it must be applied to an entire observation. I've searched through the community forums and haven't found a simple way to perform either of these weighted analyses. For variable weighting, I did see a suggestion to use Erlwood's Desirability Ranking node, but I did not follow the explanation for how it should be set up. For observation weighting, I did see a suggestion to use the one-to-many rows node to increase the number of observations proportionally to the weight variable. Unfortunately, that makes some of my data sets enormous, making analysis impractical. Any help you could provide to solve either of these issues would be greatly appreciated.

agaunt · February 28, 2018, 10:01am

I think here you need the R Snippet: For weighted linear regression, you can use the following code adapted to your data:

knime.out <- data.frame(knime.in, your_prediction = lm([Your Target Variable] ~ [Your Covariate 1] + [Your Covariate 2] + [and so on], weights = [Your Weight Vector])$fitted)

sgchase · March 1, 2018, 11:58pm

That's great! Thanks! I see your code creates data output with the prediction variable as the last column. I've experimented and found I can use the R Learner node to save the results of the model itself, which I then can combine with data in an R Predictor node to obtain the same dataset. That's pretty cool.

agaunt · March 2, 2018, 8:43am

Your solution is clearly more elegant! I'm glad I could help you find the way. ;-)