Hello, I’m abit new to knime and I’m attempting to predict a value. I have a hourly dataset that contains a variable PM2.5, which is basically how polluted the air is, together with other variables such as pressure, temperature, dewpoint. I want to predict this PM2.5 values so I set up a Linear Regression learner and a numeric scorer node to see how accurate my predictions are. In the regression learner, if I put in all my variables, my prediction(R square) score is 40%, as I start removing variables, this score continues to go lower. Am I doing something wrong? How can I get higher prediction score
Hi @FiidisksDiduui -
Nice to see more air quality analyses using KNIME! (That’s what I started using it for originally myself. )
If you want to post your workflow and data, I’m sure folks would be happy to take a closer look.
Having said that, an R2 value of 0.4 doesn’t seem all that unreasonable. What that means practically is the features you’ve provided only explain about 40% of the variability in your PM2.5 result. Meteorological data alone is never going to be able to account for everything about PM2.5 in the atmosphere - you also need to consider sources of the pollutant and how they vary as well, whether that’s diesel engines, industrial emissions, or even natural sources like windblown dust.
Adding additional features - if they’re available - would increase R2. And to be clear, what you really want to be looking at is adjusted R2, since raw R2 will necessarily increase when you add features.
Long story short, you may not be doing anything wrong from a stats perspective. Maybe you are just limited by the nature of your available data.
Hello, thanks for the response, how can I post my data if it’s on excel? Also, what do you mean by looking at adjusted R2 because I have also noticed my raw R2 increases when I add more variables.
you can attach your data in reply using this icon Upload icon. See picture:
Additionally see here how to share a workflow with data:
Here’s a quick primer on R2 vs adjusted R2. There’s rarely a reason not to use the adjusted form, although if I remember right Excel provides raw by default. KNIME provides both