Hello, relatively new to KNIME and I need help/suggestions on any findings I can make with my data set
Dataset: My dataset is hourly from December 2010 to December 2011. My data is based on Beijing and measures its PM2.5 values, which is basically the air pollution level. Listed below are the variables in my dataset.
- Year: Year of data in the row
- Month: Month of data in the row
- Day: Day of data in the row
- Hour: Hour of data in this row
- Pm2.5: PM2.5 concentration (ug/mA3)
- DEWP: Dew Point
- TEMP: Temperature
- PRES: Pressure (hPa)
- Cbwd: Combined wind direction
- Iws: Cumulated wind speed (m/s)
- Is: Cumulated hours of snow
- Ir: Cumulated hours of rain
And listed below are Hypothesis/Questions I have answered using Knime - PM2.5 levels are higher on weekdays than on weekdays(Done)
- Which hour of the day are PM2.5 levels the highest?(Done
- Is PM2.5 levels directly/indirectly proportional to the other variables(Done)
- How frequently do PM2.5 levels go above 35(Unhealthy)(Done)
- How much does wind direction affect PM 2.5?(Done)
- What range of temperatures have the highest PM2.5 levels(Done)
- Which season in China is the best time to visit(Less polluted)(Done)
8 Setting up linear regression model to predict PM2.5
If anyone can give me suggestions on how I can further analyze this dataset and come up with more insights. Any help is greatly appreciated