Skip to content

wilfredkisku/LINEAR-REGRESSION-AIR-QUALITY

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Air Quality Prediction (Linear Regression)

Problem Statement

We need to model a predictive learning model to estimate the future dated values of the pollutants such as O3, PM2.5 and SO2. The most important aspect is to get an idea of the dataset and extract useful information that can be processed by the system for its training. This makes it imperative to pre-process and remove redundant information from the dataset. Also, the dataset should be free from inconsistencies such as missing data points, multiple data points, incoherent data points between the AQS dataset and the MesoWest dataset.

Preprocessing

  • There were two datasets that has to be accessed and consolidated, the MesoWest dataset for the features and the AQS dataset to obtain the particulate matter concentration in the air.
  • The Datasets had a missing and uncomputable values such as ’NA’, ’N/A’, ’na’, ’n/a’, ’--’, ’-’, Null being some of the most common missing values in the dataset.
  • The complete matrix contain the X matrix (D features + 1 bias) concatinated with the output vector Y. So the matrix would be of the dimension m × D+2, where m is the total number of samples (or) rows in the matrix (data points).

Model Description

A linear hypothesis is selected as the prediction is to made over a range of real values R. The hypothesis is hw(x). A cost function is to be calculated to penalize the hypothesis as to obtain the correct set of parameters w.

Weight update can be effected using the gradient descent expression:

Result

The plots are individual column value plots(altimeter, air temperature etc.) in relation with the concentrations of the pollutants. Since the models includes multiple features, it is multi-dimensional is nature making it hard to get a visual representation of the features in relation to the output concentration of the pollutants.

It can be seen that the it would be very difficult to get a linear relation between the concentration of the pollutants, here below is the concentration of O3, with just a single meteorological data such as pressure, temperature or humidity. So including multiple features would make it easier to draw a relation between the output concentrations and the features.

Status: Completed

About

AQS with MesoWest for prediction of Air Quality

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages