Air Quality Prediction (Linear Regression)

Problem Statement

We need to model a predictive learning model to estimate the future dated values of the pollutants such as O₃, PM_2.5 and SO₂. The most important aspect is to get an idea of the dataset and extract useful information that can be processed by the system for its training. This makes it imperative to pre-process and remove redundant information from the dataset. Also, the dataset should be free from inconsistencies such as missing data points, multiple data points, incoherent data points between the AQS dataset and the MesoWest dataset.

Preprocessing

There were two datasets that has to be accessed and consolidated, the MesoWest dataset for the features and the AQS dataset to obtain the particulate matter concentration in the air.
The Datasets had a missing and uncomputable values such as ’NA’, ’N/A’, ’na’, ’n/a’, ’--’, ’-’, Null being some of the most common missing values in the dataset.
The complete matrix contain the X matrix (D features + 1 bias) concatinated with the output vector Y. So the matrix would be of the dimension m × D+2, where m is the total number of samples (or) rows in the matrix (data points).

Model Description

A linear hypothesis is selected as the prediction is to made over a range of real values R. The hypothesis is h_w(x). A cost function is to be calculated to penalize the hypothesis as to obtain the correct set of parameters w.

$J(w) = \frac{1}{2m} \sum{(h_w(x) - y)^2}$

Weight update can be effected using the gradient descent expression:

$w := w_i - \eta \frac{1}{m}\sum{(h_w(x)-y) \times x}$

Result

The plots are individual column value plots(altimeter, air temperature etc.) in relation with the concentrations of the pollutants. Since the models includes multiple features, it is multi-dimensional is nature making it hard to get a visual representation of the features in relation to the output concentration of the pollutants.

It can be seen that the it would be very difficult to get a linear relation between the concentration of the pollutants, here below is the concentration of O₃, with just a single meteorological data such as pressure, temperature or humidity. So including multiple features would make it easier to draw a relation between the output concentrations and the features.

Status: Completed

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Assignment_CSL7020_P19EE003.pdf		Assignment_CSL7020_P19EE003.pdf
BDCC-02-00005-v2.pdf		BDCC-02-00005-v2.pdf
README.md		README.md
clean_day_data_hourly_KIGQ.py		clean_day_data_hourly_KIGQ.py
clean_day_data_hourly_KLOT.py		clean_day_data_hourly_KLOT.py
consolidated_final_O3_3.csv		consolidated_final_O3_3.csv
consolidated_final_PM25_3.csv		consolidated_final_PM25_3.csv
consolidated_final_SO2_3.csv		consolidated_final_SO2_3.csv
linearRegression.py		linearRegression.py
normalize.py		normalize.py
parse_features_clean_KIGQ.py		parse_features_clean_KIGQ.py
test_O3.csv		test_O3.csv
test_O3_final.csv		test_O3_final.csv
test_PM25.csv		test_PM25.csv
test_PM25_final.csv		test_PM25_final.csv
test_SO2.csv		test_SO2.csv
test_SO2_final.csv		test_SO2_final.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Air Quality Prediction (Linear Regression)

Problem Statement

Preprocessing

Model Description

Result

About

Releases

Packages

Languages

wilfredkisku/LINEAR-REGRESSION-AIR-QUALITY

Folders and files

Latest commit

History

Repository files navigation

Air Quality Prediction (Linear Regression)

Problem Statement

Preprocessing

Model Description

Result

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages