From 829451807a366599fcd02974a01d6ade071cf332 Mon Sep 17 00:00:00 2001 From: Aditya Sharma Date: Sat, 30 Mar 2019 22:30:04 -0700 Subject: [PATCH] checking readme instructions --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 06e1727..4ad3f07 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# Adult Income Prediction using Flask app on Heroku +# Adult Income Prediction Follow the steps provided below to reproduce the whole project. @@ -25,7 +25,7 @@ Now you should have everything installed that we need. ### Data format before cleaning -This information is directly copied from the [UCI datasets repository for adult dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names) +This information is directly copied from the [UCI datasets repository for adult dataset](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names). - income: >50K, <=50K. - age: continuous. @@ -79,7 +79,7 @@ Execute the `main.py` script which will train a model on the cleaned data and ex ```bash python3 incomePrediction/main.py ``` -I chose the `LogisticRegression` classifier from scikit-learn to get predictions (The test accuracy obtained is quite well ~ 85%). Cross-validation is done to choose the important hyperparameter (`C`) to control the degree of regularization. The script can be modified to use and tune any classifier available in `scikit-learn`. Both the training and test accuracies are comparable and hence, there seems to be no overfitting. I chose to go with Logistic Regression because it is a simple linear classifier whose results are interpretable and this is what I would expect from a model on such a dataset where the predictor-response relationship seems to be important in the analysis. I also tried building and tuning a RandomForest classifier and there was a 1 percent increase in the accuracies which is not much higher and therefore, a simpler model is a better choice. +I chose the `LogisticRegression` classifier from scikit-learn to get predictions (The test accuracy obtained is quite well ~ 85%). Cross-validation is done to choose the important hyperparameter (`C`) to control the degree of regularization. The script can be modified to use and tune any classifier available in `scikit-learn`. Both the training and test accuracies are comparable and hence, there seems to be no overfitting. I chose to go with Logistic Regression because it is a simple linear classifier whose results are interpretable and this is what I would expect from a model on such a dataset where the predictor-response relationship seems to be important in the analysis. I also tried building and tuning a RandomForest classifier and there was a 1% increase in the accuracies which is not much higher and therefore, a simpler model is a better choice. ### Deploy the model on Heroku @@ -103,4 +103,4 @@ I have used the `pytest` library to test the [Util class](https://github.com/adi pytest incomePrediction/tests/ ``` -Due to shortage on time, I could not cover all kinds of tests but I did set up a basic test infrastructure which could be extended to test the remaining code (unit, integration and e2e tests). +As of now, the tests section is not exhaustive but I did set up a basic test infrastructure.