From 90511c40c346bd75a8e8e17dcf04551722e106e8 Mon Sep 17 00:00:00 2001
From: Ashesh Kumar Singh <user501254@gmail.com>
Date: Mon, 1 Feb 2021 19:32:53 -0600
Subject: [PATCH] Update README

---
 README.md | 124 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 74 insertions(+), 50 deletions(-)

diff --git a/README.md b/README.md
index 1cab46c..2e42a20 100644
--- a/README.md
+++ b/README.md
@@ -4,9 +4,10 @@ Regression AES — Automated Essay Scoring as regression problem
 This project aims to automatically score student essays using NLP techniques.
 Here, the problem of automatic grading is approached as a regression problem.
 
-This project uses the ASAP-AES dataset (https://www.kaggle.com/c/asap-aes/data) and builds a model to predict the scores
-of essays written by Grade 7 to Grade 10 students. To do this, we'll provide the model with a description of many 
-essays having various attributes.
+This project uses the ASAP-AES dataset (https://www.kaggle.com/c/asap-aes/data) and builds a model to predict scores for
+essays written by Grade 7 to Grade 10 students. To do this, we'll provide the model with a description of many essays 
+having various features. The underlying objective is to observe the effect of essay features (taken separately and in 
+combination) on the score.
 
 -----
 INDEX
@@ -15,9 +16,14 @@ INDEX
 1. [About the ASAP-AES Dataset](#1-about-the-asap-aes-dataset)
 2. [Some Important Files](#2-some-important-files)
 3. [Application Design](#3-application-design)
-4. [Setup Instructions](#4-setup-instructions)
-5. [Usage Details](#5-usage-details)
-6. [Visualization and Demo](#6-visualization-and-demo)
+4. [Results](#4-results)
+5. [Setup Instructions](#5-setup-instructions)
+6. [Usage Details](#6-usage-details)
+7. [Visualization](#7-visualization)
+    + [7.1 Scores Across Essay Sets](#71-scores-across-essay-sets)
+    + [7.2 Resolved Human Ratings V/s Meta Features.png](#72-resolved-human-ratings-vs-meta-featurespng)
+    + [7.3 Resolved Human Ratings V/s Extracted Features.png](#73-resolved-human-ratings-vs-extracted-featurespng)
+    + [7.4 Resolved Human Ratings V/s Readability Features.png](#74-resolved-human-ratings-vs-readability-featurespng)
 
 
 1\. About the ASAP-AES Dataset
@@ -26,22 +32,24 @@ INDEX
 The dataset was made available through a competition held by The William and Flora Hewlett Foundation (Hewlett).
 There are sevral available data formats including TSV and excel. Each file has a number of columns.
 The ones that are important to ur discussion of the project are:
-    essay_set: 1-8, an id for each set of essays
-    essay: The ascii text of a student's response
-    rater1_domain1: Rater 1's domain 1 score; all essays have this
-    rater2_domain1: Rater 2's domain 1 score; all essays have this
-    domain1_score: Resolved score between the raters; all essays have this
+- **essay_set:** 1-8, an id for each set of essays
+- **essay:** The ascii text of a student's response
+- **rater1_domain1:** Rater 1's domain 1 score; all essays have this
+- **rater2_domain1:** Rater 2's domain 1 score; all essays have this
+- **domain1_score:** Resolved score between the raters; all essays have this
 
 The `essay_set` indicates which set the essay belongs to. each set has a different writing prompt and different range
 of scores in which the human grader grades.
-The `essay` column has all the tessay in text format with some named entities Anonimized as:
+
+The `essay` column has all the essays in text format with some named entities anonymized as:
+
     "PERSON", "ORGANIZATION", "LOCATION", "DATE", "TIME", "MONEY", "PERCENT"
-Besides the above score columns, ie. `rater1_domain1`, `rater2_domain1`, `domain1_score` there are scores for other 
-domains as well. However they are not present for each and every essay. These are ignored for simplicity. However, 
-incorperating these scores in the model building process post preprocessing may give better results.
 
+Besides, the above score columns, ie. `rater1_domain1`, `rater2_domain1`, `domain1_score` there are scores for other 
+domains as well. However, they are not present for each and every essay. These are ignored for simplicity. However, 
+incorporating these scores in the model building process post preprocessing may give better results.  
 
-Table: ASAP-AES Dataset overview:
+**Table: ASAP-AES Dataset overview:**
 
 | essay_id | essay_set | essay | rater1_domain1                                    | rater2_domain1 | rater3_domain1 | domain1_score | rater1_domain2 | rater2_domain2 | domain2_score | ... | rater2_trait3 | rater2_trait4 | rater2_trait5 | rater2_trait6 | rater3_trait1 | rater3_trait2 | rater3_trait3 | rater3_trait4 | rater3_trait5 | rater3_trait6 |     | 
 |----------|-----------|-------|---------------------------------------------------|----------------|----------------|---------------|----------------|----------------|---------------|-----|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|---------------|-----| 
@@ -57,8 +65,7 @@ Table: ASAP-AES Dataset overview:
 | 12974    | 21630     | 8     | Trippin' on fen...                                | 20             | 20             | NaN           | 40             | NaN            | NaN           | NaN | ...           | 4.0           | 4.0           | 4.0           | 4.0           | NaN           | NaN           | NaN           | NaN           | NaN           | NaN | 
 | 12975    | 21633     | 8     | Many people believe that laughter can improve...  | 20             | 20             | NaN           | 40             | NaN            | NaN           | NaN | ...           | 4.0           | 4.0           | 4.0           | 4.0           | NaN           | NaN           | NaN           | NaN           | NaN           | NaN | 
 
-
-Table: Dataset Stats (set-wise):
+**Table: Dataset Stats (set-wise):**
 
 |                    | essay_set      | 1         | 2           | 3           | 4           | 5           | 6           | 7           | 8           |
 |--------------------|----------------|-----------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
@@ -91,19 +98,20 @@ Table: Dataset Stats (set-wise):
 2\. Some Important Files
 ------------------------
 
-The project repository (https://github.com/user501254/nn-aes) contains a couple of Jupyter notebooks and a single
-python script.
+This repository contains a couple of Jupyter notebooks, and a single python script.
 
-Jupyter notebooks:
+```
+Jupyter Notebooks:
     data_etl.ipynb                  to extract transform and load the ASAP-AES dataset
     regression_model.ipynb          for interactive model building
     
 Script:
-    main.py                         similar functionality to `regression_model` notebook but as a script
+    main.py                         similar to regression_model.ipynb notebook, see note below
 
 Directories:
     input/asap-aes                  for storing extracted ASAP-AES dataset
     output/training_set_rel3.pkl    a pre formed dataframe with additional features for quick modeling
+```
 
 The script file `main.py` makes use of loops for model training and evaluation across all possible feature combinations.
 This is different from the `regression_model` notebook, since there training is done on all features at once.
@@ -113,10 +121,10 @@ This is different from the `regression_model` notebook, since there training is
 ----------------------
 
 In a regression problem, we aim to predict the output of a continuous value, like in this case, the score for an essay.
-Altough the grading is done in discreete steps within the range of 0-60, we can solve this problem through regression.
-Other alternative approaches may include classifcation, Nural Nets and Deap Learning.
+Although the grading is done in discrete steps within the range of 0-60, we can solve this problem through regression.
+Other alternative approaches may include classification, Neural Nets and Deap Learning.
 
-First, we load the data into a data frame and compute various sets of features. These can be roughly categoried as:
+First, we load the data into a data frame and compute various sets of features. These can be roughly categorized as:
 ```python
 meta_features = ['essay_length', 'avg_sentence_length', 'avg_word_length']
 grammar_features = ['sentiment', 'noun_phrases', 'syntax_errors']
@@ -127,55 +135,71 @@ redability_features = ['readability_index', 'difficult_words']
     1. Essay Length (number of words)
     2. Average Sentence Length
     3. Average Word Length
-
 2. Grammar Features:
     1. Sentiment (+ve/-ve) *
     2. Noun Phrases Count
     3. Syntax Errors Count
-
 3. Readability Features:
     1. Readability Index Score
     2. Difficult Words Count
 
-These features are computed using [TextBlob][1] package and other method and then fed into a Sequential model with two densely
-connected hidden layers, and an output layer that returns a single, continuous value. This model is trained for 1000 
-epochs, and record the training and validation accuracy. Callbacks are provided for early returns incase of no further
-improvement is observed.
+These features are computed using [TextBlob][1] package and other method and then fed into a Sequential model with two 
+densely connected hidden layers, and an output layer that returns a single, continuous value. This model is trained for 
+1000 epochs, and record the training and validation accuracy. Callbacks are provided for early returns in case of no 
+further improvement is observed.
 
-The training happens across all possible feature combinations given and results can be compared (use `main.py` script).
-These combinations sum up to 511. 
+**The training happens across all possible feature combinations given and results can be compared (use `main.py` script).
+These combinations sum up to 511.**
 
 [1]: https://textblob.readthedocs.io/en/dev/
 
 
-4\. Setup Instructions
-----------------------
+4\. Results
+-----------
 
-Please make sure that the dataset is downloaded and extracted to input/asap-aes folder.
+Please see the [report here](docs/report/cs421-report.pdf). A summary is provided below:
 
-1. Install and activate python3 virtual environment
-    see: https://docs.python.org/3/library/venv.html
+1. Inclusion of `essay_set` in training feature set always improved the results. Without `essay_set`, QWK 24.
+2. The feature set (`sentiment`) performed worst with QWK -0.00016. The only feature set to have a  “chance” agreement.
+3. Considering only single feature  sets, (`essay_length`) performed best with QWK ~ 0.15, followed by
+   (`avg_sentence_length`), (`difficult_words`), (`noun_phrases`), (`syntax_errors`), (`readability_index`).
+4. Adding more features didn't always give better results (i.e. more accurate scores).
 
-2. Install required packages via `pip`
-    pip install -r requirements.txt
 
-3. Download additional library files (see notebooks for details, most likely this won't be an issue)
+5\. Setup Instructions
+----------------------
 
-4. Starting Jupyter Server
-    see: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html
+Please make sure that the dataset is downloaded and extracted to input/asap-aes folder.
 
+1. Install and activate python3 virtual environment,
+    see: [https://docs.python.org/3/library/venv.html][2]
+2. Install required packages via `pip install -r requirements.txt`
+3. Download additional library files (see notebooks for details, most likely this won't be an issue)
+4. Start Jupyter Server, 
+   see: [https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html][3]
 5. Run `data_etl.ipynb` first, followed by  `regression_model.ipynb`
 
+[2]: https://docs.python.org/3/library/venv.html
+[3]: https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html
+
+6\. Usage Details
+-----------------
+
+As of now, the program has no command line parameters that could be passes.
+Just use `python3 main.py` once `output/training_set_rel3.pkl` is created.
+
 
-5\. Usage Details
+7\. Visualization
 -----------------
 
-As of now the program has no command line parameters that could be passes.
-Just use `python3 main.py` once output/training_set_rel3.pkl is created.
+### 7.1 Scores Across Essay Sets
+![scores-across-essay-sets.png](docs/presentation/scores-across-essay-sets.png)
 
+### 7.2 Resolved Human Ratings V/s Meta Features.png
+![resolved-human-ratings-vs-meta-features.png](docs/presentation/resolved-human-ratings-vs-meta-features.png)
 
-6\. Visualization and Demo
---------------------------
+### 7.3 Resolved Human Ratings V/s Extracted Features.png
+![resolved-human-ratings-vs-extracted-features.png](docs/presentation/resolved-human-ratings-vs-extracted-features.png)
 
-TODO
-https://asing80.people.uic.edu/cs421/
+### 7.4 Resolved Human Ratings V/s Readability Features.png
+![resolved-human-ratings-vs-redability-features.png](docs/presentation/resolved-human-ratings-vs-redability-features.png)