Merge pull request #5 from hadinh1306/master

Finished README
UBC-MDS · Feb 11, 2018 · 4a1ea0c · 4a1ea0c
2 parents 077df83 + 839ec7e
commit 4a1ea0c
Showing 1 changed file with 93 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ A project by:
 
 ### Problem
 
-Currently, in both Python and R ecosystems, there is no single package that provides easy calculation for AIC, BIC, Mallow's $C_p$, and a united table summary of scores for all models that users can refer to to compare and choose the best model. This leads to manual mathematical calculation, lengthy codes or usage of several packages to obtain one task, which is inefficient.
+Currently, in both Python and R ecosystems, there is no single package that provides easy calculation for AIC, BIC, Mallow's C_p, and a united table summary of scores for all models that users can refer to to compare and choose the best model. This leads to manual mathematical calculation, lengthy codes or usage of several packages to obtain one task, which is inefficient.
 
 ### Solution
 
@@ -46,18 +46,19 @@ metrics.median_absolute_error(y_true, y_pred)
 metrics.r2_score(y_true, y_pred[, …])
 ```
 
-However, there are no functions for AIC, BIC, Mallow's $C_p$ and table output. Thus, our package can be a united source for all popular regression model comparison metrics.
+However, there are no functions for AIC, BIC, Mallow's C_p and table output. Thus, our package can be a united source for all popular regression model comparison metrics.
 
 ## **Timeline**
 
 **Phase I**: 02/05/2018 - 03/11/2018, develop functions to compute AIC, BIC, Mallow's $C_p$ and table output that include all scores for model comparison.
-- ** **Phase II**: From late March, develop other functions to finish the package.
+
+** **Phase II**: From late March, develop other functions to finish the package.
 
 * ** Tentative, and will be updated later
 
 ## **Function Description**
 
-Here, we will describe functions in *Phase I*.
+Here, we will describe functions in *Phase I*. We will also add a documentation for all functions later.
 
 ### AIC
 
@@ -76,9 +77,29 @@ where:
 #### Function
 
 ```
-aic(x, y, n, k, model)
+aic(x, y, n, k, model = 'linear')
 ```
 
+**Parameters:**
+* **x**: ndarray or scipy.sparse matrix, (n_samples, n_features)
+  * Predictive variable(s)
+
+* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
+  * Target variable(s)
+
+* **n**: int
+  * Number of observations
+
+* **k**: int
+  * Number of predictive variable(s) used in the model
+
+* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
+  * Method applied to the model
+
+**Return:**
+* AIC score of the model: int
+
+
 ### BIC
 
 #### Introduction
@@ -96,40 +117,92 @@ where:
 #### Function
 
 ```
+bic(x, y, n, k, model = 'linear')
 ```
+**Parameters:**
+* **x**: ndarray or scipy.sparse matrix, (n_samples, n_features)
+  * Predictive variable(s)
+
+* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
+  * Target variable(s)
 
-### Mallow's $C_p$
+* **n**: int
+  * Number of observations
+
+* **k**: int
+  * Number of predictive variable(s) used in the model
+
+* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
+  * Method applied to the model
+
+**Return:**
+* BIC score of the model: int
+
+### Mallow's C_p
 
 #### Introduction
 
+Mallow's C_p is named for Colin Lingwood Mallows. It is used to assess the fit of regression model, finding the best model involving a subset of predictive variables available for predicting some outcome.
+
+```
+C_p = (SSE_p/MSE) - (n - 2p)
+```
+
+where:
+- SSE_k: residual sum of squares for the subset model containing `p` explanatory
+variables counting the intercept.
+- MSE: mean squared error for the full model (model containing all `k` explanatory variables of interest)
+- n: number of observations
+- p: number of subset explanatory variables
 
 #### Function
 
+```
+mallow(X, x_subset, y, n, p, k, model = 'linear')
+```
+
+**Parameters:**
+* **X**: ndarray or scipy.sparse matrix, (n_samples, n_features)
+  * Predictive variable(s)
+
+* **x_subset**: ndarray or scipy.sparse matrix, (n_samples, n_features)
+  * Predictive variable(s) in the subset model
+
+* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
+  * Target variable(s)
+
+* **n**: int
+  * Number of observations
+
+* **p**: int
+  * Number of predictive variable(s) used in the subset model
+
+* **k**: int
+  * Number of predictive variable(s) used in the model
+
+* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
+  * Method applied to the model
+
+**Return:**
+* Mallow's C_p score of the subset model: int
+
 
 ### Table of comparison
 
 #### Function
 
 ```
-comparison_model(model1, model2,...)
+comparison_model(model)
 ```
 
-**Input**
-
-
-**Output**
+**Parameters:**
+* **model**: str
+  * Models to compare, separate by `,`
 
-A table with model names and their scores. Demo:
+**Return:**
+* A table with model names and their scores. Demo:
 
 | Model  | AIC | BIC | Mallow's C_p |
 |--------|-----|-----|--------------|
 | Model1 | 123 | 145 | 156          |
 | Model2 | 145 | 134 | 167          |
-
-Mallow’s C_p
-
-
-
-R ecosystem:
-`finish_glance`  function from broom package
-    https://www.rdocumentation.org/packages/broom/versions/0.4.2/topics/finish_glance