Skip to content

Commit

Permalink
Merge pull request #5 from hadinh1306/master
Browse files Browse the repository at this point in the history
Finished README
  • Loading branch information
rq1995 authored Feb 11, 2018
2 parents 077df83 + 839ec7e commit 4a1ea0c
Showing 1 changed file with 93 additions and 20 deletions.
113 changes: 93 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ A project by:

### Problem

Currently, in both Python and R ecosystems, there is no single package that provides easy calculation for AIC, BIC, Mallow's $C_p$, and a united table summary of scores for all models that users can refer to to compare and choose the best model. This leads to manual mathematical calculation, lengthy codes or usage of several packages to obtain one task, which is inefficient.
Currently, in both Python and R ecosystems, there is no single package that provides easy calculation for AIC, BIC, Mallow's C_p, and a united table summary of scores for all models that users can refer to to compare and choose the best model. This leads to manual mathematical calculation, lengthy codes or usage of several packages to obtain one task, which is inefficient.

### Solution

Expand Down Expand Up @@ -46,18 +46,19 @@ metrics.median_absolute_error(y_true, y_pred)
metrics.r2_score(y_true, y_pred[, …])
```

However, there are no functions for AIC, BIC, Mallow's $C_p$ and table output. Thus, our package can be a united source for all popular regression model comparison metrics.
However, there are no functions for AIC, BIC, Mallow's C_p and table output. Thus, our package can be a united source for all popular regression model comparison metrics.

## **Timeline**

**Phase I**: 02/05/2018 - 03/11/2018, develop functions to compute AIC, BIC, Mallow's $C_p$ and table output that include all scores for model comparison.
- ** **Phase II**: From late March, develop other functions to finish the package.

** **Phase II**: From late March, develop other functions to finish the package.

* ** Tentative, and will be updated later

## **Function Description**

Here, we will describe functions in *Phase I*.
Here, we will describe functions in *Phase I*. We will also add a documentation for all functions later.

### AIC

Expand All @@ -76,9 +77,29 @@ where:
#### Function

```
aic(x, y, n, k, model)
aic(x, y, n, k, model = 'linear')
```

**Parameters:**
* **x**: ndarray or scipy.sparse matrix, (n_samples, n_features)
* Predictive variable(s)

* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
* Target variable(s)

* **n**: int
* Number of observations

* **k**: int
* Number of predictive variable(s) used in the model

* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
* Method applied to the model

**Return:**
* AIC score of the model: int


### BIC

#### Introduction
Expand All @@ -96,40 +117,92 @@ where:
#### Function

```
bic(x, y, n, k, model = 'linear')
```
**Parameters:**
* **x**: ndarray or scipy.sparse matrix, (n_samples, n_features)
* Predictive variable(s)

* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
* Target variable(s)

### Mallow's $C_p$
* **n**: int
* Number of observations

* **k**: int
* Number of predictive variable(s) used in the model

* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
* Method applied to the model

**Return:**
* BIC score of the model: int

### Mallow's C_p

#### Introduction

Mallow's C_p is named for Colin Lingwood Mallows. It is used to assess the fit of regression model, finding the best model involving a subset of predictive variables available for predicting some outcome.

```
C_p = (SSE_p/MSE) - (n - 2p)
```

where:
- SSE_k: residual sum of squares for the subset model containing `p` explanatory
variables counting the intercept.
- MSE: mean squared error for the full model (model containing all `k` explanatory variables of interest)
- n: number of observations
- p: number of subset explanatory variables

#### Function

```
mallow(X, x_subset, y, n, p, k, model = 'linear')
```

**Parameters:**
* **X**: ndarray or scipy.sparse matrix, (n_samples, n_features)
* Predictive variable(s)

* **x_subset**: ndarray or scipy.sparse matrix, (n_samples, n_features)
* Predictive variable(s) in the subset model

* **y**: ndarray, shape (n_samples), or (n_samples, n_targets)
* Target variable(s)

* **n**: int
* Number of observations

* **p**: int
* Number of predictive variable(s) used in the subset model

* **k**: int
* Number of predictive variable(s) used in the model

* **model**: default 'linear' | 'logistic' | 'ridge' | 'lasso' | 'elasticnet'
* Method applied to the model

**Return:**
* Mallow's C_p score of the subset model: int


### Table of comparison

#### Function

```
comparison_model(model1, model2,...)
comparison_model(model)
```

**Input**


**Output**
**Parameters:**
* **model**: str
* Models to compare, separate by `,`

A table with model names and their scores. Demo:
**Return:**
* A table with model names and their scores. Demo:

| Model | AIC | BIC | Mallow's C_p |
|--------|-----|-----|--------------|
| Model1 | 123 | 145 | 156 |
| Model2 | 145 | 134 | 167 |

Mallow’s C_p



R ecosystem:
`finish_glance` function from broom package
https://www.rdocumentation.org/packages/broom/versions/0.4.2/topics/finish_glance

0 comments on commit 4a1ea0c

Please sign in to comment.