Skip to content

Commit c6904eb

Browse files
Jake TeoJake Teo
Jake Teo
authored and
Jake Teo
committed
initial commit
0 parents  commit c6904eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+18446
-0
lines changed

.DS_Store

6 KB
Binary file not shown.

Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line.
5+
SPHINXOPTS =
6+
SPHINXBUILD = sphinx-build
7+
SPHINXPROJ = DataScience
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

_build/.DS_Store

6 KB
Binary file not shown.

_build/doctrees/association.doctree

4.28 KB
Binary file not shown.

_build/doctrees/decomposition.doctree

2.43 KB
Binary file not shown.

_build/doctrees/difference.doctree

4.49 KB
Binary file not shown.

_build/doctrees/environment.pickle

19.7 KB
Binary file not shown.

_build/doctrees/forecasting.doctree

2.38 KB
Binary file not shown.

_build/doctrees/general.doctree

20.6 KB
Binary file not shown.

_build/doctrees/index.doctree

3.86 KB
Binary file not shown.

_build/doctrees/supervised.doctree

5.99 KB
Binary file not shown.

_build/doctrees/unsupervised.doctree

2.85 KB
Binary file not shown.

_build/html/.buildinfo

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: 05ff58d1be54252f0a6748edeeab048c
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7

_build/html/.nojekyll

Whitespace-only changes.

_build/html/_images/bias-variance.png

531 KB
Loading
+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Tests of Association
2+
=====================
3+
4+
Pearson's Correlation
5+
---------------------
6+
7+
X, Explantory: ``Continuous``
8+
Y, Response: ``Continuous``
9+
Type: ``Non-Parametric``
10+
11+
12+
Spearman's Rank Correlation
13+
---------------------------
14+
X, Explantory:``Continuous``
15+
Y, Response: ``Continuous``
16+
Type: ``Parametric``
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Time Series Decomposition
2+
=========================
+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
Tests of Difference
2+
===================
3+
4+
Chi-Square Test
5+
---------------
6+
X, Explantory: ``Categorical``
7+
Y, Response: ``Categorical``
8+
Type: ``Non-Parametric``
9+
10+
11+
Student's T-Test
12+
----------------
13+
Type: ``Parametric``
14+
15+
16+
ANOVA
17+
-----
18+
Type: ``Parametric``
19+
20+
Analysis of Variance (ANOVA).
+2
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Forecasting
2+
===========

_build/html/_sources/general.rst.txt

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
General Notes
2+
=============
3+
4+
Variables
5+
---------
6+
``x`` = independent variable = explanatory = predictor
7+
8+
``y`` = dependent variable = response = target
9+
10+
11+
Data Types
12+
----------
13+
The type of data is essential as it determines what kind of tests can be applied to it.
14+
15+
``Continuous:`` Also known as quantitative. Unlimited number of values
16+
17+
``Categorical:`` Also known as discrete or qualitative. Fixed number of values or *categories*
18+
19+
20+
Bias-Variance Tradeoff
21+
-----------------------
22+
The best predictive algorithm is one that has good *Generalization Ability*.
23+
With that, it will be able to give accurate predictions to new and previously unseen data.
24+
25+
*High Bias* results from *Underfitting* the model. This usually results from erroneous assumptions, and cause the model to be too general.
26+
27+
*High Variance* results from *Overfitting* the model, and it will predict the training dataset very accurately, but not with unseen new datasets.
28+
This is because it will fit even the slightless noise in the dataset.
29+
30+
The best model with the highest accuarcy is the middle ground between the two.
31+
32+
.. figure:: ./images/bias-variance.png
33+
:scale: 25 %
34+
:align: center
35+
36+
from Andrew Ng's lecture
37+
38+
Steps to Build a Predictive Model
39+
--------------------------------------------
40+
Train Test Split
41+
*****************
42+
Split the dataset into *Train* and *Test* datasets.
43+
By default, sklearn assigns 75% to train & 25% to test randomly.
44+
45+
.. code:: Python
46+
47+
train_predictor, test_predictor, train_target, test_target
48+
= train_test_split(predictor, target, test_size=0.25)
49+
50+
Create Model
51+
************
52+
Choose model and set model parameters (if any).
53+
54+
.. code:: Python
55+
56+
clf = DecisionTreeClassifier()
57+
58+
59+
Fit Model
60+
************
61+
Fit the model using the training dataset.
62+
63+
.. code:: Python
64+
65+
model = clf.fit(train_predictor, train_target)
66+
67+
>>> print model
68+
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
69+
max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
70+
min_samples_split=2, min_weight_fraction_leaf=0.0,
71+
presort=False, random_state=None, splitter='best')
72+
73+
Test Model
74+
**********
75+
Test the model by predicting identity of unseen data using the testing dataset.
76+
77+
.. code:: Python
78+
79+
predictions = model.predict(test_predictor)
80+
81+
82+
Score Model
83+
***********
84+
Use a confusion matrix and...
85+
86+
>>> print sklearn.metrics.confusion_matrix(test_target,predictions)
87+
[[14 0 0]
88+
[ 0 13 0]
89+
[ 0 1 10]]
90+
91+
accuarcy percentage score to obtain the predictive accuarcy.
92+
93+
>>> print sklearn.metrics.accuracy_score(test_target, predictions)*100, '%'
94+
97.3684210526 %
95+

_build/html/_sources/index.rst.txt

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
.. Data Science documentation master file, created by
2+
sphinx-quickstart on Tue Jun 27 22:55:47 2017.
3+
You can adapt this file completely to your liking, but it should at least
4+
contain the root `toctree` directive.
5+
6+
Data Science in Python
7+
========================================
8+
This documentation summarises various statistics and machine learning techniques in Python.
9+
10+
11+
12+
13+
.. toctree::
14+
:maxdepth: 2
15+
:caption: Contents
16+
:numbered:
17+
18+
general
19+
difference
20+
association
21+
supervised
22+
unsupervised
23+
decomposition
24+
forecasting
+36
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
Supervised Learning
2+
===================
3+
4+
Classification
5+
--------------
6+
7+
K Nearest Neighbours (KNN)
8+
**************************
9+
10+
Decision Tree
11+
**************************
12+
13+
Random Forest
14+
**************************
15+
16+
Logistic Regression
17+
**************************
18+
19+
Support Vector Machine
20+
***********************
21+
22+
23+
Regression
24+
----------
25+
26+
Ordinary Least Squares (OLS) Regression
27+
***************************************
28+
Best fit line ``ŷ = a + bx`` is drawn based on the ordrinary least squares method. i.e., least total area of squares with length from each x,y point to regresson line.
29+
30+
31+
Ridge Regression
32+
****************
33+
34+
Lasso Regression
35+
****************
36+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Unsupervised Learning
2+
=====================
3+
4+
Clustering
5+
----------
6+
7+
K-Means
8+
**************************

_build/html/_static/ajax-loader.gif

673 Bytes
Loading

0 commit comments

Comments
 (0)