Skip to content
This repository was archived by the owner on Oct 22, 2024. It is now read-only.

Commit 2a926e3

Browse files
author
Christian Barra
committed
first commit
0 parents  commit 2a926e3

17 files changed

+1256
-0
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.DS_Store

Diff for: Amazon_Lambda.ipynb

+404
Large diffs are not rendered by default.

Diff for: From_0_to_lambda.ipynb

+131
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "code",
5+
"execution_count": 2,
6+
"metadata": {
7+
"collapsed": false
8+
},
9+
"outputs": [],
10+
"source": [
11+
"from IPython.display import Image"
12+
]
13+
},
14+
{
15+
"cell_type": "markdown",
16+
"metadata": {},
17+
"source": [
18+
"# From 0 to Lambda\n",
19+
"# Machine Learning with Cloud system"
20+
]
21+
},
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {},
25+
"source": [
26+
"# Hello There !\n",
27+
"\n",
28+
"## My Name is Christian"
29+
]
30+
},
31+
{
32+
"cell_type": "markdown",
33+
"metadata": {},
34+
"source": [
35+
"# Requirements !!!\n",
36+
"\n",
37+
"* You must have installed Docker on your computer\n",
38+
"* You must have valid aws user account\n",
39+
"* You must have AWS cli installed\n",
40+
"* You must have a computer"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"Today we are going to have a workshop about:\n",
48+
"\n",
49+
"* Some stuff about Machine Learning\n",
50+
"* Amazon Web Service Lambda\n",
51+
"* Deploy Machine Learning Model in the Cloud"
52+
]
53+
},
54+
{
55+
"cell_type": "markdown",
56+
"metadata": {},
57+
"source": [
58+
"# First Part: Amazon Lambda\n",
59+
"\n",
60+
"* What is AWS Lambda ?\n",
61+
"* Pros of AWS Lambda\n",
62+
"* Cons of AWS Lambda\n",
63+
"* Prices\n",
64+
"* Deployment packages\n",
65+
"* Other serverless system\n",
66+
"* Create our first Lambda function\n",
67+
"* Deploy it\n",
68+
"* Goals of this lectureN"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"# Second Part: NLP with Python\n",
76+
"\n",
77+
"* What is NLP ?\n",
78+
"* NLP with Python\n",
79+
"* Supervised Sentiment Analysis\n",
80+
"* Load our data\n",
81+
"* Vectorize your test\n",
82+
"* Learn a model\n",
83+
"* Predict\n",
84+
"* Score\n",
85+
"* Twitter\n",
86+
"* Predict sentiment"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"metadata": {},
92+
"source": [
93+
"# Third Part: Putting everything together\n",
94+
"\n",
95+
"* Create an AWS Lambda function to download tweets and save them in S3\n",
96+
"* Create an AWS Lambda function to load the tweets, load the model and save the result in S3"
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"metadata": {
103+
"collapsed": true
104+
},
105+
"outputs": [],
106+
"source": []
107+
}
108+
],
109+
"metadata": {
110+
"celltoolbar": "Slideshow",
111+
"kernelspec": {
112+
"display_name": "Python 3",
113+
"language": "python",
114+
"name": "python3"
115+
},
116+
"language_info": {
117+
"codemirror_mode": {
118+
"name": "ipython",
119+
"version": 3
120+
},
121+
"file_extension": ".py",
122+
"mimetype": "text/x-python",
123+
"name": "python",
124+
"nbconvert_exporter": "python",
125+
"pygments_lexer": "ipython3",
126+
"version": "3.5.2"
127+
}
128+
},
129+
"nbformat": 4,
130+
"nbformat_minor": 1
131+
}

Diff for: Putting_everything_together.ipynb

+162
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Putting everything together"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"# Build our deployment package\n",
15+
"\n",
16+
"* We need to start a EC2 instance with AMI (at least 2GB of memory)"
17+
]
18+
},
19+
{
20+
"cell_type": "raw",
21+
"metadata": {
22+
"collapsed": false
23+
},
24+
"source": [
25+
"# Connect to your instance\n",
26+
"\n",
27+
"sudo yum -y update\n",
28+
"sudo yum -y install gcc-c++ python27-devel atlas-sse3-devel lapack-devel\n",
29+
"\n",
30+
"sudo /usr/local/bin/pip install -U pip\n",
31+
"sudo /usr/local/bin/pip install tweepy numpy scipy scikit-learn\n",
32+
"\n",
33+
"cd /usr/local/lib/python2.7/site-packages/\n",
34+
"zip -r9 ~/MyLambdaPackage.zip *\n",
35+
"cd /usr/local/lib64/python2.7/site-packages/\n",
36+
"zip -r9 ~/MyLambdaPackage.zip *\n",
37+
"\n",
38+
"# You can download your package now an put in on S3\n",
39+
"# And then strip everything....."
40+
]
41+
},
42+
{
43+
"cell_type": "raw",
44+
"metadata": {
45+
"collapsed": false
46+
},
47+
"source": [
48+
"https://github.com/ryansb/sklearn-build-lambda"
49+
]
50+
},
51+
{
52+
"cell_type": "markdown",
53+
"metadata": {},
54+
"source": [
55+
"# 1) Create the predict.py \n",
56+
"\n",
57+
"This function will receive a stream from S3, it will load the data and predict the sentiment and then save the json file in s3."
58+
]
59+
},
60+
{
61+
"cell_type": "markdown",
62+
"metadata": {},
63+
"source": [
64+
"# 2) Create the reader.py\n",
65+
"\n",
66+
"This function will be called every 1 min from AWS CloudWatch Events and it will save a list of 50 tweets in S3"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"metadata": {},
72+
"source": [
73+
"# 3) Create the buckets\n",
74+
"You will need 2 buckets:\n",
75+
" * twitter.reader\n",
76+
" * twitter.sentiment\n",
77+
" * twitter.deploypack"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"# 4) Create the deployment package\n",
85+
"\n",
86+
"* cp deployment_pack.zip deployment_pack_with_code.zip\n",
87+
"\n",
88+
"* zip -9r deployment_pack_with_code.zip reader.py predict.py models/clf.pkl models/vocabulary.pkl"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"metadata": {},
94+
"source": [
95+
"# 5) Upload everything\n",
96+
"* load the deploypack in twitter.deploypack\n",
97+
"* create a lambda function (call it \"predict\") with S3 as a trigger, put twitter.reader as a bucket to control\n",
98+
"* create a lambda function (call it \"reader\") with CloudWatch\n",
99+
"* for both functions set 256MB as memory and 60 secs as timeout\n",
100+
"* Watch your twitter.sentiment !"
101+
]
102+
},
103+
{
104+
"cell_type": "markdown",
105+
"metadata": {},
106+
"source": [
107+
"# Error using pickle with the vectorizer"
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {},
113+
"source": [
114+
"Probably related to Pickle and sparse matrix"
115+
]
116+
},
117+
{
118+
"cell_type": "markdown",
119+
"metadata": {},
120+
"source": [
121+
"format not found: AttributeError\n",
122+
"Traceback (most recent call last):\n",
123+
"File \"/var/task/predict.py\", line 32, in handler\n",
124+
"tweets_vectors = vectorizer.transform([\"Hillary doesn't got what it takes...Trump neither but Hillary my God! https://t.co/7OOeEO4Epi\"])\n",
125+
"File \"/var/task/sklearn/feature_extraction/text.py\", line 1334, in transform\n",
126+
"return self._tfidf.transform(X, copy=False)\n",
127+
"File \"/var/task/sklearn/feature_extraction/text.py\", line 1037, in transform\n",
128+
"X = X * self._idf_diag\n",
129+
"File \"/var/task/scipy/sparse/base.py\", line 319, in __mul__\n",
130+
"return self._mul_sparse_matrix(other)\n",
131+
"File \"/var/task/scipy/sparse/compressed.py\", line 478, in _mul_sparse_matrix\n",
132+
"other = self.__class__(other) # convert to this format\n",
133+
"File \"/var/task/scipy/sparse/compressed.py\", line 28, in __init__\n",
134+
"if arg1.format == self.format and copy:\n",
135+
"File \"/var/task/scipy/sparse/base.py\", line 525, in __getattr__\n",
136+
"raise AttributeError(attr + \" not found\")\n",
137+
"AttributeError: format not found"
138+
]
139+
}
140+
],
141+
"metadata": {
142+
"kernelspec": {
143+
"display_name": "Python 2",
144+
"language": "python",
145+
"name": "python2"
146+
},
147+
"language_info": {
148+
"codemirror_mode": {
149+
"name": "ipython",
150+
"version": 2
151+
},
152+
"file_extension": ".py",
153+
"mimetype": "text/x-python",
154+
"name": "python",
155+
"nbconvert_exporter": "python",
156+
"pygments_lexer": "ipython2",
157+
"version": "2.7.12"
158+
}
159+
},
160+
"nbformat": 4,
161+
"nbformat_minor": 1
162+
}

Diff for: README.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
From 0 to Lambda - PyCon PL 2016 Workshop

0 commit comments

Comments
 (0)