Skip to content

Commit d93b02b

Browse files
committed
hashiconf 2017
1 parent 783a744 commit d93b02b

21 files changed

+435
-0
lines changed

Diff for: 2017_hashiconf/Dockerfile

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
FROM centos:7
2+
3+
# Install the RPM to get the GPG key securely, then overwrite the epel repo config
4+
RUN rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
5+
6+
RUN yum -y install \
7+
ImageMagick \
8+
git \
9+
graphviz \
10+
java-1.8.0-openjdk-headless \
11+
make \
12+
psmisc \
13+
supervisor \
14+
wget \
15+
&& yum clean all
16+
17+
ENV goversion 1.9
18+
ENV gofile go${goversion}.linux-amd64.tar.gz
19+
ENV gourl https://storage.googleapis.com/golang/${gofile}
20+
ENV GOPATH /go
21+
22+
RUN wget -O /root/plantuml.jar 'https://downloads.sourceforge.net/project/plantuml/plantuml.jar'
23+
24+
RUN wget -q -O /usr/local/${gofile} ${gourl} \
25+
&& mkdir /usr/local/go \
26+
&& tar -xzf /usr/local/${gofile} -C /usr/local/go --strip 1
27+
28+
# supervisor configs
29+
ADD supervisord.conf /etc/supervisord.conf
30+
ADD *.ini /etc/supervisord.d/
31+
ADD run.sh /
32+
RUN chmod 0755 /run.sh
33+
34+
EXPOSE 3999
35+
ENV PATH /usr/local/go/bin:${GOPATH}/bin:${PATH}
36+
37+
RUN go get golang.org/x/tools/cmd/present
38+
RUN go get github.com/jmhodges/justrun
39+
40+
CMD /run.sh

Diff for: 2017_hashiconf/Makefile

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
.PHONY: graphs
2+
docker_image=2017_hashiconf
3+
container_name=hashiconf
4+
pwd=$(shell pwd)
5+
6+
container:
7+
docker build --no-cache -t $(docker_image) ./
8+
9+
present: graphs
10+
$(GOPATH)/bin/present -http 0.0.0.0:3999 -notes ./puppet_overview.slide
11+
12+
down:
13+
@docker rm -f $(container_name) 2>/dev/null || true
14+
15+
# Brings up the dev instance of mf2.
16+
up: down
17+
@chmod 0755 run.sh
18+
docker run -d \
19+
-v $(pwd):/mnt/build \
20+
-v $(pwd)/run.sh:/run.sh \
21+
-p 3999:3999 \
22+
--name $(container_name) \
23+
$(docker_image)
24+
docker logs -f $(container_name)
25+
26+
console:
27+
docker exec -t -i $(container_name) /bin/bash
28+
29+
graphs:
30+
mkdir -p images/{graphs,sequences,diagrams}
31+
java -jar ~/plantuml.jar -o $(pwd)/images/diagrams/ graphs/original.uml
32+
java -jar ~/plantuml.jar -o $(pwd)/images/diagrams/ graphs/original_singlenode.uml
33+
java -jar ~/plantuml.jar -o $(pwd)/images/diagrams/ graphs/with_nomad.uml
34+
convert images/diagrams/original_singlenode.png images/diagrams/original_singlenode.png images/diagrams/original_singlenode.png -append images/diagrams/multi_customers.png

Diff for: 2017_hashiconf/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
A presentation I gave at HashiConf 2017
2+
3+
The slides can be viewed at: http://go-talks.appspot.com/github.com/jasonhancock/presentations/2017_hashiconf/hashiconf.slide

Diff for: 2017_hashiconf/graphs/original.uml

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
@startuml
2+
actor user
3+
node API
4+
node s3
5+
node backend1 [
6+
backend worker
7+
]
8+
node persist [
9+
persistance
10+
]
11+
database database
12+
user->API : process s3 bucket
13+
queue stage [
14+
files to process
15+
]
16+
queue results
17+
API->stage
18+
API-up->s3
19+
API-down->database
20+
s3->API
21+
stage->backend1
22+
backend1->results
23+
results->persist
24+
persist->API
25+
@enduml

Diff for: 2017_hashiconf/graphs/original_singlenode.uml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
@startuml
2+
rectangle "API Server" {
3+
rectangle API [
4+
API services + DB
5+
]
6+
queue queues
7+
}
8+
9+
rectangle "Backend Server 1" {
10+
rectangle backend1 [
11+
backend worker
12+
]
13+
}
14+
15+
rectangle "Backend Server 2" {
16+
rectangle backend2 [
17+
backend worker
18+
]
19+
}
20+
21+
rectangle "Backend Server N" {
22+
rectangle backendn [
23+
backend worker
24+
]
25+
}
26+
27+
queues<->backend1
28+
queues<->backend2
29+
queues<->backendn
30+
31+
@enduml

Diff for: 2017_hashiconf/graphs/with_nomad.uml

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
@startuml
2+
actor user
3+
node API
4+
node s3
5+
node scheduler
6+
node nomad
7+
node backend1 [
8+
backend worker
9+
]
10+
database database
11+
user->API : process s3 bucket
12+
queue stage [
13+
files to process
14+
]
15+
API->stage
16+
API-up->s3
17+
API-down->database
18+
s3->API
19+
stage->scheduler
20+
scheduler->nomad
21+
nomad->backend1
22+
backend1->API
23+
@enduml

Diff for: 2017_hashiconf/hashiconf.slide

+200
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
Backend Batch Processing With Nomad
2+
http://bit.ly/jhancock_nomad
3+
19 Sep 2017
4+
5+
Jason Hancock
6+
Operations Engineer, GrayMeta
7+
8+
https://jasonhancock.com
9+
@jsnby
10+
11+
* Ops Engineer at GrayMeta
12+
13+
.image images/GrayMeta.png
14+
.caption [[https://www.graymeta.com][graymeta.com]]
15+
16+
* What we do...
17+
18+
We index large filesystems, analyze content, extract metadata, etc.
19+
20+
* Our Workload Composition
21+
22+
Lots of backend jobs, essentially one job per file.
23+
24+
Some jobs are small and fast, others are large and can take multiple hours to complete.
25+
26+
: example: harvesting a 20kB jpeg vs a 3 hour 4k resolution video
27+
28+
* Our original architecture
29+
30+
.image images/diagrams/original.png
31+
32+
: A user tells our API to process some storage like an s3 bucket.
33+
: We walk that filesystem, queuing up jobs to process
34+
: Our backend worker reads from the queue, process jobs, pushes results into the results queue
35+
: A persistence process reads jobs from the results queue, pushes them into the database through our API
36+
37+
* Scaling the original architecture
38+
39+
The original architecture scales horizontally
40+
41+
.image images/diagrams/original_singlenode.png
42+
43+
* Things were good, but we had operational issues
44+
45+
Some customers have really large files, and our scheduling algorithm was "dump it into a queue and hope for the best"
46+
47+
: A job could get launched on a box that didn't have enough disk and it would just fail.
48+
49+
* To counter our lack of intelligent bin-packing
50+
51+
We over-provisioned all our nodes on disk
52+
53+
.image images/diagrams/overprovisioned.png
54+
55+
* Our CFO wasn't pleased
56+
57+
.image images/money_in_wallet.jpg
58+
.caption [[https://www.flickr.com/photos/68751915@N05/6722569541/][Money In Wallet]] by [[http://401kcalculator.org/][401kcalculator.org]], CC BY-SA 2.0
59+
60+
* Other problems: Security
61+
62+
We didn't have a defense against processing malicious content that exploited 3rd party vulnerabilities like [[https://imagetragick.com/][ImageTragick]]
63+
64+
Our processing nodes had too much sensitive configuration data and could access the storage directly.
65+
66+
* Other problems: maintenance
67+
68+
- We can add nodes to the cluster easily
69+
- We didn't have a way to enable "drain" mode to pull machines out of rotation without killing any running jobs
70+
71+
* Other problems: multi-tenancy
72+
73+
Our architecture solved the problem for 1 customer. We scaled by adding dedicated resources for each customer:
74+
75+
.image images/diagrams/multi_customers.png
76+
77+
We were now scaling up and down multiple compute clusters on demand.
78+
79+
* Other problems: multi-tenancy, idle machines
80+
81+
Each customer required at least 1 worker, but often times they were idle.
82+
83+
We needed a way to share these backend processing nodes for multiple customers
84+
85+
: needed a way to scale the processing cluster up or down based on overall customer demand, (1 cluster to manage instead of 30 clusters to manage)
86+
87+
* Coming around to a solution
88+
89+
- Wanted to do the processing in a container with limited privileges/credentials
90+
- Wanted to try to solve a lot of the problems/pain points of our current architecture if possible
91+
- Wanted to avoid writing our own scheduling algorithms
92+
- Try to keep it simple
93+
94+
* Started shopping for a container orchestrator
95+
96+
.image images/shopping_cart.jpg
97+
.caption [[https://www.flickr.com/photos/drb62/4273903651/][Shopping Cart]] By Daniel R. Blume, CC BY-SA 2.0
98+
99+
* Requirements
100+
101+
- Easy to operate
102+
- Easy to use API
103+
- Can schedule jobs based on CPU, Memory and Disk resources
104+
105+
: easy to operate...we were going to be deploying this everywhere (cloud, on-prem, etc.) and operational easy was a primary concern. We couldn't waltz into a new customer's infrastructure and expect them to deploy and manage a complicated solution
106+
: easy to use API - wanted to make the development experience as easy as possible
107+
: ability to schedule based on several dimensions, most important to us at the time was disk space
108+
109+
* Nomad to the Rescue
110+
111+
.image images/nomad.svg _ 400
112+
113+
* What Nomad solved for us out-of-the-box
114+
115+
- Easy to deploy/operate
116+
- Maintenance (removing/draining nodes)
117+
- Bin-packing
118+
.image images/diagrams/heterogeneous_nodes.png
119+
120+
* Easy to use API
121+
122+
.code launch.go /START 1OMIT/,/END 1OMIT/
123+
: Pretty easy to define a job using the Go client
124+
125+
* Launch the job
126+
127+
.code launch.go /START 2OMIT/,/END 2OMIT/
128+
129+
* Nomad written in Go
130+
131+
If the docs weren't clear, there was a full client implementation (the Nomad CLI) to borrow from
132+
133+
* Re-architecting to use Nomad
134+
135+
.image images/diagrams/with_nomad.png
136+
137+
Decided the jobs running in containers wouldn't touch the source storage directly
138+
139+
* Accessing the API
140+
141+
- Container is given a narrowly scoped OAuth token (scoped to a single source file)
142+
- Allows the job in the container to download the file and store the results for only that file
143+
144+
* Multi-tenancy
145+
146+
We inject configuration into the container via environment variables
147+
148+
- URLs of API endpoints
149+
- Auth token
150+
151+
Doing this allows us to configure jobs for any customer on a shared Nomad cluster
152+
153+
: use the same container image for all customers
154+
155+
* Solved a lot of our problems
156+
157+
- Security (never a solved problem) - better isolation
158+
- Maintenance
159+
- Reduced operational overhead (managing one cluster vs. multiple single-tenant clusters)
160+
- Heterogeneous nodes, more realistically provisioned
161+
- Better utilization of resources
162+
163+
: We built an "unplaceable job" alert. We monitor for any jobs in Nomad's "pending" state that can't ever be placed due to the resources requested by the job vs. the resources available in the cluster
164+
165+
* Flexibility - Other Container Schedulers?
166+
167+
- Launch jobs in AWS ECS
168+
- Launch jobs in k8s?
169+
170+
: Our code's written in Go.
171+
: We built our scheduler as an interface. So there's a Nomad implementation, an ECS implemenation, and we'll likely build a k8s implementation next.
172+
: allows us to swap implementations based on customer's needs
173+
174+
* Flexibility - dev environment
175+
176+
- Big advocate of local development environments
177+
- Full stack
178+
- As close to production as possible
179+
180+
* Dev env
181+
182+
- We use a monolithic Docker image that contains our full stack
183+
- Runs the same version of Nomad and Consul as production
184+
- We use the Nomad "raw_exec" driver
185+
186+
: unless we're testing a new version of Nomad and/or Consul
187+
188+
* Why raw_exec?
189+
190+
- Avoids having to do something creative like Docker-in-Docker
191+
- Avoids having to bundle up our processing binary into a container image every time it's recompiled
192+
- Allows rapid iteration without faking anything.
193+
194+
: confidence
195+
196+
* Summary - Moving our batch processing to Nomad...
197+
198+
- Required architectural changes that unlocked additional flexability in our platform
199+
- Saved us from having to implement a scheduler
200+
- Helped us better utilize compute resources

Diff for: 2017_hashiconf/images/GrayMeta.png

25.4 KB
Loading
20.8 KB
Loading

Diff for: 2017_hashiconf/images/diagrams/multi_customers.png

41.8 KB
Loading

Diff for: 2017_hashiconf/images/diagrams/original.png

18.4 KB
Loading
11.7 KB
Loading

Diff for: 2017_hashiconf/images/diagrams/overprovisioned.png

19 KB
Loading

Diff for: 2017_hashiconf/images/diagrams/with_nomad.png

18.2 KB
Loading

Diff for: 2017_hashiconf/images/money_in_wallet.jpg

221 KB
Loading

Diff for: 2017_hashiconf/images/nomad.svg

+1
Loading

Diff for: 2017_hashiconf/images/shopping_cart.jpg

178 KB
Loading

0 commit comments

Comments
 (0)