This hands on is based on Materials for Analyzing Next-Generation Sequencing (ANGUS) course.
- run simple ubuntu based container
# host
docker run ubuntu:14.04
- list docker containers
# host
docker ps
docker ps -a
docker images
-
the container has been created, but had nothing to do, so it shut down
-
we can attach to the container (like ssh to the remote)
-i keep STDIN open
-t allocate pseudo-tty
# host
docker run -it ubuntu:14.04
- use second terminal window to list containers
# host
docker ps -a
- exit with
exit
- if you run same command again, new ubuntu base container will be created
- make a new container, create a file and exit. Restart the container again (
docker start [container ID]
,docker attach [container ID]
). Are your changes still there? - you have to delete containers by hand, they will stack up very quickly,
- you can
docker run
with-rm
flag to delete the container once it exits
# host
docker run --rm ubuntu:14.04
- we will build a Docker image for the MEGAHIT short read assembler (http://angus.readthedocs.org/en/2015/assembling-ecoli.html)
- start new container
# host
docker run -it ubuntu:14.04
- install necessary dependencies (remember, you're already root)
# in the container
apt-get update && apt-get install -y g++ make git zlib1g-dev python
- checkout and install megahit
# in the container
git clone https://github.com/voutcn/megahit.git /home/megahit
cd /home/megahit && make
- we don't want to do it again, we want to keep this image for use
# host
docker commit -m "build megahit" e82c6007f7a4 megahit
docker images
- we can now run it and use megahit
# host
docker run -it megahit
# in the container
/home/megahit/megahit
- later we'll put it in dockerhub so that no one ever has to do it again
- how do we get the data for analysis to the container?
- get data locally
# host
mkdir $HOME/data
cd $HOME/data
curl -O http://public.ged.msu.edu.s3.amazonaws.com/ecoli_ref-5m-trim.se.fq.gz
curl -O http://public.ged.msu.edu.s3.amazonaws.com/ecoli_ref-5m-trim.pe.fq.gz
- run container and connect to local data directory
# host
docker run -v $HOME/data:/data -it megahit
# in the container
ls /data
- lets run the assembly
# in the container
/home/megahit/megahit --12 /data/*.pe.fq.gz \
-r /data/*.se.fq.gz \
-o /data/ecoli -t 4
- exit and look at analysis data
# in the container
exit
# host
ls $HOME/data
ls $HOME/data/ecoli
- we can run megahit command without entering the container like this (first do
rm -rf [local ecoli dir]
)
# host
docker run -v $HOME/data:/data \
-it megahit \
sh -c '/home/megahit/megahit --12 /data/*.pe.fq.gz \
-r /data/*.se.fq.gz \
-o /data/ecoli -t 4'
- we could also put the command in the script (on host or container) and run the script
do-assemble.sh
#! /bin/bash
rm -fr /data/ecoli
/home/megahit/megahit --12 /data/*.pe.fq.gz \
-r /data/*.se.fq.gz \
-o /data/ecoli -t 4
# host
chmod +x do-assemble.sh
- create a Dockerfile
# host
FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y g++ make git zlib1g-dev python
RUN git clone https://github.com/voutcn/megahit.git /home/megahit
RUN cd /home/megahit && make
CMD /data/do-assemble.sh
- we will now build and image based on the Dockerfile
# host
docker build -t megahit .
- and run a container
# host
docker run -v $HOME/data/:/data -it megahit