Skip to content

Latest commit

 

History

History
172 lines (121 loc) · 5.61 KB

6.cellranger.md

File metadata and controls

172 lines (121 loc) · 5.61 KB

Building and running a Cellranger container

<< Episode 5 | TOC | Episode 7 >>


Objectives

This episode will introduce Dockerfiles.


Dockerfiles

A Dockerfile is essentially a recipe for creating a container. In cases where you can't find a pre-built container with the software you need, you will need to create your own. Just about any command you would normally use to install software is a valid command in a Dockerfile. The only difference is that it needs to be prefaced with a Docker instruction. The general format looks like this:

# Comment
INSTRUCTION arguments

There are lot of INSTRUCTIONS available from Docker (see their documentation), but we will only focus on a few.

The very fist instruction you need to use is FROM:

FROM ubuntu:latest

FROM centos:7

FROM rocker/tidyverse:3.5

FROM bskjerven/cellranger

You can only use one FROM command in a Dockerfile...I'm just showing several examples here. The FROM command tells Docker what base image to use. You can use standard Linux distributions (like Ubuntu and CentOS) and install everything you need. There are also common, prebuilt containers for popular applications, like a tidyverse RStudio.

DockerHub is the most popular source for images, but there are other repositories out there, especially for bioinformatics software. More repos are listed at the end of this example.

The most common instruction you'll use is RUN; this will execute whatever command follows. For example:

RUN apt-get install python

RUN conda install matplotlib

RUN RScript -e "source('http://bioconductor.org/biocLite.R')" \
   -e "biocLite('DropletUtils')" \
   -e "biocLite('GSEABase')" \
   -e "biocLite('GSVA')"

Note that in the last example, I installed several R packages in a single line. The reason for this is that each RUN command generates a layer in your final image. It's good practice to try and minimise the number of layers in your image to reduce the size and complexity of it.


Building Cellranger

In the episode6_cellranger_build directory is a Dockerfile for building Cellranger and bcl2fastq. It's broken up into several sections:

  • Install build tools with apt (git, wget, unzip, etc.)
  • Download and install Cellranger
  • Download and install bcl2fastq

There are some other Docker instructions:

  • WORKDIR automatically creates a directory (and cds to it) where we can download temporary files
  • ARG allows a user to pass in options for building (e.g. package version numbers)
  • ENV lets us set environment variables for when the container runs

Before we build the image, make sure you're in the same directory as the Dockerfile:

cd bio-workshop-18/episode6_cellranger_build

To build the image, we simply run:

docker build -t bskjerven/cellranger .

Sending build context to Docker daemon  3.584kB
Step 1/11 : FROM centos:7
 ---> 5182e96772bf
Step 2/11 : RUN yum -y update     && yum -y install          file          git          sssd-client          which          wget          unzip     && yum clean all
 ---> Running in 246cae4fe147
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors

....

After a few minutes you should have a built Cellranger image:

docker images bskjerven/cellranger
REPOSITORY             TAG                 IMAGE ID            CREATED             SIZE
bskjerven/cellranger   latest              1c4b8ff2526c        2 hours ago         1.99GB

Running our Cellranger container

We can now run our Cellranger container and test it. We'll run the mkfastq and count commands with some sample data. You can refer to the scripts in the directoy if you like, but here are the full Docker commands:

docker run \
	--rm \
   -v $(pwd):/data -w /data \
   -p 8080:8080 \
   bskjerven/cellranger cellranger mkfastq \
   --id=tiny-bc \
   --run=cellranger-tiny-bcl-1.2.0 \
   --csv=cellranger-tiny-bcl-samplesheet-1.2.0.csv \
   --uiport=8080

Note that the Docker run options have just been split up over mulitple lines with \ to improve readability.

And we should see some output:

/opt/cellranger-2.2.0/cellranger-cs/2.2.0/bin
cellranger mkfastq (2.2.0)
Copyright (c) 2018 10x Genomics, Inc.  All rights reserved.
-------------------------------------------------------------------------------

Martian Runtime - '2.2.0-v2.3.3'
Serving UI at http://3d1a670df38b:46839?auth=g5hMEGOSBEAqDo2S3_qgEFXFw8X1-6ykaAnVkDFN8lY

Running preflight checks (please wait)...
Checking run folder...
Checking RunInfo.xml...
Checking system environment...
Checking barcode whitelist...
Checking read specification...
...

Once that's finished, we can also run a count:

docker run \
	--rm \
	-v $(pwd):/data -w /data \
	-p 8080:8080 \
	bskjerven/cellranger cellranger count \
	--id=test \
	--transcriptome=refdata-cellranger-ercc92-1.2.0/ \
	--fastqs=tiny-bcl/outs/fastq_path \
	--uiport=8080

Conclusion

Dockerfiles are the recipes that permit to build containers from scratch. They have been used to build the containers you pull from Web repositories. If you cannot find a container with the package you need online, you can build your own by writing a new Dockerfile.


Best practices


<< Episode 5 | TOC | Episode 7 >>