Skip to content

Development: docker (and podman) #356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
697aae4
docker: docker build
falkmielke Jan 28, 2025
cc71aaf
docker: extend rocker/rstudio for inbo
falkmielke Jan 28, 2025
f5adca7
docker: tested/run modified rstudio rocker
falkmielke Jan 28, 2025
2a5c0fc
docker: podman
falkmielke Jan 29, 2025
8bf932d
docker: rename folder
falkmielke Jan 29, 2025
db7d15a
docker: the windows experience
falkmielke Jan 29, 2025
346f6de
docker: screenshots and wrapup V1
falkmielke Jan 29, 2025
e349246
docker: minor adjustments and additions
falkmielke Jan 29, 2025
2297c10
docker: installing private packages
falkmielke Jan 30, 2025
bcacce9
docker: minor adjustments
falkmielke Feb 4, 2025
1e1b862
docker: render quarto -> md
falkmielke Feb 4, 2025
0767adf
docker: image references to static
falkmielke Feb 4, 2025
e09a382
docker: pasting in CSS for callout boxes
falkmielke Feb 4, 2025
3cd97b3
docker: hugo shortcode for callouts
falkmielke Feb 4, 2025
1367a8d
docker (wip): review comments by @florisvdh
falkmielke Feb 10, 2025
2ec9d36
docker: review comments incorporated
falkmielke Feb 11, 2025
4a94915
docker: review changes to markdown
falkmielke Feb 11, 2025
581ef3a
docker: html anchors
falkmielke Feb 11, 2025
7d14644
docker: html anchors, attempt 2
falkmielke Feb 11, 2025
68568c1
docker: html anchors, attempt 3
falkmielke Feb 11, 2025
29b5ed8
docker: html anchors, attempt n+1
falkmielke Feb 11, 2025
260045f
docker: html anchors, attempt n+1
falkmielke Feb 11, 2025
8a0be6b
docker: html anchors working.
falkmielke Feb 11, 2025
ef46b97
docker: adding more references
falkmielke Feb 13, 2025
d0efa99
docker: incorporate review by @ThierryO
falkmielke Feb 14, 2025
81d9927
docker: review suggestions @florisvdh, pt.1
falkmielke Feb 20, 2025
2bab9d3
docker: paragraph rework on images
falkmielke Feb 20, 2025
8db7230
docker: ready for next review round*
falkmielke Feb 20, 2025
b8fd9f0
docker: pre-split changes (review suggestion)
falkmielke Feb 21, 2025
4e5f527
docker: docker_run spin-off
falkmielke Feb 21, 2025
950f02d
docker: docker_build spin-off
falkmielke Feb 21, 2025
52e9a22
docker: containers_podman spin-off
falkmielke Feb 21, 2025
990a04f
docker: central node tutorial
falkmielke Feb 21, 2025
d4451e7
docker: rename for generalization
falkmielke Feb 21, 2025
8997b9a
docker: link preparation and testing
falkmielke Feb 21, 2025
32e0fef
docker: conversion to `.md`
falkmielke Feb 21, 2025
580fb3a
docker: tutorial cluster interactions
falkmielke Feb 21, 2025
db380e3
docker: cherry - a catchy image for the overview
falkmielke Feb 21, 2025
4a40f6e
docker: how to place a caption footnote
falkmielke Feb 21, 2025
28bc562
docker: how to put colon in yaml title
falkmielke Feb 21, 2025
04821ee
docker: that footnote again...
falkmielke Feb 21, 2025
b64f85a
docker: direct link to installation
falkmielke Feb 21, 2025
b862c61
docker: that footnote again :/
falkmielke Feb 21, 2025
65e8d31
docker: still no footnote
falkmielke Feb 21, 2025
1057eea
docker: footnote (n'th attempt)
falkmielke Feb 21, 2025
5b142cc
docker: footnote hack in html
falkmielke Feb 21, 2025
6626c0e
docker: adding a podman image
falkmielke Feb 21, 2025
73cbcea
docker: changing notebook order (rename)
falkmielke Feb 21, 2025
9c3fcb6
docker: crosslinks corrected
falkmielke Feb 21, 2025
4c2cdcf
docker: markdown blocks / hugo clash
falkmielke Feb 21, 2025
df237da
docker: typo performacne drop
falkmielke Mar 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 76 additions & 30 deletions content/tutorials/development_docker/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: Building Containers with Docker and Podman
title: Containers with Docker and Podman
description: Introduction to containerization and the practical use of Docker-like tools.
date: "2025-02-11"
date: "2025-02-20"
authors: [falkmielke]
categories: ["development", "open science"]
tags: ["development", "open science"]
tags: ["development", "open science", "docker", "containers"]
number-sections: false
params:
math: true
Expand Down Expand Up @@ -96,11 +96,14 @@ This is why the rest of this tutorial will focus on terminal access.
On the Windows terminal or Linux shell, you can install `docker` as a terminal tool.

{{% callout note %}}
On Windows, this comes bundled with the App[^1]; the steps below are not necessary.
However, note that you need to run a terminal *as administrator*.
On Windows, this comes bundled with the App; the steps below are not necessary.
There might be ways to get around the Desktop App and facilitate installation, either via WSL2 or using [a windows package manager called Chocolatey](https://en.wikipedia.org/wiki/Chocolatey).

Either way, note that you need to run the docker app or docker in a terminal *as administrator*.

{{% /callout %}}

More info on the debian installation [can be found here](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository).
More info about the installation on Debian-based or Ubuntu Linux systems [can be found here](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository).
The procedure requires you to add an extra repository, [some caution is warranted](https://wiki.debian.org/DontBreakDebian).

``` sh
Expand All @@ -124,7 +127,7 @@ For this change to take effect, log off and log in again and restart the Docker
Containers are managed by a system task ("service" and "socket") which need to be started.
Most likely, your Linux uses `systemd`.
Your system can start and stop that service automatically, by using `systemctl enable <...>`.
However, due to [diverse](https://docs.docker.com/engine/security) [security](https://github.com/moby/moby/issues/9976) [pitfalls](https://snyk.io/blog/top-ten-most-popular-docker-images-each-contain-at-least-30-vulnerabilities), it is good practice to **not keep it enabled** permanently on your system.
However, due to [diverse](https://docs.docker.com/engine/security) [security](https://github.com/moby/moby/issues/9976) [pitfalls](https://snyk.io/blog/top-ten-most-popular-docker-images-each-contain-at-least-30-vulnerabilities), it is good practice to **not keep it enabled** permanently on your system (unless, of course, if you use it all the time).

On a `systemd` system, you can start and stop Docker on demand via the following commands (those will ask you for `sudo` authentification if necessary).

Expand Down Expand Up @@ -155,7 +158,14 @@ Docker is about assembling and working in containers.
"Living" in containers.
Or, rather, you can think of this as living in a ["tiny home", or "mobile home"](https://parametric-architecture.com/tiny-house-movement).
Let's call it a fancy caravan.
The good thing is that you get to pick a general design and to choose all details of the interior.
The good thing is that at least you get to pick a general design and to choose all details of the interior.

<figure>
<img src="../../images/tutorials/development_docker/docker_metaphor_tiny_space.jpg" alt="Black/white image of a tiny home as a metaphor for software containerization." />
<figcaption aria-hidden="true">A tiny home close to "Gare Maritime", Brussels, February 2025.</figcaption>
</figure>



The best thing: if you feel like you do not have the cash, time, or talent to build your own home, you can *of course* use someone else's.
There are a gazillion **Docker images available for you** on [Docker Hub](https://hub.docker.com).
Expand All @@ -182,7 +192,7 @@ If it does not find the resources locally, Docker will download and extract the
docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD rocker/rstudio
```

- The `--rm` flag makes the Docker image non-permanent, i.e. disk space will be freed after you close the container (<a href="#sec-permanence" class="quarto-xref">Section 2.5</a>).
- The `--rm` flag makes the Docker container non-permanent, i.e. disk space will be freed after you close the container (<a href="#sec-permanence" class="quarto-xref">Section 2.5</a>).
- The port specified at `-p` is the one you use to access this local container server (the `-p` actually maps host- and container ports). You have to specify it explicitly, otherwise the host system will not let you pass (`:gandalf-meme:`).
- The `-e` flag allows you to specify environment variables, in this case used to set a password for the RStudio server. But if you do not specify one, a random password will be generated and displayed upon startup (read the terminal output).

Expand Down Expand Up @@ -231,12 +241,13 @@ This is a simple and quick way to run R and RStudio in a container.

However, there are limitations:

{{% callout emphasize %}}
{{% callout note %}}
- You have to live with the R packages provided in the container, or otherwise install them each time you access it...
- ... unless you make your container permanent by omitting the `--rm` option. Note that this will cost considerable disk space, will not transfer to other computers (the original purpose of Docker), and demand occasional updates (<a href="#sec-permanence" class="quarto-xref">Section 2.5</a>).
- You could alternatively add `--pull always` to `docker run`, which will check and pull new versions.
- Speaking of updates: it is good practice to keep software up to date. Occasionally update or simply re-install your Docker image and R packages to get the latest versions.
- You should make sure that the containers are configured correctly and securely. This is especially important with server components which expose your machine to the internet.
- Because most containers contain a linux system, user permissions are taken seriously, and the consequences might be confusing. There are guides online ([e.g. here](https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866)); there are example repositories (like the author's own struggle [here](https://github.com/inbo/containbo?tab=readme-ov-file#understanding-volumes) and [here](https://github.com/inbo/containbo/tree/main/emacs)); base images are well set up and one can normally get by with default users.
- There is a performance penalty from using containers: in inaccurate laymans' terms, they emulate (parts of a) "computer" inside your computer.
{{% /callout %}}

Expand Down Expand Up @@ -309,10 +320,11 @@ cat ~/test.txt
```

will return:
\> cat: /root/test.txt: No such file or directory

> cat: /root/test.txt: No such file or directory

This behavior is desired (in the second workflow above): if you start up a fresh environment each time you work in Docker, you **assure that your work pipeline is independent of prior changes on the system**.
Whether this makes sense as a workflow has to be evaluated with respect to with hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential.
Whether this makes sense as a workflow has to be evaluated with respect to hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential.

You can "link in" folders for working files (note how you have to specify the full path to `new_home`, and that this container uses the root user by default):

Expand All @@ -338,7 +350,7 @@ But it also pays off in complicated server setups and distributed computing.

A standardized container from [Docker Hub](https://hub.docker.com) is a good start.
However, you will probably require personalization.
As a use case, imagine you would like to have an RStudio server which comes with relevant inbo packages pre-installed (e.g. [`inbodb`](https://inbo.github.io/inbodb), [`watina`](https://inbo.github.io/watina); *cf.* [contaINBO](https://github.com/inbo/contaINBO)).
As a use case, imagine you would like to have an RStudio server which comes with relevant inbo packages pre-installed (e.g. [`inbodb`](https://inbo.github.io/inbodb), [`watina`](https://inbo.github.io/watina); *cf.* [the containbo repository](https://github.com/inbo/containbo)).

I will return to this use case below.
To explore the general workings of `docker build`, let us turn to more web-directed tasks for a change.
Expand Down Expand Up @@ -452,46 +464,81 @@ We have used an existing image and added `flask` on top of it.
This works via writing a Dockerfile and building an image.
{{% /callout %}}

## Multiple Images: `compose` Versus `build`
## Multiple Images: `compose` *versus* `build`

The above works fine for most cases.
However, if you want to assemble and combine multiple images, or build on base images from multiple sources, you need a level up.

In that case `docker compose` is [the way to go](https://docs.docker.com/compose/gettingstarted).
On Debian, this extra functionality comes with the `docker-compose-plugin`.
On Debian or Ubuntu, this extra functionality comes with the `docker-compose-plugin`.
I did not have the need to try this out, yet, but will return here if that changes.

## Confusion with Version Control and Version Management
## Relation to Version Control and Version Management

Back to the initial paradigma of reproducibility:
*What exactly is the Open Science aspect of containerization?*

This question might have led to some confusion, and I would like to throw in a paragraph of clarification.
A crucial distinction lies in the preparation of *Dockerfiles* (i.e. build instructions) and the preservation of *images* (i.e. end products of a build process).


One purpose of a container may be that you document the exact components of your system environment.
You might start at a base image (e.g. a `rocker`) and add all necessary software via a Dockerfile.
One purpose of a Dockerfile may be that you document the exact components of your system environment.
You start at a base image (e.g. a `rocker`) and add additional software via Dockerfile layers.
This is good practice, and encouraged: if you publish an analysis, provide a tested container recipe with it.

However, this does not solve the problem of version conflicts.
Documenting the versions of packages you used is an extra step, for which [other tools are available](https://doi.org/10.1038/d41586-023-01469-0).
However, this alone does not solve the problem of version conflicts and deprecation.
Documenting the versions of packages you used is an extra step, for which [other tools are available](https://doi.org/10.1038/d41586-023-01469-0):

- Version control such as `git` will track the changes within your own scripts and texts.
- It is good practice to report the exact versions of the software used upon publication ([see here, for example](https://arca-dpss.github.io/manual-open-science/requirements-chapter.html)).
- Version control such as `git` will track the changes within your own texts, scripts, even version snapshots and Dockerfiles.
- Finally, docker images can serve as a snapshot of a (virtual) machine on which your code would run.

The first point, **version control**, is a fantastic tool to enable open science, and avoid personal trouble.
{{% callout emphasize %}}
The simple rule of thumb is: use all three methods, ideally all the time.

Virtual environments.
Version control.
Snapshots.

Get used to them.
They are easy.
They will save you time and trouble almost immediately.
{{% /callout %}}


But unless you use them already, you might require some starting points and directions: here we go.
The second point, **version control**, is a fantastic tool to enable open science, and avoid personal trouble.
You will [find starting points and help in other tutorials on this website](https://tutorials.inbo.be/tags/git).
The second point, version documentation, is ideally handled by **virtual environments**.
It might have a steep learning curve, yet [there](https://rstudio.github.io/cheatsheets/git-github.pdf) [are](https://www.sourcetreeapp.com) [fantastic](https://magit.vc) [tools](https://www.sublimemerge.com) to get you started.
The other point, version documentation, is trivially achieved by manual storage of currently installed versions via `sessionInfo()` in R, or `pip freeze > versions.txt` for Python.
A small step towads somewhat more professionalism are **virtual environments**.
Those exist for R ([renv](https://rstudio.github.io/renv/articles/renv.html)) or Python ([venv](https://docs.python.org/3/library/venv.html)).
The `pak` library in R can [handle lock files conveniently](https://pak.r-lib.org/reference/lockfile_install.html) with `pak::lockfile_install()`.
Then there is the integration of R, Python and system packages in `conda`-like tools ([e.g. micromamba](https://mamba.readthedocs.io/en/latest)).
There are even system level tools, for example [`nix` and `rix`](https://docs.ropensci.org/rix).

The methods are not mutually exclusive:
all Dockerfiles, build recipes and scripts to establish virtual environments should generally be subject to version control.


However, documenting the exact tools and versions used in a project does not guarantee that these versions will be accessible to future investigators (like oneself, trying to reproduce an analysis five years later).
This is where **Docker images** come in.
Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building.
In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions.
Think of a Docker image as a virtual copy of your computer which you store for later re-activation.
For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk).
We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses.


Some confusion might arise from the fact that managing these image snapshots is achieved with the same vocabulary as version control, for example you would ["commit"](https://docs.docker.com/reference/cli/docker/container/commit) updated versions and ["push"](https://docs.docker.com/reference/cli/docker/image/push) them to a container repository.

Even more confusion might arise from the fact that you also find ready-made images online, e.g. on [Docker Hub](https://hub.docker.com), or [Quai](https://quay.io), or elsewhere.
These provide images of (recent) versions of working environments, supposed to stand in as starting points for derived containers.
Hence, be aware of the dual use case of images: (i) the dynamic, universal base image which improves efficiency and (ii) the static, derived, bespoke image which you created for your analysis (shared with the world for reproducibility).

A simple, less effective basic solution to version reproducibility is the manual storage of currently installed versions via `sessionInfo()` in R, or `pip freeze > versions.txt` for Python.

You can find Docker images of (recent) older versions of working environments on Docker Hub.
You might think that this is how Docker supports version reproducibility.
However, those will fail to build once the binary dependencies get removed.
Furthermore, Docker itself does not fix the versions of installed system components by default.
Ideally, you want to implement **version control and virtual environments within the container**, to be a "full stack open science developer".
And, once more, those images are not a "holy grail" solution: they are not entirely system independent (e.g. processor architecture), and they might occupy a considerable amount of hard disk space (Dockerfile optimization is warranted).
Ideally, to be a "full stack open science developer", you want to implement **a mixed strategy** consisting virtual environments and containers, wrapped in version control and stored in a backup image.


<a id="sec-rootless"></a>
Expand Down Expand Up @@ -813,7 +860,6 @@ Your head might be twisting in a swirl of containers by now.
I hope you find this overview useful, nevertheless.
Thank you for reading!

[^1]: I saw several ways online to get around the Desktop App, either via WSL2 or using [a windows package manager called Chocolatey](https://en.wikipedia.org/wiki/Chocolatey).

[^2]: I mostly follow [this tutorial](https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html).

Expand Down
Loading