Skip to content

Development: docker (and podman) #356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 51 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
697aae4
docker: docker build
falkmielke Jan 28, 2025
cc71aaf
docker: extend rocker/rstudio for inbo
falkmielke Jan 28, 2025
f5adca7
docker: tested/run modified rstudio rocker
falkmielke Jan 28, 2025
2a5c0fc
docker: podman
falkmielke Jan 29, 2025
8bf932d
docker: rename folder
falkmielke Jan 29, 2025
db7d15a
docker: the windows experience
falkmielke Jan 29, 2025
346f6de
docker: screenshots and wrapup V1
falkmielke Jan 29, 2025
e349246
docker: minor adjustments and additions
falkmielke Jan 29, 2025
2297c10
docker: installing private packages
falkmielke Jan 30, 2025
bcacce9
docker: minor adjustments
falkmielke Feb 4, 2025
1e1b862
docker: render quarto -> md
falkmielke Feb 4, 2025
0767adf
docker: image references to static
falkmielke Feb 4, 2025
e09a382
docker: pasting in CSS for callout boxes
falkmielke Feb 4, 2025
3cd97b3
docker: hugo shortcode for callouts
falkmielke Feb 4, 2025
1367a8d
docker (wip): review comments by @florisvdh
falkmielke Feb 10, 2025
2ec9d36
docker: review comments incorporated
falkmielke Feb 11, 2025
4a94915
docker: review changes to markdown
falkmielke Feb 11, 2025
581ef3a
docker: html anchors
falkmielke Feb 11, 2025
7d14644
docker: html anchors, attempt 2
falkmielke Feb 11, 2025
68568c1
docker: html anchors, attempt 3
falkmielke Feb 11, 2025
29b5ed8
docker: html anchors, attempt n+1
falkmielke Feb 11, 2025
260045f
docker: html anchors, attempt n+1
falkmielke Feb 11, 2025
8a0be6b
docker: html anchors working.
falkmielke Feb 11, 2025
ef46b97
docker: adding more references
falkmielke Feb 13, 2025
d0efa99
docker: incorporate review by @ThierryO
falkmielke Feb 14, 2025
81d9927
docker: review suggestions @florisvdh, pt.1
falkmielke Feb 20, 2025
2bab9d3
docker: paragraph rework on images
falkmielke Feb 20, 2025
8db7230
docker: ready for next review round*
falkmielke Feb 20, 2025
b8fd9f0
docker: pre-split changes (review suggestion)
falkmielke Feb 21, 2025
4e5f527
docker: docker_run spin-off
falkmielke Feb 21, 2025
950f02d
docker: docker_build spin-off
falkmielke Feb 21, 2025
52e9a22
docker: containers_podman spin-off
falkmielke Feb 21, 2025
990a04f
docker: central node tutorial
falkmielke Feb 21, 2025
d4451e7
docker: rename for generalization
falkmielke Feb 21, 2025
8997b9a
docker: link preparation and testing
falkmielke Feb 21, 2025
32e0fef
docker: conversion to `.md`
falkmielke Feb 21, 2025
580fb3a
docker: tutorial cluster interactions
falkmielke Feb 21, 2025
db380e3
docker: cherry - a catchy image for the overview
falkmielke Feb 21, 2025
4a40f6e
docker: how to place a caption footnote
falkmielke Feb 21, 2025
28bc562
docker: how to put colon in yaml title
falkmielke Feb 21, 2025
04821ee
docker: that footnote again...
falkmielke Feb 21, 2025
b64f85a
docker: direct link to installation
falkmielke Feb 21, 2025
b862c61
docker: that footnote again :/
falkmielke Feb 21, 2025
65e8d31
docker: still no footnote
falkmielke Feb 21, 2025
1057eea
docker: footnote (n'th attempt)
falkmielke Feb 21, 2025
5b142cc
docker: footnote hack in html
falkmielke Feb 21, 2025
6626c0e
docker: adding a podman image
falkmielke Feb 21, 2025
73cbcea
docker: changing notebook order (rename)
falkmielke Feb 21, 2025
9c3fcb6
docker: crosslinks corrected
falkmielke Feb 21, 2025
4c2cdcf
docker: markdown blocks / hugo clash
falkmielke Feb 21, 2025
df237da
docker: typo performacne drop
falkmielke Mar 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,5 @@ data_gisclub/
# png images for the brms tutorial
/content/tutorials/r_brms/brms_eng/*.png
/content/tutorials/r_brms/brms_nl/*.png

/.quarto/
874 changes: 874 additions & 0 deletions content/tutorials/development_containers1/index.bak

Large diffs are not rendered by default.

373 changes: 373 additions & 0 deletions content/tutorials/development_containers1/index.md

Large diffs are not rendered by default.

407 changes: 407 additions & 0 deletions content/tutorials/development_containers1/index.qmd

Large diffs are not rendered by default.

43 changes: 43 additions & 0 deletions content/tutorials/development_containers1/notes_qmd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

steps to get a qmd to hugo markdown:

+ export hugo-md:
quarto render <file>.qmd --to hugo-md

+ include yaml
preserve_yaml: true
but double-check the yaml header: it does not always copy correctly (author, date, categories, tags).
also check data and description

+ remove TOC
(usually unnecessary for short texts)

+ callouts: https://rossabaker.com/configs/website/shortcodes/callout/
{{% callout note %}}
{{% /callout %}}


+ section crosslinks:
<a id="sec-section"></a>
## Section

+ figure captions
<img
src="path/to/figure.png"
id="fig-label"
alt="Figure 1: Caption text." />
<figcaption>Figure 1: Caption text.</figcaption><br>

+ equations
in yaml header:
params:
math: true
replace $s$ -> \\(s\\), $$\ldots$$ -> \\[\ldots\\]
cf. math https://gohugo.io/content-management/mathematics/
eqn with \\(\\) and \\[\\]

+ preview procedure:
rm tutorials -rf
unzip <zip>
python -m http.server 8887

240 changes: 240 additions & 0 deletions content/tutorials/development_containers2_run/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
---
title: Running Existing Containers
description: Pulling and running containers from an online container repository.
date: "2025-02-21"
authors: [falkmielke]
categories: ["development", "open science"]
tags: ["development", "open science", "docker", "containers"]
number-sections: false
params:
math: true
format:
html:
toc: true
html-math-method: katex
hugo-md:
toc: true
preserve_yaml: true
html-math-method: katex
output:
hugo-md:
preserve_yaml: true
variant: gfm+footnotes
html:
variant: gfm+footnotes
---


Docker is about assembling and working in containers.
"Living" in containers.
Or, rather, you can think of this as living in a ["tiny home", or "mobile home"](https://parametric-architecture.com/tiny-house-movement).
(Let's call it a fancy caravan.)
In the simple, but comfortable case, you do not get to pick a general design or to choose all details of the interior: you just take that wheeled cabine "as is" from a tiny home reseller.

<figure>
<img src="../../images/tutorials/development_docker/docker_metaphor_tiny_space.jpg" alt="Black/white image of a tiny home as a metaphor for software containerization." />
<figcaption aria-hidden="true">A tiny home close to "Gare Maritime", Brussels, February 2025.</figcaption>
</figure>

Just as a tiny home is a mini-version of an immobile house, a container can be thought of as a miniature computer which can be transferred to other computing environments.
The good news:
there are a gazillion **Docker images available** on repositories like [Docker Hub](https://hub.docker.com) or [Quay](https://quay.io).

This tutorial will show you how to use such "containers-to-go", thereby demonstrating some basic principles and vocabulary about containerization.
I assume that you have [installed docker](../../tutorials/development_containers1#sec-installation).
This tutorial will stay on the more involved route of running Docker in the terminal (the Docker Desktop "app" is rather self-explanatory, and you can manoever it easily with knowledge of terminal vocabulary).
Once you master these first step, you can proceed to [customize your container images](../../tutorials/development_containers3_build).
You might also [consider Podman as a Docker alternative](../../tutorials/development_containers4_podman).

## Example

Because of the useful idea of bringing your computer environment along (think of benefits for distributed computing), container images of all kind are abundant on the container repositories mentioned above.
For example[^1], there are Docker images with [rstudio server](https://posit.co/download/rstudio-server) pre-installed:

- <https://hub.docker.com/r/rocker/rstudio>

{{% callout note %}}
If you control containers via the desktop app, simply search, pull, and run it.
{{% /callout %}}


<figure>
<img src="../../images/tutorials/development_docker/docker_desktop2.jpg" alt="desktop app: run" />
<figcaption aria-hidden="true">Desktop App: run.</figcaption>
</figure>

If you are comfortable using the terminal, execute the following script (*Windows*: use an administrator terminal).
If it does not find the resources locally, Docker will download and extract the image from Docker Hub[^2].

``` sh
docker run --rm -p 8787:8787 -e PASSWORD=YOURNEWPASSWORD rocker/rstudio
```

- The `run` command will automatically `pull`, i.e. download an existing image; though you could also `pull` without running.
- The `--rm` flag makes the Docker container non-permanent, i.e. disk space will be freed after you close the container (<a href="#sec-permanence" class="quarto-xref">Section 0.4</a>).
- The port specified at `-p` is the one you use to access this local container server (the `-p` actually maps host- and container ports). You have to specify it explicitly, otherwise the host system will not let you pass (`:gandalf-meme:`).
- The `-e` flag allows you to specify environment variables, in this case used to set a password for the RStudio server. But if you do not specify one, a random password will be generated and displayed upon startup (read the terminal output).

<figure>
<img src="../../images/tutorials/development_docker/docker_run.jpg" alt="run" />
<figcaption aria-hidden="true">Docker run, on the terminal.</figcaption>
</figure>

You are now running (`run`) a `rocker/rstudio` server instance on your `localhost`, i.e. your computer.
You can access it via a browser, going to <localhost:8787>, with the username `rstudio` and your chosen password.

You can shut down the container with the keyboard shortcut `[ctrl]+[C]` (probably `[ctrl]+[Z] [Return]` on Windows).


<a id="sec-mounting"></a>
## File Access

The downside of this is that your container is isolated (well... at least to a certain degree).

Images can take up considerable storage space.
Storing files locally, i.e. on the host machine, without storing an unneccessarily filled container, might be a good strategy.
This can be achieved by mapping a virtual path on the container to a local drive on your computer.
(Linux users will be familiar with the concept of "mounting" and "linking" storage locations.)
Note that the technique is equally relevant when running the container locally, hence not exclusive to remote hosts.

Docker `run` brings the `-v` flag for mounting volumes.
Suppose you have an R project you would like to work on, stored, for example, in this path:

- `/data/git/coding-club`

Then you can link this to your container's home folder via the following command.

``` sh
# Windows syntax, mapping on `D:\data`
docker run --rm -p 8787:8787 -v //d/data/git/coding-club:/home/rstudio/coding-club rocker/rstudio

# Linux syntax
docker run --rm -p 8787:8787 -v /data/git/coding-club:/home/rstudio/coding-club rocker/rstudio
```

Again, navigate to <localhost:8787>, *et voilà*, you can access your project and store files back in your regular folders.

## Limitations

This is a simple and quick way to run R and RStudio in a container.

However, there are limitations:


{{% callout note %}}

- You have to live with the R packages provided in the container, or otherwise install them each time you access it...
- ... unless you make your container permanent by omitting the `--rm` option. Note that this will cost considerable disk space, will not transfer to other computers (the original purpose of Docker), and demand occasional updates (<a href="#sec-permanence" class="quarto-xref">Section 0.4</a>).
- You could alternatively add `--pull always` to `docker run`, which will check and pull new versions.
- Speaking of updates: it is good practice to keep software up to date. Occasionally update or simply re-install your Docker image and R packages to get the latest versions.
- You should make sure that the containers are configured correctly and securely. This is especially important with server components which expose your machine to the internet.
- Because most containers contain a Linux system, user permissions are taken seriously, and the consequences might be confusing. There are guides online ([e.g. here](https://labex.io/tutorials/docker-how-to-handle-permissions-in-docker-415866)); there are example repositories (like the author's own struggle [here](https://github.com/inbo/containbo?tab=readme-ov-file#understanding-volumes) and [here](https://github.com/inbo/containbo/tree/main/emacs)); base images are well set up and one can normally get by with default users.
- There is a performance penalty from using containers: in inaccurate laymans' terms, they emulate (parts of a) "computer" inside your computer.
{{% /callout %}}

On the performance issue: I attempted this on my local laptop with matrix multiplication.

``` r
# https://cran.r-project.org/web/packages/rbenchmark/rbenchmark.pdf
# install.packages("rbenchmark")

test <- function(){
# test from https://prdm0.github.io/ropenblas/#installation
m <- 1e4; n <- 1e3; k <- 3e2
X <- matrix(rnorm(m*k), nrow=m); Y <- matrix(rnorm(n*k), ncol=n)
X %*% Y
}

benchmark(test())
```

In the terminal:

test replications elapsed relative user.self sys.self user.child sys.child
1 test() 100 22.391 1 83.961 65.291 0 0

In the container:

test replications elapsed relative user.self sys.self user.child sys.child
1 test() 100 26.076 1 102.494 153.89 0 0

Now, the *good news* is that the difference is not by orders of magnitude.
This indicates that the chosen rocker image integrated the more performant `blas` variant which is [recommended](https://pbs-assess.github.io/sdmTMB/index.html#installation) [elsewhere](https://prdm0.github.io/ropenblas/#installation) (`blas-openblas`).

The *bad news* is that we still suffer a performance drop of `-20%`, which is considerable.

This is just a single snapshot on a laptop, and putatively `blas`-confounded.
Feel free to systematically and scientifically repeat the tests on your own machine.


<a id="sec-permanence"></a>
## Container Permanence: The `--rm` Option

As briefly touched above, `docker run` comes with the `--rm` option.
This basically enables two separate workflows, i.e. usage paradigms.

The first option, which is the default, is that your container is stored on the system permanently.
This counts for the upstream images, which are downloaded upon first invocation of a container.
But also, changes you apply while working in the container are persistently stored until you log in again, using hard drive space of the host.
Images may still be removed by manually running `docker rmi [...]` ([*cf.* "useful commands" in the overview tutorial](../../tutorials/development_containers1#sec-commands)).

In contrast, with the second option, `docker run --rm [...]`, ad-hoc changes in the container are removed when the container is finished.
Unless, of course, you mount a local volume with `docker run --rm -v [...]` (<a href="#sec-mounting" class="quarto-xref">Section 0.2</a>).
However, contrary to a rather general intuition, starting a container with `--rm` will not require dependency download a second time.

You might want to test this for yourself.
Consider the following series of commands to create a test file in the Docker home directory:

``` sh
docker run --name testing_permanence --rm -it docker.io/rocker/r-base
echo "testing permanence." > ~/test.txt
cat ~/test.txt
exit
```

Re-connecting is instantateous.
However,

``` sh
docker run --name testing_permanence --rm -it docker.io/rocker/r-base bash
cat ~/test.txt
```

will return:

> cat: /root/test.txt: No such file or directory

This behavior is desired (in the second workflow above): if you start up a fresh environment each time you work in Docker, you **assure that your work pipeline is independent of prior changes on the system**.
Whether this makes sense as a workflow has to be evaluated with respect to hard drive space requirement, updates, the option to build upon a customized Dockerfile, reproducibility potential.

You can "link in" folders for working files (note how you have to specify the full path to `new_home`, and that this container uses the root user by default):

``` sh
mkdir new_home
docker run --name testing_permanence -v /data/containers/new_home:/root --rm -it docker.io/rocker/r-base bash
echo "testing permanence." > ~/test.txt
```

Using `--rm` might not be desirable in every case.
However, it is a valuable option for testing, good to have when disk space is sparse, or as a final check before publishing.
Generally, I would consider it good practice to treat containers as volatile, thereby keeping them hostmachine-independent as much as possible.

# Summary

Docker images are the actual containers which you create from the Dockerfile blueprints by the process of building.
In the "tiny home" metaphor: your "image" is the physical (small, but real, DIY-achievement) home to live in, built from step-by-step instructions.
Think of a Docker image as a virtual copy of your computer which you store for later re-activation.

Luckily, other people have prepared images for you.
For example, a collection of images for specific analysis pipelines at INBO are preserved at [Docker Hub/inbobmk](https://hub.docker.com/u/inbobmk).
We consider these "stable" versions because they could be re-activated no matter what crazy future updates will shatter the R community, which enables us to return to all details of previous analyses.

This tutorial provided introductory details on how to run such images.
If you would like to take this further and customize your containers, proceed with [the next tutorial about the `build` command](../../tutorials/development_containers3_build).
Those commands are practically identical [in Docker and Podman](../../tutorials/development_containers4_podman).

An overview on the topic is [available here](../../tutorials/development_containers1).

[^1]: I mostly follow [this tutorial](https://jsta.github.io/r-docker-tutorial/02-Launching-Docker.html).

[^2]: Just like "Github" is a server service to store git repositories, guess what: "Docker Hub" is a hosting service to store Docker containers.
Loading