Compute worker installation with Podman

Here is the specification for compute worker installation by using Podman.

Firstly, I’m presenting the guide, how to install the compute worker with Podman for two cases CPU and GPU compute worker. This could help you understand more about the requirements later.

Secondly, there are the packages, configurations for VM, the changes needed in the project where “docker” is hard coded and the requirements for the images.

Finally, there are other places where “docker” term appears in the code. It couldn’t affect the correct functioning; however, it could make developer / admin confused after switching to Podman.

1. Compute worker installation

We need to install Podman on the VM. We use Debian based OS, like Ubuntu. Ubuntu is recommended, because it has Nvidia driver support better.

sudo apt install podman

Then configure where Podman downloading the images by using the docker hub, by adding this line into /etc/containers/registries.conf :

unqualified-search-registries = ["docker.io"]

Create the .env file in order to add the compute worker into a queue (we use the default queue, if you use a particular queue then fill in your BROKER_URL generated when creating a new queue) :

BROKER_URL=pyamqp://<login>:<password>@www.codabench.org:5672 
HOST_DIRECTORY=/codabench/storage
BROKER_USE_SSL=true

For CPU case

Run the compute worker :

podman run -d \ 
    -v /codabench/storage:/codabench \ 
    -v /run/user/1001/podman/podman.sock:/run/podman/podman.sock \ # podman socket of the user, this example uses userid 1001
    --env-file .env \ 
    --name compute_worker \ 
    --restart unless-stopped \ 
    --log-opt max-size=50m \ 
    --log-opt max-file=3 \ 
    codalab/competitions-v2-compute-worker:latest

For GPU case

You need to install nvidia packages supporting Podman and nvidia driver:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
    && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
    && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-container.list
sudo apt update
sudo apt install nvidia-container-runtime nvidia-containe-toolkit nvidia-driver-<version>

Check if nvidia driver is working, by executing:

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| 27%   26C    P8    20W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The result should show gpu card information.

We need to configure the OCI hook script for nvidia. Create this file /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json if not exists:

{
    "version": "1.0.0",
    "hook": {
        "path": "/usr/bin/nvidia-container-toolkit",
        "args": ["nvidia-container-toolkit", "prestart"],
        "env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ]
    },
    "when": {
        "always": true,
        "commands": [".*"]
    },
    "stages": ["prestart"]
}

Validating if all are working by running a test container:

podman run --rm -it \
 --security-opt="label=disable" \
 --hooks-dir=/usr/share/containers/oci/hooks.d/ \
 nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

The result should show as same as the command nvidia-smi above.

Run the GPU compute worker

podman run -d \
 --hooks-dir=/usr/share/containers/oci/hooks.d/ \
 -v /run/user/1001/podman/podman.sock:/run/podman/podman.sock \
 --security-opt="label=disable" \
 codalab/competitions-v2-compute-worker:nvidia

2. Requirements

For VM

OS Debian 11+, or Ubuntu 20+
podman installed
docker.io configured in /etc/containers/registries.conf
Only for GPU compute worker. We need in addition nvidia-container-runtime nvidia-containe-toolkit nvidia-driver-<version> packages and /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json

Changes needed where `docker` is hard coded

/fabfile.py

[docker/compute_worker/compute_worker.py] (https://github.com/codalab/codabench/blob/232490ddf2682b89feedc2f6b907e88110828077/docker/compute_worker/compute_worker.py#L498)

There are some scripts needing to be adapted to podman:

/reset_db.sh

bin/pg_dump.py

For container images

As same as for the VM, the container itself has to have podman and/or nvidia packages installed. Therefore, we need to build the images supporting podman with nvidia (in GPU case). There are Dockerfile files belowing needing to be updated in order to build new images.

/Dockerfile.compute_worker

/Dockerfile.compute_worker_gpu

Some default container images hard coded need to be adapted for podman

/src/apps/competitions/models.py

src/apps/competitions/migrations/0001_initial.py

/server_config_sample.yaml

src/apps/competitions/unpackers/v2.py

src/apps/competitions/unpackers/v1.py

3. Other suggestions

Theses files belowing have docker in the name of some variables. After adding podman "feature", developers/admins could be confused when having docker variable while running podman. It's better to rename theses variables even it's less important. The directory "docker" in the repository should be also renamed.

src/apps/competitions/unpackers/v1.py

/docker-compose.yml

docker/compute_worker/compute_worker.py

/server_config_sample.yaml

src/static/riot/competitions/editor/_competition_details.tag

src/apps/api/serializers/competitions.py

src/tests/functional/test_competitions.py

src/static/riot/submissions/submission_management.tag

src/apps/competitions/tests/unpacker_test_data.py

src/apps/competitions/tasks.py

src/apps/competitions/models.py

Compute worker installation with Podman

Table of Contents

1. Compute worker installation

For CPU case

For GPU case

2. Requirements

For VM

Changes needed where docker is hard coded

For container images

3. Other suggestions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Changes needed where `docker` is hard coded