-
Notifications
You must be signed in to change notification settings - Fork 35
Compute worker installation with Podman
Here is the specification for compute worker installation by using Podman.
Firstly, I’m presenting the guide, how to install the compute worker with Podman for two cases CPU and GPU compute worker. This could help you understand more about the requirements later.
Secondly, there are the packages, configurations for VM, the changes needed in the project where “docker” is hard coded and the requirements for the images.
Finally, there are other places where “docker” term appears in the code. It couldn’t affect the correct functioning; however, it could make developer / admin confused after switching to Podman.
- For CPU compute worker
- For GPU compute worker
- For VM
- For the code
- For the container images
We need to install Podman on the VM. We use Debian based OS, like Ubuntu. Ubuntu is recommended, because it has Nvidia driver support better.
sudo apt install podman
Then configure where Podman downloading the images by using the docker hub, by adding this line into /etc/containers/registries.conf
:
unqualified-search-registries = ["docker.io"]
Create the .env
file in order to add the compute worker into a queue (we use the default queue) :
BROKER_URL=pyamqp://<login>:<password>@www.codabench.org:5672
HOST_DIRECTORY=/codabench/storage
BROKER_USE_SSL=true
Run the compute worker :
podman run -d \
-v /codabench/storage:/codabench \
-v /run/user/1001/podman/podman.sock:/run/podman/podman.sock \ # podman socket of the user, this example uses userid 1001
--env-file .env \
--name compute_worker \
--restart unless-stopped \
--log-opt max-size=50m \
--log-opt max-file=3 \
codalab/competitions-v2-compute-worker:latest
You need to install nvidia packages supporting Podman and nvidia driver:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \
&& curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-container.list
sudo apt update
sudo apt install nvidia-container-runtime nvidia-containe-toolkit nvidia-driver-<version>
Check if nvidia driver is working, by executing:
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 27% 26C P8 20W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
The result should show gpu card information.
We need to configure the OCI hook script for nvidia. Create this file /usr/share/containers/oci/hooks.d/oci-nvidia-hook.json
if not exists:
{
"version": "1.0.0",
"hook": {
"path": "/usr/bin/nvidia-container-toolkit",
"args": ["nvidia-container-toolkit", "prestart"],
"env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
]
},
"when": {
"always": true,
"commands": [".*"]
},
"stages": ["prestart"]
}
Validating if all are working by running a test container:
podman run --rm -it \
--security-opt="label=disable" \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
The result should show as same as the command nvidia-smi
above.
Run the GPU compute worker
podman run -d \
--hooks-dir=/usr/share/containers/oci/hooks.d/ \
-v /run/user/1001/podman/podman.sock:/run/podman/podman.sock \
-e DOCKER_HOST=unix:///run/podman/podman.sock \
--security-opt="label=disable" codalab/competitions-v2-compute-worker:nvidia