Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/1716 update docker image #1823

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
30 changes: 30 additions & 0 deletions docker/AMD/Dockerfile.release
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
################################################################################################
# Development Environment
# Pulls Pytorch image and installs heat + dependencies
################################################################################################

# Build Arguments
ARG DEFAULT=latest
ARG HEAT_VERSION=1.5.x
ARG ROCM_VERSION=6.2
ARG AMDGPU_VERSION=6.2

ARG VERSION_STRING=rocm6.3_ubuntu22.04_py3.10_pytorch_release_2.3.0


# This is the base image for the release image
FROM rocm/pytorch:${DEFAULT} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed

# Starts a new stage for the release image
FROM base AS release-install
# RUN /usr/bin/python3 -m pip install --upgrade pip
RUN pip install --upgrade pip
RUN pip install mpi4py --no-binary :all:
# RUN echo ${HEAT_VERSION}
# RUN if [[ ${HEAT_VERSION} =~ ^([1-9]\d*|0)(\.(([1-9]\d*)|0)){2}$ ]]; then \
# pip install heat[hdf5,netcdf]==${HEAT_VERSION}; \
# else \
# pip install heat[hdf5,netcdf]; \
# fi
18 changes: 18 additions & 0 deletions docker/AMD/Dockerfile.source
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
########################################################################################
# Development Environment
# Pulls Pytorch image and installs heat + dependencies
########################################################################################

ARG PYTORCH_IMG=24.10-py3
ARG HEAT_BRANCH=main

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
RUN debconf-set-selections /tmp/tzdata.seed
RUN apt update && DEBIAN_FRONTEND=noninteractive apt install -y build-essential openssh-client python3-dev git && apt clean && rm -rf /var/lib/apt/lists/*

FROM base AS source-install
ARG HEAT_BRANCH
RUN pip install --upgrade pip
RUN git clone -b ${HEAT_BRANCH} https://github.com/helmholtz-analytics/heat.git
RUN pip install mpi4py --no-binary :all: && pushd heat && pip install .[hdf5,netcdf] && popd && rm -rf heat
19 changes: 19 additions & 0 deletions docker/AMD/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Not functional yet


### AMD/ROCM Specific Information

Template found in: [ROCm Docker Github Link](github.com/ROCm/ROCm-docker/blob/master/rocm-terminal/Dockerile)

Check out the following [installation guide](https://github.com/ROCm/ROCm-docker/blob/master/quick-start.md)

### AMD Docker file sources
#### Basic Image
[Radeon Repository](https://repo.radeon.com/rocm/manylinux/)

#### Pytorch Image
[ROCm docker hub](https://hub.docker.com/r/rocm/pytorch)


#### General Things I realized
You might run into an error that no space is left on your device when trying to build the pytorch container. Within Docker Desktop you can specify a limit for file sizes. Since containers can take up to 100GB, it might take some free storage.
7 changes: 7 additions & 0 deletions docker/AMD/compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# version: '3.8'

services:
rocm-dev:
image: rocm/dev-ubuntu-24.04
container_name: rocm_dev_container
restart: unless-stopped
2 changes: 2 additions & 0 deletions docker/AMD/tzdata.seed
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tzdata tzdata/Areas select Europe
tzdata tzdata/Zones/Europe select Berlin
11 changes: 9 additions & 2 deletions docker/Dockerfile.release
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
ARG HEAT_VERSION=latest
ARG PYTORCH_IMG=23.05-py3
####################################################################################################
# NVIDIA Docker Release File
# Check out available tags at: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags
# Check out the release notes: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-25-02.html
####################################################################################################

#ARG PYTORCH_IMG=24.10-py3
ARG PYTORCH_IMG=25.02-py3
ARG HEAT_VERSION=1.5.1

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
COPY ./tzdata.seed /tmp/tzdata.seed
Expand Down
7 changes: 6 additions & 1 deletion docker/Dockerfile.source
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
ARG PYTORCH_IMG=23.05-py3
####################################################################################################
# NVIDIA Docker Source File
# Development Environment
####################################################################################################

ARG PYTORCH_IMG=24.10-py3
ARG HEAT_BRANCH=main

FROM nvcr.io/nvidia/pytorch:${PYTORCH_IMG} AS base
Expand Down
41 changes: 41 additions & 0 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,44 @@ srun --mpi="pmi2" apptainer exec --nv heat_1.2.0_torch.11_cuda11.5_py3.9.sif bas
## Scripts

The scripts folder has a small collection of helper scripts to automate certain tasks, primarly meant for heat developers. Explanations are given at the top of the script.

## Useful commands for first-time docker users

> docker run -it [docker container id]
Runs the container in interactive mode (Opens a terminal)

> docker ps -a
Lists all active containers

> docker system prune
Closes all currently running containers and frees up the resources

> docker system prune --all --force
Frees up all space taken up by docker images, even stopped ones

> docker images
Lists all docker images

> docker tag <old_image_name>:<old_image_tag> <new_image_name>:<new_image_tag>
Rename the docker container (Needed to upload)

## How to download a pre-built image from the container registry
The github container registry (ghcr.io) contains different docker versions of heat / pytorch / cuda / rocm.

> docker pull ghcr.io/NAMESPACE/IMAGE_NAME

For further info refer to the [Github documentation on package registries](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry)

## How to push a new image to ghcr.io

1. Make sure you have a github access token set up in the CLI
> [Authenticating to the container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry#authenticating-to-the-container-registry)

> docker login ghcr.io -u USERNAME -p GITHUB_PERSONAL_ACCESS_TOKEN

2. Rename the local image in the following format:
> docker tag current:name ghcr.io/helmholtz-analytics/heat:1.X.X-torchX.X_cudaXX.X_py3.XX
1. Upload the image via:
> docker push ghcr.io/helmholtz-analytics/heat:1.X.X-torchX.X_cudaXX.X_py3.XX

For further info refer to the [Github documentation on package registries](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry)