-
Modify Installation Paths:
Begin by setting up the installation paths for EasyBuild. Follow the preparation instructions in the LUMI EasyBuild documentation to specify where to save the container and module files. -
Load Necessary Modules:
Load the required modules for EasyBuild and Singularity:module load LUMI partition/container EasyBuild-user
-
Install the container:
eb PyTorch-2.2.0-rocm-5.6.1-python-3.10-singularity-20240315.eb
Additional container options with different PyTorch, ROCm, or Python versions can be found in the: https://lumi-supercomputer.github.io/LUMI-EasyBuild-docs/p/PyTorch/#singularity-containers-with-modules-for-binding-and-extras
For our case, extra libraries are listed in extra_requirements.txt (some might be omitted)
For our case, three scripts are used:
- .slurm script to submit job (run_imagenet_training.slurm)
- .sh script to set up environment (train_imagenet.sh)
- .py script, the entrypoint for model training (pl_eval_navit.py)
PyTorch Lightning typically manages the distributed environment setup automatically. However, on LUMI, it is necessary to include the following line in your script for proper initialization (not sure if it is due to the slurm config):
torch.distributed.init_process_group(backend="nccl")
Create a helper function to print only on rank 0 to avoid redundancy.
def print_rank0(*msg):
if rank != 0:
return
print(*msg)
https://github.com/YuTian8328/dataset-sharding
On LUMI (in general any cluster based distributed file system) transfering big number of small files to compute nodes is very slow. To optimize performance, it’s preferable to use larger files, which reduces transfer overhead and improve read speeds. However, working with a single large file also has its challenges. In such cases, sharding the dataset into larger chunks can provide a balance. Sharding a dataset, which involves splitting a massive dataset into large subsets (typically 1-10GB per shard), offers several advantages:
- Increased I/O Efficiency: With sharded data, multiple processors can read different shards simultaneously, significantly improving read speeds and reducing I/O bottlenecks.
- Enhanced Dataset Randomization: Reading from randomly selected shards helps to avoid always presenting the same data first, thus achieving a level of dataset randomization. Although this is not true randomization, it is far better than sequentially reading from a single, large file.
webdataset offers a convenient way to load large and sharded datasets into pytorch, it implements the iterable dataset interface of pytorch, and can thus be used like any other pytorch dataset. A simple example can be found in the dataset-sharding repo.