|
| 1 | +# LUMI software stacks (technical) |
| 2 | + |
| 3 | +*[[back: The Cray Programming Environment]](1_03_CPE.md)* |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +The user-facing documentation on how to use the LUMI software stacks is |
| 8 | +available in [the LUMI documentation](https://docs.lumi-supercomputer.eu/computing/softwarestacks/). |
| 9 | +On this page we focus more on the technical implementation behind it. |
| 10 | + |
| 11 | +--- |
| 12 | + |
| 13 | +# An overview of LUMI |
| 14 | + |
| 15 | +LUMI has different node types providing compute resources: |
| 16 | + |
| 17 | +- LUMI has 16 login nodes, though many of those are reserved for special purposes and not |
| 18 | + available to all users. These login nodes have a zen2 CPU. These nodes have a SlingShot 10 |
| 19 | + interconnect. |
| 20 | +- There are 1536 regular CPU compute nodes in a partition denoted as LUMI-C. These |
| 21 | + compute nodes have a zen3 CPU and run a reduced version of SUSE Linux optimised |
| 22 | + by Cray to reduce OS jitter. These nodes will in the future be equipped with a |
| 23 | + SlingShot 11 interconnect card. |
| 24 | +- There are 2560 GPU compute nodes in a partition denoted as LUMI-G. These nodes have |
| 25 | + a single zen3-based CPU with optimised I/O die linked to 4 AMD MI250X GPUs. Each node |
| 26 | + has 4 SlingShot 11 interconnect cards, one attached to each GPU. |
| 27 | +- The interactive data analytics and visualisation partition is really two different partitions |
| 28 | + from the software point-of-view: |
| 29 | + - 8 nodes are CPU-only but differ considerably from the regular compute nodes, |
| 30 | + not only in the amount of memory. These nodes are equipped with zen2 CPUs |
| 31 | + and in that sense comparable to the login nodes. They also have local SSDs |
| 32 | + and are equipped with SlingShot 10 interconnect cards (2 each???) |
| 33 | + - 8 nodes have zen2 CPUs and 8 NVIDIA A40 GPUs each, and have 2 SlingShot 10 |
| 34 | + interconnect cards each. |
| 35 | +- The early access platform (EAP) has 14 nodes equipped with a single 64-core |
| 36 | + zen2 CPU and 4 AMD MI100 GPUS. Each node has a single SlingShot 10 interconnect |
| 37 | + and also local SSDs. |
| 38 | + |
| 39 | +SlingShot 10 and SlingShot 11 are different software-wise. SlingShot 10 uses a |
| 40 | +Mellanox CX5 NIC that support both OFI and UCX, and hence can also use the |
| 41 | +UCX version of Cray MPICH. SlingShot 11 uses a NIC code-named Cassini and |
| 42 | +supports only OFI with an OFI provider specific for the Cassini NIC. However, |
| 43 | +given that the nodes that are equipped with SlingShot 10 cards are not meant |
| 44 | +to be used for big MPI jobs, we build our software stack solely on top of |
| 45 | +libfabric and Cray MPICH. |
| 46 | + |
| 47 | + |
| 48 | +--- |
| 49 | + |
| 50 | +## CrayEnv and LUMI modules |
| 51 | + |
| 52 | +On LUMI, two types of software stacks are currently offered: |
| 53 | + |
| 54 | + - ``CrayEnv`` (module name) offers the Cray PE and enables one to use |
| 55 | + it completely in the way intended by HPE-Cray. The environment also offers a |
| 56 | + limited selection of additional tools, often in updated versions compared to |
| 57 | + what SUSE Linux, the basis of the Cray Linux environment, offers. Those tools |
| 58 | + are installed and managed via EasyBuild. However, EasyBuild is not available |
| 59 | + in that partition. |
| 60 | + |
| 61 | + It also rectifies a problem caused by the fact that there is only one |
| 62 | + configuration file for the Cray PE on LUMI, so that starting a login shell |
| 63 | + will not produce an optimal set of target modules for all node types. |
| 64 | + The ``CrayEnv`` module recognizes on which node type it is running and |
| 65 | + (re-)loading it will trigger a reload of the recommended set of target |
| 66 | + modules for that node. |
| 67 | + |
| 68 | + - ``LUMI`` is an extensible software stack that is mostly managed through |
| 69 | + [EasyBuild][easybuild]. Each version of the LUMI software stack is based on |
| 70 | + the version of the Cray Programming Environment with the same version |
| 71 | + number. |
| 72 | + |
| 73 | + A deliberate choice was made to only offer a limited number of software |
| 74 | + packages in the globally installed stack as the setup of redundancy on LUMI |
| 75 | + makes it difficult to update the stack in a way that is guaranteed to not |
| 76 | + affect running jobs and as a large central stack is also hard to manage, especially |
| 77 | + as we expect frequent updates to the OS and compiler infrastructure in |
| 78 | + the first years of operation. |
| 79 | + However, the EasyBuild setup is such that users can easily install |
| 80 | + additional software in their home or project directory using EasyBuild build |
| 81 | + recipes that we provide or they develop, and that software will fully |
| 82 | + integrate in the central stack (even the corresponding modules will be made |
| 83 | + available automatically). |
| 84 | + |
| 85 | + Each ``LUMI`` module will also automatically activate a set of application |
| 86 | + modules tuned to the architecture on which the module load is executed. To |
| 87 | + that purpose, the ``LUMI`` module will automatically load the ``partition`` |
| 88 | + module that is the best fit for the node. After loading a version of the |
| 89 | + ``LUMI`` module, users can always load a different version of the ``partition`` |
| 90 | + module. |
| 91 | + |
| 92 | +Note that the ``partition`` modules are only used by the ``LUMI`` module. In the |
| 93 | +``CrayEnv`` environment, users should overwrite the configuration by loading their |
| 94 | +set of target modules after loading the ``CrayEnv`` module. |
| 95 | + |
| 96 | + |
| 97 | +--- |
| 98 | + |
| 99 | +## The ``partition`` module |
| 100 | + |
| 101 | +The ``LUMI`` module currently supports five partition modules, but that number may |
| 102 | +be reduced in the future: |
| 103 | + |
| 104 | +| Partition | CPU target | Accelerator | |
| 105 | +|:------------------|-----------------------|:----------------------------| |
| 106 | +| ``partition/L`` | ``craype-x86-rome`` | ``craype-accel-host`` | |
| 107 | +| ``partition/C`` | ``craype-x86-milan`` | ``craype-accel-host`` | |
| 108 | +| ``partition/G`` | ``craype-x86-trento`` | ``craype-accel-amd-gfx90a`` | |
| 109 | +| ``partition/D`` | ``craype-x86-rome`` | ``craype-accel-nvidia80`` | |
| 110 | +| ``partition/EAP`` | ``craype-x86-rome`` | ``craype-accel-amd-gfx908`` | |
| 111 | + |
| 112 | +All ``partition`` modules also load `craype-network-ofi``. |
| 113 | + |
| 114 | +``pattition/D`` may be dropped in the future as it seems we have no working CUDA setup |
| 115 | +and can only use the GPU nodes in the LUMI-D partition for visualisation and not with CUDA. |
| 116 | + |
| 117 | +Furthermore if it would turn out that there is no advantage in optimizing for Milan |
| 118 | +specifically, or that there are no problems at all in running Milan binaries on Rome |
| 119 | +generation CPUs, ``partition/L`` and ``partition/C`` might also be united in a single |
| 120 | +partition. |
| 121 | + |
| 122 | + |
| 123 | + |
| 124 | + |
| 125 | + |
| 126 | + |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +*[[next: Terminology]](1_05_terminology.md)* |
| 131 | + |
0 commit comments