Merge branch 'develop' into feature/decorators_simplify_examples

Libensemble · Jan 11, 2024 · a9c164d · a9c164d
2 parents 9975f51 + 50bcefb
commit a9c164d
Show file tree

Hide file tree

Showing 51 changed files with 464 additions and 563 deletions.
diff --git a/.github/workflows/basic.yml b/.github/workflows/basic.yml
@@ -98,7 +98,7 @@ jobs:
             pip install -r install/testing_requirements.txt
             pip install -r install/misc_feature_requirements.txt
 
-            git clone --recurse-submodules -b refactor/pounders_API https://github.com/POptUS/IBCDFO.git
+            git clone --recurse-submodules -b develop https://github.com/POptUS/IBCDFO.git
             pushd IBCDFO/minq/py/minq5/
             export PYTHONPATH="$PYTHONPATH:$(pwd)"
             echo "PYTHONPATH=$PYTHONPATH" >> $GITHUB_ENV
@@ -167,4 +167,4 @@ jobs:
         runs-on: ubuntu-latest
         steps:
         - uses: actions/checkout@v4
-        - uses: crate-ci/typos@v1.16.25
+        - uses: crate-ci/typos@v1.17.0
diff --git a/.github/workflows/extra.yml b/.github/workflows/extra.yml
@@ -166,7 +166,7 @@ jobs:
             sed -i -e "s/pyzmq>=22.1.0,<23.0.0/pyzmq>=23.0.0,<24.0.0/" ./balsam/setup.cfg
             cd balsam; pip install -e .; cd ..
 
-            git clone --recurse-submodules -b refactor/pounders_API https://github.com/POptUS/IBCDFO.git
+            git clone --recurse-submodules -b develop https://github.com/POptUS/IBCDFO.git
             pushd IBCDFO/minq/py/minq5/
             export PYTHONPATH="$PYTHONPATH:$(pwd)"
             echo "PYTHONPATH=$PYTHONPATH" >> $GITHUB_ENV
@@ -250,4 +250,4 @@ jobs:
         runs-on: ubuntu-latest
         steps:
         - uses: actions/checkout@v4
-        - uses: crate-ci/typos@v1.16.25
+        - uses: crate-ci/typos@v1.17.0
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 BSD 3-Clause License
 
-Copyright (c) 2018-2023, UChicago Argonne, LLC and the libEnsemble Development Team
+Copyright (c) 2018-2024, UChicago Argonne, LLC and the libEnsemble Development Team
 All Rights Reserved.
 
 Redistribution and use in source and binary forms, with or without

diff --git a/README.rst b/README.rst
@@ -29,6 +29,10 @@
    :target: https://github.com/psf/black
    :alt: Code style: black
 
+.. image:: https://joss.theoj.org/papers/10.21105/joss.06031/status.svg
+   :target: https://doi.org/10.21105/joss.06031
+   :alt: JOSS Status
+
 |
 
 .. after_badges_rst_tag

diff --git a/docs/advanced_installation.rst b/docs/advanced_installation.rst
@@ -118,7 +118,7 @@ Further recommendations for selected HPC systems are given in the
 
         On some platforms you may wish to run libEnsemble without ``mpi4py``,
         using a serial PETSc build. This is often preferable if running on
-        the launch nodes of a three-tier system (e.g., Theta/Summit)::
+        the launch nodes of a three-tier system (e.g., Summit)::
 
             spack install py-libensemble +scipy +mpmath +petsc4py ^py-petsc4py~mpi ^petsc~mpi~hdf5~hypre~superlu-dist
 

diff --git a/docs/data_structures/libE_specs.rst b/docs/data_structures/libE_specs.rst
@@ -240,6 +240,10 @@ libEnsemble is primarily customized by setting options within a ``LibeSpecs`` cl
                     the equivalent ``persis_info`` settings, generators will be allocated this
                     many GPUs.
 
+                **use_tiles_as_gpus** [bool] = ``False``:
+                    If ``True`` then treat a GPU tile as one GPU, assuming
+                    ``tiles_per_GPU`` is provided in ``platform_specs`` or detected.
+
                 **enforce_worker_core_bounds** [bool] = ``False``:
                     Permit submission of tasks with a
                     higher processor count than the CPUs available to the worker.

diff --git a/docs/function_guides/simulator.rst b/docs/function_guides/simulator.rst
@@ -43,7 +43,6 @@ Writing a Simulator
 
                 return Output, persis_info
 
-
 Most ``sim_f`` function definitions written by users resemble::
 
     def my_simulation(Input, persis_info, sim_specs, libE_info):
@@ -85,7 +84,6 @@ If ``sim_specs`` was initially defined:
                 user={"batch_size": 128},
             )
 
-
 Then user parameters and a *local* array of outputs may be obtained/initialized like::
 
     batch_size = sim_specs["user"]["batch_size"]

diff --git a/docs/introduction_latex.rst b/docs/introduction_latex.rst
@@ -52,7 +52,6 @@
 .. _SWIG: http://swig.org/
 .. _tarball: https://github.com/Libensemble/libensemble/releases/latest
 .. _Tasmanian: https://tasmanian.ornl.gov/
-.. _Theta: https://www.alcf.anl.gov/alcf-resources/theta
 .. _tomli: https://pypi.org/project/tomli/
 .. _tqdm: https://tqdm.github.io/
 .. _user guide: https://libensemble.readthedocs.io/en/latest/programming_libE.html

diff --git a/docs/nitpicky b/docs/nitpicky
@@ -36,6 +36,7 @@ py:class <class 'int'>
 py:class +ScalarType
 
 # Internal paths that are verified importable but Sphinx can't find
+py:class libensemble.resources.platforms.Aurora
 py:class libensemble.resources.platforms.GenericROCm
 py:class libensemble.resources.platforms.Crusher
 py:class libensemble.resources.platforms.Frontier

diff --git a/docs/platforms/aurora.rst b/docs/platforms/aurora.rst
@@ -0,0 +1,112 @@
+======
+Aurora
+======
+
+Aurora_ is an Intel/HPE EX supercomputer located in the ALCF_ at Argonne
+National Laboratory. Each compute node contains two Intel (Sapphire Rapids)
+Xeon CPUs and six Intel X\ :sup:`e` GPUs (Ponte Vecchio) each with two tiles.
+
+The PBS scheduler is used to submit jobs from login nodes to run on the
+compute nodes.
+
+Configuring Python and Installation
+-----------------------------------
+
+To obtain Python use::
+
+    module use /soft/modulefiles
+    module load frameworks
+
+To obtain libEnsemble::
+
+    pip install libensemble
+
+See :doc:`here<../advanced_installation>` for more information on advanced
+options for installing libEnsemble, including using Spack.
+
+Example
+-------
+
+To run the :doc:`forces_gpu<../tutorials/forces_gpu_tutorial>` tutorial on
+Aurora.
+
+To obtain the example you can git clone libEnsemble - although only
+the forces sub-directory is needed::
+
+    git clone https://github.com/Libensemble/libensemble
+    cd libensemble/libensemble/tests/scaling_tests/forces/forces_app
+
+To compile forces (a C with OpenMP target application)::
+
+    mpicc -DGPU -O3 -fiopenmp -fopenmp-targets=spir64 -o forces.x forces.c
+
+Now go to forces_gpu directory::
+
+    cd ../forces_gpu
+
+To make use of all available GPUs, open ``run_libe_forces.py`` and adjust
+the exit_criteria to do more simulations. The following will do two
+simulations for each worker::
+
+    # Instruct libEnsemble to exit after this many simulations
+    ensemble.exit_criteria = ExitCriteria(sim_max=nsim_workers*2)
+
+Now grab an interactive session on two nodes (or use the batch script at
+``../submission_scripts/submit_pbs_aurora.sh``)::
+
+    qsub -A <myproject> -l select=2 -l walltime=15:00 -lfilesystems=home -q EarlyAppAccess -I
+
+Once in the interactive session, you may need to reload the frameworks module::
+
+    cd $PBS_O_WORKDIR
+    module use /soft/modulefiles
+    module load frameworks
+
+Then in the session run::
+
+    python run_libe_forces.py --comms local --nworkers 13
+
+This provides twelve workers for running simulations (one for each GPU across
+two nodes). An extra worker is added to run the persistent generator. The
+GPU settings for each worker simulation are printed.
+
+Looking at ``libE_stats.txt`` will provide a summary of the runs.
+
+Using tiles as GPUs
+-------------------
+
+If you wish to treat each tile as its own GPU, then add the *libE_specs*
+option ``use_tiles_as_gpus=True``, so the *libE_specs* block of
+``run_libe_forces.py`` becomes:
+
+.. code-block:: python
+
+    ensemble.libE_specs = LibeSpecs(
+        num_resource_sets=nsim_workers,
+        sim_dirs_make=True,
+        use_tiles_as_gpus=True,
+    )
+
+Now you can run again but with twice the workers for running simulations (each
+will use one GPU tile)::
+
+    python run_libe_forces.py --comms local --nworkers 25
+
+Note that the *forces* example will automatically use the GPUs available to
+each worker (with one MPI rank per GPU), so if fewer workers are provided,
+more than one GPU will be used per simulation.
+
+Also see ``forces_gpu_var_resources`` and ``forces_multi_app`` examples for
+cases that use varying processor/GPU counts per simulation.
+
+Demonstration
+-------------
+
+Note that a video demonstration_ of the *forces_gpu* example on *Frontier*
+is also available. The workflow is identical when running on Aurora, with the
+exception of different compiler options and numbers of workers (because the
+numbers of GPUs on a node differs).
+
+.. _ALCF: https://www.alcf.anl.gov/
+.. _Aurora: https://www.alcf.anl.gov/support-center/aurorasunspot/getting-started-aurora
+.. _demonstration: https://youtu.be/H2fmbZ6DnVc
diff --git a/docs/platforms/example_scripts.rst b/docs/platforms/example_scripts.rst
@@ -33,14 +33,14 @@ for submitting workflows to almost any system or scheduler.
         :caption: /examples/libE_submission_scripts/bebop_submit_slurm_distrib.sh
         :language: bash
 
-.. dropdown:: Theta - On MOM Node with Multiprocessing
-
-    ..  literalinclude:: ../../examples/libE_submission_scripts/theta_submit_mproc.sh
-        :caption: /examples/libE_submission_scripts/theta_submit_mproc.sh
-        :language: bash
-
 .. dropdown:: Summit - On Launch Nodes with Multiprocessing
 
     ..  literalinclude:: ../../examples/libE_submission_scripts/summit_submit_mproc.sh
         :caption: /examples/libE_submission_scripts/summit_submit_mproc.sh
         :language: bash
+
+.. dropdown:: Cobalt - Intermediate node with Multiprocessing
+
+    .. literalinclude:: ../../examples/libE_submission_scripts/cobalt_submit_mproc.sh
+        :caption: /examples/libE_submission_scripts/cobalt_submit_mproc.sh
+        :language: bash
diff --git a/docs/platforms/platforms_index.rst b/docs/platforms/platforms_index.rst
@@ -87,7 +87,7 @@ Some large systems have a 3-tier node setup. That is, they have a separate set o
 (known as MOM nodes on Cray Systems). User batch jobs or interactive sessions run on a launch node.
 Most such systems supply a special MPI runner that has some application-level scheduling
 capability (e.g., ``aprun``, ``jsrun``). MPI applications can only be submitted from these nodes. Examples
-of these systems include: Summit, Sierra, and Theta.
+of these systems include Summit and Sierra.
 
 There are two ways of running libEnsemble on these kinds of systems. The first, and simplest,
 is to run libEnsemble on the launch nodes. This is often sufficient if the worker's simulation
@@ -209,13 +209,13 @@ libEnsemble on specific HPC systems.
     :maxdepth: 2
     :titlesonly:
 
+    aurora
     bebop
     frontier
     perlmutter
     polaris
     spock_crusher
     summit
-    theta
     srun
     example_scripts
 

diff --git a/docs/platforms/summit.rst b/docs/platforms/summit.rst
@@ -120,7 +120,6 @@ to execute on the launch nodes.
 
 It is recommended to run libEnsemble on the launch nodes (assuming workers are
 submitting MPI applications) using the ``local`` communications mode (multiprocessing).
-In the future, Balsam may be used to run libEnsemble on compute nodes.
 
 Interactive Runs
 ^^^^^^^^^^^^^^^^