Skip to content

Commit 9653f18

Browse files
Merge pull request #843 from mlcommons/dev
Dev -> main
2 parents bf61255 + 5c4c07d commit 9653f18

File tree

223 files changed

+2108
-1300
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

223 files changed

+2108
-1300
lines changed

.github/workflows/CI.yml

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ jobs:
77
runs-on: ubuntu-latest
88
steps:
99
- uses: actions/checkout@v3
10-
- name: Set up Python 3.9
10+
- name: Set up Python 3.11.10
1111
uses: actions/setup-python@v4
1212
with:
13-
python-version: 3.9
13+
python-version: 3.11.10
1414
cache: 'pip' # Cache pip dependencies\.
1515
cache-dependency-path: '**/setup.py'
1616
- name: Install Modules and Run
@@ -25,10 +25,10 @@ jobs:
2525
runs-on: ubuntu-latest
2626
steps:
2727
- uses: actions/checkout@v3
28-
- name: Set up Python 3.9
28+
- name: Set up Python 3.11.10
2929
uses: actions/setup-python@v4
3030
with:
31-
python-version: 3.9
31+
python-version: 3.11.10
3232
cache: 'pip' # Cache pip dependencies\.
3333
cache-dependency-path: '**/setup.py'
3434
- name: Install Modules and Run
@@ -42,10 +42,10 @@ jobs:
4242
runs-on: ubuntu-latest
4343
steps:
4444
- uses: actions/checkout@v3
45-
- name: Set up Python 3.9
45+
- name: Set up Python 3.11.10
4646
uses: actions/setup-python@v4
4747
with:
48-
python-version: 3.9
48+
python-version: 3.11.10
4949
cache: 'pip' # Cache pip dependencies\.
5050
cache-dependency-path: '**/setup.py'
5151
- name: Install Modules and Run
@@ -59,10 +59,10 @@ jobs:
5959
runs-on: ubuntu-latest
6060
steps:
6161
- uses: actions/checkout@v3
62-
- name: Set up Python 3.9
62+
- name: Set up Python 3.11.10
6363
uses: actions/setup-python@v4
6464
with:
65-
python-version: 3.9
65+
python-version: 3.11.10
6666
cache: 'pip' # Cache pip dependencies\.
6767
cache-dependency-path: '**/setup.py'
6868
- name: Install Modules and Run
@@ -77,10 +77,10 @@ jobs:
7777
runs-on: ubuntu-latest
7878
steps:
7979
- uses: actions/checkout@v3
80-
- name: Set up Python 3.9
80+
- name: Set up Python 3.11.10
8181
uses: actions/setup-python@v4
8282
with:
83-
python-version: 3.9
83+
python-version: 3.11.10
8484
cache: 'pip' # Cache pip dependencies\.
8585
cache-dependency-path: '**/setup.py'
8686
- name: Install Modules and Run
@@ -96,10 +96,10 @@ jobs:
9696
runs-on: ubuntu-latest
9797
steps:
9898
- uses: actions/checkout@v3
99-
- name: Set up Python 3.9
99+
- name: Set up Python 3.11.10
100100
uses: actions/setup-python@v4
101101
with:
102-
python-version: 3.9
102+
python-version: 3.11.10
103103
cache: 'pip' # Cache pip dependencies\.
104104
cache-dependency-path: '**/setup.py'
105105
- name: Install Modules and Run
@@ -113,10 +113,10 @@ jobs:
113113
runs-on: ubuntu-latest
114114
steps:
115115
- uses: actions/checkout@v3
116-
- name: Set up Python 3.9
116+
- name: Set up Python 3.11.10
117117
uses: actions/setup-python@v4
118118
with:
119-
python-version: 3.9
119+
python-version: 3.11.10
120120
cache: 'pip' # Cache pip dependencies\.
121121
cache-dependency-path: '**/setup.py'
122122
- name: Install Modules and Run
@@ -130,10 +130,10 @@ jobs:
130130
runs-on: ubuntu-latest
131131
steps:
132132
- uses: actions/checkout@v3
133-
- name: Set up Python 3.9
133+
- name: Set up Python 3.11.10
134134
uses: actions/setup-python@v4
135135
with:
136-
python-version: 3.9
136+
python-version: 3.11.10
137137
cache: 'pip' # Cache pip dependencies\.
138138
cache-dependency-path: '**/setup.py'
139139
- name: Install Modules and Run
@@ -148,10 +148,10 @@ jobs:
148148
runs-on: ubuntu-latest
149149
steps:
150150
- uses: actions/checkout@v3
151-
- name: Set up Python 3.9
151+
- name: Set up Python 3.11.10
152152
uses: actions/setup-python@v4
153153
with:
154-
python-version: 3.9
154+
python-version: 3.11.10
155155
cache: 'pip' # Cache pip dependencies\.
156156
cache-dependency-path: '**/setup.py'
157157
- name: Install Modules and Run
@@ -166,10 +166,10 @@ jobs:
166166
runs-on: ubuntu-latest
167167
steps:
168168
- uses: actions/checkout@v3
169-
- name: Set up Python 3.9
169+
- name: Set up Python 3.11.10
170170
uses: actions/setup-python@v4
171171
with:
172-
python-version: 3.9
172+
python-version: 3.11.10
173173
cache: 'pip' # Cache pip dependencies\.
174174
cache-dependency-path: '**/setup.py'
175175
- name: Install Modules and Run
@@ -184,10 +184,10 @@ jobs:
184184
runs-on: ubuntu-latest
185185
steps:
186186
- uses: actions/checkout@v3
187-
- name: Set up Python 3.9
187+
- name: Set up Python 3.11.10
188188
uses: actions/setup-python@v4
189189
with:
190-
python-version: 3.9
190+
python-version: 3.11.10
191191
cache: 'pip' # Cache pip dependencies\.
192192
cache-dependency-path: '**/setup.py'
193193
- name: Install pytest
@@ -199,7 +199,7 @@ jobs:
199199
pip install .[pytorch_cpu]
200200
- name: Run pytest tests
201201
run: |
202-
pytest -vx tests/version_test.py
202+
pytest -vx tests/test_version.py
203203
pytest -vx tests/test_num_params.py
204204
pytest -vx tests/test_param_shapes.py
205205
pytest -vx tests/test_param_types.py
@@ -208,10 +208,10 @@ jobs:
208208
runs-on: ubuntu-latest
209209
steps:
210210
- uses: actions/checkout@v3
211-
- name: Set up Python 3.9
211+
- name: Set up Python 3.11.10
212212
uses: actions/setup-python@v4
213213
with:
214-
python-version: 3.9
214+
python-version: 3.11.10
215215
cache: 'pip' # Cache pip dependencies\.
216216
cache-dependency-path: '**/setup.py'
217217
- name: Install pytest

.github/workflows/linting.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,17 @@ jobs:
77
runs-on: ubuntu-latest
88
steps:
99
- uses: actions/checkout@v2
10-
- name: Set up Python 3.9
10+
- name: Set up Python 3.11.10
1111
uses: actions/setup-python@v2
1212
with:
13-
python-version: 3.9
13+
python-version: 3.11.10
1414
- name: Install pylint
1515
run: |
1616
python -m pip install --upgrade pip
1717
pip install pylint==2.16.1
1818
- name: Run pylint
1919
run: |
20-
pylint algorithmic_efficiency
20+
pylint algoperf
2121
pylint reference_algorithms
2222
pylint prize_qualification_baselines
2323
pylint submission_runner.py
@@ -27,14 +27,14 @@ jobs:
2727
runs-on: ubuntu-latest
2828
steps:
2929
- uses: actions/checkout@v2
30-
- name: Set up Python 3.9
30+
- name: Set up Python 3.11.10
3131
uses: actions/setup-python@v2
3232
with:
33-
python-version: 3.9
33+
python-version: 3.11.10
3434
- name: Install isort
3535
run: |
3636
python -m pip install --upgrade pip
37-
pip install isort
37+
pip install isort==5.12.0
3838
- name: Run isort
3939
run: |
4040
isort . --check --diff
@@ -43,14 +43,14 @@ jobs:
4343
runs-on: ubuntu-latest
4444
steps:
4545
- uses: actions/checkout@v2
46-
- name: Set up Python 3.9
46+
- name: Set up Python 3.11.10
4747
uses: actions/setup-python@v2
4848
with:
49-
python-version: 3.9
49+
python-version: 3.11.10
5050
- name: Install yapf
5151
run: |
5252
python -m pip install --upgrade pip
53-
pip install yapf==0.32
53+
pip install yapf==0.32 toml
5454
- name: Run yapf
5555
run: |
5656
yapf . --diff --recursive

.github/workflows/regression_tests_variants.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ jobs:
7272
run: |
7373
docker pull us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_${{ github.head_ref || github.ref_name }}
7474
docker run -v $HOME/data/:/data/ -v $HOME/experiment_runs/:/experiment_runs -v $HOME/experiment_runs/logs:/logs --gpus all --ipc=host us-central1-docker.pkg.dev/training-algorithms-external/mlcommons-docker-repo/algoperf_pytorch_${{ github.head_ref || github.ref_name }} -d criteo1tb -f pytorch -s reference_algorithms/paper_baselines/adamw/pytorch/submission.py -w criteo1tb_resnet -t reference_algorithms/paper_baselines/adamw/tuning_search_space.json -e tests/regression_tests/adamw -m 10 -c False -o True -r false
75-
criteo_resnet_pytorch:
75+
criteo_embed_init_pytorch:
7676
runs-on: self-hosted
7777
needs: build_and_push_pytorch_docker_image
7878
steps:

.gitignore

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ makefile
1212
*.swp
1313
*/data/
1414
*events.out.tfevents*
15-
algorithmic_efficiency/workloads/librispeech_conformer/data_dir
16-
algorithmic_efficiency/workloads/librispeech_conformer/work_dir
15+
algoperf/workloads/librispeech_conformer/data_dir
16+
algoperf/workloads/librispeech_conformer/work_dir
1717
*.flac
1818
*.npy
1919
*.csv
@@ -23,4 +23,6 @@ wandb/
2323
scoring/plots/
2424

2525
!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_0/eval_measurements.csv
26-
!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_1/eval_measurements.csv
26+
!scoring/test_data/experiment_dir/study_0/mnist_jax/trial_1/eval_measurements.csv
27+
28+
algoperf/_version.py

README.md

Lines changed: 36 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66
</p>
77

88
<p align="center">
9-
<a href="https://arxiv.org/abs/2306.07179" target="_blank">Paper (arXiv)</a> •
10-
<a href="/CALL_FOR_SUBMISSIONS.md">Call for Submissions</a> •
11-
<a href="/GETTING_STARTED.md">Getting Started</a> •
12-
<a href="/COMPETITION_RULES.md">Competition Rules</a> •
13-
<a href="/DOCUMENTATION.md">Documentation</a> •
14-
<a href="/CONTRIBUTING.md">Contributing</a>
9+
<a href="https://github.com/mlcommons/submissions_algorithms">Leaderboard</a> •
10+
<a href="/docs/GETTING_STARTED.md">Getting Started</a> •
11+
<a href="https://github.com/mlcommons/submissions_algorithms">Submit</a> •
12+
<a href="/docs/DOCUMENTATION.md">Documentation</a> •
13+
<a href="/docs/CONTRIBUTING.md">Contributing</a> •
14+
<a href="https://arxiv.org/abs/2306.07179" target="_blank">Benchmark</a>/<a href="https://openreview.net/forum?id=CtM5xjRSfm" target="_blank">Results</a> Paper
1515
</p>
1616

1717
[![CI](https://github.com/mlcommons/algorithmic-efficiency/actions/workflows/CI.yml/badge.svg)](https://github.com/mlcommons/algorithmic-efficiency/actions/workflows/CI.yml)
@@ -22,19 +22,21 @@
2222

2323
---
2424

25-
> *AlgoPerf* is a suite of benchmarks and competitions to measure neural network training speedups due to algorithmic improvements in both training algorithms and models. This is the repository for the *AlgoPerf: Training Algorithms benchmark* and its associated competition. It is developed by the [MLCommons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/). This repository holds the [**competition rules**](/COMPETITION_RULES.md), the [**technical documentation**](/DOCUMENTATION.md) of the benchmark, [**getting started guides**](/GETTING_STARTED.md), and the benchmark code. For a detailed description of the benchmark design, see our [**paper**](https://arxiv.org/abs/2306.07179).
26-
25+
> This is the repository for the *AlgoPerf: Training Algorithms benchmark* measuring neural network training speedups due to algorithmic improvements.
26+
> It is developed by the [MLCommons Algorithms Working Group](https://mlcommons.org/en/groups/research-algorithms/).
27+
> This repository holds the benchmark code, the benchmark's [**technical documentation**](/docs/DOCUMENTATION.md) and [**getting started guides**](/docs/GETTING_STARTED.md). For a detailed description of the benchmark design, see our [**introductory paper**](https://arxiv.org/abs/2306.07179), for the results of the inaugural competition see our [**results paper**](https://openreview.net/forum?id=CtM5xjRSfm).
28+
>
29+
> **See our [AlgoPerf Leaderboard](https://github.com/mlcommons/submissions_algorithms) for the latest results of the benchmark and to submit your algorithm.**
2730
---
2831

2932
> [!IMPORTANT]
30-
> The results of the inaugural AlgoPerf: Training Algorithms benchmark competition have been announced. See the [MLCommons blog post](https://mlcommons.org/2024/08/mlc-algoperf-benchmark-competition/) for an overview and the [results page](https://mlcommons.org/benchmarks/algorithms/) for more details on the results. We are currently preparing an in-depth analysis of the results in the form of a paper and plan the next iteration of the benchmark competition.
33+
> For future iterations of the AlgoPerf: Training Algorithms benchmark competition, we are switching to a rolling leaderboard, making a few changes to the competition rules, and also run all selected submissions on our hardware. **To submit your algorithm to the next iteration of the benchmark, please see our [How to Submit](#how-to-submit) section and the [submission repository](https://github.com/mlcommons/submissions_algorithms) which hosts the up to date AlgoPerf leaderboard.**
3134
3235
## Table of Contents <!-- omit from toc -->
3336

3437
- [Installation](#installation)
3538
- [Getting Started](#getting-started)
36-
- [Call for Submissions](#call-for-submissions)
37-
- [Competition Rules](#competition-rules)
39+
- [How to Submit](#how-to-submit)
3840
- [Technical Documentation of the Benchmark \& FAQs](#technical-documentation-of-the-benchmark--faqs)
3941
- [Contributing](#contributing)
4042
- [License](#license)
@@ -45,9 +47,9 @@
4547
> [!TIP]
4648
> **If you have any questions about the benchmark competition or you run into any issues, please feel free to contact us.** Either [file an issue](https://github.com/mlcommons/algorithmic-efficiency/issues), ask a question on [our Discord](https://discord.gg/5FPXK7SMt6) or [join our weekly meetings](https://mlcommons.org/en/groups/research-algorithms/).
4749
48-
You can install this package and dependencies in a [Python virtual environment](/GETTING_STARTED.md#python-virtual-environment) or use a [Docker/Singularity/Apptainer container](/GETTING_STARTED.md#docker) (recommended).
50+
You can install this package and dependencies in a [Python virtual environment](/docs/GETTING_STARTED.md#python-virtual-environment) or use a [Docker/Singularity/Apptainer container](/docs/GETTING_STARTED.md#docker) (recommended).
4951
We recommend using a Docker container (or alternatively, a Singularity/Apptainer container) to ensure a similar environment to our scoring and testing environments.
50-
Both options are described in detail in the [**Getting Started**](/GETTING_STARTED.md) document.
52+
Both options are described in detail in the [**Getting Started**](/docs/GETTING_STARTED.md) document.
5153

5254
*TL;DR to install the Jax version for GPU run:*
5355

@@ -67,7 +69,7 @@ pip3 install -e '.[full]'
6769

6870
## Getting Started
6971

70-
For detailed instructions on developing and scoring your own algorithm in the benchmark see the [Getting Started](/GETTING_STARTED.md) document.
72+
For detailed instructions on developing your own algorithm in the benchmark see the [Getting Started](/docs/GETTING_STARTED.md) document.
7173

7274
*TL;DR running a JAX workload:*
7375

@@ -93,23 +95,19 @@ python3 submission_runner.py \
9395
--tuning_search_space=reference_algorithms/paper_baselines/adamw/tuning_search_space.json
9496
```
9597

96-
## Call for Submissions
97-
98-
The [Call for Submissions](/CALL_FOR_SUBMISSIONS.md) announces the first iteration of the AlgoPerf: Training Algorithms competition based on the benchmark by the same name. This document also contains the schedule and key dates for the competition.
99-
100-
### Competition Rules
98+
## How to Submit
10199

102-
The competition rules for the *AlgoPerf: Training Algorithms* competition can be found in the separate [**Competition Rules**](/COMPETITION_RULES.md) document.
100+
Once you have developed your training algorithm, you can submit it to the benchmark by creating a pull request to the [submission repository](https://github.com/mlcommons/submissions_algorithms), which hosts the AlgoPerf leaderboard. The AlgoPerf working group will review your PR. Based on our available resources and the perceived potential of the method, it will be selected for a free evaluation. If selected, we will run your algorithm on our hardware and update the leaderboard with the results.
103101

104102
### Technical Documentation of the Benchmark & FAQs
105103

106-
We provide additional technical documentation of the benchmark and answer frequently asked questions in a separate [**Documentation**](/DOCUMENTATION.md) page. Suggestions, clarifications and questions can be raised via pull requests, creating an issue, or by sending an email to the [working group](mailto:[email protected]).
104+
We provide a technical documentation of the benchmark and answer frequently asked questions in a separate [**Documentation**](/docs/DOCUMENTATION.md) page. This includes which types of submissions are allowed. Please ensure that your submission is compliant with these rules before submitting. Suggestions, clarifications and questions can be raised via pull requests, creating an issue, or by sending an email to the [working group](mailto:[email protected]).
107105

108106
## Contributing
109107

110108
We invite everyone to look through our rules, documentation, and codebase and submit issues and pull requests, e.g. for rules changes, clarifications, or any bugs you might encounter. If you are interested in contributing to the work of the working group and influence the benchmark's design decisions, please [join the weekly meetings](https://mlcommons.org/en/groups/research-algorithms/) and consider becoming a member of the working group.
111109

112-
Our [**Contributing**](/CONTRIBUTING.md) document provides further MLCommons contributing guidelines and additional setup and workflow instructions.
110+
Our [**Contributing**](/docs/CONTRIBUTING.md) document provides further MLCommons contributing guidelines and additional setup and workflow instructions.
113111

114112
## License
115113

@@ -134,3 +132,19 @@ If you are using the *AlgoPerf benchmark*, its codebase, baselines, or workloads
134132
eprint = {2306.07179},
135133
}
136134
```
135+
136+
If you use the results from the first *AlgoPerf competition*, please consider citing the results paper, as well as the relevant submissions:
137+
138+
> [Kasimbeg, Schneider, Eschenhagen, et al.<br/>
139+
> **Accelerating neural network training: An analysis of the AlgoPerf competition**<br/>
140+
> ICLR 2025](https://openreview.net/forum?id=CtM5xjRSfm)
141+
142+
```bibtex
143+
@inproceedings{Kasimbeg2025AlgoPerfResults,
144+
title = {Accelerating neural network training: An analysis of the {AlgoPerf} competition},
145+
author = {Kasimbeg, Priya and Schneider, Frank and Eschenhagen, Runa and Bae, Juhan and Sastry, Chandramouli Shama and Saroufim, Mark and Boyuan, Feng and Wright, Less and Yang, Edward Z. and Nado, Zachary and Medapati, Sourabh and Hennig, Philipp and Rabbat, Michael and Dahl, George E.},
146+
booktitle = {The Thirteenth International Conference on Learning Representations},
147+
year = {2025},
148+
url = {https://openreview.net/forum?id=CtM5xjRSfm}
149+
}
150+
```

algoperf/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
"""Algorithmic Efficiency."""
2+
3+
from ._version import version as __version__
4+
5+
__all__ = ["__version__"]

algorithmic_efficiency/checkpoint_utils.py renamed to algoperf/checkpoint_utils.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@
1616
from tensorflow.io import gfile # pytype: disable=import-error
1717
import torch
1818

19-
from algorithmic_efficiency import spec
20-
from algorithmic_efficiency.pytorch_utils import pytorch_setup
19+
from algoperf import spec
20+
from algoperf.pytorch_utils import pytorch_setup
2121

2222
_, _, DEVICE, _ = pytorch_setup()
2323
CheckpointReturn = Tuple[spec.OptimizerState,
@@ -231,7 +231,7 @@ def save_checkpoint(framework: str,
231231
target=checkpoint_state,
232232
step=global_step,
233233
overwrite=True,
234-
keep=np.Inf if save_intermediate_checkpoints else 1)
234+
keep=np.inf if save_intermediate_checkpoints else 1)
235235
else:
236236
if not save_intermediate_checkpoints:
237237
checkpoint_files = gfile.glob(

0 commit comments

Comments
 (0)