Skip to content

Commit 3960ea0

Browse files
committed
Update the course for 2024, add week01 materials
1 parent d32671c commit 3960ea0

File tree

169 files changed

+1893
-18767
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

169 files changed

+1893
-18767
lines changed

.pre-commit-config.yaml

-18
This file was deleted.

README.md

+12-25
Original file line numberDiff line numberDiff line change
@@ -7,29 +7,15 @@ __This branch corresponds to the ongoing 2023 course. If you want to see full ma
77
- [__Week 1:__](./week01_intro) __Introduction__
88
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
99
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
10-
- [__Week 2:__](./week02_management_and_testing) __Experiment tracking, model and data versioning, testing DL code in Python__
11-
- Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.
12-
- Seminar: Example DVC+W&B project walkthrough. Intro to testing with pytest.
13-
- [__Week 3:__](./week03_fast_pipelines) __Training optimizations, profiling DL code__
14-
- Lecture: Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads.
15-
- Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of PyTorch Profiler, PyTorch TensorBoard Profiler and cProfile.
16-
- [__Week 4:__](./week04_distributed) __Basics of distributed ML__
17-
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
18-
- Seminar: Multiprocessing basics. Parallel GloVe training.
19-
- [__Week 5:__](./week05_data_parallel) __Data-parallel training and All-Reduce__
20-
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
21-
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
22-
- [__Week 6:__](./week06_large_models) __Training large models__
23-
- Lecture: Model parallelism, gradient checkpointing, offloading, sharding.
24-
- Seminar: Gradient checkpointing and tensor parallelism in practice.
25-
- [__Week 7:__](./week07_application_deployment) __Python web application deployment__
26-
- Lecture/Seminar: Building and deployment of production-ready web services. App & web servers, Docker, Prometheus, API via HTTP and gRPC.
27-
- [__Week 8:__](./week08_inference_software) __Software for serving neural networks__
28-
- Lecture/Seminar: Different formats for packing NN: ONNX, TorchScript, IR. Inference servers: OpenVINO, Triton. ML on client devices: TfJS, ML Kit, Core ML.
29-
- [__Week 9:__](./week09_compression) __Efficient model inference__
30-
- Lecture: Efficient Architectures, Knowledge distillation, Pruning, Quantization, Matrices decompositions.
31-
- Seminar: Quantization in practice.
32-
- [__Week 10:__](./week10_invited) __Optimizing the BLOOM inference API__ by [Nicolas Patry](https://github.com/Narsil), Hugging Face
10+
- __Week 2:__ __Experiment tracking, model and data versioning, testing DL code in Python__
11+
- __Week 3:__ __Training optimizations, profiling DL code__
12+
- __Week 4:__ __Basics of distributed ML__
13+
- __Week 5:__ __Data-parallel training and All-Reduce__
14+
- __Week 6:__ __Training large models__
15+
- __Week 7:__ __Python web application deployment__
16+
- __Week 8:__ __Software for serving neural networks__
17+
- __Week 9:__ __Efficient model inference__
18+
- __Week 10:__ __Guest lecture__
3319

3420
## Grading
3521
There will be several home assignments (spread over multiple weeks) on the following topics:
@@ -44,9 +30,10 @@ Please refer to the course page of your institution for details.
4430
- [Max Ryabinin](https://github.com/mryab)
4531
- [Just Heuristic](https://github.com/justheuristic)
4632
- [Alexander Markovich](https://github.com/markovka17)
47-
- [Alexey Kosmachev](https://github.com/ADKosm)
48-
- [Anton Semenkin](https://github.com/topshik/)
33+
- [Anton Chigin](https://github.com/achigin)
34+
- [Ruslan Khaidurov]()
4935

5036
# Past versions
37+
- [2023](https://github.com/mryab/efficient-dl-systems/tree/2023)
5138
- [2022](https://github.com/mryab/efficient-dl-systems/tree/2022)
5239
- [2021](https://github.com/yandexdataschool/dlatscale_draft)

week01_intro/README.md

+3
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
* Seminar + bonus home assignment: [link](./seminar.ipynb)
55

66
## Further reading
7+
* [CUDA MODE reading group Resource Stream](https://github.com/cuda-mode/resource-stream)
78
* [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) and [CUDA C++ Best Practices Guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
89
* [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
910
* [PyTorch Performance Tuning Guide](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)
@@ -13,3 +14,5 @@
1314
* [PyTorch Benchmark tutorial](https://pytorch.org/tutorials/recipes/recipes/benchmark.html)
1415
* Links on floating point precision in different libraries and environments: [1](https://discuss.pytorch.org/t/big-difference-between-torch-matmul-and-a-batch-of-torch-mm/101192) [2](https://github.com/pytorch/pytorch/issues/17678)
1516
* [On threading in PyTorch](https://github.com/pytorch/pytorch/issues/19001)
17+
* [Getting started with CUDA Graphs](https://developer.nvidia.com/blog/cuda-graphs/)
18+
* [Accelerating PyTorch with CUDA Graphs](https://github.com/pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)

week01_intro/lecture.pdf

78.8 KB
Binary file not shown.

0 commit comments

Comments
 (0)