Skip to content

Commit eb7c257

Browse files
committed
Update the course for 2025, add week01 materials
1 parent e5f860d commit eb7c257

File tree

135 files changed

+237
-16569
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

135 files changed

+237
-16569
lines changed

README.md

Lines changed: 16 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,21 @@
11
# Efficient Deep Learning Systems
22
This repository contains materials for the Efficient Deep Learning Systems course taught at the [Faculty of Computer Science](https://cs.hse.ru/en/) of [HSE University](https://www.hse.ru/en/) and [Yandex School of Data Analysis](https://academy.yandex.com/dataschool/).
33

4-
__This branch corresponds to the ongoing 2024 course. If you want to see full materials of past years, see the ["Past versions"](#past-versions) section.__
4+
__This branch corresponds to the ongoing 2025 course. If you want to see full materials of past years, see the ["Past versions"](#past-versions) section.__
55

66
# Syllabus
77
- [__Week 1:__](./week01_intro) __Introduction__
88
- Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
99
- Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
10-
- [__Week 2:__](./week02_management_and_testing) __Experiment tracking, model and data versioning, testing DL code in Python__
11-
- Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.
12-
- Seminar: Example DVC+Weights & Biases project walkthrough. Intro to testing with pytest.
13-
- [__Week 3:__](./week03_fast_pipelines) __Training optimizations, profiling DL code__
14-
- Lecture: Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads.
15-
- Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of profiling with py-spy, PyTorch Profiler, PyTorch TensorBoard Profiler, nvprof and Nsight Systems.
16-
- [__Week 4:__](./week04_distributed) __Basics of distributed ML__
17-
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
18-
- Seminar: Multiprocessing basics. Parallel GloVe training.
19-
- [__Week 5:__](./week05_data_parallel) __Data-parallel training and All-Reduce__
20-
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
21-
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
22-
- [__Week 6:__](./week06_large_models) __Training large models__
23-
- Lecture: Model parallelism, gradient checkpointing, offloading, sharding.
24-
- Seminar: Gradient checkpointing and tensor parallelism in practice.
25-
- [__Week 7:__](./week07_application_deployment) __Python web application deployment__
26-
- Lecture/Seminar: Building and deployment of production-ready web services. App & web servers, Docker, Prometheus, API via HTTP and gRPC.
27-
- [__Week 8:__](./week08_inference_software) __LLM inference optimizations and software__
28-
- Lecture: Inference speed metrics. KV caching, batch inference, continuous batching. FlashAttention with its modifications and PagedAttention. Overview of popular LLM serving frameworks.
29-
- Seminar: Basics of the Triton language. Layer fusion in PyTorch and Triton. Implementation of KV caching, FlashAttention in practice.
30-
- [__Week 9:__](./week09_compression) __Efficient model inference__
31-
- Lecture: Hardware utilization metrics for deep learning. Knowledge distillation, quantization, LLM.int8(), SmoothQuant, GPTQ. Efficient model architectures. Speculative decoding.
32-
- Seminar: Measuring Memory Bandwidth Utilization in practice. Data-free quantization, GPTq, and SmoothQuant in PyTorch.
33-
- [__Week 10:__](./week10_invited) __MLOps, k8s, GitOps and other acronyms__ by [Gleb Vazhenin](https://github.com/punkerpunker), Bumble
10+
- __Week 2:__ __Experiment tracking, model and data versioning, testing DL code in Python__
11+
- __Week 3:__ __Training optimizations, profiling DL code__
12+
- __Week 4:__ __Data-parallel training and All-Reduce__
13+
- __Week 5:__ __Sharded data-parallel training, distributed training optimizations
14+
- __Week 6:__ __Training large models__
15+
- __Week 7:__ __Python web application deployment__
16+
- __Week 8:__ __LLM inference optimizations and software__
17+
- __Week 9:__ __Efficient model inference__
18+
- __Week 10:__ Guest lecture
3419

3520
## Grading
3621
There will be several home assignments (spread over multiple weeks) on the following topics:
@@ -44,11 +29,15 @@ Please refer to the course page of your institution for details.
4429
# Staff
4530
- [Max Ryabinin](https://github.com/mryab)
4631
- [Just Heuristic](https://github.com/justheuristic)
47-
- [Alexander Markovich](https://github.com/markovka17)
32+
- [Yaroslav Zolotarev](https://github.com/Q-c7)
33+
- [Gregory Leleytner](https://github.com/RunFMe)
34+
- [Antony Frolov](https://github.com/antony-frolov)
4835
- [Anton Chigin](https://github.com/achigin)
49-
- [Ruslan Khaidurov](https://github.com/newokaerinasai)
36+
- [Alexander Markovich](https://github.com/markovka17)
37+
- [Roman Gorb](https://github.com/rvg77)
5038

5139
# Past versions
40+
- [2024](https://github.com/mryab/efficient-dl-systems/tree/2024)
5241
- [2023](https://github.com/mryab/efficient-dl-systems/tree/2023)
5342
- [2022](https://github.com/mryab/efficient-dl-systems/tree/2022)
5443
- [2021](https://github.com/yandexdataschool/dlatscale_draft)

week01_intro/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
# Week 1: Introduction
22

33
* Lecture: [link](./lecture.pdf)
4-
* Seminar + bonus home assignment: [link](./seminar.ipynb)
4+
* Seminar: [link](./seminar.ipynb)
55

66
## Further reading
77
* [CUDA MODE reading group Resource Stream](https://github.com/cuda-mode/resource-stream)
88
* [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) and [CUDA C++ Best Practices Guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
9+
* [Modal GPU Glossary](https://modal.com/gpu-glossary)
910
* [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
11+
* [GPU Puzzles](https://github.com/srush/GPU-Puzzles)
1012
* [PyTorch Performance Tuning Guide](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)
1113
* [Earlier version of this guide from NVIDIA](https://tigress-web.princeton.edu/~jdh4/PyTorchPerformanceTuningGuide_GTC2021.pdf)
1214
* [Docs for caching memory allocation in PyTorch](https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management)

week01_intro/lecture.pdf

420 KB
Binary file not shown.

0 commit comments

Comments
 (0)