mryab
diff --git a/‎.pre-commit-config.yaml
-18 b/‎.pre-commit-config.yaml
-18
diff --git a/‎README.md
+12-25 b/‎README.md
+12-25
diff --git a/‎week01_intro/README.md
+3 b/‎week01_intro/README.md
+3
diff --git a/‎week01_intro/lecture.pdf
78.8 KB b/‎week01_intro/lecture.pdf
78.8 KB
@@ -7,29 +7,15 @@ __This branch corresponds to the ongoing 2023 course. If you want to see full ma
 - [__Week 1:__](./week01_intro) __Introduction__
   - Lecture: Course overview and organizational details. Core concepts of the GPU architecture and CUDA API.
   - Seminar: CUDA operations in PyTorch. Introduction to benchmarking.
-- [__Week 2:__](./week02_management_and_testing) __Experiment tracking, model and data versioning, testing DL code in Python__
-  - Lecture: Experiment management basics and pipeline versioning. Configuring Python applications. Intro to regular and property-based testing.
-  - Seminar: Example DVC+W&B project walkthrough. Intro to testing with pytest.
-- [__Week 3:__](./week03_fast_pipelines) __Training optimizations, profiling DL code__
-  - Lecture: Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads.
-  - Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of PyTorch Profiler, PyTorch TensorBoard Profiler and cProfile.
-- [__Week 4:__](./week04_distributed) __Basics of distributed ML__
-  - Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
-  - Seminar: Multiprocessing basics. Parallel GloVe training.
-- [__Week 5:__](./week05_data_parallel) __Data-parallel training and All-Reduce__
-  - Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
-  - Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
-- [__Week 6:__](./week06_large_models) __Training large models__
-  - Lecture: Model parallelism, gradient checkpointing, offloading, sharding.
-  - Seminar: Gradient checkpointing and tensor parallelism in practice.
-- [__Week 7:__](./week07_application_deployment) __Python web application deployment__
-  - Lecture/Seminar: Building and deployment of production-ready web services. App & web servers, Docker, Prometheus, API via HTTP and gRPC.
-- [__Week 8:__](./week08_inference_software) __Software for serving neural networks__
-  - Lecture/Seminar: Different formats for packing NN: ONNX, TorchScript, IR. Inference servers: OpenVINO, Triton. ML on client devices: TfJS, ML Kit, Core ML.
-- [__Week 9:__](./week09_compression) __Efficient model inference__
-  - Lecture: Efficient Architectures, Knowledge distillation, Pruning, Quantization, Matrices decompositions.
-  - Seminar: Quantization in practice.
-- [__Week 10:__](./week10_invited) __Optimizing the BLOOM inference API__ by [Nicolas Patry](https://github.com/Narsil), Hugging Face
+- __Week 2:__ __Experiment tracking, model and data versioning, testing DL code in Python__
+- __Week 3:__ __Training optimizations, profiling DL code__
+- __Week 4:__ __Basics of distributed ML__
+- __Week 5:__ __Data-parallel training and All-Reduce__
+- __Week 6:__ __Training large models__
+- __Week 7:__ __Python web application deployment__
+- __Week 8:__ __Software for serving neural networks__
+- __Week 9:__ __Efficient model inference__
+- __Week 10:__ __Guest lecture__
 
 ## Grading
 There will be several home assignments (spread over multiple weeks) on the following topics:
@@ -44,9 +30,10 @@ Please refer to the course page of your institution for details.
 - [Max Ryabinin](https://github.com/mryab)
 - [Just Heuristic](https://github.com/justheuristic)
 - [Alexander Markovich](https://github.com/markovka17)
-- [Alexey Kosmachev](https://github.com/ADKosm)
-- [Anton Semenkin](https://github.com/topshik/)
+- [Anton Chigin](https://github.com/achigin)
+- [Ruslan Khaidurov]()
 
 # Past versions
+- [2023](https://github.com/mryab/efficient-dl-systems/tree/2023)
 - [2022](https://github.com/mryab/efficient-dl-systems/tree/2022)
 - [2021](https://github.com/yandexdataschool/dlatscale_draft)
@@ -4,6 +4,7 @@
 * Seminar + bonus home assignment: [link](./seminar.ipynb)
 
 ## Further reading
+* [CUDA MODE reading group Resource Stream](https://github.com/cuda-mode/resource-stream)
 * [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) and [CUDA C++ Best Practices Guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
 * [How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
 * [PyTorch Performance Tuning Guide](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html)
@@ -13,3 +14,5 @@
 * [PyTorch Benchmark tutorial](https://pytorch.org/tutorials/recipes/recipes/benchmark.html)
 * Links on floating point precision in different libraries and environments: [1](https://discuss.pytorch.org/t/big-difference-between-torch-matmul-and-a-batch-of-torch-mm/101192) [2](https://github.com/pytorch/pytorch/issues/17678) 
 * [On threading in PyTorch](https://github.com/pytorch/pytorch/issues/19001)
+* [Getting started with CUDA Graphs](https://developer.nvidia.com/blog/cuda-graphs/)
+* [Accelerating PyTorch with CUDA Graphs](https://github.com/pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)