You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Lecture: Mixed-precision training. Data storage and loading optimizations. Tools for profiling deep learning workloads.
15
-
- Seminar: Automatic Mixed Precision in PyTorch. Dynamic padding for sequence data and JPEG decoding benchmarks. Basics of PyTorch Profiler, PyTorch TensorBoard Profiler and cProfile.
16
-
-[__Week 4:__](./week04_distributed)__Basics of distributed ML__
17
-
- Lecture: Introduction to distributed training. Process-based communication. Parameter Server architecture.
-[__Week 5:__](./week05_data_parallel)__Data-parallel training and All-Reduce__
20
-
- Lecture: Data-parallel training of neural networks. All-Reduce and its efficient implementations.
21
-
- Seminar: Introduction to PyTorch Distributed. Data-parallel training primitives.
22
-
-[__Week 6:__](./week06_large_models)__Training large models__
23
-
- Lecture: Model parallelism, gradient checkpointing, offloading, sharding.
24
-
- Seminar: Gradient checkpointing and tensor parallelism in practice.
25
-
-[__Week 7:__](./week07_application_deployment)__Python web application deployment__
26
-
- Lecture/Seminar: Building and deployment of production-ready web services. App & web servers, Docker, Prometheus, API via HTTP and gRPC.
27
-
-[__Week 8:__](./week08_inference_software)__Software for serving neural networks__
28
-
- Lecture/Seminar: Different formats for packing NN: ONNX, TorchScript, IR. Inference servers: OpenVINO, Triton. ML on client devices: TfJS, ML Kit, Core ML.
29
-
-[__Week 9:__](./week09_compression)__Efficient model inference__
Copy file name to clipboardExpand all lines: week01_intro/README.md
+3
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@
4
4
* Seminar + bonus home assignment: [link](./seminar.ipynb)
5
5
6
6
## Further reading
7
+
*[CUDA MODE reading group Resource Stream](https://github.com/cuda-mode/resource-stream)
7
8
*[CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html) and [CUDA C++ Best Practices Guide](https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html)
8
9
*[How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog](https://siboehm.com/articles/22/CUDA-MMM)
* Links on floating point precision in different libraries and environments: [1](https://discuss.pytorch.org/t/big-difference-between-torch-matmul-and-a-batch-of-torch-mm/101192)[2](https://github.com/pytorch/pytorch/issues/17678)
15
16
*[On threading in PyTorch](https://github.com/pytorch/pytorch/issues/19001)
17
+
*[Getting started with CUDA Graphs](https://developer.nvidia.com/blog/cuda-graphs/)
18
+
*[Accelerating PyTorch with CUDA Graphs](https://github.com/pytorch.org/blog/accelerating-pytorch-with-cuda-graphs/)
0 commit comments