Files

books
courses
datasets
frameworks
hardware
imgs
memory-efficiency
network-communication
operators
package-management
papers
- AIGC
- ISPD-2022
- LLM
- NLP
- RLHF
- attribute-recognition
- checkpoint
- classification
- cluster_scheduler
- communication
- compiler
- data
- elastic-training
- explain
- fasterMoe
- fault-tolerant
- frameworks
- generative
- imgs
- inference
- knownledge-distillation
- large-model-inference
- moe
- multimodal
- object-detection
- parallelism
- pruning
- quantization
- recommender
- reinforcement-learning
- segmentation
- sparsity
- text-image-synthesis
- transformer
  - imgs
    - 1F1B-interleaved.png
    - RNN-vs-Transformer.png
    - attention-weights.png
    - deformable-attention-illustration.png
    - deformable-attention.png
    - flash-attention-benchmark.png
    - flash-attention-tiling.png
    - flash-attn-algo-1.png
    - gpipe-microbatch.png
    - learnable-weights-in-attention.png
    - mixed-precision-trainer.png
    - multi-head-attention.png
    - multi-scale-deformable-attention.png
    - positional-encoding-detail.png
    - scaled-dot-attention.png
    - scaled-dot-production-detail.png
    - transformer-architecture.png
    - transformer-qkv-attention-detail.png
    - transformer-qkv.png
    - vit-params.png
    - vit.png
  - Competition-Level-Code-Generation-with-AlphaCode.mda
  - Masked-AutoEncoders-Are-Scalable-Vision-Learners.md
  - Self-Attention-Does-Not-Need-Memory.md
  - attention-is-all-you-need.md
  - blockwise-parallel-transformer.md
  - constrastive-representation-distillation.mda
  - deformable-detr_deformable-transformers.md
  - efficient-large-scale-language-model-training.md
  - flash-attention-2.md
  - flash-attention.md
  - flash-decoding++.md
  - flash-decoding.md
  - lightseq.md
  - vit-an-image-is-worth-16x16-words.md
- 8-bit-floating-point-numbers.md
- ActNN-Theorem-Prove-HaotianHe.pdf
- ActNN-source-code.md
- ActNN.md
- GSPMD.md
- GShard：Scaling Giant Models with Conditional Computation and Automatic Sharding.md
- ImageNet.md
- Inception-GoogLeNet.md
- LeNet.md
- PyTorch Distributed-Data Parallel Training.md
- README.md
- Rammer-Enabling-Holistic-DL-Optimizations-with-rTasks.md
- ZeRO-Offload.md
- ZeRO.md
- batch-normalization.md
- capuchin.md
- characterizing-deep-learning-training-workloads-on-alibaba-pai.md
- cnn-concepts.md
- deformable-convolutional-networks.md
- designing-a-profiling-and-visualization-tool-for-scalable-and-in-depth-analysis-of-high-performance-gpu-clusters.md
- fp8-formats-for-deep-learning.md
- layer-normalization.md
- mixed-precision-training.md
- pathways.md
- persisten-threads.md
- rammer-source-code.md
- stegastamp.md
python
storage
videos
.gitignore
README.md
concepts.md
cplusplus-concepts.md
map.md
time-event-log.md