deformable-attention-illustration.png
flash-attention-benchmark.png
flash-attention-tiling.png
learnable-weights-in-attention.png
mixed-precision-trainer.png
multi-scale-deformable-attention.png
positional-encoding-detail.png
scaled-dot-production-detail.png
transformer-architecture.png
transformer-qkv-attention-detail.png
Competition-Level-Code-Generation-with-AlphaCode.mda
Masked-AutoEncoders-Are-Scalable-Vision-Learners.md
Self-Attention-Does-Not-Need-Memory.md
attention-is-all-you-need.md
blockwise-parallel-transformer.md
constrastive-representation-distillation.mda
deformable-detr_deformable-transformers.md
efficient-large-scale-language-model-training.md
vit-an-image-is-worth-16x16-words.md
8-bit-floating-point-numbers.md
ActNN-Theorem-Prove-HaotianHe.pdf
GShard:Scaling Giant Models with Conditional Computation and Automatic Sharding.md
PyTorch Distributed-Data Parallel Training.md
Rammer-Enabling-Holistic-DL-Optimizations-with-rTasks.md
characterizing-deep-learning-training-workloads-on-alibaba-pai.md
deformable-convolutional-networks.md
designing-a-profiling-and-visualization-tool-for-scalable-and-in-depth-analysis-of-high-performance-gpu-clusters.md
fp8-formats-for-deep-learning.md
mixed-precision-training.md
You can’t perform that action at this time.