Files

books
courses
datasets
frameworks
hardware
imgs
memory-efficiency
network-communication
operators
package-management
papers
- AIGC
- ISPD-2022
- LLM
- NLP
- RLHF
- attribute-recognition
- checkpoint
- classification
- cluster_scheduler
- communication
- compiler
- data
- elastic-training
- explain
- fasterMoe
- fault-tolerant
- frameworks
- generative
- imgs
- inference
- knownledge-distillation
- large-model-inference
- moe
  - imgs
    - 3-gpus-moe-examples.png
    - TED-topo-non-expert-blocks.png
    - TPU-communication-performance.png
    - a100-dgx-intra-network-topo
    - dgx-2-network-topology.png
    - ffn_layer_slow_down.png
    - fwd-pass-of-an-moe-layer.png
    - gate-value-2017.png
    - gpus-num-equation.png
    - layer-output-2017.png
    - loss-importance.png
    - mixture-of-experts-layer.png
    - moe_different_parallelism.png
    - noisy-topk-gating.png
    - se-moe-overall-training-design.png
    - single-moe-layer-fwd.png
    - st-differentiable-load-balancing-loss.png
    - st-token-routing-dynamics.png
    - switch-transformer-encoder-block.png
    - throughput-different-dispatch-layout.png
    - token-routing-dynamics.png
  - faster-moe.md
  - gshard.md
  - hybrid-deepspeed-ted.md
  - megablocks-code.md
  - megablocks.md
  - moe-meets-instruction-tuning.md
  - outrageously-large-neural-networks-the-sparsely-gated-mixture-of-experts-layer.md
  - scalable-and-efficient-moe-training.md
  - se-moe.md
  - st-moe.md
  - switch-transformer.py
  - switch-transformers.md
  - tutel.md
- multimodal
- object-detection
- parallelism
- pruning
- quantization
- recommender
- reinforcement-learning
- segmentation
- sparsity
- text-image-synthesis
- transformer
- 8-bit-floating-point-numbers.md
- ActNN-Theorem-Prove-HaotianHe.pdf
- ActNN-source-code.md
- ActNN.md
- GSPMD.md
- GShard：Scaling Giant Models with Conditional Computation and Automatic Sharding.md
- ImageNet.md
- Inception-GoogLeNet.md
- LeNet.md
- PyTorch Distributed-Data Parallel Training.md
- README.md
- Rammer-Enabling-Holistic-DL-Optimizations-with-rTasks.md
- ZeRO-Offload.md
- ZeRO.md
- batch-normalization.md
- capuchin.md
- characterizing-deep-learning-training-workloads-on-alibaba-pai.md
- cnn-concepts.md
- deformable-convolutional-networks.md
- designing-a-profiling-and-visualization-tool-for-scalable-and-in-depth-analysis-of-high-performance-gpu-clusters.md
- fp8-formats-for-deep-learning.md
- layer-normalization.md
- mixed-precision-training.md
- pathways.md
- persisten-threads.md
- rammer-source-code.md
- stegastamp.md
python
storage
videos
.gitignore
README.md
concepts.md
cplusplus-concepts.md
map.md
time-event-log.md

st-differentiable-load-balancing-loss.png

History

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

st-differentiable-load-balancing-loss.png

st-differentiable-load-balancing-loss.png

Files

st-differentiable-load-balancing-loss.png

Latest commit

History

st-differentiable-load-balancing-loss.png

File metadata and controls