Open
Description
Motivation
TorchBench
is a collection of open-source benchmarks used to evaluate PyTorch performance. It provides a standardized API for benchmark drivers, both for evaluation (eager/jit) and training. Plenty of popular models are involved in TorchBench
. Users are convenient to debug and profile.
In order to standardize the performance evluation and increase coverage, TorchBench
can be enhanced in the following 3 aspects in CPU:
- Fit for typical user scenarios
- Well integrate new features of PyTorch
- Increase benchmark coverage
Detailed proposal
Fit for typical user scenarios (especially in userbenchmark)
add a new userbenchmark with CPU runtime configuration options, enable those configurations into test.py/run.py
also for sanity check or debugging
- Add core binding option, may leverage torch launcher
- Add gomp/iomp option
- Add memory allocator option
support performance metrics in the new CPU userbenchmark
- Add throughput: Samples / Total time
- Add latency: Total time / samples
- Add fps-like report
Well integrate new features of PyTorch
- Enable bf16 datatype support both for inference and training
- Fully support channels_last both for inference and training
- Extend a complier option to support Dynamo
- Support JIT tracing and cover more models with JIT support
- Enable quantization support
Increase benchmark coverage
Increase model coverage
- Add models from community with popularity (e.g, RNN-T)
- Add models from real customers (Multi-Band MelGAN, ViT and Wav2vec)
- Fix some models not implemented in CPU (e.g, DALLE2_pytorch, moco, pytorch_struct, tacotron2, timm_efficientdet, vision_maskrcnn)
- Add typical GNN workloads
Port OpBench to TorchBench
- Increase OpBench coverage
- Complete support of dtypes, memory-format and inplace version for ops