Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mlperf inference results scc24 pku #79

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions open/peking/code/stable-diffusion-xl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TBD
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
| Model | Scenario | Accuracy | Throughput | Latency (in ms) |
|---------------------|------------|----------------------|--------------|-------------------|
| stable-diffusion-xl | offline | (14.02827, 84.33062) | 7.644 | - |
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).

*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*

## Host platform

* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29
* CPU version: x86_64
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53)
[GCC 9.4.0]
* MLCommons CM version: 3.4.1

## CM Run Command

See [CM installation guide](https://docs.mlcommons.org/inference/install/).

```bash
pip install -U cmind

cm rm cache -f

cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347

cm run script \
--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \
--model=sdxl \
--implementation=nvidia \
--max_query_count=5000 \
--min_query_count=576 \
--framework=tensorrt \
--category=datacenter \
--scenario=Offline \
--execution_mode=test \
--device=cuda \
--max_batchsize=8 \
--quiet \
--rerun
```
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:*

```bash
cm rm repo mlcommons@cm4mlops
cm pull repo mlcommons@cm4mlops
cm rm cache -f

```

## Results

Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main

Model Precision: int8

### Accuracy Results
`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`

### Performance Results
`Samples per second`: `7.64431`
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
[2024-11-18 17:23:54,543 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
[2024-11-18 17:23:54,621 main.py:229 INFO] Detected system ID: KnownSystem.sc1
[2024-11-18 17:23:56,362 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline
[2024-11-18 17:23:56,363 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/593e2d3b46e640629d0ab5c1e4ff088a.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
[2024-11-18 17:23:56,363 __init__.py:53 INFO] Overriding Environment
[2024-11-18 17:23:58,943 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
2024-11-18 17:24:00,676 INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379...
2024-11-18 17:24:00,683 INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 
[2024-11-18 17:24:00,875 harness.py:207 INFO] Start Warm Up!
(SDXLCore pid=107737) [2024-11-18 17:24:03,968 backend.py:428 INFO] initialized
(SDXLCore pid=107737) [2024-11-18 17:24:04,150 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
(SDXLCore pid=107737) [2024-11-18 17:24:04,323 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
(SDXLCore pid=107737) [2024-11-18 17:24:07,326 backend.py:97 INFO] Enabling cuda graphs for unet
(SDXLCore pid=107737) [2024-11-18 17:24:07,767 backend.py:155 INFO] captured graph for BS=1
(SDXLCore pid=107737) [2024-11-18 17:24:08,559 backend.py:155 INFO] captured graph for BS=2
(SDXLCore pid=9090, ip=10.0.0.3) [2024-11-18 17:24:01,717 backend.py:428 INFO] initialized [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
(SDXLCore pid=107744) [2024-11-18 17:24:06,876 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. [repeated 34x across cluster]
(SDXLCore pid=107744) [2024-11-18 17:24:08,645 backend.py:97 INFO] Enabling cuda graphs for unet [repeated 8x across cluster]
(SDXLCore pid=9089, ip=10.0.0.3) [2024-11-18 17:24:09,783 backend.py:155 INFO] captured graph for BS=7 [repeated 51x across cluster]
[2024-11-18 17:24:33,625 harness.py:209 INFO] Warm Up Done!
[2024-11-18 17:24:33,625 harness.py:211 INFO] Start Test!
[2024-11-18 17:24:33,731 backend.py:852 INFO] 500
(SDXLCore pid=107737) [2024-11-18 17:24:33,733 backend.py:630 INFO] generate_images
(SDXLCore pid=107741) [2024-11-18 17:24:14,698 backend.py:155 INFO] captured graph for BS=8 [repeated 19x across cluster]
(SDXLCore pid=9088, ip=10.0.0.3) [2024-11-18 17:24:38,439 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=9088, ip=10.0.0.3) [2024-11-18 17:24:46,259 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=9089, ip=10.0.0.3) [2024-11-18 17:24:54,006 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=107737) [2024-11-18 17:25:02,027 backend.py:630 INFO] generate_images [repeated 8x across cluster]
(SDXLCore pid=107738) [2024-11-18 17:25:10,624 backend.py:630 INFO] generate_images [repeated 5x across cluster]
(SDXLCore pid=107738) [2024-11-18 17:25:19,815 backend.py:630 INFO] generate_images [repeated 9x across cluster]
(SDXLCore pid=107738) [2024-11-18 17:25:29,041 backend.py:630 INFO] generate_images [repeated 9x across cluster]
[2024-11-18 17:25:40,770 backend.py:901 INFO] [Server] Received 500 total samples
[2024-11-18 17:25:40,773 backend.py:911 INFO] [Device 0] Reported 56 samples
[2024-11-18 17:25:40,774 backend.py:911 INFO] [Device 1] Reported 56 samples
[2024-11-18 17:25:40,775 backend.py:911 INFO] [Device 2] Reported 56 samples
[2024-11-18 17:25:40,777 backend.py:911 INFO] [Device 3] Reported 56 samples
[2024-11-18 17:25:40,778 backend.py:911 INFO] [Device 4] Reported 56 samples
[2024-11-18 17:25:40,780 backend.py:911 INFO] [Device 5] Reported 55 samples
[2024-11-18 17:25:40,782 backend.py:911 INFO] [Device 6] Reported 55 samples
[2024-11-18 17:25:40,783 backend.py:911 INFO] [Device 7] Reported 55 samples
[2024-11-18 17:25:40,784 backend.py:911 INFO] [Device 8] Reported 55 samples
[2024-11-18 17:25:40,784 harness.py:214 INFO] Test Done!
[2024-11-18 17:25:40,784 harness.py:216 INFO] Destroying SUT...
[2024-11-18 17:25:40,784 harness.py:219 INFO] Destroying QSL...
(SDXLCore pid=107737) [2024-11-18 17:25:30,624 backend.py:630 INFO] generate_images [repeated 4x across cluster]
benchmark : Benchmark.SDXL
buffer_manager_thread_count : 0
data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data
gpu_batch_size : 8
gpu_copy_streams : 1
gpu_inference_streams : 1
input_dtype : int32
input_format : linear
log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-17.23.50
mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf
model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/
offline_expected_qps : 0.0
precision : int8
preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1')
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
test_mode : AccuracyOnly
use_graphs : True
user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/593e2d3b46e640629d0ab5c1e4ff088a.conf
system_id : sc1
config_name : sc1_stable-diffusion-xl_Offline
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : custom_k_99_MaxP
accuracy_level : 99%
inference_server : custom
skip_file_checks : False
power_limit : None
cpu_freq : None
(SDXLCore pid=107737) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
(SDXLCore pid=107737) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
(SDXLCore pid=107744) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan [repeated 34x across cluster]
[2024-11-18 17:25:42,171 run_harness.py:166 INFO] Result: Accuracy run detected.

======================== Result summaries: ========================

Loading
Loading