Skip to content

Commit a326088

Browse files
committed
Results on system peking 1
1 parent 333b0b1 commit a326088

35 files changed

+40708
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
TBD
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
| Model | Scenario | Accuracy | Throughput | Latency (in ms) |
2+
|---------------------|------------|----------------------|--------------|-------------------|
3+
| stable-diffusion-xl | offline | (14.02827, 84.33062) | 7.47 | - |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
This experiment is generated using the [MLCommons Collective Mind automation framework (CM)](https://github.com/mlcommons/cm4mlops).
2+
3+
*Check [CM MLPerf docs](https://docs.mlcommons.org/inference) for more details.*
4+
5+
## Host platform
6+
7+
* OS version: Linux-5.14.0-427.33.1.el9_4.x86_64-x86_64-with-glibc2.29
8+
* CPU version: x86_64
9+
* Python version: 3.8.10 (default, Sep 11 2024, 16:02:53)
10+
[GCC 9.4.0]
11+
* MLCommons CM version: 3.4.1
12+
13+
## CM Run Command
14+
15+
See [CM installation guide](https://docs.mlcommons.org/inference/install/).
16+
17+
```bash
18+
pip install -U cmind
19+
20+
cm rm cache -f
21+
22+
cm pull repo mlcommons@cm4mlops --checkout=852b297c18a90edb8a9c975dd7ee7cf731e1e347
23+
24+
cm run script \
25+
--tags=run-mlperf,inference,_r4.1-dev,_scc24-main \
26+
--model=sdxl \
27+
--implementation=nvidia \
28+
--max_query_count=5000 \
29+
--min_query_count=576 \
30+
--framework=tensorrt \
31+
--category=datacenter \
32+
--scenario=Offline \
33+
--execution_mode=test \
34+
--device=cuda \
35+
--max_batchsize=8 \
36+
--quiet \
37+
--rerun
38+
```
39+
*Note that if you want to use the [latest automation recipes](https://docs.mlcommons.org/inference) for MLPerf (CM scripts),
40+
you should simply reload mlcommons@cm4mlops without checkout and clean CM cache as follows:*
41+
42+
```bash
43+
cm rm repo mlcommons@cm4mlops
44+
cm pull repo mlcommons@cm4mlops
45+
cm rm cache -f
46+
47+
```
48+
49+
## Results
50+
51+
Platform: mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main
52+
53+
Model Precision: int8
54+
55+
### Accuracy Results
56+
`CLIP_SCORE`: `14.02827`, Required accuracy for closed division `>= 31.68632` and `<= 31.81332`
57+
`FID_SCORE`: `84.33062`, Required accuracy for closed division `>= 23.01086` and `<= 23.95008`
58+
59+
### Performance Results
60+
`Samples per second`: `7.47046`
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
[2024-11-18 18:30:27,924 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
2+
[2024-11-18 18:30:27,994 main.py:229 INFO] Detected system ID: KnownSystem.sc1
3+
[2024-11-18 18:30:29,787 generate_conf_files.py:107 INFO] Generated measurements/ entries for sc1_TRT/stable-diffusion-xl/Offline
4+
[2024-11-18 18:30:29,787 __init__.py:46 INFO] Running command: python3 -m code.stable-diffusion-xl.tensorrt.harness --logfile_outdir="/home/lry/CM/repos/local/cache/6c0ba4746fa74e77/test_results/mlperf_inference_lry_40-nvidia_original-gpu-tensorrt-vdefault-scc24-main/stable-diffusion-xl/offline/accuracy" --logfile_prefix="mlperf_log_" --performance_sample_count=5000 --test_mode="AccuracyOnly" --gpu_batch_size=8 --mlperf_conf_path="/home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf" --tensor_path="build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/" --use_graphs=true --user_conf_path="/home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a671757f4ec449bfa975e23c1729cbdb.conf" --gpu_inference_streams=1 --gpu_copy_streams=1 --gpu_engines="./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-UNetXL-Offline-gpu-b8-int8.custom_k_99_MaxP.plan,./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan" --scenario Offline --model stable-diffusion-xl
5+
[2024-11-18 18:30:29,787 __init__.py:53 INFO] Overriding Environment
6+
[2024-11-18 18:30:32,291 systems.py:197 INFO] Found unknown device in GPU connection topology: NIC0. Skipping.
7+
2024-11-18 18:30:34,026 INFO worker.py:1567 -- Connecting to existing Ray cluster at address: 10.0.0.1:6379...
8+
2024-11-18 18:30:34,032 INFO worker.py:1743 -- Connected to Ray cluster. View the dashboard at http://127.0.0.1:8265 
9+
[2024-11-18 18:30:34,174 harness.py:207 INFO] Start Warm Up!
10+
(SDXLCore pid=141164) [2024-11-18 18:30:37,140 backend.py:428 INFO] initialized
11+
(SDXLCore pid=141164) [2024-11-18 18:30:37,232 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan.
12+
(SDXLCore pid=141169) [2024-11-18 18:30:41,052 backend.py:97 INFO] Enabling cuda graphs for unet
13+
(SDXLCore pid=141169) [2024-11-18 18:30:41,496 backend.py:155 INFO] captured graph for BS=1
14+
(SDXLCore pid=15294, ip=10.0.0.3) [2024-11-18 18:30:35,166 backend.py:428 INFO] initialized [repeated 8x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/ray-logging.html#log-deduplication for more options.)
15+
(SDXLCore pid=15294, ip=10.0.0.3) [2024-11-18 18:30:37,007 backend.py:72 INFO] Loading TensorRT engine: ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan. [repeated 35x across cluster]
16+
(SDXLCore pid=15294, ip=10.0.0.3) [2024-11-18 18:30:38,892 backend.py:97 INFO] Enabling cuda graphs for unet [repeated 8x across cluster]
17+
(SDXLCore pid=141171) [2024-11-18 18:30:46,656 backend.py:155 INFO] captured graph for BS=7 [repeated 59x across cluster]
18+
[2024-11-18 18:31:07,951 harness.py:209 INFO] Warm Up Done!
19+
[2024-11-18 18:31:07,951 harness.py:211 INFO] Start Test!
20+
[2024-11-18 18:31:08,050 backend.py:852 INFO] 500
21+
(SDXLCore pid=141165) [2024-11-18 18:31:08,053 backend.py:630 INFO] generate_images
22+
(SDXLCore pid=141164) [2024-11-18 18:30:47,810 backend.py:155 INFO] captured graph for BS=8 [repeated 12x across cluster]
23+
(SDXLCore pid=15293, ip=10.0.0.3) [2024-11-18 18:31:12,613 backend.py:630 INFO] generate_images [repeated 9x across cluster]
24+
(SDXLCore pid=15293, ip=10.0.0.3) [2024-11-18 18:31:20,272 backend.py:630 INFO] generate_images [repeated 9x across cluster]
25+
(SDXLCore pid=15293, ip=10.0.0.3) [2024-11-18 18:31:27,954 backend.py:630 INFO] generate_images [repeated 9x across cluster]
26+
(SDXLCore pid=141165) [2024-11-18 18:31:37,115 backend.py:630 INFO] generate_images [repeated 4x across cluster]
27+
(SDXLCore pid=15293, ip=10.0.0.3) [2024-11-18 18:31:43,377 backend.py:630 INFO] generate_images [repeated 9x across cluster]
28+
(SDXLCore pid=15293, ip=10.0.0.3) [2024-11-18 18:31:51,102 backend.py:630 INFO] generate_images [repeated 9x across cluster]
29+
(SDXLCore pid=141165) [2024-11-18 18:32:06,343 backend.py:630 INFO] generate_images [repeated 9x across cluster]
30+
[2024-11-18 18:32:18,343 backend.py:901 INFO] [Server] Received 500 total samples
31+
[2024-11-18 18:32:18,345 backend.py:911 INFO] [Device 0] Reported 56 samples
32+
[2024-11-18 18:32:18,346 backend.py:911 INFO] [Device 1] Reported 56 samples
33+
[2024-11-18 18:32:18,347 backend.py:911 INFO] [Device 2] Reported 56 samples
34+
[2024-11-18 18:32:18,349 backend.py:911 INFO] [Device 3] Reported 56 samples
35+
[2024-11-18 18:32:18,350 backend.py:911 INFO] [Device 4] Reported 56 samples
36+
[2024-11-18 18:32:18,352 backend.py:911 INFO] [Device 5] Reported 55 samples
37+
[2024-11-18 18:32:18,353 backend.py:911 INFO] [Device 6] Reported 55 samples
38+
[2024-11-18 18:32:18,355 backend.py:911 INFO] [Device 7] Reported 55 samples
39+
[2024-11-18 18:32:18,356 backend.py:911 INFO] [Device 8] Reported 55 samples
40+
[2024-11-18 18:32:18,356 harness.py:214 INFO] Test Done!
41+
[2024-11-18 18:32:18,356 harness.py:216 INFO] Destroying SUT...
42+
[2024-11-18 18:32:18,356 harness.py:219 INFO] Destroying QSL...
43+
(SDXLCore pid=141164) [2024-11-18 18:32:07,812 backend.py:630 INFO] generate_images [repeated 4x across cluster]
44+
benchmark : Benchmark.SDXL
45+
buffer_manager_thread_count : 0
46+
data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/data
47+
gpu_batch_size : 8
48+
gpu_copy_streams : 1
49+
gpu_inference_streams : 1
50+
input_dtype : int32
51+
input_format : linear
52+
log_dir : /home/lry/CM/repos/local/cache/3443882dd9374096/repo/closed/NVIDIA/build/logs/2024.11.18-18.30.24
53+
mlperf_conf_path : /home/lry/CM/repos/local/cache/3e2d12440d5a4a93/inference/mlperf.conf
54+
model_path : /home/lry/CM/repos/local/cache/d2b9079c1073417b/models/SDXL/
55+
offline_expected_qps : 0.0
56+
precision : int8
57+
preprocessed_data_dir : /home/lry/CM/repos/local/cache/d2b9079c1073417b/preprocessed_data
58+
scenario : Scenario.Offline
59+
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9684X 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=96, threads_per_core=2): 2}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=791.59486, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=791594860000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA H100 80GB HBM3', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=79.6474609375, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=85520809984), max_power_limit=700.0, pci_id='0x233010DE', compute_sm=90): 5})), numa_conf=NUMAConfiguration(numa_nodes={}, num_numa_nodes=2), system_id='sc1')
60+
tensor_path : build/preprocessed_data/coco2014-tokenized-sdxl/5k_dataset_final/
61+
test_mode : AccuracyOnly
62+
use_graphs : True
63+
user_conf_path : /home/lry/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/a671757f4ec449bfa975e23c1729cbdb.conf
64+
system_id : sc1
65+
config_name : sc1_stable-diffusion-xl_Offline
66+
workload_setting : WorkloadSetting(HarnessType.Custom, AccuracyTarget.k_99, PowerSetting.MaxP)
67+
optimization_level : plugin-enabled
68+
num_profiles : 1
69+
config_ver : custom_k_99_MaxP
70+
accuracy_level : 99%
71+
inference_server : custom
72+
skip_file_checks : False
73+
power_limit : None
74+
cpu_freq : None
75+
(SDXLCore pid=141164) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIP-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
76+
(SDXLCore pid=141164) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-CLIPWithProj-Offline-gpu-b8-fp16.custom_k_99_MaxP.plan
77+
(SDXLCore pid=15294, ip=10.0.0.3) [I] Loading bytes from ./build/engines/sc1/stable-diffusion-xl/Offline/stable-diffusion-xl-VAE-Offline-gpu-b8-fp32.custom_k_99_MaxP.plan [repeated 34x across cluster]
78+
[2024-11-18 18:32:19,645 run_harness.py:166 INFO] Result: Accuracy run detected.
79+
80+
======================== Result summaries: ========================
81+

0 commit comments

Comments
 (0)