Skip to content

ACES 2.0 optimization

Doug Walker edited this page Mar 10, 2025 · 13 revisions

This page is to collect various results from our ongoing effort to implement the ACES 2.0 output transforms in an efficient manner.

RESOURCES

Test config. This is a reorganized version of the most recent version of the ACES 2 draft config with ACES 1.3 viewing transforms added.

CPU benchmarks using OCIO Perf

The following commands may be used with the above config to benchmark CPU performance using the ocioperf command-line tool. I propose we use the D65 sRGB Output Transform, for consistency.

export OCIO=<PATH_TO_CONFIG>

ocioperf --verbose --iter 10 --test 0 --view ACES2065-1 "sRGB - Display" "ACES 1.0 - SDR Video"

ocioperf --verbose --iter 10 --test 0 --view ACES2065-1 "sRGB - Display" "ACES 2.0 - SDR 100 nits (Rec.709)"

The ocioperf tool outputs three durations:

  • the first iteration duration
  • the average duration excluding the first iteration
  • the overall average duration

I propose we use the second (middle) value from the result "Process the complete image (in place)", for consistency.

Branch Hardware ACES2/ACES1 ACES 2 ACES 1 Submitted by Date
2.4.1 M1 Pro, ARM build clang 16 7.8 4027 ms 515 ms Doug 2025-01-18
2.4.1 M1 Pro, x86 build (Rosetta) clang 16 8.3 5935 ms 714 ms Doug 2025-01-18
2.4.1 Intel i9 MacbookPro clang 16 5.2 4760 ms 921 ms Remi 2025-01-24
2025-01-22 Kevin 4eaa012 Intel i9 MacbookPro clang 16 4.5 4111 ms 919 ms Remi 2025-01-24
2.4.1 AMD Epyc Linux gcc 11.2 4.2 3833 ms 908 ms Remi 2025-01-24
2025-01-22 Kevin 4eaa012 AMD Epyc Linux gcc 11.2 3.7 3079 ms 822 ms Remi 2025-01-24
2.4.1 Xeon E5 Windows VS 2022 1.8 8565 ms 4704 ms Remi 2025-01-24
2025-01-22 Kevin 4eaa012 Xeon E5 Windows VS 2022 1.5 7064 ms 4717 ms Remi 2025-01-24
2.4.1 M1 Ultra ARM build clang 14 8.0 4059 ms 509 ms Nick 2025-01-27
2025-01-27 Kevin 874ba32 M1 Ultra ARM build clang 14 6.7 3458 ms 520 ms Nick 2025-01-27
2.4.1 M2 Pro ARM build clang 14 7.9 3741 ms 472 ms Nick 2025-01-27
2025-01-27 Kevin 874ba32 M2 Pro ARM build clang 14 6.5 3110 ms 477 ms Nick 2025-01-27
commit 54ddec8 M1 Pro, ARM build clang 16 5.8 2972 ms 511 ms Doug 2025-03-08
commit 54ddec8 M1 Pro, x86 build (Rosetta) clang 16 6.5 4593 ms 721 ms Doug 2025-03-08
commit 54ddec8 AMD EPYC 7V12, Windows VS 2022 3.0 4871 ms 1226 ms Doug 2025-03-08
commit 54ddec8 AMD EPYC 7V12, Ubuntu gcc 11.4 3.6 3925 ms 1099 ms Doug 2025-03-08
commit b75747 Xeon E5 Windows VS 2022 3.7 5852 ms 1566 ms Remi 2025-03-10
main Xeon E5 Windows VS 2022 5 7740 ms 1546 ms Remi 2025-03-10
commit b75747 Intel i9 MacbookPro clang 16 3.9 4312 ms 1103 ms Remi 2025-03-10
main Intel i9 MacbookPro clang 16 4.5 5031 ms 1106 ms Remi 2025-03-10
commit b75747 AMD Epyc Linux gcc 11.2 3.6 2910 ms 795 ms Remi 2025-03-10
main AMD Epyc Linux gcc 11.2 4.7 3806 ms 811 ms Remi 2025-03-10

GPU benchmarks

The GPU is harder to profile and requires use of something like Nvidia Nsight tools or Apple's similar tools in XCode.

Here are some results comparing the release candidate to main (as of March 10 2025):

OS Shading language Speed-up Submitted by Date
Windows HLSL 26% Éric Renaud-Houde 2025-03-04
macOS MSL 35-40% Cuneyt Ozdas 2025-03-08
Windows HLSL 22% Remi Achard 2025-03-10
macOS MSL 23% Remi Achard 2025-03-10