-
Notifications
You must be signed in to change notification settings - Fork 467
ACES 2.0 optimization
This page is to collect various results from our ongoing effort to implement the ACES 2.0 output transforms in an efficient manner.
Test config. This is a reorganized version of the most recent version of the ACES 2 draft config with ACES 1.3 viewing transforms added.
The following commands may be used with the above config to benchmark CPU performance using the ocioperf command-line tool. I propose we use the D65 sRGB Output Transform, for consistency.
export OCIO=<PATH_TO_CONFIG>
ocioperf --verbose --iter 10 --test 0 --view ACES2065-1 "sRGB - Display" "ACES 1.0 - SDR Video"
ocioperf --verbose --iter 10 --test 0 --view ACES2065-1 "sRGB - Display" "ACES 2.0 - SDR 100 nits (Rec.709)"
The ocioperf tool outputs three durations:
- the first iteration duration
- the average duration excluding the first iteration
- the overall average duration
I propose we use the second (middle) value from the result "Process the complete image (in place)", for consistency.
Branch | Hardware | ACES2/ACES1 | ACES 2 | ACES 1 | Submitted by | Date |
---|---|---|---|---|---|---|
2.4.1 | M1 Pro, ARM build clang 16 | 7.8 | 4027 ms | 515 ms | Doug | 2025-01-18 |
2.4.1 | M1 Pro, x86 build (Rosetta) clang 16 | 8.3 | 5935 ms | 714 ms | Doug | 2025-01-18 |
2.4.1 | Intel i9 MacbookPro clang 16 | 5.2 | 4760 ms | 921 ms | Remi | 2025-01-24 |
2025-01-22 Kevin 4eaa012 | Intel i9 MacbookPro clang 16 | 4.5 | 4111 ms | 919 ms | Remi | 2025-01-24 |
2.4.1 | AMD Epyc Linux gcc 11.2 | 4.2 | 3833 ms | 908 ms | Remi | 2025-01-24 |
2025-01-22 Kevin 4eaa012 | AMD Epyc Linux gcc 11.2 | 3.7 | 3079 ms | 822 ms | Remi | 2025-01-24 |
2.4.1 | Xeon E5 Windows VS 2022 | 1.8 | 8565 ms | 4704 ms | Remi | 2025-01-24 |
2025-01-22 Kevin 4eaa012 | Xeon E5 Windows VS 2022 | 1.5 | 7064 ms | 4717 ms | Remi | 2025-01-24 |
2.4.1 | M1 Ultra ARM build clang 14 | 8.0 | 4059 ms | 509 ms | Nick | 2025-01-27 |
2025-01-27 Kevin 874ba32 | M1 Ultra ARM build clang 14 | 6.7 | 3458 ms | 520 ms | Nick | 2025-01-27 |
2.4.1 | M2 Pro ARM build clang 14 | 7.9 | 3741 ms | 472 ms | Nick | 2025-01-27 |
2025-01-27 Kevin 874ba32 | M2 Pro ARM build clang 14 | 6.5 | 3110 ms | 477 ms | Nick | 2025-01-27 |
commit 54ddec8 | M1 Pro, ARM build clang 16 | 5.8 | 2972 ms | 511 ms | Doug | 2025-03-08 |
commit 54ddec8 | M1 Pro, x86 build (Rosetta) clang 16 | 6.5 | 4593 ms | 721 ms | Doug | 2025-03-08 |
commit 54ddec8 | AMD EPYC 7V12, Windows VS 2022 | 3.0 | 4871 ms | 1226 ms | Doug | 2025-03-08 |
commit 54ddec8 | AMD EPYC 7V12, Ubuntu gcc 11.4 | 3.6 | 3925 ms | 1099 ms | Doug | 2025-03-08 |
commit b75747 | Xeon E5 Windows VS 2022 | 3.7 | 5852 ms | 1566 ms | Remi | 2025-03-10 |
main | Xeon E5 Windows VS 2022 | 5 | 7740 ms | 1546 ms | Remi | 2025-03-10 |
commit b75747 | Intel i9 MacbookPro clang 16 | 3.9 | 4312 ms | 1103 ms | Remi | 2025-03-10 |
main | Intel i9 MacbookPro clang 16 | 4.5 | 5031 ms | 1106 ms | Remi | 2025-03-10 |
commit b75747 | AMD Epyc Linux gcc 11.2 | 3.6 | 2910 ms | 795 ms | Remi | 2025-03-10 |
main | AMD Epyc Linux gcc 11.2 | 4.7 | 3806 ms | 811 ms | Remi | 2025-03-10 |
The GPU is harder to profile and requires use of something like Nvidia Nsight tools or Apple's similar tools in XCode.
Here are some results comparing the release candidate to main (as of March 10 2025):
OS | Shading language | Speed-up | Submitted by | Date |
---|---|---|---|---|
Windows | HLSL | 26% | Éric Renaud-Houde | 2025-03-04 |
macOS | MSL | 35-40% | Cuneyt Ozdas | 2025-03-08 |
Windows | HLSL | 22% | Remi Achard | 2025-03-10 |
macOS | MSL | 23% | Remi Achard | 2025-03-10 |