Skip to content

Add BCn encoders/decoders with RDO support#1167

Draft
walcht wants to merge 40 commits into
KhronosGroup:mainfrom
walcht:add-BCn-decoder
Draft

Add BCn encoders/decoders with RDO support#1167
walcht wants to merge 40 commits into
KhronosGroup:mainfrom
walcht:add-BCn-decoder

Conversation

@walcht

@walcht walcht commented May 7, 2026

Copy link
Copy Markdown

As discussed in #1159: ktxTexture2_CompressBCn and ktxTexture2_DecodeBCn are introduced in this PR to allow libktx users/consumers to encode/decode BCn textures from/to raw decompressed formats.

https://github.com/richgel999/bc7enc_rdo does not support BC6HU/BC6HS encoding/decoding and also no BC2 (this format is essentially dead since BC3 replaces it).

Please feel free to give feedback, edit, and nitpick as much as possible.

Some context: I am adding KTX2 support to OIIO (PR: AcademySoftwareFoundation/OpenImageIO#5185) and having libktx encode/decode BCn formats significantly simplifies things (also ETC encoding/decoding which I can also open a PR for - if approved).

I haven't updated the KTX-Software-CTS with the added BCn test files.

Once this is finalized, this will fix #587.

Current TODOs:

  • add CTS test files from a primary platform. As far as I understand, I shouldn't create/transcode KTX2 test files on non-primary platfroms (non arm64). I did do the testing on my local machine (see https://github.com/walcht/KTX-Software-CTS). => added CTS tests + golden files (still not on a primary platform).
  • add RDO post processing step
    • expose RDO parameters with detailed description (some parameters are still missing - I haven't understood them yet)
    • BC1 mode RDO
    • [x] BC2 mode RDO BC2 is rarely used (I don't see any reason why to use it instead of BC3). Postponed or not planned to be implemented at all.
    • BC3 mode RDO
    • BC4 mode RDO
    • BC5 mode RDO
    • BC6H mode RDO (HDR). I don't know if this is possible. If not, then I will not adjust the ert::reduce_entropy function for the moment. postponed (too much work/overload to include in this PR).
    • BC7 mode RDO
    • ultra smooth mode (see: https://richg42.blogspot.com/2021/02/updated-bc7encrdo-with-improved-smooth.html). => trivial to implement, just have to remove dependency on image_u8
  • add multithreading for encoder part (up to 9X decrease in time for my 6-core system)
  • add multithreading for RDO (bc7enc_rdo makes use of OpenMP so we need to do our own MT same as in encoder - this is trivial) => 5.50X to 7.00X time decrease compared to single-threaded mode.
  • add BCn encoding support for ktx encode command
  • add BCn decoding to PNG support for ktx extract command
  • add BCn encoding support for ktx create command
  • agree on which parameters to expose in ktxBCnParams struct (apparently there are many and I am not qualified to know which subset to expose or to expose them all). For the moment I will just expose them all.
    We agreed on using same parameters as UASTC RDO and adding additional ones if they make sense (I haven't benchmarked skip 0 MSE error option so I might remove it if it useless).
  • fix bc7enc_rdo compiler warnings (very simple to do, before merging git diff with original files so that we are 100% that we haven't changed anything).
  • finalize test cases (not CLI tests but rather libktx tests - e.g., texturetests, etc.).
  • enable SIMD acceleration for BC7 encoder (using ISPC - see https://github.com/ispc/ispc) (this should be straightforward and should be enabled by default via a CMake flag; e.g., BC7_SIMD). => since we are planning to use bc7f we will be planning to use bc7g once it is released (this is the SIMD equivalent of bc7f).
  • test LIBKTX_FEATURE_BCN_DECODER CMake flag option
  • use bc7f for BC7 encoding instead of bc7enc_rdo's encoder (significantly faster + maintained).
  • add single image decoders used by GLUpload/VkUpload (should be exposed in the library API but must be independent of the ktxTexture* classes. Software reading a KTX file incrementally will find them useful).
  • add BC6H decoder (copied from MIT Licensed https://github.com/iOrange/bcdec from basisu)
  • add BC6H encoder from basist::astc_6x6_hdr::fast_encode_bc6h
    • fix issues with signed formats (same way that UASTC HDR handles it) => if we are given a signed input which the encoder can't handle; asserts fail, etc. (obviously, since BC6HU is the only one supported). Need to do some pre-processing step clamp signed and above-maximal-encodable value...
  • add BC2 decoder (unpack BC1 followed by unpack sharp alpha).
  • [x] add BC2 encoder postponed/aborted (rarely used).

Note1: no LLMs/AI coding tools were used in any capacity whatsoever in writing or aiding in the writing of this PR.
Note2: I am an individual contributor (main reason I am contributing here is to add support for KTX2 in Blender).

Edit: TODO list edits

walcht and others added 4 commits May 5, 2026 17:01
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Add encode/decode tests (that use both CompressBCn/DecodeBCn)
*Add BCn ktx2 test files (transcoded from
tests/resources/ktx2/color_grid_uastc_zstd_5.ktx2)
*Cleanup BCn test fixtures
*Remove `std::cout` statement

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht walcht marked this pull request as draft May 7, 2026 14:11
@walcht

walcht commented May 7, 2026

Copy link
Copy Markdown
Author

There are also a lot of compiler warnings from bc7enc_rdo dependency. These should be straightforward to address directly in copied files from bc7enc_rdo.

walcht added 2 commits May 8, 2026 09:13
*Add BC1, BC3, BC4, BC5, and BC7 encoding support to "ktx encode"
command.
*Cleanup ktxBCnParams and add BC1/BC3 quality and mode params.
*Add docstrings/documentation to newly added enums/structs in ktx.h.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@CLAassistant

CLAassistant commented May 8, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@MarkCallow

MarkCallow commented May 9, 2026

Copy link
Copy Markdown
Collaborator

@walcht, Thank you for this. Can you view the build logs?

One issue I notice immediately is that there are no changes to command_create.cpp. This too needs updating to support BCn encoding.

Regarding RDO and multi-threading, the UASTC encoder also has separate multi-threading options for the encoder and for the RDO step. You can follow the same model.

If we need to fix warnings in the encoder I suggest forking the encoder then incorporating the fork here by way of git subrepo. That will make it easier to potentially contribute fixes back upstream. If you are agreeable I can make the fork and provide instructions for how to incorporate it in your workarea.

Re. SIMD and ISPC, be careful how you support this. We need to support building and running on arm64 processors. Also compile flags to enable SSE or other SIMD options are not compatible with straightforward use of universal build tool chains, which is why this project does not do universal builds. Better is use of compiler pre-defined macros and run-time queries to discover what the software is being compiled for and running on. However since we aren't doing universal builds there is no need to obsess over this last detail.

@walcht

walcht commented May 10, 2026

Copy link
Copy Markdown
Author

@MarkCallow - Concerning the command_create.cpp, I am adding BCn encoding for it currently (somehow forgot it) - will also check if I have missed any other commands.

Concerning the build/CI logs: they are mostly failing because of bc7enc_rdo compiler warnings which should be suppressed. They will also fail because I haven't re-generated the golden files yet for ktx CTS (e.g., ktx encode --help output).

If you are agreeable I can make the fork and provide instructions for how to incorporate it in your workarea

Please do so (as far as I understood, this will be forked under the KhronosGroup and any updates here will be pushed there via the git subrepo command). Until that is done, I will keep edit the bc7enc_rdo sub-folder until the CI passes.

Concerning SIMD and ISPC: I think it makes sense to leave this for another PR, do you agree? (reasoning is this: I have to get the basics working properly, add proper testsuite that covers all supported BCn formats, etc. Once that is done, I can open another PR for SIMD performance improvements). Or I will leave this to very end (last in TODO list above).

walcht added 2 commits May 10, 2026 05:44
*Suppress bc7enc_rdo warnings like unused-variables, memset'ing a
non-trivial class (in this case the class is obviously trivial hence a
void* cast is used to suppress this warning).

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
*Some CIs report further unused variables/functions that are not
reported when building locally.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 10, 2026

Copy link
Copy Markdown
Author

Updates about CI jobs:

  • reuse lint: I don't know why it is not ignoring external/bc7enc_rdo but I have added the SPDX comments in these files.
  • MingW CI: failing because of some linkage error with bc7enc_rdo (undefined reference to rgbx::). I have to test this on my Windows machine to see what I am doing wrong/missing.
  • MacOS (arm64) compiling bc7enc_rdo fails. I have no idea why (I don't have a MacOS machine so I will be testing this CI on my fork).
  • Windows CI: probably some NSIS issue (I don't think I have introduced anything to make it fail)
  • Other failures are still due to unused-variables/functions that I don't get locally (have addressed all that were mentioned in the CI logs).

@MarkCallow

MarkCallow commented May 10, 2026

Copy link
Copy Markdown
Collaborator

@walcht,

You will need to rebase to or merge current main to get the fix for the NSIS issue on Windows.

I have created a fork of https://github.com/richgel999/bc7enc_rdo. To incorporate it in your add-BCn-decoder branch do the following in the repo root directory:

git subrepo clone https://github.com/KhronosGroup/bc7enc_rdo.git external/bc7enc_rdo -b changes_for_ktx

You can make changes in this subdirectory, as you are now, and when everything is working I can push the changes to the fork.

Concerning SIMD and ISPC: I think it makes sense to leave this for another PR

I agree. I wanted to make you aware that arm64 is a build target.

Re. reuse, you have to add an entry to REUSE.toml to get external/bc7enc_rdo ignored. It is better to do that than add SPDX comments to all the files. The entry in REUSE.toml will have to mention a license. Use the MIT license option.

Here are a few high level points.

  • I want to use the bc7f encoder once it is integrated into the bc7enc_rdo repo.
  • Re. decoding, currently the ETC and ASTC decoders have different APIs. The former has a non-public unpack function for single blocks. This is called from the GL and Vulkan upload functions, if the device does not support ETC, to convert the data before it is uploaded. The latter has a call that decodes and converts an entire ktxTexture2 object. I want to have both for all decoders: ASTC, BCn and ETC and the transcoders as both have their uses. It was a mistake not to have provided an unpack function and updated the upload functions when the ASTC decoder was added.
  • I would like BC6H support to be included in this or a later PR.
  • I will be away for 5 days from May 15th and not be able to approve workflow runs during that time. Please plan accordingly.

@MarkCallow

Copy link
Copy Markdown
Collaborator

Re the macOS build failure, because the output from the Xcode build is so voluminous it is run through a script, xcpretty, to prettify it. On a past CI service, without this, the logs exceeded the maximum allowed. Recently, for reasons I have yet to investigate, it has started swallowing compile errors. You can turn it off by editing scripts/install_macos.sh and commenting out the line that installs xcpretty.

walcht added 2 commits May 10, 2026 12:31
*Add BCn encoder support for `ktx create` command.
*Add missing tests for `ktx encode`, `ktx extract`, and `ktx create`.
*Expose BC1/BC3 approximation mode option to ktx CLIs
*Misc cleanups (still early-stage PR)

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 10, 2026

Copy link
Copy Markdown
Author

@MarkCallow

I want to use the bc7f encoder once it is integrated into the bc7enc_rdo repo.

Agree. I wasn't initially aware of this. Apparently bc7f is significantly faster that the bc7 encoder used here (also the added benefit of being continuously maintained). I will integrate this right now since this seems to be straightforward (bc7enc_rdo also seems a bit not-longer-maintained so l think it's better to just integrate it now rather than waiting for it to be integrated into bc7enc_rdo repo).

Concerning the decoding API: I added BCn decoders for VkUpload/GLUpload as a TODO (will follow same API as in etcunpack). Might also open a PR to add it for ASTC since I have already spent some time getting familiar with this code base.

I would like BC6H support to be included in this or a later PR.

Will add it in this PR. Will add it at the very end though since I have to finalize current formats.

I will be away for 5 days ...

I will try to address CI issues now (It is fine if I incrementally commit here to see if certain jobs pass?).

walcht and others added 6 commits May 10, 2026 14:30
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
…Group/bc7enc_rdo.git external/bc7enc_rdo

subrepo:
  subdir:   "external/bc7enc_rdo"
  merged:   "dbe416d2"
upstream:
  origin:   "https://github.com/KhronosGroup/bc7enc_rdo.git"
  branch:   "changes_for_ktx"
  commit:   "dbe416d2"
git-subrepo:
  version:  "0.4.9"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "5e0f401"
*Before this commit, bc7enc_rdo dependency was manually copied to
external/bc7enc_rdo directory (only needed files were copied). This was
not ideal for a lot of reasons (mainly that we are introducing changes
that may be streamed back to the original repo and having a
subrepo/submodule is better suited for that than manually copying
dependency files).
*Add bc7enc_rdo to REUSE.toml with MIT license.
*Git ignore compile_commands.json file (used by clangd LSP)

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 10, 2026

Copy link
Copy Markdown
Author

The integrated basisu_transcoder is a bit outdated (actually, significantly) and doesn't contain bc7f. So for the moment I will just use bc7enc_rdo's BC7 encoder until the fork at (https://github.com/KhronosGroup/basis_universal/tree/fixes_for_ktx_v5_0) is updated to a commit that includes the bc7f namespace.

@MarkCallow

MarkCallow commented May 11, 2026

Copy link
Copy Markdown
Collaborator

I was not aware that bc7f is included in basisu_transcoder. By pure coincidence I have just completed integration of Basis Universal release 2.1.0. See the update_basisu_to_2_1_0 branch. We have the v5.0.0 release in flight. I've made a v5.0.0-rc1 release while we wait for some external (to KTX-Software) pieces to be put in place. My plan was to wait until we made the v5.0.0 release before merging this branch. If you retarget this PR and your branch to update_basisu_to_2_1_0 you can start working on bc7f now.

Neither the KTX-Software code nor our golden files required any updates for our extensive test suite to pass with BU 2.1.0. I am therefore amenable to merging it now but will have to discuss within the Khronos WG and can't make any promises.

The single image decoders used by GLUpload/VkUpload should be exposed in the library API but must be independent of the ktxTexture* classes. Software reading a KTX file incrementally will find them useful.

Regarding the current ETC decoder, please note that it does not have a recognized open source license so might not be suitable for your OIIO work.

*Add initial RDO post processing step for BC1, BC3, and BC7 but without
ultrasmooth blocks support see:
https://richg42.blogspot.com/2021/02/updated-bc7encrdo-with-improved-smooth.html

*Fix encoder in case input texture does not have multiple-of-4
dimensions. This was reading beyond std::vector size before this commit.

*Add initial RDO params to BCnParams with verbose explanation/description
comments.

*Misc refactoring/adjustments.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 14, 2026

Copy link
Copy Markdown
Author

@MarkCallow - please don't approve the CI workflow yet (it will fail because I haven't updated the golden test files yet).

  • I have added initial RDO postprocessing step support for BC1, BC3, and BC7 (only relying on ert.h/cpp without having to use bc7enc_rdo utils.h/.cpp and rdo_bc_encoder.h/cpp because these used OpenMP and it's just better that we only depend on the actual core RDO function in ert.h/cpp).
  • Also fixed the encoder in case input texture is not multiple of 4 (still have to absolutely verify this in all cases - this was reading beyond std::vector's size before this commit...).
  • No ultrasmooth block support yet (added to TODO list).
  • Added initial RDO params for BCnParams in ktx.h (some detailed description for fields I understand).

Will add bc7f once I finish RDO post processing.

@MarkCallow

Copy link
Copy Markdown
Collaborator

I have just merged PR #1170 so you will now find bc7f in main. Please remove external/bc7enc_rdo and adapt this to bc7f. I hope that not too much of your work to date will have been wasted.

One other thing re. bcn_codec.cpp, we need to be able to build with just the decoder when building libktx_read. astc_codec.cpp is one file because of pieces needed by both encoder and decoder so it has ifdefs to allow building only the decoder parts. Unless there is substantial common code you can consider making separate source files for encode and decode. If you do not, then add similar ifdefs.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 26, 2026

Copy link
Copy Markdown
Author

The arguments I decided to expose BCn options/args as such (this is output from ktx encode --help)
(there are some spelling mistakes like "GPU's". Will fix those):

 Encode BCn options:
      --bc1-mode <mode>         BC1 (subsequently BC3) approximation mode (for both: encoding and 
                                decoding). Default is 'ideal'. If you encode textures for a 
                                specific vendor's GPU's, beware that using that texture data on 
                                other GPU's may result in ugly artifacts. Set to 'ideal' unless you 
                                know the texture data will only be deployed or used on a specific 
                                vendor's GPU. Can be set to one of the following:

                                    Mode       |  Description                                   
                                    ---------- | -----------------------------------------------
                                    ideal      | The default mode. No rounding for 4-color      
                                               | colors 2,3. This matches the D3D10 docs on BC1.
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    nvidia     | NVidia GPU mode. May produce artifacts on      
                                               | non-NVidia GPUs.                               
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    amd        | AMD GPU mode. May produce artifacts on non-AMD 
                                               | GPUs.                                          
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    ideal4     | Matches AMD Compressonator's output. Rounds    
                                               | 4-color colors 2,3 (not 3-color color 2). This 
                                               | matches the D3D9 docs on DXT1.                 
                                
      --bc1-quality <level>     The quality level configures the quality-performance tradeoff for 
                                BC1 and, subsequently, BC3 encoders. The quality level can be set 
                                between 'fastest' (0) and most 'exhaustive' (19). Default is 
                                'thorough' (15). Can also be set via the following aliases:

                                    Level      |  Quality
                                    ---------- | ---------------------------- 
                                    fastest    | (equivalent to quality =  0) 
                                    fast       | (equivalent to quality =  5) 
                                    medium     | (equivalent to quality = 10) 
                                    thorough   | (equivalent to quality = 15) 
                                    exhaustive | (equivalent to quality = 19) 

                                Note on BC1 vs. BC3 vs. BC7: apart from lower VRAM consumption 
                                (4bpp vs. 8bpp) and better GPU texture cache efficiency, there's 
                                little need to use BC1 now. BC3 still has an advantage vs. BC7, 
                                because it very strongly separates how RGB is encoded from the 
                                alpha channel, in a predictable way.
                                
      --bc7-quality <level>     The quality level configures the quality-performance tradeoff for 
                                BC7 encoder. Default is 'medium'. The quality level can be set 
                                between fastest and exhaustive via the following fixed quality 
                                presets where each preset is an OR'ed set of flags:

                                    Level      |  OR'ed flags                 
                                    ---------- | ---------------------------- 
                                    fastest    | (equivalent to flags =  128) 
                                    faster     | (equivalent to flags =  176) 
                                    fast       | (equivalent to flags =  179) 
                                    medium     | (equivalent to flags =  255) 
                                    thorough   | (equivalent to flags = 1023) 
                                    exhaustive | (equivalent to flags = 3967) 
                                
      --rdo                     Enable Rate Distortion Optimization (RDO) post-processing step on 
                                BCn-encoded blocks to reduce entropy for Deflate-based compressors. 
                                This is primarily used to reduce size on disk when a further 
                                compression is applied (Zlib/ZSTD supercompressions). RDO 
                                parameters are only used if this is set. Setting this might result 
                                in significantly slower encoding time at the benefit of potentially 
                                significantly lower bit rate for Deflate-based compressors (i.e., 
                                number of bits per encoded texel). Default is false.
                                
      --rdo-lambda <arg>        RDO quality scalar (lambda). Controls rate vs. distortion tradeoff. 
                                Lower values yield higher quality/larger LZ compressed files, 
                                higher values yield lower quality/smaller LZ compressed files. A 
                                good range to try is [0.25,8]. Full range is [0.1,50.0]. Default is 
                                0.5. The post-processor tries to minimize: 
                                distortion*smooth_block_scale + rate*lambda (rate is approximate LZ 
                                bits and distortion is scaled MSE multiplied against the smooth 
                                block MSE weighting factor). Larger values push the post-processor 
                                towards optimizing more for lower rate, and smaller values more for 
                                distortion. 0=minimal distortion.
                                
      --rdo-max-smooth-block-mse-scale <arg>
                                RDO max MSE scaling factor for blocks considered to be smooth/flat. 
                                A value of 1.0 means no smooth block error scaling which may cause 
                                very noticeable artifacts for smooth/flat blocks (e.g., kodim23 
                                test image). This value can be automatically computed based on the 
                                set RDO lamba by setting 'rdo-auto-mse-scale'. 
                                'rdo-max-smooth-block-std-dev' is used to compute, for a given 
                                block, the MSE scale factor in the range: 1.0 (i.e., not a smooth 
                                block) up to this max MSE scale factor. As to why an MSE factor has 
                                to be applied to smooth/flat blocks, the MSE for these blocks is 
                                too low relative to the visual impact they have when they get 
                                distorted. The solution implemented here is to compute the max std 
                                dev. of any component and use a linear function of that to scale 
                                block/trial MSE. Range is [1,300]. Default is to automatically 
                                compute this.
                                
      --rdo-max-smooth-block-std-dev <arg>
                                RDO max smooth/flat block standard deviation. If the standard 
                                deviation of a block exceeds this value, then it won't be 
                                considered as a smooth block (i.e., the smooth block MSE scale 
                                factor will be set to 1 for this block). The smaller the ratio of 
                                the standard deviation of this block to this value the more the 
                                smooth block MSE scale factor approaches rdoMaxSmoothBlockMseScale. 
                                Range is [.01,65536.0]. Larger values expand the range of blocks 
                                considered smooth. Default is 10.0.
                                
      --rdo-ultrasmooth-blocks  Detect extremely smooth blocks and encode them with a significantly 
                                higher MSE scale factor. When enabled, a per-block mask image is 
                                computed, filtered, then an array of per-block MSE scale factors is 
                                supplied to the ERT. The end result is much less significant 
                                artifacts on regions containing very smooth blocks (e.g., 
                                gradients). This does hurt rate-distortion performance.
                                
      --rdo-max-rms-ratio <arg>
                                How much the RMS error of a block is allowed to increase before a 
                                trial is rejected. 1.0=no increase allowed, 1.05=5% increase 
                                allowed, etc. Range is [1.001, 100.0]. Default is 10.0.
                                
      --rdo-window-loopback-size <arg>
                                The number of bytes the encoder can look back from each block to 
                                find matches. The larger this value, the slower the encoder but the 
                                higher the quality per LZ compressed bit. You don't need a huge 
                                window to get large gains. Even 64-512 byte windows can be fine. 
                                Range is [64,65536]. Default is 128.
                                
      --rdo-try-one-match       Inject up to 1 match into each block instead of up-to-two matches. 
                                Results in slightly faster, but lower compression.
                                
      --rdo-skip-zero-mse       Skip blocks that have zero mean-squared error (MSR). Might result 
                                in faster compression speed but potentially lower compression.
                                
      --rdo-allow-relative-movement
                                TODO: no idea still what this does.

@MarkCallow

Copy link
Copy Markdown
Collaborator

I removed the subrepo (via git rm -rf external/bc7enc_rdo) and included the files separately. I haven't added a CMakeLists.txt yet because I want to fix CI/CD issues first while keeping the setup as simple as possible (when CI/CD passes then I will include it).

If this means you are including the files in the sources for libktx that is likely to complicate things. It is quite likely that some warnings will have to be ignored when compiling these files requiring further changes to lib/CMakeLists.txt which will then have to be removed. Given the small number of files remaining, it should not take long to make a CMakeLists.txt to build a static library target with them.

I would have preferred you kept the subrepo and deleted the unwanted files with git rm. The goal is to create a branch in the bc7enc fork I created that has just the files we will use. That way we maintain a connection back to upstream.

I have added BC1, BC3, BC4, BC5, and BC7 KTX2 test files such as the bellow (they are mostly 600 bytes each): rgba8_unorm_bc7rgba8_unorm_bc3rgba8_srgb_bc7rgba8_srgb_bc3rgb16_sfloat_bc6hu etc ...

Only files used by texturetests, unittests, etc. need to be in tests/resources. Those for ktx createandktx encode`, along with the tests, should be in the KTX-Software-CTS repo.

I have added BC2 decoder (haven't tested it yet because there are no BC2-compressed KTX2 files apparently...). Should I add BC2 encoder? (it is very rarely used and BC3 replaces it, right?). If yes, it is straightforward to implement.

There is tests/resources/ktx2/pattern_02_bc2.ktx2

I am now finalizing BC6HS encoder and adding proper tests to it.

Sounds good.

This PR also needs to update ktx extract to use the decoders when processing a BCn file so it can output .png or .exr (for BC6H) as it does now for ASTC encoded files.

@MarkCallow

Copy link
Copy Markdown
Collaborator

The arguments I decided to expose BCn options/args as such (this is output from ktx encode --help)
(there are some spelling mistakes like "GPU's". Will fix those):

UASTC RDO has many of the same options. Please use the same names, just changing the prefix to something like bc or bc_ldr. Regrettably the lambda option, uastc_rdo_l has a different range, as possibly do others, so we cannot use the same options for both.

I also want to include the high level --quality and --effort options which will be exposed for the other encoders in the next release. See external/basis_universal/cmd_help/cmd_help.txt for details. If these are specified, the low-level options are ignored. Some of the mapping from these options to the low-level options is done in external/basis_universal/basisu_tool.cpp but most is done in the basisu_encoder. See basis_compressor_params::set_format_mode_and_quality_effort in external/basis_universal/encoder/basisu_comp.cpp as a starting point.

@walcht

walcht commented May 27, 2026

Copy link
Copy Markdown
Author

I would have preferred you kept the subrepo and deleted the unwanted files with git rm

Oops. I misunderstood and removed the subrepo. Nothing is lost and will add it again as subrepo with all other unused files removed.

This PR also needs to update ktx extract to use the decoders when processing a BCn file so it can output .png or .exr (for BC6H)

This is already done (with EXR output for BC6HU and PNG for others).

*Through testing, RDO ultrasmooth block handling significantly reduces
block artifacts for smooth/very smooth blocks (e.g., gradients, sky,
blurred background, etc.). Ideally, this should be enabled by default.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht

walcht commented May 27, 2026

Copy link
Copy Markdown
Author

I have finally integrated ultra-smooth block handling and after testing on some images, the result is significantly less blocky images (see below - if you get an error when clicking on image => refresh Github tab then try again: this happens because Github rotates image links every now and then...):

Top image command:

ktx create --format BC7_SRGB_BLOCK --assign-tf SRGB --bc7-quality exhaustive --rdo --rdo-lambda 1.0 --rdo-window-loopback-size 2048 --threads 12 kodim23.png bc7.ktx2
ktx extract bc7.ktx bc7_ultrasmooth.png

Bottom image command:

ktx create --format BC7_SRGB_BLOCK --assign-tf SRGB --bc7-quality exhaustive --rdo --rdo-lambda 1.0 --rdo-window-loopback-size 2048 --rdo-ultrasmooth-blocks --threads 12 kodim23.png bc7.ktx2
ktx extract bc7.ktx bc7_ultrasmooth.png
image image

@MarkCallow

Copy link
Copy Markdown
Collaborator

I am happy you have extract working.

As for ultrasmooth, the lower image definitely looks better at a glance but when zooming in, I find it difficult to see precisely why. I don't see any hugely noticeable blockiness in either image.

Have you got metrics working? Metrics relies on extracting an image and comparing it against the original input. That will enable quantitative comparison.

@walcht

walcht commented May 28, 2026

Copy link
Copy Markdown
Author

find it difficult to see precisely why

If you zoom in at the top right in the background you will probably notice significant blockiness (the focus/bird themselves are the same).

I think these imagines illustrate the difference better: (top is RDO without ultra-smooth blocks handling; middle is RDO + ultra-smooth blocks handling; bottom is original):
bc7
bc7_ultrasmooth
delorean

Either way, since there are a lot of parameters and setting this correctly is difficult, I am taking some detailed notes on what each parameter does + some metrics (e.g., bit rate vs. window size, etc.) and I will send a link to this README.md guide when it's finished.

Have you got metrics working?

Didn't know KTX provides this.

@MarkCallow

Copy link
Copy Markdown
Collaborator

If you zoom in at the top right in the background you will probably notice significant blockiness (the focus/bird themselves are the same).

Now you have pointed it out, I can see it.

I will send a link to this README.md guide when it's finished.

I look forward to it. It will likely be a useful guide for UASTC RDO too.

Have you got metrics working?

Didn't know KTX provides this.

See --compare-psnr and --compare-ssim in ktx help encode.

@walcht

walcht commented Jun 1, 2026

Copy link
Copy Markdown
Author

This is the link to the guide/detailed notes about different RDO parameters:
https://walcht.github.io/walcht/guides/ktx-bcn-playground.html

Hopefully by next commit (in which all your points should be addressed) I will finally mark this as ready to review.

@MarkCallow

Copy link
Copy Markdown
Collaborator

This is the link to the guide/detailed notes about different RDO parameters:
https://walcht.github.io/walcht/guides/ktx-bcn-playground.html

I had a quick read. It is very thorough and interesting. It needs some editing - to remove duplication, aid clarity and fix some typos. Providing the feedback will take more time than I have today. When I have time, should I open issues at https://github.com/walcht/walcht/issues? I can probably provide PRs for some simple things but I wouldn't want to try for others.

@walcht

walcht commented Jun 2, 2026

Copy link
Copy Markdown
Author

I had a quick read. It is very thorough and interesting. It needs some editing - to remove duplication, aid clarity and fix some typos. Providing the feedback will take more time than I have today. When I have time, should I open issues at https://github.com/walcht/walcht/issues? I can probably provide PRs for some simple things but I wouldn't want to try for others.

Yes, please (that would be really nice). I will add more details + benchmarks at some point.

@walcht

walcht commented Jun 2, 2026

Copy link
Copy Markdown
Author

@MarkCallow - Some minor updates:

UASTC RDO has many of the same options. Please use the same names, just changing the prefix to something like bc or bc_ldr. Regrettably the lambda option, uastc_rdo_l has a different range, as possibly do others, so we cannot use the same options for both.

Done. I chose to use exact same name as ktxBasisParams but with bcn prefix (I think bcn is more consistent. But it's fine if you want me to change it to just bc).
I also used same defaults as UASTC RDO params (at least for the common ones). I have also adjusted some ranges to be the same of that of UASTC RDO ranges. This is my somewhat final version of BCn options:

 Encode BCn options:
      --bc1-mode <mode>         BC1 (subsequently BC3) approximation mode (for both: encoding and 
                                decoding). Default is 'ideal'. If you encode textures for a 
                                specific vendor's GPU, beware that using that texture data on other 
                                GPUs may result in ugly artifacts. Set to 'ideal' unless you know 
                                the texture data will only be deployed or used on a specific 
                                vendor's GPU. Can be set to one of the following:

                                    Mode       |  Description                                   
                                    ---------- | -----------------------------------------------
                                    ideal      | The default mode. No rounding for 4-color      
                                               | colors 2,3. This matches the D3D10 docs on BC1.
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    nvidia     | NVidia GPU mode. May produce artifacts on      
                                               | non-NVidia GPUs.                               
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    amd        | AMD GPU mode. May produce artifacts on non-AMD 
                                               | GPUs.                                          
                                    - - - - -  | - - - - - - - - - - - - - - - - - - - - - - - -
                                    ideal4     | Matches AMD Compressonator's output. Rounds    
                                               | 4-color colors 2,3 (not 3-color color 2). This 
                                               | matches the D3D9 docs on DXT1.                 
      --bc1-quality <level>     The quality level configures the quality-performance tradeoff for 
                                BC1 and, subsequently, BC3 encoders. The quality level can be set 
                                in the range [0, 19] with (0) being the 'fastest' and (19) the 
                                slowest but most 'exhaustive'. Default is (15) 'thorough'. Can also 
                                be set via the following aliases:

                                    Level      |  Quality
                                    ---------- | ---------------------------- 
                                    fastest    | (equivalent to quality =  0) 
                                    fast       | (equivalent to quality =  5) 
                                    medium     | (equivalent to quality = 10) 
                                    thorough   | (equivalent to quality = 15) 
                                    exhaustive | (equivalent to quality = 19) 

                                Note on BC1 vs. BC3 vs. BC7: apart from lower VRAM consumption 
                                (4bpp vs. 8bpp) and better GPU texture cache efficiency, there's 
                                little need to use BC1 now. BC3 still has an advantage vs. BC7, 
                                because it very strongly separates how RGB is encoded from the 
                                alpha channel, in a predictable way.
      --bc7-quality <level>     The quality level configures the quality-performance tradeoff for 
                                BC7 encoder. Default is 'medium'. The quality level can be set 
                                between fastest and exhaustive via the following fixed quality 
                                presets where each preset is an OR'ed set of flags:

                                    Level      |  OR'ed flags                 
                                    ---------- | ---------------------------- 
                                    fastest    | (equivalent to flags =  128) 
                                    faster     | (equivalent to flags =  176) 
                                    fast       | (equivalent to flags =  179) 
                                    medium     | (equivalent to flags =  255) 
                                    thorough   | (equivalent to flags = 1023) 
                                    exhaustive | (equivalent to flags = 3967)
      --bcn-rdo                 Enable BCn LDR RDO post-processing. HDR formats (BC6HU/BC6HS) are 
                                currently not supported.
      --bcn-rdo-l <lambda>      Set BCn RDO quality scalar to the specified value. Lower values 
                                yield higher quality/larger supercompressed files, higher values 
                                yield lower quality/smaller supercompressed files. A good range to 
                                try is [.25,10]. For normal maps a good range is [.25,.75]. The 
                                full range is [.001,10.0]. Default is 1.0.
      --bcn-rdo-d <dictsize>    Set BCn RDO dictsize size in bytes. Default is 4096. Lower 
                                values=faster, but give less compression. Range is [64,65536].
      --bcn-rdo-b <scale>       Set BCn RDO max smooth block error scale. Range is [1.0,300.0]. 
                                Default is to automatically compute this. 1.0 is disabled. Larger 
                                values suppress more artifacts (and allocate more bits) on smooth 
                                blocks.
      --bcn-rdo-s <deviation>   Set BCn RDO max smooth block standard deviation. Range is 
                                [.01,65536.0]. Default is 18.0. Larger values expand the range of 
                                blocks considered smooth.
      --bcn-rdo-r <ratio>       How much the RMS error of a block is allowed to increase before a 
                                trial is rejected. 1.0=no increase allowed, 1.05=5% increase 
                                allowed, etc. Range is [1.001, 100.0]. Default is 10.0.
      --bcn-rdo-no-ultrasmooth  Disable encoding of extremely smooth blocks with a significantly 
                                higher MSE scale factor. Results in significantly more artifacts on 
                                regions containing very smooth blocks (e.g., gradients, skies, 
                                etc.). This does improve rate-distortion performance, though.
      --bcn-rdo-try-one-match   Inject up to 1 match into each block instead of up-to-two matches. 
                                Results in slightly faster, but lower compression.
      --bcn-rdo-skip-zero-mse   Skip blocks that have zero mean-squared error (MSR). Might result 
                                in faster compression speed but potentially lower compression.

I have also removed the --rdo-allow-relative-movement option and its associated code in ERT. From my benchmarks, it just makes encoding extremely slow with no noticeable benefits. Also, this is stated in original bc7enc_rdo repo:
-zm BC1-7: Allow byte sequences to be moved inside blocks (significantly slower, not worth it in benchmarking, will likely be removed.

I also want to include the high level --quality and --effort options which will be exposed for the other encoders in the next release.

I looked into some details in basisu about this. Should I do this now (in this PR) or wait for the PR that will add --quality and --effort to all options? (I think the latter makes more sense).

@walcht

walcht commented Jun 2, 2026

Copy link
Copy Markdown
Author

And this is the single image BCn decoder that you requested me to include. It is independent of KtxTexture* C classes. I included this in ktx.h so that it is exposed in libktx API:

/**
 * @ingroup reader
 * @brief Decodes a provided BCn-encoded image. All BCn formats are supported
 *        (BC1, BC2, BC3, BC4, BC5, BC6HU, BC6HS, or BC7).
 *
 *        Decoding into non-multiple-of-4 texture dimensions is also supported
 *        (decoded blocks that fall out of the image's dimensions are simply
 *        discarded).
 *
 * @param [in] src_blocks   pointer to the BCn-encoded blocks.
 * @param [in] dst          pointer to where to write the decoded image. Should
 *                          be able to hold the size of the corresponding
 *                          decompressed vkFormat.
 * @param [in] width        current image's width.
 * @param [in] height       current image's height.
 * @param [in] bcn          which BCn compression kind the provided image is
 *                          encoded in.
 * @param [in] params       pointer to BC1, and subsequently BC3, decoder
 *                          parameters.
 *
 * @return                  KTX_SUCCESS on success, other KTX_* enum values on
 *                          error.
 *
 * @exception KTX_INVALID_VALUE
 *                          @p params is NULL but @p This texture is BC1 or BC3
 *                          compressed.
 * @exception KTX_INVALID_OPERATION
 *                          Decoder/Unpacker returned an error exit code or a
 *                          non-success return flag. Only occurs for BC1, BC2,
 *                          BC3, and BC7 (BC2 and BC3 are based on BC1).
 */
KTX_API KTX_error_code KTX_APIENTRY
ktxUnpackBCn(const ktx_uint8_t* src_blocks, ktx_uint8_t* dst, ktx_uint32_t width,
             ktx_uint32_t height, ktx_bcn_compression_e bcn, ktxBC1UnpackParams* params);

Now all that remains are CTS tests.

… code

*Remove --rdo-allow-relative-movement since it makes encoding significantly
slower with no noticeable improvements.
*Use same RDO option names as UASTC RDO (also use same defaults and same
ranges).
*Add single-image BCn decoder that is independant of KtxTexture* C classes and
which can be used by Vulkan/OpenGL loaders to upload a single image/mip level.
*Refact BCn encoding code to use the said function above.
*Fix a pointer arithmetic issue with encoder when given non-multiple-of-4 input
images/textures.

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@MarkCallow

Copy link
Copy Markdown
Collaborator

Please make a PR for the CTS tests so I review them. Also update the CTS ref here to resolve the conflict which will also allow me to run CI on this PR. Note, I am not sure if having a CTS ref to a commit in the branch that is source for your PR will work. Let's try it and see.

walcht added 2 commits June 8, 2026 16:34
*Basisu's BC6HU encoder fails if it encounters NaNs, infinites, negative, or
larger than basist::ASTC_HDR_MAX_VAL half floats. This commit addresses that by
adding a HDR cleanup pre-processing step (basis_compressor::clean_hdr_image).

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
walcht and others added 2 commits June 8, 2026 17:01
*Point to original CTS submodule commit (not the PR containing the
additional BCn encoder/decoder tests).

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@walcht walcht marked this pull request as ready for review June 8, 2026 15:03
@walcht

walcht commented Jun 8, 2026

Copy link
Copy Markdown
Author

Please make a PR for the CTS tests so I review them.

KhronosGroup/KTX-Software-CTS#74 (I haven't finalized these yet).

Also update the CTS ref here to resolve the conflict which will also allow me to run CI on this PR. Note, I am not sure if having a CTS ref to a commit in the branch that is source for your PR will work. Let's try it and see.

I reverted to the same CTS submodule commit as main (I just keep getting a merge conflict even after updating the CTS PR - have no idea why).

@MarkCallow

Copy link
Copy Markdown
Collaborator

I reverted to the same CTS submodule commit as main (I just keep getting a merge conflict even after updating the CTS PR - have no idea why).

I think you need to merge the latest main from KTX-Software-CTS into your branch.

@walcht

walcht commented Jun 8, 2026

Copy link
Copy Markdown
Author

I think you need to merge the latest main from KTX-Software-CTS into your branch.

I thought I did that already (Merge branch 'main' of github.com:walcht/KTX-Software-CTS into HEAD). I will double check. => now it points the CTS from PR. This is really weird, GitHub UI complained previously that there was a merge conflict now there no longer is one (I probably missed something).

For the moment, I am fixing Window build CI issues.

*size_t were used all over the place and silently converted to uint32_t
which throws warnings (which are treated as errors) for Windows
platforms on Visual Studio 2022 17. Now uint32_t is used consistently
and size_t is only used very sparingly.

*CTS tests now point to the PR corresponding to BCn encoders/decoders PR.

*multithreading.h/cpp was wrongly refactored. static keyword is removed
and functions are properly declared and defined in header and cpp files,
respectively.

*REUSE complained about missing license of bc6hu file. Added Khronos
Group license to this file (it was created by me from scratch).

Signed-off-by: Walid Chtioui <walid.chtioui.main@gmail.com>
@MarkCallow

Copy link
Copy Markdown
Collaborator

This is really weird, GitHub UI complained previously that there was a merge conflict now there no longer is one (I probably missed something).

The conflict may have been because the selected action for this PR was "Create a merge commit". If so, that was selected because that was the action I used on the most recent PR that I merged. It now shows "Squash and merge" which vastly reduces the possibility of a conflict. With "Create a merge commit", each commit has the possibility to be causing a conflict.

I have been caught out by this before. I hate that there is no way to set a default action.

@walcht walcht marked this pull request as draft June 10, 2026 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add BC7 encoder with RDO

3 participants