Skip to content

Conversation

@ziliangzl
Copy link
Contributor

For masked loads where other is not specified, the Triton CUDA backend fills the masked elements with 0 by default. This change ensures that the behavior matches the Triton CUDA backend results.

A reference test case named basic_load in Triton can be found here: test/Conversion/tritongpu_to_llvm.mlir
This test case specifies an other value. Its generated LLVM IR includes:

(module attributes {...}
  llvm.func @basic_load(...) {
    ...
    %10 = llvm.inline_asm has_side_effects ... "mov.u32 $0, $1;\0A\09@$3 ld.global.b32 { $0 }, [ $2 + 0 ];", ...
    ...
  }
)

Another test case vecadd_masked_vec1 without an other value generates LLVM IR like:

(module attributes {...}
  llvm.func @vecadd_masked_vec1(...) {
    ...
    %72 = llvm.inline_asm has_side_effects asm_dialect = att operand_attrs = [] "mov.u32 $0, 0x0;\0A\09@$2 ld.global.b32 { $0 }, [ $1 + 0 ];", "=r,l,b" %70, %71 : (!llvm.ptr<1>, i1) -> i32
    ...
  }
)

In this case, the masked elements are always filled with 0x0 when other is not specified.

@ziliangzl
Copy link
Contributor Author

The tests for this PR have not been updated yet. It currently conflicts with #355, and the tests will be updated once #355 is resolved.

@ziliangzl
Copy link
Contributor Author

The tests for this PR have not been updated yet. It currently conflicts with #355, and the tests will be updated once #355 is resolved.

I have already updated tests, this pr is ready now.

@bmyerz0
Copy link
Contributor

bmyerz0 commented Oct 3, 2025

While we often treat the CUDA backend as the language spec, this particular behavior is not listed in the documentation for load https://triton-lang.org/main/python-api/generated/triton.language.load.html#triton.language.load. It seems like a performance issue and lack of programmer control to force backends to initialize.

Do you have any references to discussions on this issue in Triton that this other=0 should be default behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants