-
Notifications
You must be signed in to change notification settings - Fork 156
[CIR][ThroughMLIR] Lower WhileOp with break #1735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR adds the flag `-emit-mlir-llvm` to allow emitting of MLIR in the LLVM dialect (cc @xlauko who asked me to do this). I'm not sure if the naming of the flag is the best and maybe someone will have a better idea. Another solution would be to make the `-emit-mlir` flag have a value, that specifies the target dialect (CIR/MLIR std dialects/LLVM Dialect).
GCC, unlike clang, issues a warning when one virtual function is overridden in a derived class but one or more other virtual functions with the same name and different signature from a base class are not overridden. This leads to many warnings in the MLIR and ClangIR code when using the OpenConversionPattern<>::matchAndRewrite() function in the ordinary way. The "hiding" behavior is what we want.
This resolve issue llvm#1442 .
Lower vcagtd_f64
Lower neon vaddlvq_s32
Lower neon vaddlv_u32
This patch introduces support for pointer TBAA, which can be enabled using the `-fpointer-tbaa` flag. By default, this feature is now enabled. To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`, and `tbaa-struct.cpp`) have been updated to include the `-fno-pointer-tbaa` flag. Related Pull Requests of OG: - llvm/llvm-project#76612 - llvm/llvm-project#116991
Based on https://github.com/llvm/clangir/blob/7f66a204c4ba1f674cfe0e16e2c9c6b65ca70bc8/clang/lib/Basic/Targets/NVPTX.h#L27, the current address space values are incorrect. This PR fixes these values.
Lower neon vcages_f32
This implements the missing feature `cir::setTargetAttributes`. Although other targets might also need attributes, this PR focuses on the CUDA-specific ones. For CUDA kernels (on device side, not stubs), they must have a calling convention of `ptx_kernel`. It is added here. CUDA kernels, as well as global variables, also involves lots of NVVM metadata, which is intended to be dealt with at the same place. It's marked with a new missing feature here.
Lower neon vmaxv_f32
This is part 2 of CUDA lowering. Still more to come! This PR generates `__cuda_register_globals` for functions only, without touching variables. It also fixes two discrepancies mentioned in Part 1, namely: - Now CIR will not generate registration code if there's nothing to register; - `__cuda_fatbin_wrapper` now becomes a constant.
This PR deals with several issues currently present in CUDA CodeGen. Each of them requires only a few lines to fix, so they're combined in a single PR. **Bug 1.** Suppose we write ```cpp __global__ void kernel(int a, int b); ``` Then when we call this kernel with `cudaLaunchKernel`, the 4th argument to that function is something of the form `void *kernel_args[2] = {&a, &b}`. OG allocates the space of it with `alloca ptr, i32 2`, but that doesn't seem to be feasible in CIR, so we allocated `alloca [2 x ptr], i32 1`. This means there must be an extra GEP as compared to OG. In CIR, it means we must add an `array_to_ptrdecay` cast before trying to accessing the array elements. I missed that out in llvm#1332 . **Bug 2.** We missed a load instruction for 6th argument to `cudaLaunchKernel`. It's added back in this PR. **Bug 3.** When we launch a kernel, we first retrieve the return value of `__cudaPopCallConfiguration`. If it's zero, then the call succeeds and we should proceed to call the device stub. In llvm#1348 we did exactly the opposite, calling the device stub only if it's not zero. It's fixed here. **Issue 4.** CallConvLowering is required to make `cudaLaunchKernel` correct. The codepath is unblocked by adding a `getIndirectResult` at the same place as OG does -- the function is already implemented so we can just call it. After this (and other pending PRs), CIR is now able to compile real CUDA programs. There are still missing features, which will be followed up later.
Lower neon vaddlv_s32
This is Part 3 of registration function generation. This generates `__cuda_module_dtor`. It cannot be placed in global dtors list, as treating it as a normal destructor will result in double-free in recent CUDA versions (see comments in OG). Rather, the function is passed as callback of `atexit`, which is called at the end of `__cuda_module_ctor`.
Traditional clang implementation: https://github.com/llvm/clangir/blob/a1ab6bf6cd3b83d0982c16f29e8c98958f69c024/clang/lib/CodeGen/CGBuiltin.cpp#L3618-L3632 The problem here is that `__builtin_clz` allows undefined result, while `__lzcnt` doesn't. As a result, I have to create a new CIR for `__lzcnt`. Since the return type of those two builtin differs, I decided to change return type of current `CIR_BitOp` to allow new `CIR_LzcntOp` to inherit from it. I would like to hear your suggestions. C.c. @Lancern
This PR adds support for compiling builtin variables like `threadIdx` down to the appropriate intrinsic. --------- Co-authored-by: Aidan Wong <[email protected]> Co-authored-by: anominos <[email protected]>
I have now fixed the test. Earlier I made some commits with other changes because we were testing something on my fork. This should be resolved now
CIR is currently ignoring the `signext` and `zeroext` for function arguments and return types produced by CallConvLowering. This PR lowers them to LLVM IR.
I realized I committed a new file with CRLF before. Really sorry about that >_< Related: llvm#1404
The choice of adding a separate file imitates that of OG.
Backporting the VecShuffleOp folder
…llvm#1716) We lower `cir::ForOp` into `cir::WhileOp` (rather than `scf::WhileOp`) when it contains break and continue. This is to reuse the rewriting functions already implemented for while loops. Co-authored-by: Yue Huang <[email protected]>
Support codgen for GenericSelectionExpr
Implement PackIndexingExpr for ScalarExpr
…lvm#1720) Fix the verifier error messages from `unexpected error: 'cir.complex.imag' op cir.complex.imag result type does not match operand type` to `unexpected error: 'cir.complex.imag' op : result type does not match operand type`
Backporting the VecShuffleDynamicOp folder
…ing (llvm#1724) This will in future allow to use builtin integer types within cir operations
- Add common CIR_ prefix - Simplify printing/parsing - Make it use IntTypeInterface
- Adds CIR_ prefix to the definition - Removes redundant builder and cleans up attribute creations
This will allow to use Attributes and Types together in tablegen without inducing cyclic dependency.
…1717) Big question mark here: When lowering target specific vector types: (`__m256i`, `__m128i`, `__m64`), I was hitting an unreachable statement which I removed and were preventing these types from being lowered. Not too familiar with it but it's related to the attribute `"min-legal-vector-width"="N"` which is not implemented for `cir::VectorType` as compared to OG. Is that a blocker for these intrinsics as of now? or is that something we wanna target before we merge x86 vector specific intrinsics?.
Backporting the VecCmpOp folder
… of the same size (llvm#1728) The `cir::CastOp::verify` method was overly conservative, and would fail on any `bitcast` from vector to scalar or scalar to vector. Change List: - Extends the `cir::CastOp::verify` method to check if the source and result types are the same size using the `mlir::DataLayout` of the current scope, and succeeds if the sizes match. - Extends the CodeGen vectype tests with vector to scalar, scalar to vector and vector to vector conversions. - Extends the IR invalid tests with vector to scalar and scalar to vector conversions with different source and result sizes.
…back due to deprecation" This reverts commit 1bbf343.
Remove code after return statement
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…lvm#1722) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore <dependency name> major version` will close this group update PR and stop Dependabot creating any more for the specific dependency's major version (unless you unignore this specific dependency's major version or upgrade to it yourself) - `@dependabot ignore <dependency name> minor version` will close this group update PR and stop Dependabot creating any more for the specific dependency's minor version (unless you unignore this specific dependency's minor version or upgrade to it yourself) - `@dependabot ignore <dependency name>` will close this group update PR and stop Dependabot creating any more for the specific dependency (unless you unignore this specific dependency or upgrade to it yourself) - `@dependabot unignore <dependency name>` will remove all of the ignore conditions of the specified dependency - `@dependabot unignore <dependency name> <ignore condition>` will remove the ignore condition of the specified dependency and ignore conditions </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Couple of things I have questions about: 1. I duplicated function `getIntValueFromConstOp` from `CIRGenBuiltinAArch64.cpp`. I was wondering if that's correct or if there's a place where we can avoid that duplication. 2. For the tests related to `mm_prefetch` im not sure if it'd be correct to define them in a file eg: `sse-builtins.c` like it's currently done in the codegen lib. 3. I'm also aware we can emit a call for a `PreFetchOp` would that be required in this case? related: llvm#1414, llvm#1404 (A PR was previously opened but It was not resolved)
…re (llvm#1732) just a few improvements to mirror og test cases in x86 for better reference.
…tribute (llvm#1733) - Remove redundant custom printer and parser for AddressSpace, relying instead on MLIR's default EnumAttr handling. - Leverage AddressSpace::Default to omit the attribute from the assembly form when not needed. Therefore, an empty attribute is no longer needed to represent the default address space. - Update PointerType to use the AddressSpace enum directly, instead of a boxed attribute.
} | ||
|
||
// Operations after this BreakOp has to be removed. | ||
for (mlir::Operation *runner = breakOp->getNextNode(); runner;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you need to remove this? Looks like we should just split the block at this point and create an unrecheable block (which should get later DCE'd by canonicalizer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know whether it's allowed to delete a non-empty loop (as I'm deleting the loop below). I'll change it to splitting the block if it doesn't matter or we can find a better way below.
} | ||
|
||
// Blocks after this BreakOp also has to be removed. | ||
for (mlir::Block *block = breakOp->getBlock()->getNextNode(); block;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ties back to the other comment
|
||
// We know this BreakOp isn't nested in any IfOp. | ||
// Therefore, the loop is executed only once. | ||
// We pull everything out of the loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like your are optimizing while you are lowering, why isn't this a separate pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to erase the BreakOp because scf doesn't support it, so we have to rewrite the loop in some way to preserve semantics. Deleting the loop is the most straightforward way I can think of. Could you suggest better ways of rewriting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point! This is good for now.
Thinking more about this it seems like these things should actually be handled in a pass before the actual SCF lowering (something like a CoreDialectPrepare kinda thing) - just like we have a CFGFlatten pre LLVM we could have something that massage CIR loops into CIR loops suitable for more direct SCF translation. This is more food for thought for the future, where you will probably have gathered more examples (cir.continue
is slightly different but suffers from a similar issue?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I think I'll do the cir.switch
handling (which I'm working on now) in a new pass, and move these break/continue lowering to there afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once the remaining comment is addressed (perhaps deleting the empty loop is what's gonna work). Just gave you extra permission to merge the PR once its ready
This only deals with the case in which the
break
is directly under the loop, not nested in anyif
s.In this case we simply erase the loop as well as unreachable code after that
break
.