Skip to content

Dev/wwolff/llvm cleanup 20240710 2 #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: upmem_release_120
Choose a base branch
from

Conversation

wwolff42
Copy link

No description provided.

@wwolff42 wwolff42 self-assigned this Jul 31, 2024
@wwolff42 wwolff42 changed the base branch from upmem_release_120 to dev/wwolff/remove_builtin_dpu_seqread_get July 31, 2024 15:08
@wwolff42 wwolff42 force-pushed the dev/wwolff/remove_builtin_dpu_seqread_get branch from 278c7d5 to 8467b26 Compare August 19, 2024 07:01
@wwolff42 wwolff42 force-pushed the dev/wwolff/llvm_cleanup_20240710_2 branch 8 times, most recently from bfcda1d to 2233588 Compare August 19, 2024 15:52
Base automatically changed from dev/wwolff/remove_builtin_dpu_seqread_get to upmem_release_120 August 20, 2024 15:28
In those use, last parameter of BuildMI is the DestReg.
It basically declare this regsiter as a definition internally.

SD variant represents store instruction, it doesn't define or allocate to a register.
CALL variant define the first register, usualy R23 as link register in our ABI.
SUB variant define the first register as well.

This patch fix those errors:

```
*** Bad machine code: Explicit operand marked as def ***
- function:    __divdf3
- basic block: %bb.0 entry (0x55d88649e508)
- instruction: $r22 = SDrir 88, $d22
- operand 0:   $r22

*** Bad machine code: Explicit definition marked as use ***
- function:    __divdf3
- basic block: %bb.35 if.else141 (0x560777d636f8)
- instruction: CALLri $r23, &__muldi3, debug-location !307; work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/divdf3.c:166:52
- operand 0:   $r23

*** Bad machine code: Explicit definition marked as use ***
- function:    process_inputs_all_tasklets
- basic block: %bb.5  (0x563bd908a390)
- instruction: CALLrr $r23, killed $r3, debug-location !219; work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/wramfifo.c:108:21
- operand 0:   $r23

*** Bad machine code: Explicit definition marked as use ***
- function:    printf
- basic block: %bb.0  (0x5595f29ecd90)
- instruction: SUBrrif $r0, $r22, 56, 9, debug-location !103; work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/stdlib/stdio.c:110:5
- operand 0:   $r0
```
Those function is heavily used all along the LLVM backend pipeline, to inspect
and optimize the CFG.

UPMEM DPU ISA provides arithmetic+comp+branch in one instruction.
We introduce them as early as DPUTargetLowering::EmitInstrWithCustomInserter,
and when optimizer move around the CFG, it was losing correctness.

This patch fix issues like:
```
*** Bad machine code: Explicit definition marked as use ***
- function:    test
- basic block: %bb.0 entry (0x56544fed73d8)
- instruction: CLZ_Urrci $d0, $r2, 33, %bb.2, debug-location !31; work/simple_examples/minimal.c:24:12
- operand 0:   $d0

*** Bad machine code: Using an undefined physical register ***
- function:    test
- basic block: %bb.0 entry (0x56544fed73d8)
- instruction: CLZ_Urrci $d0, $r2, 33, %bb.2, debug-location !31; work/simple_examples/minimal.c:24:12
- operand 0:   $d0
```

or:
```
*** Bad machine code: Explicit definition marked as use ***
- function:    __muldf3
- basic block: %bb.35 cleanup98.i (0x55f824732cb8)
- instruction: LSRXrrrci $r6, $r2, $r4, 40, %bb.25, debug-location !310; work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/fp_lib.h:268:48 @[ work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/fp_mul_impl.inc:99:9 @[ work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/muldf3.c:21:12 ] ]
- operand 0:   $r6

*** Bad machine code: Using an undefined physical register ***
- function:    __muldf3
- basic block: %bb.35 cleanup98.i (0x55f824732cb8)
- instruction: LSRXrrrci $r6, $r2, $r4, 40, %bb.25, debug-location !310; work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/fp_lib.h:268:48 @[ work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/fp_mul_impl.inc:99:9 @[ work/dpu_tools_llvm_cleanup_20240710_2/dpu-rt/src/syslib/muldf3.c:21:12 ] ]
- operand 0:   $r6
```
CALL variant instruction defines implicitely r0, or d0 (r0-r1) for 64-bit value,
don't lose this information.
Add properly register used in new MBBs when lowering Jcc familly of pseudo instruction.
Fix machine CFG topology.

This is a WIP, it will be rearranged/rebased later.
@wwolff42 wwolff42 force-pushed the dev/wwolff/llvm_cleanup_20240710_2 branch from 2233588 to c804830 Compare August 20, 2024 15:31
wwolff42 and others added 3 commits August 23, 2024 15:30
…familly

This is a partial WIP fix.
I'm saving it now to keep it.

The idea is to emit simple but correct code first,
and work out for fuse them back together later in the pipeline.

When in presence of known post RA fusable set of instructions, we try
to keep them together during pre RA:
- DPUInstrInfo::shouldSink return false
- DPUMacroFussion::shouldScheduleAdjacent return true

TODO:
- the other instruction
- a new CFGOptimizer to cleanup what split-critical-edge did when it make sense
-- some critical-edge are broken down to new MBB with a simple JUMPi
-- and are never optimized/cleaned up out
-- this would cause bigger code footprint, and performance regression
-- note that this situation is already present without this big fix ...
Pointers in non-zero address spaces need to be address space
casted before appending to the used list.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D101363
@wwolff42 wwolff42 force-pushed the dev/wwolff/llvm_cleanup_20240710_2 branch from c804830 to e802fe6 Compare August 23, 2024 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants