Skip to content

[mlir] Inconsistent output when executing MLIR program with --scf-parallel-loop-tiling --canonicalize #119825

@Emilyaxe

Description

@Emilyaxe

git version: b2b267e

system: Ubuntu 18.04.6 LTS

Description:

I am experiencing an inconsistent result when executing the same MLIR program with and without --scf-parallel-loop-tiling --canonicalize.
The output becomes correct when either of these two options is removed, so I'm unsure which optimization contains the bug.

Steps to Reproduce:

1. MLIR Program (a.mlir):

a.mlir:

module {
  func.func private @printMemrefI32(tensor<*xi32>)
  func.func private @printMemrefF32(tensor<*xf32>)
  func.func @main() {
    %7 = "tosa.const"() <{values = dense<6220> : tensor<1x6x6xi32>}> : () -> tensor<1x6x6xi32>
    %9 = "tosa.const"() <{values = dense<-298> : tensor<1x6x6xi32>}> : () -> tensor<1x6x6xi32>
    %51 = tosa.bitwise_or %7, %9 : (tensor<1x6x6xi32>, tensor<1x6x6xi32>) -> tensor<1x6x6xi32>
    %cast = tensor.cast %51 : tensor<1x6x6xi32> to tensor<*xi32>
    call @printMemrefI32(%cast) : (tensor<*xi32>) -> ()
    return
  }
}


2. Command to Run without optimizations :

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt  \
 -tosa-to-arith  -one-shot-bufferize="bufferize-function-boundaries"    -convert-linalg-to-parallel-loops   \
 -convert-index-to-llvm      -convert-arith-to-llvm  -convert-scf-to-cf       -convert-arith-to-llvm    \
  -convert-cf-to-llvm  -finalize-memref-to-llvm   -convert-func-to-llvm  -lower-affine  -convert-arith-to-llvm \
  -reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main \
-entry-point-result=void --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so  \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

3. Output without optimizations ::

[[[-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290]]]

4. Command to Run with optimizations :

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt a.mlir -pass-pipeline="builtin.module(func.func(tosa-to-linalg))" | /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt \
-tosa-to-arith  -one-shot-bufferize="bufferize-function-boundaries" \
-convert-linalg-to-parallel-loops            -convert-index-to-llvm      -convert-arith-to-llvm    \
--scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true"    \
--canonicalize  --scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true"  \
 -convert-scf-to-cf       -convert-arith-to-llvm       -convert-cf-to-llvm  -finalize-memref-to-llvm  \
 -convert-func-to-llvm  -lower-affine  -convert-arith-to-llvm  \
-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void  \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
 --shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so  \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_async_runtime.so

5. Output with optimizations :

[[[-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [0,    0,    0,    32767,    -1553697280,    22067], 
  [-1553697280,    22067,    -1553697264,    22067,    0,    0]]]

Activity

Emilyaxe

Emilyaxe commented on Feb 27, 2025

@Emilyaxe
Author

I have analyzed the issue and found that the root cause may be related to the --scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true" option.
The input IR (after running --canonicalize) input.txt can correctly print the index %10 (from 0 to 5). However the output IR (after running --scf-parallel-loop-tiling="parallel-loop-tile-sizes=1,4 no-min-max-bounds=true") output.txt
cannot correctly print the index %46 (from 0 to 3), which leads to random values in the last two rows of the final results. I am not pretty sure whether the use of no-min-max-bounds=true is appropriate in this case.

Running the input.txt

Image

with commands

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt-953838d input.txt \
-convert-scf-to-cf       -convert-arith-to-llvm       -convert-cf-to-llvm -convert-vector-to-llvm -finalize-memref-to-llvm  \
-convert-func-to-llvm  -lower-affine  -convert-arith-to-llvm  \
-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void  \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so  

we will get results as following:

0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
4
5
4
5
4
5
4
5
4
5
4
5
Unranked Memref base@ = 0x56475a1f8100 rank = 3 offset = 0 sizes = [1, 6, 6] strides = [36, 6, 1] data = 
[[[-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290]]]

with commands

/data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-opt-953838d input.txt \
-convert-scf-to-cf       -convert-arith-to-llvm       -convert-cf-to-llvm -convert-vector-to-llvm -finalize-memref-to-llvm  \
-convert-func-to-llvm  -lower-affine  -convert-arith-to-llvm  \
-reconcile-unrealized-casts | timeout 10 /data/szy/MLIR/llvm-release/llvm-project/build/bin/mlir-cpu-runner -e main -entry-point-result=void  \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_c_runner_utils.so \
--shared-libs=/data/szy/MLIR/llvm-release/llvm-project/build/lib/libmlir_runner_utils.so  

However, running the output.txt with the same commands as the one used in running input.txt

Image

we will get results as following:

0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
Unranked Memref base@ = 0x5631d802b440 rank = 3 offset = 0 sizes = [1, 6, 6] strides = [36, 6, 1] data = 
[[[-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [-290,    -290,    -290,    -290,    -290,    -290], 
  [2,    0,    0,    0,    0,    0], 
  [0,    0,    0,    1879426263,    852879496,    0]]]

This case is a little complex, and I haven't been able to pinpoint the exact invalid IR in output.txt yet. If further analysis is needed, I will continue investigating.
Hi, @linearhit @sherhut @bondhugula
Sorry to disturb you, but could you please kindly take a moment to review this problem? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @Emilyaxe@Apochens

        Issue actions

          [mlir] Inconsistent output when executing MLIR program with `--scf-parallel-loop-tiling` `--canonicalize` · Issue #119825 · llvm/llvm-project