-
Notifications
You must be signed in to change notification settings - Fork 7
[ingress][pytorch] Basic KernelBench to MLIR conversion #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fa1141e to
63b8240
Compare
Basic as can be torch-mlir converter for the level1 and level2 KernelBench kernels. The `convert-kernel-bench-to-mlir.py` script does the conversion and dumps the results in the `cache/level1` and `cache/level2` folders. Relies on pre-packaged mlir wheels and mlir-torch, as this PR considers dealing with versioning and packaging an orthogonal matter to getting ingress up and running. About ~55 of the 200 kernels are filtered out as they either crash torch-mlir or yield very big .mlir files. This ignore_list is meant to be amended as these issues get addressed, e.g. by altering init_inputs on a per kernel basis. The conversion script sticks to outputting just linalg for now. As it does this, it does do some basic post-processing of torch-mlir's output, namely it runs the -linalg-specialize-generic-ops pass.
63b8240 to
7b2309a
Compare
|
I thought to leave the following here: $ time uv run convert-kernel-bench-to-mlir.py
Processing: level1/100_HingeLoss.py
Processing: level1/10_3D_tensor_matrix_multiplication.py
Processing: level1/11_4D_tensor_matrix_multiplication.py
Skipping: level1/12_Matmul_with_diagonal_matrices_.py
...
Processing: level2/96_ConvTranspose3d_Multiply_Max_GlobalAvgPool_Clamp.py
Skipping: level2/97_Matmul_BatchNorm_BiasAdd_Divide_Swish.py
Skipping: level2/98_Matmul_AvgPool_GELU_Scale_Max.py
Skipping: level2/99_Matmul_GELU_Softmax.py
Skipping: level2/9_Matmul_Subtract_Multiply_ReLU.py
real 6m15.501s
user 5m29.552s
sys 1m24.632s
$ ls -l cache/* | grep .mlir | wc -l
144That is, even with the worst offenders filtered out, using vanilla torch-mlir to convert these 144 simple NNs is still terribly slow. I expect this is in no small part due to the huge EDIT: For completeness, here's how long it takes when not doing the deserialization and clean-up pass afterwards, i.e. completely vanilla |
Is there any option for |
| if not (kernels_as_pytorch_folder.exists() and kernels_as_pytorch_folder.is_dir()): | ||
| print( | ||
| "ERROR: KernelBench repo not found.\n" | ||
| "NOTE: Pull in dependency with: git submodule update " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It probably needs update --init when the dir is not present at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried it yesterday without any directory there. That is, started out clean, I ran the script and got the error whereupon I copied the (--init-less) command and ran that. After that I could run the script without error.
Might depend on git version though. If someone knows or encounters that this command isn't sufficient in all cases, please let me know and I will amend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looks like the error only occurs for the first time before overall submodule initialization.
Now, if I only remove the cloned KernelBench dir, then a simple submodule update seems sufficient.
Might be down to some git caching?
However, I'm able to reproduce the error in a freshly cloned repo.
Running git submodule update ingress/KernelBench/KernelBench returns:
Submodule path 'ingress/KernelBench/KernelBench' not initialized
Maybe you want to use 'update --init'?
Not sure about best practices here. But not necessarily a blocker as it's easily fixable by following git's error msg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always use --init on a fresh repo. We should not rely on caches or user configuration.
I tried to find it yesterday but had no luck. @Groverkss would you have some pointers? |
|
Thanks for working on this! A bit tangential, but shouldn't we set-up some CI to make sure that the code uploaded to Lighthouse does work? That would be my preference and I am keen to help, just a bit blocked ATM 😅 (e.g. llvm/torch-mlir-release#22). |
Not tangential at all, we definitely should. At the very least to make sure we pull the right deps, etc. @adam-smnk, it should be trivial to create a workflow file that runs this example. Then, when @rolfmorel merges the Kernel Bench, we can add that to the CI, too. It will be a bit loose in the beginning, but then we can start creating some test scripts later. @banach-space, it would be great if you could check it works on Arm, too, after your PR to |
You can control how mutable tensors are lowered using FXImporterHooks: https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/extras/fx_importer.py#L470 See how IREE does it for parameters: https://github.com/iree-org/iree-turbine/blob/5e50318baf6d436f48413788e5c7ea11425d130f/iree/turbine/aot/support/procedural/exported_program.py#L268 I would ideally recommend against promoting constant arguements to input arguements, because that's not how these things work in the real world, and instead do something like what iree does with "util.global_load". Maybe you can use "ml_program.global_load". Promoting to constant arguements is an easy short term solution if you are only compliing but not running things. It's a fake solution. This is also why i recommended against using |
Note, this is required for both Aarch64 and X86: |
rengolin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in #23, since this script has "whole process" without using a (still non-existent) API, it should be in examples, not ingress.
| if not (kernels_as_pytorch_folder.exists() and kernels_as_pytorch_folder.is_dir()): | ||
| print( | ||
| "ERROR: KernelBench repo not found.\n" | ||
| "NOTE: Pull in dependency with: git submodule update " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always use --init on a fresh repo. We should not rely on caches or user configuration.
| @@ -0,0 +1,3 @@ | |||
| [submodule "ingress/KernelBench/KernelBench"] | |||
| path = ingress/KernelBench/KernelBench | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually submodules go into third_party, so that it's easier to initialize and know what's external. But in the case where we only want to pull some submodules, not all, this may be a better option.
I don't mind either way, just want to make sure we're all in the same page here.
@banach-space @adam-smnk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preference but I agree that keeping all submodules in a third_party dir seems more standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just need to make sure we avoid --recursive for git submodule update on instructions or scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to third_party, as per the principle of least surprise.
Basic as can be torch-mlir converter for the level1 and level2 KernelBench kernels. The
convert-kernel-bench-to-mlir.pyscript does the conversion and dumps the results in thecache/level1andcache/level2folders alongside the script.56 of the 200 kernels are filtered out as they either crash torch-mlir or yield very big .mlir files. This ignore_list is meant to be amended as these issues get addressed, e.g. by altering init_inputs on a per kernel basis.
The conversion script sticks to outputting just linalg for now. As it does this, it does do some basic post-processing of torch-mlir's output, namely it runs the -linalg-specialize-generic-ops pass.