-
Notifications
You must be signed in to change notification settings - Fork 632
[OnnxToTorch] Fix Resize op when ONNX exports dynamic spatial dims as 0 #4294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[OnnxToTorch] Fix Resize op when ONNX exports dynamic spatial dims as 0 #4294
Conversation
ONNX encodes dynamic spatial dimensions in Resize as 0. This pass just passes the spatial dimensions as it. For interpolate/resize op, [0,0] spatial dimension is not valid proposed size. This change replaces such 0 values with the corresponding runtime dimension from the input tensor, ensuring correct shape propagation in Torch-MLIR and preventing invalid 0-sized dimensions. Signed-off-by: Prashanth Pujar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Can you add a test?
@HalfBloodPrince010 I'm not sure I understand this change. The example you gave in the PR comment generates IR like: module {
func.func @main_graph(%arg0: !torch.vtensor<[?,3,?,?],f32>) -> !torch.vtensor<[?,3,?,?],f32> attributes {torch.onnx_meta.ir_version = 6 : si64, torch.onnx_meta.opset_version = 11 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.7.1"} {
%none = torch.constant.none
%0 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_> : tensor<4xf32>} : () -> !torch.vtensor<[4],f32>
%1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<__1> : tensor<0xf32>} : () -> !torch.vtensor<[0],f32>
%2 = torch.operator "onnx.Resize"(%arg0, %1, %0) {torch.onnx.coordinate_transformation_mode = "half_pixel", torch.onnx.cubic_coeff_a = -7.500000e-01 : f32, torch.onnx.mode = "cubic", torch.onnx.nearest_mode = "floor"} : (!torch.vtensor<[?,3,?,?],f32>, !torch.vtensor<[0],f32>, !torch.vtensor<[4],f32>) -> !torch.vtensor<[?,3,?,?],f32>
return %2 : !torch.vtensor<[?,3,?,?],f32>
}
}
{-#
dialect_resources: {
builtin: {
_: "0x080000000000803F0000803F0000003F0000003F",
__1: "0x08000000"
}
}
#-}
Which converts to: module {
func.func @main_graph(%arg0: !torch.vtensor<[?,3,?,?],f32>) -> !torch.vtensor<[?,3,?,?],f32> attributes {torch.onnx_meta.ir_version = 6 : si64, torch.onnx_meta.opset_version = 11 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "2.7.1"} {
%float5.000000e-01 = torch.constant.float 5.000000e-01
%none = torch.constant.none
%false = torch.constant.bool false
%str = torch.constant.str "cubic"
%0 = torch.prim.ListConstruct %float5.000000e-01, %float5.000000e-01 : (!torch.float, !torch.float) -> !torch.list<float>
%1 = torch.aten.__interpolate.size_list_scale_list %arg0, %none, %0, %str, %false, %none, %false : !torch.vtensor<[?,3,?,?],f32>, !torch.none, !torch.list<float>, !torch.str, !torch.bool, !torch.none, !torch.bool -> !torch.vtensor<[?,3,?,?],f32>
return %1 : !torch.vtensor<[?,3,?,?],f32>
}
} Which uses the scales and not the sizes (as it should, since the sizes arg was not provided in the original model). Calling |
Hi @zjgarvey, thank you for the response. The example I gave was meant to illustrate how ONNX sometimes exports dynamic dims as 0. In the test case from iree-org/iree#19501. When I visualized the onnx exported model produced from the test, I observed the following
This serves as the sizes operand for the Resize Op. Below is the IR from the above linked issue
I now understand from your comment that in ONNX protobuf, dim.dim_value=0 is just a placeholder, and dim.dim_param carries the true dynamic information. In that case, is there a way we can ensure produced %303 which is the sizes doesn't produce [0,0]? |
I'd need a bit more context to help pin down the real crux of the model issue, but if the onnx model is doing 'onnx.Shape' on a dynamic tensor, we should definitely not be getting zeros. The fact that there are literal constant zeros being passed to Resize seems like an export bug, unless I'm misunderstanding the 'onnx.Resize' op functionality. |
understood, you are saying if the spatial dimensions of the proposed sizes here in the But, based on the above IR, the proposedSizes (i.e %303) is constructed from input pixel_values
So, if the input spatial dims are 0, is there a way to enforce Resize Op's proposed sizes are not 0's? More context, https://discord.com/channels/689900678990135345/1398431144210333818 let me know if I have missed anything? |
What I'm saying is that the IR literally concatenates zeros for the proposed sizes. It is evidently not getting them from a 'onnx.Shape' op + gathers + divs or anything like that. |
Thanks @zjgarvey Any pointers on where this shape propagation/or export could be going wrong or which passes would be useful to trace further would be super helpful. I thought we got the 0s because of onnx exports for dynamic axes, which then got propagated. |
I'm not exactly sure. I'd look at exporting a submodule for the problematic IR to get a smaller reproducer to start. Look at the model code in pytorch and the onnx graph torch exports. If that looks bad, it's likely an issue in torch or the model code. If the torch exported onnx model looks good, but the mlir is bad, having a smaller e2e torch->onnx->torch-mlir reproducer to address will be helpful, since the original IR posted in the issue isn't enough to debug the full picture. Some other questions: are we applying onnxruntime optimizations before export in this particular model? If something is going wrong there, that might be an indicator. Are the sample model inputs a reasonable shape? If the sizes end up being <16 for the sample input, is something folding dim//16 to zero? |
ONNX encodes dynamic spatial dimensions in Resize as 0. This pass just passes the spatial dimensions as is for next steps. For interpolate/resize op, [0,0] spatial dimension is not valid proposed size. This change replaces such 0 values with the corresponding runtime dimension from the input tensor, ensuring correct shape propagation in Torch-MLIR and preventing invalid 0-sized dimensions.
This fixes the issue: iree-org/iree#19501
Example for ONNX Model Export
In IREE, one of the test using Hugging Model is failing because the interpolate/resize
dimensions are [0, 0]
But input pixrl_values are all [0, 0 , 0, 0]