Open
Description
🐛 Describe the bug
Certain ops in pytorch can internally convert their inputs to contiguous format. Linear, for example, seems to do this. These dim order conversions are reflected in the node metadata (and thus ET dim order), but not in the graph. Our kernels don't include the input dim order conversion and thus fail at runtime.
A slightly contrived example is as follows, though I expect that this is reproable with any op with an ATen kernel that includes an implicit conversion to contiguous, of which there are quite a few.
import torch
class Module(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(10, 10)
def forward(self, x):
return self.linear(x)
inputs = (torch.randn(1,2,3,10).to(memory_format=torch.channels_last),)
module = Module()
ep = torch.export.export(module, inputs)
from executorch.exir import to_edge_transform_and_lower
et_program = to_edge_transform_and_lower(ep).to_executorch()
from executorch.extension.pybindings.portable_lib import _load_for_executorch_from_buffer
et_model = _load_for_executorch_from_buffer(et_program.buffer)
Output:
[tensor_util_portable.cpp:132] Check failed (all_contiguous || all_channels_last): 2 input tensors have different dim orders
[op_expand_copy.cpp:89] Check failed (tensors_have_same_dim_order(self, out)):
[method.cpp:1322] KernelCall failed at instruction 0:1 in operator aten::expand_copy.out: 0x12
[method.cpp:1328] arg 0 with type id 1
[method.cpp:1328] arg 1 with type id 8
[method.cpp:1328] arg 2 with type id 5
[method.cpp:1328] arg 3 with type id 1
[method.cpp:1328] arg 4 with type id 1
Note that the failure actually occurs in expand_copy due to the weird decomp in this case.
Versions
N/A
Metadata
Metadata
Assignees
Type
Projects
Status
To triage