Skip to content

Update opset imports in version_converter #2295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

shubhambhokare1
Copy link
Contributor

@shubhambhokare1 shubhambhokare1 commented May 12, 2025

  • Update version converter logic to only up-convert if all nodes in the graph can be successfully upconverted to that opset version
  • Assign opset import to be the highest opset version to which all the nodes were able to be successfully converted to.

Copy link

codecov bot commented May 12, 2025

❌ 8 Tests Failed:

Tests completed Failed Passed Skipped
14516 8 14508 1880
View the top 3 failed test(s) by shortest run time
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0525_test_layer_normalization_4d_axis0
Stack Traces | 0.004s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_layer_normalization_4d_axis0'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_layer_normalization_4d_axis0' (e=No module named 'tests.onnx_backend_test_code.test_layer_normalization_4d_axis0') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_layer_normalization_4d_axis0.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_layer_normalization_4d_axis0.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT
E   from onnxscript.onnx_opset import opset17
E   
E   @script()
E   def bck_test_layer_normalization_4d_axis0(X: FLOAT[2,3,4,5], W: FLOAT[2,3,4,5], B: FLOAT[2,3,4,5]) -> (FLOAT[2,3,4,5], FLOAT[1,1,1,1], FLOAT[1,1,1,1]):
E       Y, Mean, InvStdDev = opset17.LayerNormalization(X, W, B, axis=0)
E       return Y, Mean, InvStdDev
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0847_test_reduce_log_sum_exp_default_axes_keepdims_random
Stack Traces | 0.004s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_reduce_log_sum_exp_default_axes_keepdims_random'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_reduce_log_sum_exp_default_axes_keepdims_random' (e=No module named 'tests.onnx_backend_test_code.test_reduce_log_sum_exp_default_axes_keepdims_random') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_reduce_log_sum_exp_default_axes_keepdims_random.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_reduce_log_sum_exp_default_axes_keepdims_random.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import DOUBLE, INT64
E   from onnxscript.onnx_opset import opset18
E   
E   @script()
E   def bck_test_reduce_log_sum_exp_default_axes_keepdims_random(data: DOUBLE[3,2,2], axes: INT64[0]) -> (DOUBLE[1,1,1]):
E       reduced = opset18.ReduceLogSumExp(data, axes, keepdims=1)
E       return reduced
onnxscript.backend.onnx_export_test.TestOnnxBackEnd::test_export2python_produces_correct_onnx_script_model_0918_test_reduce_sum_square_do_not_keepdims_example
Stack Traces | 0.004s run time
onnxscript\backend\onnx_export_test.py:137: in extract_functions
    mod = importlib.import_module(import_name)
C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'tests.onnx_backend_test_code.test_reduce_sum_square_do_not_keepdims_example'

The above exception was the direct cause of the following exception:
.nox\test\Lib\site-packages\parameterized\parameterized.py:620: in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
onnxscript\backend\onnx_export_test.py:271: in test_export2python_produces_correct_onnx_script_model
    functions = extract_functions(backend_test.name, code, self.test_folder)
onnxscript\backend\onnx_export_test.py:139: in extract_functions
    raise AssertionError(
E   AssertionError: Unable to import 'tests.onnx_backend_test_code.test_reduce_sum_square_do_not_keepdims_example' (e=No module named 'tests.onnx_backend_test_code.test_reduce_sum_square_do_not_keepdims_example') (file: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_reduce_sum_square_do_not_keepdims_example.py', absolute path: 'D:\\a\\onnxscript\\onnxscript\\tests\\onnx_backend_test_code\\test_reduce_sum_square_do_not_keepdims_example.py', current folder: D:\a\onnxscript\onnxscript
E   ---- CONTENT --
E   import numpy
E   from onnx import TensorProto
E   from onnx.helper import make_tensor
E   from onnxscript import script, external_tensor
E   from onnxscript.values import Opset
E   from onnxscript.onnx_types import FLOAT, INT64
E   from onnxscript.onnx_opset import opset18
E   
E   @script()
E   def bck_test_reduce_sum_square_do_not_keepdims_example(data: FLOAT[3,2,2], axes: INT64[1]) -> (FLOAT[3,2]):
E       reduced = opset18.ReduceSumSquare(data, axes, keepdims=0)
E       return reduced

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@shubhambhokare1 shubhambhokare1 marked this pull request as ready for review May 12, 2025 19:04
Comment on lines 230 to 235
if domain == "":
model_or_function.opset_imports[domain] = max(version, current_version)
return
elif domain == "ai.onnx":
model_or_function.opset_imports[domain] = max(version, current_version)
return
Copy link
Collaborator

@justinchuby justinchuby May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since "" and "ai.onnx" is the same, they should share the same opset import. In practice it may be good to assert that we don't have any "ai.onnx" nodes if handling both domains over complicates the logic

Copy link
Contributor Author

@shubhambhokare1 shubhambhokare1 May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2283 should handle this, pre-conversion? Then it makes sense just to check for "" opset imports

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm ... I guess #2283 does NOT do it for opset-imports, that remains a TODO, is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we need to create a new data structure for opset imports to do this, which is a todo.

@@ -320,6 +352,8 @@ def visit_model(self, model: ir.Model) -> None:
return None
self.model_version = model_version
self.visit_graph(model.graph)
# Finally, update the opset imports for the model
self._update_opset_imports(model)
Copy link
Collaborator

@justinchuby justinchuby May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this logic, if a user request opset 22, but all nodes were updated to opset 21 max, is it still possible for the model to be set at 22? Would it make sense to set the opset import filed directly to the target opset, if the conversion succeeds?

Copy link
Contributor Author

@shubhambhokare1 shubhambhokare1 May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all nodes are upconverted to target_version. If no adapter is registered for a particular node, only the node.version is updated to be =target_version.

Now, the real issue would arise, in situations in which up_conversion for a node is not possible, let's take the example for GroupNorm with no bias, if no bias was provided the up conversion from 20->21 would not possible, meaning we would have a model, where GroupNorm is version 20, but all other ops are 21.

I think we need a rethink of the version conversion logic, instead of iterating (each node, then up-converting verA->verB->verC) we would have to switch the loop order to be (verA->verB, iterate each node, verB->verC, iterate each node, and so on), maintaining a copy of the original graph pre-conversion for that opset during each up conversion from opset -> opset +1, if any conversion fails, we fall back to the graph from before that up conversion, so all the nodes are of a particular opset. Once this process is done, we assign the model_imports to be whatever the last successful up conversion was.

Marking the PR as draft as I flesh out/try to implement this new logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. I think it will also help to verify/document the global-strategy. I think some of the earlier complications stem back to how onnxscript functions are registered for torchlib ops. Suppose I register an onnxscript function for torch.gelu targetting onnx opset 20. Suppose the user requests onnx opset 21. If we use the abovementioned onnxscript function, will it generate the decomposition with calls to opset 20 or 21? I assume that they will be 20? And will need to be version-converted to 21?

So, in general: the assumption is that the model we start with (before version-conversion) could contain calls to several different onnx opsets (say both 20 and 21), with the assumption that they will be typically <= the target version.

What is the self.model_version's initial value in your loop? How is it determined?

@shubhambhokare1 shubhambhokare1 marked this pull request as draft May 12, 2025 19:17
@shubhambhokare1 shubhambhokare1 force-pushed the sbhokare/opset-import branch from 1ca1bd2 to 13e7f05 Compare May 12, 2025 20:09
@shubhambhokare1 shubhambhokare1 marked this pull request as ready for review May 12, 2025 20:15
@shubhambhokare1 shubhambhokare1 force-pushed the sbhokare/opset-import branch from 890a6de to 3950910 Compare May 12, 2025 20:32
# Return non-converted graph if any node fails to convert.
for node in graph:
up_conversion = True
if node.version is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if it is an onnx domain op first: else skip, and don't do anything (even updating node.version).


# TODO(shubhambhokare1): Support down-conversion
while self.model_version < self.target_version:
pre_conversion_graph = copy.copy(graph)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinchuby : how expensive is copy? Specifically, what does it do for large tensors?

Copy link
Collaborator

@justinchuby justinchuby May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't copy graphs yet. This won't work (sorry)

I intend to implement graph.duplicate(). But I have not done it. cc @titaiwangms who has a similar need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinchuby What would be the correct way to store pre-conversion state graphs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe implement a proper graph.duplicate()? Before that maybe aborting on failures is acceptable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gramalingam
Copy link
Collaborator

My understanding is this:

  • The (python) IR allows different nodes in the graph to call different opset versions (of the standard domain)
  • The models generated by torchlib may actually have such nodes (with different opset versions). (We may be able to change this, but that will require some changes in torchlib. And would have some disadvantages.)

Hence:

  • The version-converter should support input models where different nodes may use different opset versions.
  • It will be unable to guarantee that it will be able to produce a model where all nodes use same opset-version.
  • I suggest settling for a weaker guarantee: we migrate each node to the opset closest to the target-opset that we can get to, which however might not be the target-opset or even be the same as for other nodes. We return a boolean or status indicating where we were unsuccessful. (Ideally, we should map the error back to a torchlib-registered function, which likely needs a new version for failing-opset-version). The user can also try conversion with other opset versions if they want.
  • This will not require creating a copy of the graph.

The goal of producing a model with same opset-version through-out will require extra work anyway.

@justinchuby : does this sound reasonable?

@justinchuby
Copy link
Collaborator

That sounds reasonable to me!

gramalingam added a commit that referenced this pull request May 21, 2025
Redo of PR #2295 as
discussed there.

* Ensure opset_imports is updated when version converter is applied

TODO (in a separate PR):
* Cleanup error status API (and return value)

---------

Signed-off-by: Ganesan Ramalingam <[email protected]>
@titaiwangms
Copy link
Contributor

Replaced by #2318

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants