Skip to content

Conversation

ericspod
Copy link
Member

@ericspod ericspod commented Apr 25, 2025

Description

This will update MONAI to be compatible with PyTorch 2.7.1. There appear to be few code changes with this release so hopefully this will be simply a matter of updating versions. The versions tested in the actions are now fixed to explicit version strings rather than including "latest" as a PyTorch version, this will avoid new breaking versions being released and rendering PRs unmergable.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Summary by CodeRabbit

Summary by CodeRabbit

  • Bug Fixes

    • Implemented a platform-specific workaround to address issues with PyTorch's grid_sample on Windows with float64 CPU tensors, ensuring correct behavior in spatial transforms and resampling.
    • Suppressed unnecessary error messages during expected test failures to improve test output clarity.
    • Adjusted test tolerances on Windows to account for platform-specific numerical differences.
  • Chores

    • Updated dependency requirements to allow newer versions of PyTorch, with special handling for Windows.
    • Restricted the pytype version range for development environments.
    • Improved test configuration to include the latest PyTorch version in continuous integration.
    • Refined dependency version constraints to support PyTorch 2.7.1 and beyond.

Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod ericspod requested review from Copilot, Nic-Ma and KumoLiu April 25, 2025 11:38
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates MONAI’s torch dependency to a newer version aiming for improved compatibility with PyTorch. The changes include:

  • Increasing the minimum torch version in pyproject.toml from 2.3.0 to 2.4.1.
  • Updating the installation command in the GitHub Actions workflow (pythonapp.yml) accordingly.
  • Modifying the torch version matrix in the minimal workflow (pythonapp-min.yml).

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 3 comments.

File Description
pyproject.toml Updated minimum torch dependency and Black target versions
.github/workflows/pythonapp.yml Updated torch installation command to new dependency
.github/workflows/pythonapp-min.yml Revised torch version matrix for testing
Files not reviewed (2)
  • docs/requirements.txt: Language not supported
  • setup.cfg: Language not supported

Signed-off-by: Eric Kerfoot <[email protected]>
Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod
Copy link
Member Author

It's possible the CPU provided by the Windows runner is too old for PyTorch 2.7 which may now require instructions it doesn't have.

@ericspod
Copy link
Member Author

The issue with Windows appears to be related to float 64 calculations, specifically with RandRotate in tests\integration\test_pad_collation.py. This doesn't appear to be pad collation related and goes away if float 32 is used as the dtype. I'm investigating further.

@ericspod
Copy link
Member Author

I think I've traced the issue to apparently a bug in grid_sample, I've raised an issue here on the PyTorch repo.

@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 29, 2025

I think I've traced the issue to apparently a bug in grid_sample, I've raised an issue here on the PyTorch repo.

Thank you for looking into this! Instead of waiting for a fix from PyTorch, do you think it's possible to implement a workaround by altering the dtype for Windows operating systems?

@ericspod
Copy link
Member Author

do you think it's possible to implement a workaround by altering the dtype for Windows operating systems?

I'm looking into that now and will hopefully have something soon. We may have to convert to float32 and back in places so we may have knock-on precision issues.

@KumoLiu
Copy link
Contributor

KumoLiu commented Apr 30, 2025

We may need waiting for the release from torch-tensorrt to support PyTorch2.7.
https://pypi.org/project/torch-tensorrt/#history

@ericspod
Copy link
Member Author

ericspod commented May 1, 2025

Hi @KumoLiu this got through the Windows tests now. I raised the issue with PyTorch so hopefully version 2.7.1 will resolve the issue, in the meantime we can run the blossom tests and discuss whether to merge this.

Signed-off-by: Eric Kerfoot <[email protected]>
@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

Error log:

[2025-05-02T05:07:25.738Z]   Attempting uninstall: setuptools
[2025-05-02T05:07:25.738Z]     Found existing installation: setuptools 45.2.0
[2025-05-02T05:07:25.738Z]     Uninstalling setuptools-45.2.0:
[2025-05-02T05:07:25.738Z]       Successfully uninstalled setuptools-45.2.0
[2025-05-02T05:08:04.452Z] ERROR: Exception:
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/py_compile.py", line 144, in compile
[2025-05-02T05:08:04.452Z]     code = loader.source_to_code(source_bytes, dfile or file,
[2025-05-02T05:08:04.452Z]   File "<frozen importlib._bootstrap_external>", line 918, in source_to_code
[2025-05-02T05:08:04.452Z]   File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pytype/tools/merge_pyi/test_data/parse_error.py", line 2
[2025-05-02T05:08:04.452Z]     def f(*): pass
[2025-05-02T05:08:04.452Z]            ^
[2025-05-02T05:08:04.452Z] SyntaxError: named arguments must follow bare *
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] During handling of the above exception, another exception occurred:
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/compileall.py", line 238, in compile_file
[2025-05-02T05:08:04.452Z]     ok = py_compile.compile(fullname, cfile, dfile, True,
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/py_compile.py", line 150, in compile
[2025-05-02T05:08:04.452Z]     raise py_exc
[2025-05-02T05:08:04.452Z] py_compile.PyCompileError:   File "/usr/local/lib/python3.9/dist-packages/pytype/tools/merge_pyi/test_data/parse_error.py", line 2
[2025-05-02T05:08:04.452Z]     def f(*): pass
[2025-05-02T05:08:04.452Z]            ^
[2025-05-02T05:08:04.452Z] SyntaxError: named arguments must follow bare *
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] During handling of the above exception, another exception occurred:
[2025-05-02T05:08:04.452Z] 
[2025-05-02T05:08:04.452Z] Traceback (most recent call last):
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/base_command.py", line 105, in _run_wrapper
[2025-05-02T05:08:04.452Z]     status = _inner_run()
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/base_command.py", line 96, in _inner_run
[2025-05-02T05:08:04.452Z]     return self.run(options, args)
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/cli/req_command.py", line 68, in wrapper
[2025-05-02T05:08:04.452Z]     return func(self, options, args)
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/commands/install.py", line 459, in run
[2025-05-02T05:08:04.452Z]     installed = install_given_reqs(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/req/__init__.py", line 83, in install_given_reqs
[2025-05-02T05:08:04.452Z]     requirement.install(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/req/req_install.py", line 867, in install
[2025-05-02T05:08:04.452Z]     install_wheel(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/operations/install/wheel.py", line 728, in install_wheel
[2025-05-02T05:08:04.452Z]     _install_wheel(
[2025-05-02T05:08:04.452Z]   File "/usr/local/lib/python3.9/dist-packages/pip/_internal/operations/install/wheel.py", line 614, in _install_wheel
[2025-05-02T05:08:04.452Z]     success = compileall.compile_file(path, force=True, quiet=True)
[2025-05-02T05:08:04.452Z]   File "/usr/lib/python3.9/compileall.py", line 255, in compile_file
[2025-05-02T05:08:04.452Z]     msg = err.msg.encode(sys.stdout.encoding,
[2025-05-02T05:08:04.452Z] TypeError: encode() argument 'encoding' must be str, not None
[2025-05-02T05:08:04.452Z] 

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod
Copy link
Member Author

ericspod commented May 2, 2025

The blossom issue is related to the current pytype version, so I've added <=2024.4.11 to the requirements-dev.txt file for it.

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented May 2, 2025

The blossom issue is related to the current pytype version, so I've added <=2024.4.11 to the requirements-dev.txt file for it.

Seems related to the new version of the pip: https://pypi.org/project/pip/#history
I tried downgrade it to 25.0.1, then it works.

raise issue here: google/pytype#1909

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
.github/workflows/pythonapp-min.yml (1)

154-158: pip install torch ignores the Windows exclusion and the CPU-only wheel source

In the “min-dep-pytorch” job we fall back to a bare pip install torch when the matrix value is latest.
Two problems:

  1. The wheel resolver may pick up a CUDA build, defeating the “CPU-only” objective used elsewhere (--index-url …/cpu).
  2. Although this job runs on Linux, the same pattern appears in the min-dep-os job for Windows (line 57) and will happily install 2.7.0, bypassing the !=2.7.0 guard you just added in requirements.txt.
-  python -m pip install torch
+  # CPU-only latest build, respecting platform pins
+  python -m pip install "torch!=2.7.0" --index-url https://download.pytorch.org/whl/cpu

Apply the same logic to the Windows path (line 57) to honour the exclusion.

♻️ Duplicate comments (2)
pyproject.toml (1)

5-5: Consider raising minimum version to align with PR objectives.

The upper bound removal enables PyTorch 2.7 compatibility as intended. However, the previous review comment raises a valid point about potentially updating the minimum version to 2.7.0 since this PR specifically targets PyTorch 2.7 updates.

Please confirm whether the minimum version should remain at 2.4.1 for backward compatibility or be raised to 2.7.0 to match the PR's focus on PyTorch 2.7 updates.

.github/workflows/pythonapp-min.yml (1)

127-127: Still missing explicit 2.7.0 in the test matrix

The PR title claims PyTorch 2.7 support, yet the matrix lists the pinned versions up to 2.6.0 plus latest. Relying on latest to pick up 2.7.0 is brittle (as soon as 2.7.1 lands, 2.7.0 will silently drop out of CI).

Add an explicit 2.7.0 entry and keep latest for forward-looking coverage.

🧹 Nitpick comments (1)
requirements.txt (1)

1-2: Two overlapping torch lines – consolidate for clarity

Having both a global torch>=2.4.1 and a Windows-specific exclusion works, but it is easy to mis-read and invites drift. Consider collapsing into a single, self-documenting marker:

-torch>=2.4.1
-torch>=2.4.1, !=2.7.0; platform_system == "Windows"
+torch>=2.4.1; platform_system != "Windows"
+torch>=2.4.1,!=2.7.0; platform_system == "Windows"

Same behaviour, clearer intent.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e499362 and e2ec553.

📒 Files selected for processing (10)
  • .github/workflows/pythonapp-min.yml (1 hunks)
  • docs/requirements.txt (1 hunks)
  • monai/networks/layers/spatial_transforms.py (3 hunks)
  • monai/transforms/spatial/array.py (2 hunks)
  • pyproject.toml (1 hunks)
  • requirements-dev.txt (1 hunks)
  • requirements.txt (1 hunks)
  • setup.cfg (1 hunks)
  • tests/integration/test_pad_collation.py (3 hunks)
  • tests/lazy_transforms_utils.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration/test_pad_collation.py (1)
monai/utils/misc.py (1)
  • set_determinism (335-393)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (19)
  • GitHub Check: min-dep-pytorch (2.5.1)
  • GitHub Check: min-dep-pytorch (2.4.1)
  • GitHub Check: min-dep-pytorch (latest)
  • GitHub Check: min-dep-os (windows-latest)
  • GitHub Check: min-dep-pytorch (2.6.0)
  • GitHub Check: min-dep-os (macOS-latest)
  • GitHub Check: min-dep-py3 (3.9)
  • GitHub Check: min-dep-os (ubuntu-latest)
  • GitHub Check: min-dep-py3 (3.10)
  • GitHub Check: min-dep-py3 (3.12)
  • GitHub Check: min-dep-py3 (3.11)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: build-docs
  • GitHub Check: packaging
  • GitHub Check: flake8-py3 (codeformat)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: flake8-py3 (mypy)
  • GitHub Check: quick-py3 (windows-latest)
🔇 Additional comments (12)
tests/integration/test_pad_collation.py (2)

14-14: LGTM: Import changes are well-justified.

The new imports (os, redirect_stderr, first) are all properly used in the test method to suppress error output during expected failures.

Also applies to: 17-17, 40-40


102-104: LGTM: Improved test clarity with proper error suppression.

The change to use first(loader_fail) with stderr suppression is a good improvement. It makes the test more explicit while preventing PyTorch error messages from cluttering the test output during expected failures.

tests/lazy_transforms_utils.py (2)

14-14: LGTM: Import is properly used for platform detection.

The sys import is correctly used for Windows platform detection in the tolerance adjustment logic.


67-71: LGTM: Reasonable workaround for PyTorch Windows compatibility.

The platform-specific tolerance adjustment is a sensible temporary fix for the known PyTorch numerical precision issues on Windows with float64 CPU tensors. The increased tolerances (1e-4) are still reasonable for testing purposes.

Please monitor the referenced GitHub issue and remove this workaround once PyTorch fixes the underlying grid_sample issue.

monai/transforms/spatial/array.py (2)

17-17: LGTM - Sys import added for platform detection

The sys import is correctly added to support the Windows-specific workaround implementation.


2111-2134: Well-implemented workaround for Windows PyTorch grid_sample bug

The implementation correctly addresses the known PyTorch issue with float64 tensors on Windows CPU by:

  1. Targeted detection: Only applies the workaround when all conditions are met (Windows platform, float64 dtype, CPU device)
  2. Minimal scope: Conversions are applied only to the specific tensors passed to grid_sample
  3. Proper restoration: Converts the output back to float64 to maintain expected data types
  4. Clean implementation: The workaround is clearly documented and doesn't affect other code paths

The logic follows the pattern described in the PR objectives and should resolve the Windows test failures without impacting other platforms or data types.

monai/networks/layers/spatial_transforms.py (3)

14-14: LGTM: Import added for platform detection.

The sys import is correctly placed and necessary for the Windows-specific workaround implementation.


530-535: LGTM: Well-implemented platform-specific workaround.

The workaround logic correctly identifies the specific conditions (Windows + float64 + CPU) that trigger the PyTorch grid_sample issue. The inline comment provides good context with a reference to the related PR.


591-603: LGTM: Proper workaround implementation with precision considerations.

The implementation correctly handles the dtype conversion to work around the PyTorch grid_sample issue on Windows. The conversion back to float64 maintains API consistency, though there may be minor precision implications as noted in the PR discussion.

This is a temporary workaround for PyTorch issue #152385. Consider monitoring for PyTorch 2.7.1 release to remove this workaround when the upstream issue is fixed.

requirements-dev.txt (1)

23-23: LGTM: Version constraint addresses build failures.

The upper bound constraint on pytype (<=2024.4.11) appropriately addresses the build failures mentioned in the PR discussion. The platform-specific exclusion for Windows is also correct.

docs/requirements.txt (1)

2-2: LGTM: Consistent torch version constraint relaxation.

The removal of the upper bound constraint aligns with the PR's objective to enable PyTorch 2.7 compatibility for documentation builds.

setup.cfg (1)

45-45: LGTM: Consistent torch version constraint relaxation.

The removal of the upper bound constraint (<2.7.0) enables PyTorch 2.7 compatibility and is consistent with similar changes across other configuration files.

@ericspod
Copy link
Member Author

Hi @KumoLiu this should pass tests now so we can run blossom and merge. Thanks!

@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 21, 2025

/build

@ericspod
Copy link
Member Author

Hi @KumoLiu something didn't work with blossom, could you share the logs please?

@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Jul 22, 2025
@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 22, 2025

/build

@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 22, 2025

Hi @ericspod, looks like a timeout issue, just retrigger the tests.

@KumoLiu
Copy link
Contributor

KumoLiu commented Jul 23, 2025

/build

@KumoLiu KumoLiu enabled auto-merge (squash) July 23, 2025 22:25
@KumoLiu KumoLiu merged commit 8ee3f89 into Project-MONAI:dev Jul 23, 2025
27 checks passed
@ericspod ericspod deleted the torch27 branch July 24, 2025 11:00
@ogencoglu
Copy link

I get this from my uv sync:

  × No solution found when resolving dependencies for split (python_full_version == '3.12.10' and sys_platform == 'darwin'):
  ╰─▶ Because monai==1.5.0 depends on torch>=2.4.1,<2.7.0 and your project depends on monai==1.5.0, we can conclude that your project depends on torch>=2.4.1,<2.7.0.
      And because your project depends on torch==2.7.1, we can conclude that your project's requirements are unsatisfiable.

Am I missing something?

@ericspod
Copy link
Member Author

I get this from my uv sync:

  × No solution found when resolving dependencies for split (python_full_version == '3.12.10' and sys_platform == 'darwin'):
  ╰─▶ Because monai==1.5.0 depends on torch>=2.4.1,<2.7.0 and your project depends on monai==1.5.0, we can conclude that your project depends on torch>=2.4.1,<2.7.0.
      And because your project depends on torch==2.7.1, we can conclude that your project's requirements are unsatisfiable.

Am I missing something?

This doesn't look like the updated requirements that this PR has changed. MONAI 1.5 won't have this PR integrated so you will have to install MONAI from source. For Darwin the requirement is only torch>=2.4.1.

@MMelQin
Copy link

MMelQin commented Sep 15, 2025

I get this from my uv sync:

  × No solution found when resolving dependencies for split (python_full_version == '3.12.10' and sys_platform == 'darwin'):
  ╰─▶ Because monai==1.5.0 depends on torch>=2.4.1,<2.7.0 and your project depends on monai==1.5.0, we can conclude that your project depends on torch>=2.4.1,<2.7.0.
      And because your project depends on torch==2.7.1, we can conclude that your project's requirements are unsatisfiable.

Am I missing something?

This doesn't look like the updated requirements that this PR has changed. MONAI 1.5 won't have this PR integrated so you will have to install MONAI from source. For Darwin the requirement is only torch>=2.4.1.

Hi @ericspod do we have a target date for releasing a v1.5 update or the v1.6?

Asking this because the torch v2.6 that's installed as part of pip install monai has the default bundled cuda runtime at v12.4.127 while my app needs cuda runtime >=12.6 but it finds the lower version of the cuda runtime lib.so for torch at <venv>/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12.

I have a few ways to work around, but would love to see a new version of MONAI with your updates soon:

  • pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 then install monai, or
  • remove the cuda runtime lib.so from the site packages folder post monai installation so that torch will use the system's cuda runtime lib, or
  • pip install --upgrade nvidia-cuda-runtime-cu12 post monai installation but this raises an error torch 2.6.0 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.9.79 which is incompatible.

Thanks,
Ming

@ericspod
Copy link
Member Author

Hi @ericspod do we have a target date for releasing a v1.5 update or the v1.6?

Asking this because the torch v2.6 that's installed as part of pip install monai has the default bundled cuda runtime at v12.4.127 while my app needs cuda runtime >=12.6 but it finds the lower version of the cuda runtime lib.so for torch at <venv>/lib/python3.10/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12.

I have a few ways to work around, but would love to see a new version of MONAI with your updates soon:

* `pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126` then install monai, or

* remove the cuda runtime lib.so from the site packages folder post monai installation so that `torch` will use the system's cuda runtime lib, or

* `pip install --upgrade nvidia-cuda-runtime-cu12` post monai installation but this raises an error `torch 2.6.0 requires nvidia-cuda-runtime-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-runtime-cu12 12.9.79 which is incompatible.`

Thanks, Ming

Hi @MMelQin we're working on a 1.5.1 release to include support for PyTorch 2.8. This will be out shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants