Skip to content

Conversation

@nitsanluke
Copy link
Contributor

✨ Description

This PR updates base image to pytorch:25.02-py3 (torch 2.7 and cuda 12.8)

πŸ” Type of change

Select all that apply:

  • πŸ› Bug fix (non-breaking change that addresses a specific issue)
  • πŸš€ New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • πŸ“ˆ Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • πŸ› οΈ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • πŸ“¦ Dependency bump (updates dependencies, including Dockerfile or package changes)
  • πŸ“ Documentation change (updates documentation, including new content or typo fixes)
  • πŸ”§ Infrastructure/Build change (affects build process, CI/CD, or dependencies)

πŸ“ Changes

List the key changes introduced in this PR:

  1. Change A
  2. Change B

βœ… Checklist

Make sure the following tasks are completed before submitting the PR:

General

  • πŸ“œ I have read and followed the contributing guidelines.
  • 🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
  • πŸŽ‰ The functionality is complete, and I have tested the changes.
  • πŸ“ I have updated the documentation if needed.
  • ⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
  • 🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

  • πŸ‹ I have updated the Docker configuration or dependencies, if applicable.
  • πŸ”„ I have ensured compatibility with the existing setup after dependency changes.

Testing

  • πŸ§ͺ I have added or updated tests to cover my changes.
  • βœ”οΈ New and existing tests pass locally with my changes.
  • 🚦 I have tested these changes on GPUs and verified training stability.
  • πŸ‹οΈ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

  • πŸ“Š I have run benchmarks where applicable to evaluate the performance impact.
  • βœ… The benchmarks show no performance regression.
  • πŸš€ The benchmarks indicate a potential performance improvement.
  • ⚠️ The benchmarks indicate a potential performance degradation.
  • πŸ“ˆ I have provided benchmark results and detailed any performance impact below, if applicable.

πŸ“Š Performance Impact Details

flash-attn is not fully implemented so we won't be getting the best speedups yet.

@nitsanluke nitsanluke changed the title Updating base image to pytorch:25.02-py3 to support B200 GPUs WIP: Updating base image to pytorch:25.02-py3 to support B200 GPUs Jun 10, 2025
CORE =
# Available through the nvidia base image
torch>=2.5.0
torch>=2.7.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work, 2.7.0a0+6c54963f75 is not >=2.7.0

@@ -1,5 +1,5 @@
# syntax=docker/dockerfile:1.7-labs
FROM nvcr.io/nvidia/pytorch:24.11-py3
FROM nvcr.io/nvidia/pytorch:25.02-py3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do 25.05

@jlamypoirier
Copy link
Collaborator

jlamypoirier commented Jun 11, 2025

See #295. I got past the pytest version issue and got mamba to compile, but I'm still getting other build issues.

@nitsanluke nitsanluke closed this Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants