Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Invalidate trace cache @ step 975: expected module 1003, but got module 1003 #5033

Closed
cbiras opened this issue Jan 30, 2024 · 1 comment
Labels
bug Something isn't working training

Comments

@cbiras
Copy link

cbiras commented Jan 30, 2024

Describe the bug
I am using pytorch lightning with deepspeed zero3, with offload. Each step of the training, I am receiving the following warning: "Invalidate trace cache @ step 975: expected module 1003, but got module 1003"
I saw other issues with the same bug, but none that prints the same module.

@cbiras cbiras added bug Something isn't working training labels Jan 30, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 18, 2025
Make trace cache warnings configurable, and disabled by default. 

Fix #6985, #4081, #5033, #5006, #5662

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
@tjruwase
Copy link
Contributor

Fixed by #7039

Yejing-Lai pushed a commit to Yejing-Lai/DeepSpeed that referenced this issue Feb 24, 2025
Make trace cache warnings configurable, and disabled by default. 

Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this issue Feb 28, 2025
Make trace cache warnings configurable, and disabled by default.

Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: gyou2021 <[email protected]>
tohtana pushed a commit that referenced this issue Feb 28, 2025
Make trace cache warnings configurable, and disabled by default.

Fix #6985, #4081, #5033, #5006, #5662

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this issue Mar 6, 2025
Make trace cache warnings configurable, and disabled by default.

Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662

---------

Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: yisheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants