-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] 'Invalidate trace cache' with Seq2SeqTrainer+predict_with_generate+Zero3 #5662
Comments
Likely fixed by #7039 |
Yejing-Lai
pushed a commit
to Yejing-Lai/DeepSpeed
that referenced
this issue
Feb 24, 2025
Make trace cache warnings configurable, and disabled by default. Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662 --------- Signed-off-by: Olatunji Ruwase <[email protected]>
gyou2021
pushed a commit
to gyou2021/DeepSpeed
that referenced
this issue
Feb 28, 2025
Make trace cache warnings configurable, and disabled by default. Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662 --------- Signed-off-by: Olatunji Ruwase <[email protected]> Signed-off-by: gyou2021 <[email protected]>
tohtana
pushed a commit
that referenced
this issue
Feb 28, 2025
Make trace cache warnings configurable, and disabled by default. Fix #6985, #4081, #5033, #5006, #5662 --------- Signed-off-by: Olatunji Ruwase <[email protected]> Signed-off-by: Masahiro Tanaka <[email protected]>
ys950902
pushed a commit
to ys950902/DeepSpeed
that referenced
this issue
Mar 6, 2025
Make trace cache warnings configurable, and disabled by default. Fix deepspeedai#6985, deepspeedai#4081, deepspeedai#5033, deepspeedai#5006, deepspeedai#5662 --------- Signed-off-by: Olatunji Ruwase <[email protected]> Signed-off-by: yisheng <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Evaluating transformers Seq2SeqTrainer with 'predict_with_generate=True' results in 'Invalidate trace cache' warnings.
The warnings appear inside the prediction_step of the Seq2SeqTrainer. Twice during each prediction_step:
Here:
generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
and here:
outputs = model(**inputs)
The error messages:
Call Stack:
MyTrainer.evaluate->Trainer.evaluate->Trainer.evaluation_loop->Seq2SeqTrainer.prediction_step->'Invalidate Trace Cache'
To Reproduce
I built a simple script to reproduce the error. A little bit of background first:
Seq2SeqTrainer.prediction_step has a small check at the beginning:
This means that (1) predict_with_generate=True needs to be in the training args, but also (2) prediction_loss_only needs to be None or False. Otherwise we wouldn't acutally predict with generate. prediction_loss_only will be automaticly set to True by trainer.evaluate when there are no compute_metrics. Thats why compute_metrics are included in the script. Note: We could also subclass the Seq2SeqTrainer.prediction_step and pass prediction_loss_only=False on to the superclass for testing purposes.
I run the script like this:
deepspeed --include localhost:1 main_trainer_simple.py
main_trainer_simple.py:
ds_config3_simple.json:
Expected behavior
I excpected no warnings. The problem also slows down execution. It is currently faster not to use deepspeed.
ds_report output
[2024-06-14 16:46:59,976] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
DeepSpeed C++/CUDA extension op report
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
JIT compiled ops requires ninja
ninja .................. [OKAY]
op name ................ installed .. compatible
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fp_quantizer ........... [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.3
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
DeepSpeed general environment info:
torch install path ............... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/torch']
torch version .................... 2.3.1+cu121
deepspeed install path ........... ['/XXX/miniconda3/envs/seq2seqnew2/lib/python3.12/site-packages/deepspeed']
deepspeed info ................... 0.14.3, unknown, unknown
torch cuda version ............... 12.1
torch hip version ................ None
nvcc version ..................... 12.5
deepspeed wheel compiled w. ...... torch 2.3, cuda 12.5
shared memory (/dev/shm) size .... 503.86 GB
Screenshots

System info (please complete the following information):
OS: Ubuntu 22.04.4 LTS
GPU: 3x RTX A6000 (no difference between single or multi-gpu)
Python version: 3.12.2 | packaged by conda-forge
Transformers version: 4.41.2
Datasets version: 2.20.0
Numpy version: 1.26.4
DeepSpeed version: 0.14.3
Torch version: 2.3.1+cu121
-> All packages are installed within a conda env
Additional context
I also read about deepspeed fastgen/mii, but there is no support for T5 - the model im currently using - yet.
The text was updated successfully, but these errors were encountered: