Skip to content

enable cpu offloading of new pipelines on XPU & use device agnostic empty to make pipelines work on XPU #11671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

yao-matrix
Copy link
Contributor

No description provided.

@@ -193,7 +193,7 @@ def __init__(
def enable_xformers_memory_efficient_attention(self, attention_op: Optional[Callable] = None):
self.decoder_pipe.enable_xformers_memory_efficient_attention(attention_op)

def enable_sequential_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = "cuda"):
def enable_sequential_cpu_offload(self, gpu_id: Optional[int] = None, device: Union[torch.device, str] = None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per discussion in this PR #11288, we change the default to None, so cpu_offloading can work on other accelerators like XPU w/ application code change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with such changes, cases like tests/pipelines/wuerstchen/test_wuerstchen_combined.py::WuerstchenCombinedPipelineFastTests::test_cpu_offload_forward_pass_twice, tests/pipelines/kandinsky2_2/test_kandinsky_combined.py::KandinskyV22PipelineImg2ImgCombinedFastTests::test_cpu_offload_forward_pass_twice can pass on XPU

@yao-matrix yao-matrix marked this pull request as draft June 6, 2025 08:00
@yao-matrix yao-matrix marked this pull request as ready for review June 8, 2025 23:55
@yao-matrix
Copy link
Contributor Author

@a-r-r-o-w @DN6 , pls help review, thx very much.

@DN6
Copy link
Collaborator

DN6 commented Jun 11, 2025

@bot /style

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

device_mod = getattr(torch, self.device.type, None)
if hasattr(device_mod, "empty_cache") and device_mod.is_available():
device_mod.empty_cache() # otherwise we don't see the memory savings (but they probably exist)
empty_device_cache(orig_device_type)

Copy link
Contributor Author

@yao-matrix yao-matrix Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6 , the original code has a bug, the empty_cache will not be executed.
The logic is like: check module's device type w/ self.device.type, at this time, it's "cuda" or "xpu", then it goes to the if scope, then it put module to cpu w/ to, after this, the self.device.type will be "cpu". So, "device_mod" will be torch.cpu, it has no empty_cache, so the following check will not pass, so empty_cache will not be called, so no empty_cache behavior.

I changed the code to make it will empty_cache of device, pls review in case that's not what in your design.

PS. I attached the PR which changes to current code here: #4191

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants