Skip to content

on-demand loading of text encoders#1509

Draft
dxqb wants to merge 8 commits into
Nerogar:masterfrom
dxqb:ondemand-base
Draft

on-demand loading of text encoders#1509
dxqb wants to merge 8 commits into
Nerogar:masterfrom
dxqb:ondemand-base

Conversation

@dxqb

@dxqb dxqb commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Summary

Text encoders mostly sit in RAM, and are only moved to VRAM for caching and sampling.
This PR introduces a mechanism to not load the text encoder at all, and load it directly from disk onto the GPU whenever it is needed.
This is needed by the Lens model, because it doesn't seem to be possible to move the quantized GTP-OSS encoder between CPU and GPU: microsoft/Lens#11

It might also be useful for other models (to save RAM), but this PR doesn't implement it for any other models.

includes #1476

Test plan

  • pre-commit run --all-files passes
  • Launched the affected UI or script and exercised the change
  • Tested with at least one real preset / config when relevant (note which: Lens)

AI assistance

  • Early AI prototype — opened for discussion, not ready for review

dxqb and others added 4 commits May 25, 2026 18:11
…model composition in ModelType

- Gradient checkpointing and layer offloading are now configured per component
  (text encoder, transformer, VAE) rather than globally
- ModelType centralizes model composition and training method associations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces OnDemandModule, a persistent delegating proxy for text encoders
that must be loaded on demand and freed after use rather than parked on the
CPU temp device. Adds load_on_demand per-component config and four
text_encoder_N_on_demand() resolvers in TrainConfig.

BaseModel.to(device) is removed as an abstract method; release() is now
the sole abstract method for parking a model. Each concrete model reads
self.train_config.temp_device directly. Call sites in modelSetup,
dataLoader, trainer, and SampleWindow are updated to model.release().

Co-Authored-By: dxqb <183307934+dxqb@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dxqb dxqb mentioned this pull request Jun 6, 2026
5 tasks
@dxqb dxqb added the preview merged in the preview branch label Jun 13, 2026
@dxqb

This comment was marked as resolved.

@dxqb

This comment was marked as resolved.

dxqb and others added 2 commits June 17, 2026 20:48
Several models' release() forwarded self.train_config.temp_device (a str)
directly to *_to() methods typed as device: torch.device. This crashes
inside LayerOffloadConductor.to() when layer/block-swap offloading is
enabled, since it accesses device.type. nn.Module.to() tolerates str so
the bug was latent for runs without offloading enabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves conflict in the Flux2 LoRA 8GB preset: keeps this branch's
per-component offload_fraction scheme and drops the superseded
top-level gradient_checkpointing/layer_offload_fraction fields, while
picking up master's dynamic_timestep_shifting addition.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dxqb and others added 2 commits June 18, 2026 00:24
…) rename

The rename to release() in this PR accidentally dropped the eval() call
that used to follow to(temp_device) before caching and before sampling.
Without it, the model stays in train() mode during in-training sampling,
which breaks models whose forward pass branches on self.training (e.g.
HiDream's unpatchify).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
# Conflicts:
#	modules/modelSetup/BaseErnieSetup.py
#	modules/modelSetup/BaseWuerstchenSetup.py
#	modules/util/checkpointing_util.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preview merged in the preview branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant