Make gradient checkpointing and offloading per-component by dxqb · Pull Request #1476 · Nerogar/OneTrainer

dxqb · 2026-05-25T16:14:31Z

fixes [Bug]: Activations offloading depends on layer offload fraction #1136 and [Feat]: Separate offload settings for text encoder #980
includes fix: Flux2 UI — sequence length and caption dropout labels no longer overlap #1460 because it was in the way
required for Microsoft Lens, because it has a text encoder that cannot be (easily) offloaded
some general testing
testing of SD VAE training because this PR touches it and it's rarely ever tested

dxqb · 2026-05-25T16:41:29Z

Claude:

activation_offloading should default to False — it was always default-True but only took effect with layer offloading; now it works standalone, so a fresh fine-tune offloads activations out of the box.
"Layer Offload Fraction" is shown for CLIP encoders but is a no-op — the CLIP setup discards its conductor, so the value never drives anything; don't render the field for CLIP (or wire it up).

- BaseAnimaSetup: per-component checkpointing_or_offloading_enabled(), remove weight_list from create_autocast_context / disable_fp16_autocast_context - AnimaFineTune/LoRASetup: latent_caching → image_caching / text_caching - ModelType: add ANIMA to _MODEL_PARTS and supported_training_methods Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…-component) into preview

In the upstream TrainingTab.py (PR #1476), config is stored as self.train_config on the view. In preview's Base*/controller pattern it lives on controller.config. One call site in __setup_stable_diffusion_ui was translated incorrectly during the bec207a merge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ent) into preview

…1476

dxqb · 2026-06-14T18:30:03Z

Claude: Found while testing the preview branch — the caching_threads > 1 + layer-offloading guard in create_data_loader (introduced here, with a TODO: narrow this to the cached components only) is now overly broad:

if config.caching_threads > 1 and any(part.offload_fraction > 0 for part in config.model_part_configs()):
    raise RuntimeError('layer offloading can not be activated if "caching_threads" > 1')

This rejects the config if any part has layer offloading enabled — including transformer/unet/prior/unconditional_transformer, none of which run inside the caching dataloader's worker threads (only the components that actually produce a cache do: text encoder(s) for text caching, VAE for image/latent caching).

In practice, after the per-component split:

Only text_encoder / text_encoder_2 / text_encoder_3 / text_encoder_4 (depending on model_type.model_parts()) expose an "Offload" UI control (__create_offloading_widgets in BaseTrainingTabView.py) — vae does not (__create_vae_frame has no offloading widgets), so vae.offload_fraction is always 0 in practice.
So the check should really be: does any text encoder part that's actually used (and being cached) have offload_fraction > 0?

This means the current check blocks perfectly valid configs — e.g. layer offloading the transformer with caching_threads > 1 — even though that combination is fine, since the transformer's conductor never runs in a caching worker thread.

Suggested narrowing: only check text-encoder parts, e.g.

if config.caching_threads > 1 and any(
    getattr(config, name).offload_fraction > 0
    for name in config.model_type.model_parts()
    if name.startswith("text_encoder")
):
    raise RuntimeError('layer offloading can not be activated for a text encoder if "caching_threads" > 1')

…ent) into preview # Conflicts: # modules/ui/ModelTab.py # modules/ui/TopBar.py # modules/ui/TrainUI.py # modules/ui/TrainingTab.py

Anima, Lens, and Ideogram setup files used the pre-#1476/#1462 4-arg create_autocast_context/disable_fp16_autocast_context (with a weight-dtype list), the old config.gradient_checkpointing.enabled() global check, and the renamed config.latent_caching field. Update them to the current 3-arg autocast helpers, per-part checkpointing via enable_checkpointing_for_*, and config.image_caching/config.text_caching.

…ent) into preview

Add a _MODEL_PARTS table + ModelType.model_parts() as the single source of truth for which components each model type has, keyed by TrainConfig field names, and a ModelType.supported_training_methods() that enumerates every type explicitly, raising on an unknown type rather than defaulting. Collapse ModelTab's per-type __setup_*_ui methods into one __setup_ui that derives the has_* widget flags from model_parts(), and collapse TopBar's per-type training-method dispatch to build its dropdown from supported_training_methods(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Rebased onto centralize-model-type: this is the offloading-only part of split-offload, with the model-composition centralization (ModelType, ModelTab, TopBar) excluded since it already landed separately.

dxqb linked an issue May 25, 2026 that may be closed by this pull request

[Feat]: Separate offload settings for text encoder #980

Open

dxqb added the preview merged in the preview branch label May 29, 2026

dxqb added a commit to TheForgotten69/OneTrainer that referenced this pull request Jun 3, 2026

Merge PR Nerogar#1476 (Make gradient checkpointing and offloading per…

bec207a

…-component) into preview

dxqb added a commit that referenced this pull request Jun 4, 2026

Merge PR #1476 (Make gradient checkpointing and offloading per-compon…

79c68f5

…ent) into preview

dxqb added a commit that referenced this pull request Jun 4, 2026

Remove OffloadingWindow: superseded by per-component offloading from #…

fc363ff

…1476

dxqb mentioned this pull request Jun 6, 2026

on-demand loading of text encoders #1509

Draft

3 tasks

dxqb added a commit that referenced this pull request Jun 14, 2026

Merge PR #1476 (Make gradient checkpointing and offloading per-compon…

bd84842

…ent) into preview # Conflicts: # modules/ui/ModelTab.py # modules/ui/TopBar.py # modules/ui/TrainUI.py # modules/ui/TrainingTab.py

dxqb added a commit that referenced this pull request Jun 19, 2026

Merge PR #1476 (Make gradient checkpointing and offloading per-compon…

804d838

…ent) into preview

dxqb force-pushed the split-offload branch from 2a891b7 to f4b7d50 Compare June 20, 2026 15:46

Add per-component offloading/checkpointing from PR Nerogar#1476

f4b7d50

Rebased onto centralize-model-type: this is the offloading-only part of split-offload, with the model-composition centralization (ModelType, ModelTab, TopBar) excluded since it already landed separately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make gradient checkpointing and offloading per-component#1476

Make gradient checkpointing and offloading per-component#1476
dxqb wants to merge 2 commits into
Nerogar:masterfrom
dxqb:split-offload

dxqb commented May 25, 2026 •

edited

Loading

Uh oh!

dxqb commented May 25, 2026

Uh oh!

dxqb commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dxqb commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dxqb commented May 25, 2026

Uh oh!

dxqb commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dxqb commented May 25, 2026 •

edited

Loading