Make gradient checkpointing and offloading per-component#1476
Conversation
|
Claude:
|
- BaseAnimaSetup: per-component checkpointing_or_offloading_enabled(), remove weight_list from create_autocast_context / disable_fp16_autocast_context - AnimaFineTune/LoRASetup: latent_caching → image_caching / text_caching - ModelType: add ANIMA to _MODEL_PARTS and supported_training_methods Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-component) into preview
In the upstream TrainingTab.py (PR #1476), config is stored as self.train_config on the view. In preview's Base*/controller pattern it lives on controller.config. One call site in __setup_stable_diffusion_ui was translated incorrectly during the bec207a merge. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude: Found while testing the if config.caching_threads > 1 and any(part.offload_fraction > 0 for part in config.model_part_configs()):
raise RuntimeError('layer offloading can not be activated if "caching_threads" > 1')This rejects the config if any part has layer offloading enabled — including In practice, after the per-component split:
This means the current check blocks perfectly valid configs — e.g. layer offloading the transformer with Suggested narrowing: only check text-encoder parts, e.g. if config.caching_threads > 1 and any(
getattr(config, name).offload_fraction > 0
for name in config.model_type.model_parts()
if name.startswith("text_encoder")
):
raise RuntimeError('layer offloading can not be activated for a text encoder if "caching_threads" > 1') |
…ent) into preview # Conflicts: # modules/ui/ModelTab.py # modules/ui/TopBar.py # modules/ui/TrainUI.py # modules/ui/TrainingTab.py
Anima, Lens, and Ideogram setup files used the pre-#1476/#1462 4-arg create_autocast_context/disable_fp16_autocast_context (with a weight-dtype list), the old config.gradient_checkpointing.enabled() global check, and the renamed config.latent_caching field. Update them to the current 3-arg autocast helpers, per-part checkpointing via enable_checkpointing_for_*, and config.image_caching/config.text_caching.
Add a _MODEL_PARTS table + ModelType.model_parts() as the single source of truth for which components each model type has, keyed by TrainConfig field names, and a ModelType.supported_training_methods() that enumerates every type explicitly, raising on an unknown type rather than defaulting. Collapse ModelTab's per-type __setup_*_ui methods into one __setup_ui that derives the has_* widget flags from model_parts(), and collapse TopBar's per-type training-method dispatch to build its dropdown from supported_training_methods(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rebased onto centralize-model-type: this is the offloading-only part of split-offload, with the model-composition centralization (ModelType, ModelTab, TopBar) excluded since it already landed separately.
fixes [Bug]: Activations offloading depends on layer offload fraction #1136 and [Feat]: Separate offload settings for text encoder #980
includes fix: Flux2 UI — sequence length and caption dropout labels no longer overlap #1460 because it was in the way
required for Microsoft Lens, because it has a text encoder that cannot be (easily) offloaded
some general testing
testing of SD VAE training because this PR touches it and it's rarely ever tested