Update fork #1

AleHD · 2025-02-26T15:34:47Z

Opening PR to keep track of upstream changes

* add recommendations for Ascend NPU using flash_attn * update recommend_message_npu Co-authored-by: Marc Sun <[email protected]> --------- Co-authored-by: Marc Sun <[email protected]>

…36395) * fix: prevent model access error during Optuna hyperparameter tuning The `transformers.integrations.integration_utils.run_hp_search_optuna` function releases model memory and sets trainer.model to None after each trial. This causes an AttributeError when subsequent Trainer.train calls attempt to access the model before reinitialization. This is only an issue when `fp16_full_eval` or `bf16_full_eval` flags are enabled. * Update src/transformers/trainer.py Co-authored-by: Marc Sun <[email protected]> --------- Co-authored-by: Marc Sun <[email protected]>

* move `TestAssistedCandidateGeneratorDifferentTokenizers` into a new testing file * refactor * NOTHING. add space to rerun github actions tests * remove it... * `UniversalSpeculativeDecodingGenerator` * Use `UniversalSpeculativeDecodingGenerator` when `generation_config.do_sample=True` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * add `TestGenerateWithDifferentModels` * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * `UniversalSpeculativeDecodingGenerator` * assistant tokenizes only the target's new suffix * formatting * fix code * fix code * formatting * `TestGenerateWithDifferentModels` parameterize on `do_sample` * `AssistantVocabMapping` & `AssistantVocabMappingCache` * formatting * `AssistantToTargetTranslator`: `get_target_input_ids` & `get_target_logits` * improve `_get_assistant_to_target_input_ids` & formatting * renaming * WIP: debugging `min_new_tokens` * fix get_target_ids * fix device issue * fix get_assistant_input_ids * add `TestAssistedCandidateGeneratorDifferentTokenizers` * formatting * `AssistantVocabTranslatorCache` refactor & tests * revert changes in `src/transformers/generation/logits_process.py` * refactor `AssistedCandidateGenerator` * refactor `AssistedCandidateGeneratorDifferentTokenizers` * formatting * refactor `UniversalSpeculativeDecodingGenerator` * fix negative value for max_new_tokens * fix generation length target + attention_mask vs. assistant + attent * fix device * fix negative max_new_tokens bug * fix UAG * minor * formatting * `AssistedCandidateGeneratorDifferentTokenizers` `lookbehind`s init * resolve conflict & formatting * rerun CI tests * remove space... * remove old code * fix candidate_input_ids device * minor * formatting * Fix prepare + apply (#7) * fix prepare + apply * move to cpu * simplity suppress_tokens * fix bugs and refacatoring * device move * handle self.config.vocab_size > len(target_tokenizer.get_vocab()) * no need to normalize in candidate_generator * address Nadav's comments + minor * optimize device move + SuppressTokensLogitsProcessor * AssistantToTargetTranslator, SuppressTokensLogitsProcessor and tokenizers mapping improvements * padding size * padding improvement * fix and simplify get_target_logits * renaming in get_target_logits * minor * add filter_value and suppress_tokens_id * style + rename * remove TODO * restore original SelectTokensLogitsProcessor with modification * fix style * fix _update_past_and_masks and optimize code * remove assistant_vocab_size arg * fix attention_mask * call _prepare_attention_mask also if not has_past_key_values * handling attention mask for first generation * comment * restore test * remove SelectTokensLogitsProcessor * _update_past_and_masks implementation for USD * Add unittests for Universal Assisted generation * fix style * update tests * Remove unused import and fix `test_speculation_depth` test * exclude special and reserved tokens from tokenizer for UAG * mv `test_universal_assisted_generation.py` to `generation/test_candidate_generator.py` * Remove unused imports and fix style using `make style` (#9) * formatting * Swap gated `meta-llama/llama-3.2` with `allenai/llama` (#10) * Fix space sign disagreement (#12) * default values for AssistantToTargetTranslator fileds * fix space sign * minor * fix test + style * Default values for some fields of assistant to target translator (#11) * default values for AssistantToTargetTranslator fileds * fix * add support to empty logit_processors * Update candidate_generator.py (#15) fix typo * BUG fix in _prepare_assistant_input_ids (#14) * fix _prepare_assistant_input_ids * target_to_assistant_input_ids * Update src/transformers/generation/candidate_generator.py Co-authored-by: Nadav Timor <[email protected]> --------- Co-authored-by: Nadav Timor <[email protected]> * typo (`target_to_assistant_input_ids`) * formatting * merge upstream/main * Fix minor review comments (#16) * Fix: `token_ids.to(torch.int64)` (#18) * tok ids to `torch.int64` (reference: https://huggingface.co/docs/transformers.js/en/api/tokenizers) * `LongTensor` * fix dtype * `assistant_input_ids.to(dtype=torch.long)` * Remove unused import from test_candidate_generator.py * Remove unused import from test_candidate_generator.py * Remove `numpy` import * resolve pr comments (#19) * `AssistantToTargetTranslator` docstring * (per gante's comment) `filter_value` and `suppress_tokens_id` to class constants * update `AssistantToTargetTranslator` docstring * (gante's comment) replace `match-case` * formatting * Fix Joao's comments (#21) * remove threading * fix logits_processor * fix test device * fix style (#23) * Move atm (#24) * move AssistantToTargetTranslator * fixup * fix logit_processor * add atm_translator test * refactor test * remove threading from test * add require_torch in tests * move AssistantVocabTranslatorCache + add tests * ruff fix --------- Co-authored-by: jmamou <[email protected]> Co-authored-by: Gaurav <[email protected]> Co-authored-by: Gaurav Jain <[email protected]> Co-authored-by: gauravjain14 <[email protected]>

* fix config * update --------- Co-authored-by: Marc Sun <[email protected]>

* clean code * oups * fix merge * yups * fix if * now you can play * fix shape issue * try non blocking * fix * updates * up * updates * fix most of thetests * update * update * small updates * up * fix the remaining bug? * update * rename when you read from the file * buffer issues * current status * cleanup * properly allocate dumb memory * update a small bug * fix colwise rep issue * fix keep in float 32 that was keeping everything in float 32 * typo * more fixes with keep_in_fp32_modules as we use to serach on it * fix ROPE dtype for TP * remove what's breaking the tests * updates * update and fixes * small cleanup after merging * allocate 2x to be safe * style, auto * update * yup nit * fix * remove slow as fuck torch api :( * work * fixup * update * brting the fix back * fix and update * fixes Co-authored-by: Marc Sun <[email protected]> * updates because some suggestions were wrong 👀 * update? * fuck this bloated function * typo * fix the dumb prefix thing once and forall * fixes here and there * updates * remove prints * fix strict cases * styel * properly fix keys on load! * update * fix base model prefix issue * style * update * fix all? * remoce 1 print * fix the final etsts * fixup * last nits * fix the detach issue which cause a 2x slowdown * fixup * small fixes * ultra nit * fix * fix --------- Co-authored-by: Marc Sun <[email protected]>

fix

* draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft * draft --------- Co-authored-by: ydshieh <[email protected]>

fix permission Co-authored-by: ydshieh <[email protected]>

* fix permission * fix permission --------- Co-authored-by: ydshieh <[email protected]>

fix permission Co-authored-by: ydshieh <[email protected]>

* Skip collecting duplicated weight * format

* test * docstring * prepare distributed cache data * fix cat dim * test mvp * add test checks * like this? * working test and solution * nit * nit * add shape info

* Lazy import libraries in `src/transformers/image_utils.py` * `make fixup` Signed-off-by: Harry Mellor <[email protected]> * Protect imports Signed-off-by: Harry Mellor <[email protected]> --------- Signed-off-by: Harry Mellor <[email protected]>

* cry * trigger --------- Co-authored-by: ydshieh <[email protected]>

* Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * fixed: failing tests * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: amyeroberts <[email protected]> * Addressed comments * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <[email protected]> * add: cardinality loss and make box loss as copy from * change: default for reduction loss is sum * fix: vectorized generate fake box * fix copies * Addressed comments * addressed comments * addressed one-hot * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Sangbum Daniel Choi <[email protected]> * Addressed comments * fixed test * Update src/transformers/models/grounding_dino/modeling_grounding_dino.py * Update tests/models/grounding_dino/test_modeling_grounding_dino.py Co-authored-by: Pavel Iakubovskii <[email protected]> * Starting to fix GroundingDinoLoss and GroundingDinoHungarianMatcher * More updates * More updates * fixed: GroundingDinoLoss * add: cardinality loss and make box loss as copy from * fix copies * Revert "Update tests/models/grounding_dino/test_modeling_grounding_dino.py" This reverts commit aa74c4c57c430e54cc74c414d6269edb65c73e83. * [run-slow] groundigdino * remove nestedtensor * [run-slow] groundig_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * [run-slow] grounding_dino * check * check * add: enconder intermediate outputs to ImageLoss forward * add: GroundingDinoForObjectDetectionLoss in the loss directory * make style * fix the loss function * remove class_reduction since it sum is default * remove class_reduction * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <[email protected]> * simple fix * Update src/transformers/loss/loss_grounding_dino.py Co-authored-by: Pavel Iakubovskii <[email protected]> * minor fix * Update src/transformers/loss/loss_for_object_detection.py --------- Co-authored-by: amyeroberts <[email protected]> Co-authored-by: Sangbum Daniel Choi <[email protected]> Co-authored-by: Pavel Iakubovskii <[email protected]> Co-authored-by: sangbumchoi <[email protected]> Co-authored-by: ydshieh <[email protected]>

* Fix loading model with mismatched sizes * trigger tests

bug fix

* refactor image processor slow got ocr * add working image processor fast * fix fast image processor, update doc * use one big loop for processing patches

* fix * style * better allocation * fix * fix * style * revert disk * exit * style * return if nothing to cache * dtensor guard * fix regressiion * fix regression * fix * fix

* Fix _load_state_dict_into_meta_model with device_map=None * Update src/transformers/modeling_utils.py

* Check if fixes * Fix zero3 loading * Quality * Fix marc nit * Add fast tests * Migrate to integrations.deepspeed rather than modeling_utils * Style

* fix * repush --------- Co-authored-by: ydshieh <[email protected]>

transformers/image_processing_utils.py:41: UserWarning: The following named arguments are not valid for `SamImageProcessor.preprocess` and were ignored: 'point_pad_value'

* fix regression * fix param * fix load_state_dict * style * better fix for module * fix tests * quick fix for now * rm print

Co-authored-by: Matt <[email protected]>

chore: fix messagedescriptions in arguments and comments

* Fix pipeline-peft interaction * once again you have committed a debug breakpoint * Remove extra testing line * Add a test to check adapter loading * Correct adapter path * make fixup * Remove unnecessary check * Make check a little more stringent

* Fix edge case for continue_final_message * lstrip() correctly * Add regression test * Add a clearer error message when the final message is not present * Add a clearer error message when the final message is not present * Fix massive bug!

* squash everything together start to simplify inner logic Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py continue refactor fix small fixes add type hints/docstring Update modeling_utils.py remove _fast_init keep improving Update modeling_utils.py Update modeling_utils.py new first tp loading version style fix weird in-place op trigger CIs Update modeling_utils.py much clearer renaming of keys fix update Update test_modeling_common.py trigger CIs update update style Update modeling_utils.py Update modeling_utils.py Update modeling_utils.py fix fast download first prototype remove old function remove old functions Remove unused function and move back _get_tp_registry fix tp plan registry simplify CIs Update hub.py Update modeling_utils.py simplify simplify renaming logic remove unused check add sanity check back (a test depends on it) Update modeling_utils.py finalize sound renaming logic style add forgotten check Update modeling_utils.py add key_mapping keyword style Update modeling_utils.py add comment minor updates minor change for clarity fix small prefix issue and simplify style trigger CIs typo fix Post rebase fix post rebase cleanup simplify tp typo oupsi typo correctly escape improvements based on Marc's review finalize Marc's review comments squash everything * improve * Update modeling_utils.py * Update modeling_utils.py * fix * Update modeling_utils.py * Update modeling_utils.py * style * Update modeling_utils.py * simplify * style * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix dtype issue * Update modeling_utils.py * style * remove test that does not make sense * style * small fixes * style * fix * cleanup after rebase * style * typo * escape * tp for task specific top modules * Update modeling_utils.py * Update modeling_utils.py * fix allocation * CIs * CIs * CIs * improve docstring * CIs * Update modeling_utils.py * fix

* Don't accidentally mutate the base_model_tp_plan * Co-authored by: Joao Gante <[email protected]> * Trigger tests * Marking grad accum test as slow * Add a flaky decorator * Add a flaky decorator * Use cyril's codeblock * Don't copy() when it's None * Use cyril's new codeblock * make fixup

fix tests

…fast ones (#36266) * Add fast image processor class to processors supporting them * fix test kosmos2

…processors (#36186) * Remove differences between init and preprocess kwargs in fast image processors * make modifs got_ocr2 * update gemma3

* refactor siglip2 fast image processor, add unused_kwargs in base fast image processor * nits * change unused_kwargs default to None * update siglip2 fast image proc

* fix fused rescale normalize inconsistencies * fix siglip2 fast image processor * refactor kwargs validation and fused nirmalize rescale * cleanup kwargs handling in preprocess * update new procs after refactor

* fix * style * new test

* fix * switch to ellipsis instead * Add co-author Co-authored-by: fxmarty-amd <[email protected]> * Add co-author second try Co-authored-by: fxmarty-amd <[email protected]>

changing model

* fix wandb hp search unable to resume from sweep_id * format styles --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Marc Sun <[email protected]>

* update * small update * no spqr quant * testing * testing * test nightly * gptqmodel * flute * fix hadamard * running tests * new docker * fix docker * run tests * testing new docker * new docker * run tests * new docker * run tests * final test * update * update * run tests * new docker * launch tests * test_docker * running tests * add comments * fixing yml * revert

…e kwargs (#36207) Change qwen2VL image processors to have init and call accept the same kwargs

Corrects the type annotation to match actual usage. The variable was typed as Dict[str, Dict[str, Callable]] but is actually used as Dict[str, Callable] where keys are attention mechanism names and values are the corresponding attention functions directly. This change makes the type annotation consistent with how the dictionary is used in the codebase.

* Update tensor_parallel.py * CIs

* chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module

* Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * trigger CIs * Update test_modeling_utils.py * Update test_modeling_utils.py * Update test_modeling_utils.py * better error messages * Update test_modeling_utils.py * Update test_modeling_utils.py

Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Marc Sun <[email protected]>

* adding exception * style * add types

* add gguf support to t5encoder Signed-off-by: Isotr0py <[email protected]> * fix Signed-off-by: Isotr0py <[email protected]> * remove gguf from model_kwargs Signed-off-by: Isotr0py <[email protected]> --------- Signed-off-by: Isotr0py <[email protected]>

* make fixup * make fixup * Correct skip decorator * Add TODOs * add is_flaky() parentheses

* add support for fast image processors in add-new-model-like * fix header not found add-fast-image-processor-cli * Encourage adding fast image processor * nit * start improve doc * update docs * make requested modifs

* fix typo when is on * tiny * add test and remove 'text_crops' * lint

* Make the flaky list a little more general * Trigger tests * Make the flaky list a little more general

* Cleanup the regex used for doc preprocessing * Run tests

* don't gc collect if 1 shard is used * delete state dict anyways

* Set best_model_checkpoint only when ckpt exists. Rather than set it explicitly without checking if the checkpoint directory even exists as before, now we moved the setting logic inside of _save_checkpoint and are only setting it if it exists. * Added best_global_step to TrainerState. * Added tests for best_model_checkpoint. * Fixed hard-coded values in test to prevent fail. * Added helper func and removed hard-coded best_step. * Added side effect patch generator for _eval. * Added evaluate side effect func. * Removed erroneous patching. * Fixed minor bug. * Applied Ruff. * Fixed Ruff problem in make style. * Used Trainer.set_initial_training_values.

Narsil and others added 30 commits February 26, 2025 14:11

Fixing the docs corresponding to the breaking change in torch 2.6. (#…

b4965ce

…36420)

add recommendations for NPU using flash_attn (#36383)

6513e5e

* add recommendations for Ascend NPU using flash_attn * update recommend_message_npu Co-authored-by: Marc Sun <[email protected]> --------- Co-authored-by: Marc Sun <[email protected]>

Fix compressed tensors config (#36421)

981c276

* fix config * update --------- Co-authored-by: Marc Sun <[email protected]>

Fix Expected output for compressed-tensors tests (#36425)

a7fbab3

fix

restrict cache allocator to non quantized model (#36428)

8ede897

Fix permission (#36443)

a8e4fe4

fix permission Co-authored-by: ydshieh <[email protected]>

Fix another permission (#36444)

549db24

* fix permission * fix permission --------- Co-authored-by: ydshieh <[email protected]>

Add contents: write (#36445)

2d6cc0d

fix permission Co-authored-by: ydshieh <[email protected]>

[save_pretrained ] Skip collecting duplicated weight (#36409)

1779255

* Skip collecting duplicated weight * format

[generate] torch.distributed-compatible DynamicCache (#36373)

8aed019

* test * docstring * prepare distributed cache data * fix cat dim * test mvp * add test checks * like this? * working test and solution * nit * nit * add shape info

Fix hub_retry (#36449)

482d17b

* cry * trigger --------- Co-authored-by: ydshieh <[email protected]>

Fix loading models with mismatched sizes (#36463)

02776d2

* Fix loading model with mismatched sizes * trigger tests

[docs] fix bug in deepspeed config (#36081)

51083d1

bug fix

Add Got-OCR 2 Fast image processor and refactor slow one (#36185)

2c5d038

* refactor image processor slow got ocr * add working image processor fast * fix fast image processor, update doc * use one big loop for processing patches

Fix couples of issues from #36335 (#36453)

a40f1ac

* fix * style * better allocation * fix * fix * style * revert disk * exit * style * return if nothing to cache * dtensor guard * fix regressiion * fix regression * fix * fix

Fix _load_state_dict_into_meta_model with device_map=None (#36488)

dcbdf7e

* Fix _load_state_dict_into_meta_model with device_map=None * Update src/transformers/modeling_utils.py

Fix loading zero3 weights (#36455)

4d8259d

* Check if fixes * Fix zero3 loading * Quality * Fix marc nit * Add fast tests * Migrate to integrations.deepspeed rather than modeling_utils * Style

Check TRUST_REMOTE_CODE for RealmRetriever for security (#36511)

9e3a107

* fix * repush --------- Co-authored-by: ydshieh <[email protected]>

Fix kwargs UserWarning in SamImageProcessor (#36479)

3e83ee7

transformers/image_processing_utils.py:41: UserWarning: The following named arguments are not valid for `SamImageProcessor.preprocess` and were ignored: 'point_pad_value'

fix torch_dtype, contiguous, and load_state_dict regression (#36512)

0463901

* fix regression * fix param * fix load_state_dict * style * better fix for module * fix tests * quick fix for now * rm print

Fix some typos in docs (#36502)

acb8586

Co-authored-by: Matt <[email protected]>

chore: fix message descriptions in arguments and comments (#36504)

28159ae

chore: fix messagedescriptions in arguments and comments

Cyrilvallez and others added 30 commits March 12, 2025 13:39

Fix Failing GPTQ tests (#36666)

0013ba6

fix tests

Remove hardcoded slow image processor class in processors supporting …

bc3253f

…fast ones (#36266) * Add fast image processor class to processors supporting them * fix test kosmos2

[quants] refactor logic for modules_to_not_convert (#36672)

cc3a361

Remove differences between init and preprocess kwargs for fast image …

ea219ed

…processors (#36186) * Remove differences between init and preprocess kwargs in fast image processors * make modifs got_ocr2 * update gemma3

Refactor siglip2 fast image processor (#36406)

48292a9

* refactor siglip2 fast image processor, add unused_kwargs in base fast image processor * nits * change unused_kwargs default to None * update siglip2 fast image proc

Fix rescale normalize inconsistencies in fast image processors (#36388)

79254c9

* fix fused rescale normalize inconsistencies * fix siglip2 fast image processor * refactor kwargs validation and fused nirmalize rescale * cleanup kwargs handling in preprocess * update new procs after refactor

[Cache] Don't initialize the cache on meta device (#36543)

c416123

Update config.torch_dtype correctly (#36679)

fbb18ce

* fix * style * new test

Fix slicing for 0-dim param (#36580)

bc3d578

* fix * switch to ellipsis instead * Add co-author Co-authored-by: fxmarty-amd <[email protected]> * Add co-author second try Co-authored-by: fxmarty-amd <[email protected]>

Changing the test model in Quanto kv cache (#36670)

47cc4da

changing model

fix wandb hp search unable to resume from sweep_id (#35883)

87b30c3

* fix wandb hp search unable to resume from sweep_id * format styles --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Marc Sun <[email protected]>

Change Qwen2_VL image processors to have init and call accept the sam…

1c287ae

…e kwargs (#36207) Change qwen2VL image processors to have init and call accept the same kwargs

Fix dtype for params without tp_plan (#36681)

32c95bd

* Update tensor_parallel.py * CIs

chore: fix typos in utils module (#36668)

d845693

* chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module * chore: fix typos in utils module

[CI] Automatic rerun of certain test failures (#36694)

a3201ce

fix: fsdp sharded state dict wont work for save_only_model knob (#36627)

09a309d

Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Marc Sun <[email protected]>

Handling an exception related to HQQ quantization in modeling (#36702)

4a60bae

* adding exception * style * add types

Final CI cleanup (#36703)

48ef468

* make fixup * make fixup * Correct skip decorator * Add TODOs * add is_flaky() parentheses

Gemma3 processor typo (#36710)

53742b1

* fix typo when is on * tiny * add test and remove 'text_crops' * lint

Make the flaky list a little more general (#36704)

72861e1

* Make the flaky list a little more general * Trigger tests * Make the flaky list a little more general

Cleanup the regex used for doc preprocessing (#36648)

8cb522b

* Cleanup the regex used for doc preprocessing * Run tests

[model loading] don't gc.collect() if only 1 shard is used (#36721)

3bd1a0d

* don't gc collect if 1 shard is used * delete state dict anyways

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fork #1

Update fork #1

AleHD commented Feb 26, 2025

Update fork #1

Are you sure you want to change the base?

Update fork #1

Conversation

AleHD commented Feb 26, 2025