[FEAT] Integrate BD-LoRA into PEFT #2895

Conzel · 2025-11-04T16:25:39Z

Block-Diagonal LoRA (BD-LoRA) (NeurIPS 2025) is a LoRA-variant in which some LoRA factors are constrained to be block-diagonal. This allows faster serving (e.g. 10-40% with Llama 3.1-8B) by eliminating communication overheads when running inference on multiple GPUs. Despite the block-diagonal constraint, BD-LoRA is as performant as vanilla LoRA at similar parameter counts. For performance and speed examples, see the plots below.

BD-LoRA is designed to be used with tensor parallelism, which means sharding the weights of a model among multiple GPUs. I am currently also working on a pull-request to vLLM which will enable vLLM to use the adapters from this BD-LoRA implementation for GPU speed-ups. I will release the pull-request to vLLM in the next 1-2 days and post a link to it.

I have included in this PR:

Implementation of BD-LoRA under src/peft/tuners/bdlora, following other LoRA variants
Performance experiments under method_comparison
Usage example in the form of an ipynb notebook under examples/bdlora_finetuning
Tests under tests/test_bdlora.py

review-notebook-app · 2025-11-04T16:25:45Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

BenjaminBossan · 2025-11-07T15:42:20Z

Thanks for your work on integrating block-diagonal LoRA into PEFT. I haven't checked the details yet, but wanted to give a first, high level feedback.

1. Parallelism

The main advantage of BD-LoRA compared to LoRA would be the better parallelism story. Did you verify that it works with the proposed implementation? I think having a small example would be really helpful. AFAICT, the included notebook doesn't really cover that.

2. Implementation approach

You implemented BD-LoRA as a subclass of normal LoRA. The idea is probably to leverage the existing code as much as possible. However, it can also be problematic, as BD-LoRA may or may not always be a substitute for LoRA, so some checks like isinstance(layer, LoraLayer) may falsely return True for BD-LoRA. I haven't checked the details, but historically, we had such issues with AdaLoRA, which also subclasses. This approach would also require us to keep BdLoraModel in sync with LoraModel each time there is a change there.

I wonder if you have considered implementing BD-LoRA as a LoRA variant instead. E.g. DoRA is implemented as a LoRA variant. The advantage would be that we have a defined interface for BD-LoRA.

From a user's point of view, it could look like this:

from peft import LoraConfig, BdLoraConfig

bd_lora_config = BdLoraConfig(
    lora_a_is_blockdiagonal=["o_proj", "down_proj"],
    lora_b_is_blockdiagonal=["q_proj", "v_proj", "k_proj", "up_proj", "gate_proj"],
    nblocks=2,
)

lora_config = LoraConfig(
    r=16,
    target_modules=["q_proj", "v_proj", "k_proj", "up_proj", "gate_proj", "o_proj", "down_proj"],
    use_bdlora=bd_lora_config,
)

Maybe this idea won't quite work because LoRA variants are not the right abstraction, but I wanted to discuss this first before proceeding.

Conzel · 2025-11-10T16:48:28Z

Hey @BenjaminBossan, thank you very much for your thoughtful comments.

Using this implementation and the vLLM version at [Core][Feature] Add BD-LoRA to VLLM vllm-project/vllm#28136, I achieved the following speed-up for Llama-3.1-8B-Instruct using LLMPerf for evaluation on an 8xA100 GPU compute node:

If all adapters can be kept in GPU memory:

If no adapter can be kept in GPU memory:

I agree that adding some information on how to integrate BD-LoRA vLLM would be a good idea. I would wait until the PR in vLLM is accepted (or at least ready to merge), and then add further information to the notebook.

Thanks for the suggestion! My latest commit contains a draft on how it would look like using the variant. This would involve adding the target_name parameter in the create_and_replace method in lora/model.py to record the current key, as BD-LoRA needs this parameter to determine whether a module is row- or column-sharded.

Conzel · 2025-11-17T14:02:41Z

@BenjaminBossan Is there something else that we need to move forward / that I can work on?

BenjaminBossan

Thanks a lot for refactoring the PR to implement BD-LoRA as a LoRA variant. I think this is indeed the much nicer approach, as it integrates better and is more future proof.

This would involve adding the target_name parameter in the create_and_replace method in lora/model.py to record the current key, as BD-LoRA needs this parameter to determine whether a module is row- or column-sharded.

We can do that but I hope that we can find a way to avoid it. I started a discussion about the choice of LoRA A vs B, depending on the outcome there, we may be able to streamline the implementation.

I agree that adding some information on how to integrate BD-LoRA vLLM would be a good idea. I would wait until the PR in vLLM is accepted (or at least ready to merge), and then add further information to the notebook.

Sounds good. Apart from vLLM, could we also expect potential speedups from transformers TP with this technique (at least for those models that support it)? If yes, it would be great if we could include an example.

Is there something else that we need to move forward / that I can work on?

Sorry for the delay, we had our hands full with the last PEFT release and updates required for the upcoming transformers v5.

BenjaminBossan · 2025-11-17T15:31:14Z

method_comparison/MetaMathQA/results/bdlora--llama-3.2-3B-rank46.json

Thanks for providing the results. Note that for the final PR, we need to remove this file, as we will run the experiment on our own hardware to keep results comparable.

Regarding the settings, do you have an idea why the end result, the test accuracy, is relatively low at 36.8%? For comparison vanilla LoRA has 48.2% test accuracy. And that is with 9M trainable parameters, compared to 49M trainable parameters used here. From the paper results, I would have expected a very close match in performance.

Thanks for providing the results. Note that for the final PR, we need to remove this file, as we will run the experiment on our own hardware to keep results comparable.

Okay, that makes a lot of sense.

Regarding the settings, do you have an idea why the end result, the test accuracy, is relatively low at 36.8%? For comparison vanilla LoRA has 48.2% test accuracy. And that is with 9M trainable parameters, compared to 49M trainable parameters used here. From the paper results, I would have expected a very close match in performance.

At the moment, BD-LoRA uses many more parameters as I set the LoRA adapters to every linear module. I just noticed that in the LoRA result we only apply the adapters to v and q projections. Applying them to all layers might already cause overfitting/require different learning rates, which explains BD-LoRAs comparatively worse performance.

Only adding LoRA adapters to v_proj and q_proj for BD-LoRA might not be the best test scenario for BD-LoRA, as then we only test performance for modules where the LoRA B adapter is block-diagonal. Do you have any suggestions?

We have a vanilla LoRA experiments that targets the 'MLP' part of the model. You could do the same, which would allow a mix of A and B being BD. Let's try to match the rank to achieve a similar number of trainable parameters (9-10M). The test accuracy for this experiment is 0.526.

BenjaminBossan · 2025-11-17T15:34:35Z

src/peft/tuners/lora/config.py

+        nblocks: Number of blocks in block-diagonal matrices
+    """
+
+    lora_a_is_blockdiagonal: Optional[list[str]] = field(


The name lora_a_is_blockdiagonal would suggest the argument to be a bool. How about something like target_modules_blockdiagonal_a or even target_modules_bd_a (same for b)?

As to the meaning of these arguments: How would I, as a user, make the correct choice for which of A or B should be block-diagonal? I think it would be great if this could be automated (at least, as an option), for instance based on the base weight. Do you think that would be possible? If not, then at the very least the doc needs to be improved so that the users know what they should set here.

Sounds good, I have updated the attribute.

Regarding the meaning: Yes, it would be great to automate it. However, I am not sure if this can be done in a robust manner.

As a general guideline: To use BD-LoRA, we want to set QKV and up projections as B-block-diagonal, and down,out projections as A-block-diagonal. The exact sharding scheme however might depend on the implementation in the downstream inference engine. As an example, vLLM re-implements each supported model architecture and explicitly encodes which layers are ColumnParallel (needs LoRA B block-diagonal) and which ones are RowParallel (needs LoRA A block-diagonal): Here is Llama as an example: https://github.com/vllm-project/vllm/blob/0af3d4f0df360decc2115f43f5e4bc732342e7e4/vllm/model_executor/models/llama.py

If you have some ideas on how to determine this automatically, I would be very glad.

src/peft/tuners/lora/config.py

BenjaminBossan · 2025-11-17T16:12:34Z

src/peft/tuners/lora/variants.py

+        self.weight = nn.Parameter(torch.empty(out_features, in_features // nblocks))
+        torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
+        if bias:
+            raise NotImplementedError


Let's raise with a proper error message.

BenjaminBossan · 2025-11-17T16:13:41Z

src/peft/tuners/lora/variants.py

+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        first_dims = x.shape[:-1]
+        if not len(x.shape) == 2:


A bit clearer IMO:

Suggested change

if not len(x.shape) == 2:

if x.dim() != 2:

src/peft/tuners/lora/variants.py

BenjaminBossan · 2025-11-17T16:21:01Z

src/peft/tuners/lora/variants.py

+
+    @staticmethod
+    def merge_safe(module: Linear, active_adapter: str, orig_weight: torch.Tensor) -> torch.Tensor:
+        return orig_weight + module.get_delta_weight(active_adapter)


Hmm, is it expected that the use of block diagonal LoRA does not affect how merging should be performed?

This was an oversight, thank you. I have added the appropriate method for getting the delta weight in the case of BD-LoRA adapters.

BenjaminBossan · 2025-11-17T16:21:21Z

src/peft/utils/peft_types.py

    - PREFIX_TUNING
    - LORA
    - ADALORA
+    - BDLORA


Can be removed, since we're going with LoRA variants.

BenjaminBossan · 2025-11-17T16:21:26Z

src/peft/utils/peft_types.py

    PREFIX_TUNING = "PREFIX_TUNING"
    LORA = "LORA"
    ADALORA = "ADALORA"
+    BDLORA = "BDLORA"


Can be removed, since we're going with LoRA variants.

BenjaminBossan · 2025-11-17T16:24:50Z

tests/test_bdlora.py

+        return X
+
+
+class TestBdLora:


Thanks for thinking about adding tests. From skimming those, I think most if not all can be covered by extending existing tests. To do so, check e.g. how the test matrix is expanded for DoRA:

peft/tests/test_custom_models.py

Lines 91 to 134 in 9d3e0e5

("Vanilla MLP 7 LoRA with DoRA", "MLP", LoraConfig, {"target_modules": ["lin0"], "use_dora": True}),

("Vanilla MLP 8 LoRA with DoRA", "MLP", LoraConfig, {"target_modules": ["lin0", "lin1"], "use_dora": True}),

(

"Vanilla MLP 9 LoRA with DoRA",

"MLP",

LoraConfig,

{"target_modules": "lin1", "use_dora": True, "lora_alpha": 32},

),

("Embedding + transformers Conv1D 1 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["conv1d"]}),

("Embedding + transformers Conv1D 2 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"]}),

("Embedding + transformers Conv1D 3 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb", "conv1d"]}),

(

"Embedding + transformers Conv1D 1 DoRA",

"EmbConv1D",

LoraConfig,

{"target_modules": ["conv1d"], "use_dora": True},

),

("Embedding + transformers Conv1D 2 DoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_dora": True}),

(

"Embedding + transformers Conv1D 3 DoRA",

"EmbConv1D",

LoraConfig,

{"target_modules": ["emb", "conv1d"], "use_dora": True},

),

(

"Embedding + transformers Conv1D 1 LoRA trainable_tokens",

"EmbConv1D",

LoraConfig,

{"target_modules": ["conv1d"], "trainable_token_indices": {"emb": [0, 10]}},

),

("Conv1d LoRA", "Conv1d", LoraConfig, {"target_modules": ["conv1d"]}),

("Conv1d LoRA with DoRA", "Conv1d", LoraConfig, {"target_modules": ["conv1d"], "use_dora": True}),

("Conv2d 1 LoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d"]}),

("Conv2d 2 LoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d", "lin0"]}),

("Conv2d 1 LoRA with DoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),

("Conv2d 2 LoRA with DoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d", "lin0"], "use_dora": True}),

("Conv2d Groups LoRA", "Conv2dGroups", LoraConfig, {"target_modules": ["conv2d"]}),

("Conv2d Groups2 LoRA", "Conv2dGroups2", LoraConfig, {"target_modules": ["conv2d"]}),

("Conv2d Groups LoRA with DoRA", "Conv2dGroups", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),

("Conv2d Groups2 LoRA with DoRA", "Conv2dGroups2", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),

("Conv3d 1 LoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d"]}),

("Conv3d 2 LoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d", "lin0"]}),

("Conv3d 1 LoRA with DoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d"], "use_dora": True}),

("Conv3d 2 LoRA with DoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d", "lin0"], "use_dora": True}),

If you add similar settings for BD-LoRA there (of course only for linear layers), it should automatically cover lots of use cases of BD-LoRA. Then we can check the test coverage for gaps and add more BD-LoRA specific tests if needed.

That is true, thank you! I have moved the tests to the test_custom_model file. The only test that could not be converted is the block-diagonal structure test, but this is maybe a bit too implementation dependent anyways, I have removed it.

Related to type casting & initialization.

Conzel · 2025-11-18T18:10:45Z

@BenjaminBossan Thank you very much for your thorough review!

We can do that but I hope that we can find a way to avoid it. I started a discussion about the choice of LoRA A vs B, depending on the outcome there, we may be able to streamline the implementation.

Sounds good. Apart from vLLM, could we also expect potential speedups from transformers TP with this technique (at least for those models that support it)? If yes, it would be great if we could include an example.

Transformers TP would also need a PR that adds BD-LoRA compatibility, similar to vLLM. I am not too familiar with Transformers TP, but if Multi-LoRA serving with Megatron-style sharding is implemented already, it should not be much work.

Sorry for the delay, we had our hands full with the last PEFT release and updates required for the upcoming transformers v5.

No worries, thanks for your time.

I hope to have addressed all your concerns. By moving the tests to the LoRA variant test matrix, I have found some more minor issues (related to dtype and multi-adapter settings) which I have also fixed.
From my side, two points now remain open:

How to organize the performance tests? Should we just run BD-LoRA with adapters on the q and v projections?
Is passing the target_name to the LoRA adapters okay or is there a more clean approach we can do?

BenjaminBossan

Thanks for the updates, the PR is getting close to the finish line. I did another review, please check.

Regarding testing: There are a few checks being performed at initialization time, e.g. for BD target module overlap or for % nblocks. Could you please add tests for those to test_initialization.py? I think the existing tests there should give you a good idea how to implement the tests, if not LMK.

Transformers TP would also need a PR that adds BD-LoRA compatibility, similar to vLLM. I am not too familiar with Transformers TP, but if Multi-LoRA serving with Megatron-style sharding is implemented already, it should not be much work.

Okay, let's leave that aside then. It would be nice to have but is not required for this PR.

As a general guideline: To use BD-LoRA, we want to set QKV and up projections as B-block-diagonal, and down,out projections as A-block-diagonal. The exact sharding scheme however might depend on the implementation in the downstream inference engine. As an example, vLLM re-implements each supported model architecture and explicitly encodes which layers are ColumnParallel (needs LoRA B block-diagonal) and which ones are RowParallel (needs LoRA A block-diagonal): Here is Llama as an example: https://github.com/vllm-project/vllm/blob/0af3d4f0df360decc2115f43f5e4bc732342e7e4/vllm/model_executor/models/llama.py
If you have some ideas on how to determine this automatically, I would be very glad.

I see, thanks for explaining. In this case, I think it's hard to impossible to automate this step. Could you put this additional information into the docstring of BdLoraConfig too, e.g. by extending it here?

One thing we could do is to determine some defaults for popular models with support in vLLM and, if the user leaves target_modules_bd_a and target_modules_bd_b as None, check if the used model matches one of the defaults. This is, however, a bit of extra work, especially as these defaults can quickly get out of date. Up to you if you think it's worth the effort.

Is passing the target_name to the LoRA adapters okay or is there a more clean approach we can do?

I thought a bit about that and I think it's okay as is. Another idea that I had would be to provide two variants, one for BD-A and one for BD-B, and resolve which one to use based on the target name, but I think overall that would be equally complex, so we can leave the logic is as.

One more thing, could you try to limit the characters to 120 per line, there are some very long strings for instance in config.py. Calling make style should help with some of those.

BenjaminBossan · 2025-11-20T11:28:48Z

method_comparison/MetaMathQA/results/bdlora--llama-3.2-3B-rank46.json

We have a vanilla LoRA experiments that targets the 'MLP' part of the model. You could do the same, which would allow a mix of A and B being BD. Let's try to match the rank to achieve a similar number of trainable parameters (9-10M). The test accuracy for this experiment is 0.526.

BenjaminBossan · 2025-11-20T11:32:56Z

src/peft/tuners/lora/config.py

+        # check for overlap
+        if self.target_modules_bd_a is None or self.target_modules_bd_b is None:
+            return  # one is empty so no overlap possible
+        for module in self.target_modules_bd_a:


How about:

overlap = set(self.target_modules_bd_a or []) & set(self.target_modules_bd_b or []) if overlap: raise ValueError(f"Found overlapping ...")

BenjaminBossan · 2025-11-20T11:35:59Z

src/peft/tuners/lora/variants.py

+            torch.nn.init.zeros_(self.weight)
+        else:
+            torch.nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
+        if bias:


Let's move this check up, directly below the check for % nblocks. Also, let's raise a ValueError instead.

BenjaminBossan · 2025-11-20T11:38:10Z

src/peft/tuners/lora/variants.py

+    @staticmethod
+    def init(module: Linear, adapter_name: str, **kwargs) -> None:
+        use_bdlora = kwargs.get("use_bdlora")
+        if not use_bdlora:


Can this ever happen? I don't think so, since the LoRA variant is only added if use_bdlora is given.

Yeah that is true, I have converted it to an assert as a sanity-check (also to make the type-checker aware that this is not None)

src/peft/tuners/lora/variants.py

Removing some unused checks and clarifying others

This raises an error if neither target_modules_bd_a or ..._b matches.

Updated experiments to include new results

Conzel · 2025-11-20T18:38:33Z

Hey @BenjaminBossan, thanks again for your time.

This is helpful to know. Let's add this info to the config. It would be kinda bad if none of the passed names match, in which case BD-LoRA would not be applied anywhere. But I think it's a worthwhile tradeoff for the possibility to mix BD-LoRA with vanilla LoRA.

Great, I added a match_strict argument that is set to True per default and raises an error if it is set and we encounter an overlap. I think this is a good solution.

We have a vanilla LoRA experiments that targets the 'MLP' part of the model. You could do the same, which would allow a mix of A and B being BD. Let's try to match the rank to achieve a similar number of trainable parameters (9-10M). The test accuracy for this experiment is 0.526.

This is great. I have added a similar experiment for BD-LoRA at rank 14 (to match the number of parameters) and have 0.51857 test accuracy on my device. Additionally, I have repeated the rank 44 experiment with the correct target modules (before, I had erroneously targeted all linear layers), this time achieving 0.479 accuracy.

Additionally, I have added an example for vLLM-integration. I had to add 2 more script files for querying the server and starting it. At the moment, I have not added a real demo for the speed-up (just a demo for how to start the vLLM server with a BD-LoRA adapter and how to send requests to it). I think a speedup-demo would involve using larger models and a more sophisticated platform for sending the requests (I used a modified LLMPerf-fork for my experiments in the initial comments). If you think this is a priority, I can try to add this information however.

I see, thanks for explaining. In this case, I think it's hard to impossible to automate this step. Could you put this additional information into the docstring of BdLoraConfig too, e.g. by extending it here?

I added information into the docstring in a previous commit, just above the line you linked. Is that enough or should I add more?

One thing we could do is to determine some defaults for popular models with support in vLLM and, if the user leaves target_modules_bd_a and target_modules_bd_b as None, check if the used model matches one of the defaults. This is, however, a bit of extra work, especially as these defaults can quickly get out of date. Up to you if you think it's worth the effort.

I think this could be a good idea further down the road. The vLLM maintainers have requested to keep the pull request in a draft state at the moment, and want to include it into the codebase if the community demand is high enough, so maybe keeping it mainly documented for LLama is okay at the moment.

I think I am only missing the remaining initialization tests now, I will add them today or tomorrow morning. Thanks for your support! EDIT: Done!

BenjaminBossan

Thanks for the updates, there is not much left. I did another review, please check my comments.

This is great. I have added a similar experiment for BD-LoRA at rank 14 (to match the number of parameters) and have 0.51857 test accuracy on my device. Additionally, I have repeated the rank 44 experiment with the correct target modules (before, I had erroneously targeted all linear layers), this time achieving 0.479 accuracy.

Let's only keep the one with rank 14, as it matches the param count and is giving the best results among those.

I added information into the docstring in a previous commit, just above the line you linked. Is that enough or should I add more?

Looks good, thanks.

The vLLM maintainers have requested to keep the pull request in a draft state at the moment, and want to include it into the codebase if the community demand is high enough, so maybe keeping it mainly documented for LLama is okay at the moment.

I see, too bad, but I'm sure they have good reasons. So essentially, to benefit from the speedup, users would have to checkout your branch? Let's document this, otherwise it's too hard to figure out.

There is also a merge conflict now, could you please merge with/rebase on the latest main and resolve it? When all is finished, don't forget to run make style.

examples/bdlora_finetuning/chat.py

BenjaminBossan · 2025-11-21T13:40:30Z

src/peft/tuners/lora/config.py

+        default=True,
+        metadata={
+            "help": "If set to true, requires each target_module to have either a block-diagonal LoRA-A or LoRA-B, "
+            "and raises an error otherwise. You can set this to False this to mix LoRA and BD-LoRA training, "


Suggested change

"and raises an error otherwise. You can set this to False this to mix LoRA and BD-LoRA training, "

"and raises an error otherwise. You can set this to False to mix LoRA and BD-LoRA training, "

BenjaminBossan · 2025-11-21T13:42:09Z

src/peft/tuners/lora/variants.py

+    def init(module: Linear, adapter_name: str, **kwargs) -> None:
+        use_bdlora = kwargs.get("use_bdlora")
+        target_name = kwargs.get("target_name", "")
+        assert use_bdlora is not None


IMO, this can be fully removed. If you really want to keep it, please raise a proper error (in this case I'd say RuntimeError("Something went wrong with BD-LoRA initialization, please open an issue at https://github.com/huggingface/peft/issues")).

BenjaminBossan · 2025-11-21T13:42:37Z

src/peft/tuners/lora/variants.py

+
+        # Handle case where use_bdlora is a dict (from saved config) instead of BdLoraConfig object
+        if isinstance(use_bdlora, dict):
+            from peft.tuners.lora.config import BdLoraConfig


Can the import be moved to the root? Also, it is a bit strange to go from BdLoraConfig to dict back to BdLoraConfig. Can we not keep it as a BdLoraConfig?

BenjaminBossan · 2025-11-21T13:45:21Z

tests/test_custom_models.py

+            "modules_to_save": ["lin1"],
+            "use_bdlora": BdLoraConfig(target_modules_bd_a=["lin0"], nblocks=2),
+        },
+    ),


Can we get an example with match_strict=False and one vanilla LoRA layer? On the other hand, no need for the modules_to_save example, as I don't think there is any interaction there with BD-LoRA.

This should be covered by the two tests above, right?

( "BD-LoRA A only", "MLP", LoraConfig, { "target_modules": ["lin0", "lin1"], "use_bdlora": BdLoraConfig(target_modules_bd_a=["lin0"], nblocks=2, match_strict=False), }, ), ( "BD-LoRA B only", "MLP", LoraConfig, { "target_modules": ["lin0", "lin1"], "use_bdlora": BdLoraConfig(target_modules_bd_b=["lin1"], nblocks=2, match_strict=False), }, ),

I removed the modules_to_save test :)

BenjaminBossan · 2025-11-21T13:46:01Z

tests/test_custom_models.py

+    ##########
+    # BD-LoRA #
+    ##########


Suggested change

##########

# BD-LoRA #

##########

###########

# BD-LoRA #

###########

;-)

BenjaminBossan · 2025-11-21T13:47:30Z

tests/test_initialization.py

        with pytest.raises(ValueError, match="DoRA does not support megatron_core"):
            LoraConfig(target_modules=["linear"], use_dora=True, megatron_config=megatron_config)

+    def test_bdlora_lora_a_block_diagonal(self, data):


I think this test and the next one can be scrapped, this should be covered pretty much by the test_custom_models.py tests. Also, they are not really about initialization.

BenjaminBossan · 2025-11-21T13:48:22Z

tests/test_initialization.py

+        with pytest.raises(ValueError, match="matches neither A nor B block-diagonal patterns"):
+            get_peft_model(model, config)
+
+    def test_bdlora_non_strict_matching_works(self, data):


Can be removed, see my suggestion above to add a non-strict BD-LoRA example to test_custom_models.py.

BenjaminBossan · 2025-11-21T13:49:59Z

src/peft/tuners/lora/variants.py

+            raise ValueError(
+                f"self.in_features={self.in_features} or self.out_features={self.out_features} not divisble by {self.nblocks}"
+            )
+        if bias:
+            raise ValueError("Bias is not implemented for BlockDiagonalLinear layer.")


Let's add init tests for these two errors as well.

I have added a test for the divisibility error, but I actually cannot get the bias error to raise. For example, I had a test like this:

def test_bdlora_bias_raises(self): model = self.get_model() bdlora_config = { "target_modules_bd_a": [], "target_modules_bd_b": ["linear"], "nblocks": 2, "match_strict": True, } config = LoraConfig(target_modules=["linear"], use_bdlora=bdlora_config, bias='lora_only') with pytest.raises(ValueError, match="Bias is not implemented for BdLora."): get_peft_model(model, config)

And a check at the BdLoraLinearVariant like this:

elif has_lora_b_blockdiagonal: r = module.lora_B[adapter_name].in_features print(module.lora_B[adapter_name].bias) if module.lora_B[adapter_name].bias: raise ValueError("Bias is not implemented for BdLora.") base_layer = module.get_base_layer()

I thought that this should raise the ValueError for the bias. At the moment, I have just removed bias from the BlockDiagonalLinear (it is probably not the right place to catch the error anyways).

Conzel · 2025-11-24T17:17:44Z

Hey Benjamin, thanks again for your thorough review. I hope I have caught everything this time. Please let me know especially if the example could use more work.

Additionally, I ran into some unexpected behaviour with the bias, would be glad if you could check it out.

BenjaminBossan

Thanks for the updates, especially the description of the example.

This should be covered by the two tests above, right?

Yes!

I actually cannot get the bias error to raise.

I think there is a misunderstanding here. LoraConfig(..., bias='lora_only') means that for this LoRA adapter, the bias of the targeted modules should be made trainable. I don't think that this has a bearing on BD-LoRA, does it?

There is a small merge conflict in the test file, could you please take care?

examples/bdlora_finetuning/chat.py

BenjaminBossan · 2025-11-25T11:47:50Z

method_comparison/MetaMathQA/experiments/bdlora/llama-3.2-3B-rank46/adapter_config.json

As mentioned earlier, let's keep only the rank 14 experiment.

BenjaminBossan · 2025-11-25T11:51:43Z

src/peft/tuners/lora/model.py

                use_rslora=lora_config.use_rslora,
                use_dora=lora_config.use_dora,
                lora_bias=lora_config.lora_bias,
+                target_name=current_key,


Let's put the new arguments to the end.

Co-authored-by: Benjamin Bossan <[email protected]>

Conzel · 2025-11-25T15:12:09Z

I think there is a misunderstanding here. LoraConfig(..., bias='lora_only') means that for this LoRA adapter, the bias of the targeted modules should be made trainable. I don't think that this has a bearing on BD-LoRA, does it?

Ah perfect, then this does not interfere with BD-LoRA. I guess it is fine to just initialize BlockDiagonalLinear with bias=False and not perform any checks.

There is a small merge conflict in the test file, could you please take care?

Done!

Conzel added 11 commits October 22, 2025 16:32

Added BD-LoRA support

5b8d30e

Changed BD-LoRA weights to be 2d instead of 3d

4e57924

Fixed formatting and style

872b91d

Add docstrings and updated comments

08c4a98

Enable mixed_compatible

1969d95

Fixed failing tests due to previous refactors

fd768d7

Fixed shaping issue with BD-LoRA

43782b5

Add usage example for bdlora

069302d

Created experiment results

3a96bc3

Merge branch 'huggingface:main' into bdlora

9404334

Updated notebook with clarification on sharding and performance

797c7af

Conzel mentioned this pull request Nov 5, 2025

[Core][Feature] Add BD-LoRA to VLLM vllm-project/vllm#28136

Draft

5 tasks

Moved BD-LoRA implementation to a LoRA variant

d5f7b28

BenjaminBossan requested changes Nov 17, 2025

View reviewed changes

Conzel added 11 commits November 18, 2025 13:09

Removed BD-LoRA from peft types

f8e4e89

Renamed attributes in BD-LoRA Config

d6f1f1d

Add post-init check to ensure no overlap in BD-modules

d9a5edd

Improved error message for BlockDiagonalLinear if bias is supplemented

83fd06d

Fixed adapter merging in BD-LoRA and added tests

9a0bf5d

Moved tests to custom model matrix

7657cb7

Fixed issues discovered through new tests

e49ff77

Related to type casting & initialization.

Updated notebook to new BD-LoRA variant implementation

55460f7

Add check that features are divisible by blocks

c0b599d

Fixed wrong default argument

639578a

Add more information in the BdLoraConfig Docstring

6f9a496

BenjaminBossan requested changes Nov 20, 2025

View reviewed changes

Conzel added 6 commits November 20, 2025 16:07

Minor refactors

fac0291

Removing some unused checks and clarifying others

Added match_strict option to BD-LoRA

4e15182

This raises an error if neither target_modules_bd_a or ..._b matches.

Removed kaiming_uniform argument for a, not needed

cf459d9

Shortened docstrings

fec4643

Changed BD-LoRA experiments

9cc8f03

Updated experiments to include new results

Add example with vLLM integration

362c5c0

Added tests for BD-LoRA initialization

a6e7819

BenjaminBossan requested changes Nov 21, 2025

View reviewed changes

Conzel added 3 commits November 24, 2025 16:12

Improved BD-LoRA finetuning example

2fce6c2

Cleaned up asserts and spelling in BD-LoRA implementation

b7f1a9e

Cleaned up tests and removed bias assertion

9b8e965

BenjaminBossan requested changes Nov 25, 2025

View reviewed changes

Update examples/bdlora_finetuning/chat.py

86913b6

Co-authored-by: Benjamin Bossan <[email protected]>

Conzel force-pushed the bdlora branch from 59c8dc3 to 86913b6 Compare November 25, 2025 15:03

Conzel added 3 commits November 25, 2025 15:05

Merge branch 'main' of https://github.com/huggingface/peft into bdlora

dc0c8e9

Removed all BD-LoRA experiments besides the one for rank 14

512aef9

Moved target name to end of argument list

9dd8166

	("Vanilla MLP 7 LoRA with DoRA", "MLP", LoraConfig, {"target_modules": ["lin0"], "use_dora": True}),
	("Vanilla MLP 8 LoRA with DoRA", "MLP", LoraConfig, {"target_modules": ["lin0", "lin1"], "use_dora": True}),
	(
	"Vanilla MLP 9 LoRA with DoRA",
	"MLP",
	LoraConfig,
	{"target_modules": "lin1", "use_dora": True, "lora_alpha": 32},
	),
	("Embedding + transformers Conv1D 1 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["conv1d"]}),
	("Embedding + transformers Conv1D 2 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"]}),
	("Embedding + transformers Conv1D 3 LoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb", "conv1d"]}),
	(
	"Embedding + transformers Conv1D 1 DoRA",
	"EmbConv1D",
	LoraConfig,
	{"target_modules": ["conv1d"], "use_dora": True},
	),
	("Embedding + transformers Conv1D 2 DoRA", "EmbConv1D", LoraConfig, {"target_modules": ["emb"], "use_dora": True}),
	(
	"Embedding + transformers Conv1D 3 DoRA",
	"EmbConv1D",
	LoraConfig,
	{"target_modules": ["emb", "conv1d"], "use_dora": True},
	),
	(
	"Embedding + transformers Conv1D 1 LoRA trainable_tokens",
	"EmbConv1D",
	LoraConfig,
	{"target_modules": ["conv1d"], "trainable_token_indices": {"emb": [0, 10]}},
	),
	("Conv1d LoRA", "Conv1d", LoraConfig, {"target_modules": ["conv1d"]}),
	("Conv1d LoRA with DoRA", "Conv1d", LoraConfig, {"target_modules": ["conv1d"], "use_dora": True}),
	("Conv2d 1 LoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d"]}),
	("Conv2d 2 LoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d", "lin0"]}),
	("Conv2d 1 LoRA with DoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),
	("Conv2d 2 LoRA with DoRA", "Conv2d", LoraConfig, {"target_modules": ["conv2d", "lin0"], "use_dora": True}),
	("Conv2d Groups LoRA", "Conv2dGroups", LoraConfig, {"target_modules": ["conv2d"]}),
	("Conv2d Groups2 LoRA", "Conv2dGroups2", LoraConfig, {"target_modules": ["conv2d"]}),
	("Conv2d Groups LoRA with DoRA", "Conv2dGroups", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),
	("Conv2d Groups2 LoRA with DoRA", "Conv2dGroups2", LoraConfig, {"target_modules": ["conv2d"], "use_dora": True}),
	("Conv3d 1 LoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d"]}),
	("Conv3d 2 LoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d", "lin0"]}),
	("Conv3d 1 LoRA with DoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d"], "use_dora": True}),
	("Conv3d 2 LoRA with DoRA", "Conv3d", LoraConfig, {"target_modules": ["conv3d", "lin0"], "use_dora": True}),

	"and raises an error otherwise. You can set this to False this to mix LoRA and BD-LoRA training, "
	"and raises an error otherwise. You can set this to False to mix LoRA and BD-LoRA training, "

[FEAT] Integrate BD-LoRA into PEFT #2895

Are you sure you want to change the base?

[FEAT] Integrate BD-LoRA into PEFT #2895

Uh oh!

Conversation

Conzel commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Nov 4, 2025

Uh oh!

BenjaminBossan commented Nov 7, 2025

Uh oh!

Conzel commented Nov 10, 2025

Uh oh!

Conzel commented Nov 17, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Conzel Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Conzel commented Nov 18, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Conzel commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Conzel commented Nov 4, 2025 •

edited

Loading

Conzel Nov 18, 2025 •

edited

Loading

Conzel commented Nov 20, 2025 •

edited

Loading