FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149
Open
Chessing234 wants to merge 1 commit intohuggingface:mainfrom
Open
FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149Chessing234 wants to merge 1 commit intohuggingface:mainfrom
Chessing234 wants to merge 1 commit intohuggingface:mainfrom
Conversation
\`_ConvNd.unmerge\` in \`src/peft/tuners/ia3/layer.py\` is supposed to
undo the corresponding \`_ConvNd.merge\` by inverting the (IA)^3
scaling applied to the base layer's weight **and** bias. The weight
path correctly divides by the IA3 scaling (line 278):
base_layer.weight.data = torch.div(
base_layer.weight.data, ia3_scaling + 1e-8
).to(orig_dtype)
but the bias path right below it accidentally uses \`torch.mul\`
again, the same op \`merge\` uses:
# merge (line 256)
base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
...
# unmerge (line 283) <-- BUG: should be torch.div
base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
So a merge followed by an unmerge leaves the bias scaled by
\`ia3_l ** 2\` instead of being restored to its original value. For
any non-feedforward IA3 Conv layer (Conv2d / Conv3d) whose base
layer has \`bias is not None\`, \`merge() -> unmerge()\` permanently
corrupts the bias — and subsequent \`disable_adapters\`-triggered
unmerges compound the error.
The sibling \`Linear.unmerge\` in the same file (line 155) already
uses the correct pattern:
base_layer.bias.data = torch.div(
base_layer.bias.data, scaling.data + 1e-8
).to(orig_dtype)
Mirror that pattern in \`_ConvNd.unmerge\`: divide by
\`scaling.data + 1e-8\` so the bias unmerge is the inverse of the
bias merge, and the \`+ 1e-8\` tolerance guards against division by
any IA3 scaling entry that happens to be zero.
githubnemo
reviewed
Apr 16, 2026
Collaborator
There was a problem hiding this comment.
Hey, nice find.
Can you also reduce the tolerance in test_custom_models.py for test_merge? (to verify that the fix works)
--- a/tests/testing_common.py
+++ b/tests/testing_common.py
@@ -608,7 +608,7 @@ class PeftCommonTester:
atol, rtol = 1e-3, 1e-3
if (config.peft_type in {"IA3", "LORA", "OFT"}) and (model_id in conv_ids):
# for some reason, the Conv introduces a larger error
- atol, rtol = 0.3, 0.01
+ atol, rtol = 1e-2, 0.01
if (config.peft_type == "OFT") and not is_decoder:
atol, rtol = 0.3, 0.01
if model_id == "trl-internal-testing/tiny-Llama4ForCausalLM":It would also be nice if you'd found out why test_safe_merge doesn't detect this issue.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
_ConvNd.unmergeinsrc/peft/tuners/ia3/layer.pydoes not undo the bias scaling that_ConvNd.mergeapplied. After amerge() -> unmerge()round-trip on an IA3 non-feedforward Conv2d/Conv3d layer whose base layer hasbias is not None, the base layer's bias ends up multiplied byia3_l ** 2instead of being restored to its pre-merge value, and each subsequentdisable_adapters-triggered unmerge compounds the error.Root cause
In
_ConvNd.merge(line 254–256) the bias is scaled withtorch.mul:and in
_ConvNd.unmerge(line 280–283) the bias is "un-scaled" by also usingtorch.mul— exactly the same op:So
bias_merged = bias * ia3_land thenbias_unmerged = bias_merged * ia3_l = bias * ia3_l ** 2, rather thanbias.The corresponding weight path in the same function (line 278) uses
torch.divcorrectly:and the sibling
Linear.unmergebias path at line 152–155 also does it correctly:so the intended shape of the fix is unambiguous —
_ConvNd.unmergewas copy-pasted frommergefor the bias branch and never flipped to a div.Worked example
Start with
bias = [1.0, 2.0]and IA3scaling = [3.0, 0.5].[3.0, 1.0][3.0, 1.0][9.0, 0.5][1.0, 2.0]Fix
Mirror the
Linear.unmergepattern: divide byscaling.data + 1e-8(the+ 1e-8guards against any IA3 scaling entry that happens to be zero — same tolerance used on the weight line just above and inLinear.unmerge).if not self.is_feedforward and (base_layer.bias is not None): scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape) orig_dtype = base_layer.bias.data.dtype - base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype) + base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)One-line change, symmetric with the weight path right above and with
Linear.unmergein the same file. No effect on the feedforward orbias is Nonebranches.