Skip to content

FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149

Open
Chessing234 wants to merge 1 commit intohuggingface:mainfrom
Chessing234:fix/ia3-convnd-unmerge-bias-div
Open

FIX IA3 _ConvNd.unmerge bias uses torch.mul instead of torch.div#3149
Chessing234 wants to merge 1 commit intohuggingface:mainfrom
Chessing234:fix/ia3-convnd-unmerge-bias-div

Conversation

@Chessing234
Copy link
Copy Markdown
Contributor

@Chessing234 Chessing234 commented Apr 11, 2026

Bug

_ConvNd.unmerge in src/peft/tuners/ia3/layer.py does not undo the bias scaling that _ConvNd.merge applied. After a merge() -> unmerge() round-trip on an IA3 non-feedforward Conv2d/Conv3d layer whose base layer has bias is not None, the base layer's bias ends up multiplied by ia3_l ** 2 instead of being restored to its pre-merge value, and each subsequent disable_adapters-triggered unmerge compounds the error.

Root cause

In _ConvNd.merge (line 254–256) the bias is scaled with torch.mul:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

and in _ConvNd.unmerge (line 280–283) the bias is "un-scaled" by also using torch.mul — exactly the same op:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    orig_dtype = base_layer.bias.data.dtype
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

So bias_merged = bias * ia3_l and then bias_unmerged = bias_merged * ia3_l = bias * ia3_l ** 2, rather than bias.

The corresponding weight path in the same function (line 278) uses torch.div correctly:

base_layer.weight.data = torch.div(base_layer.weight.data, ia3_scaling + 1e-8).to(orig_dtype)

and the sibling Linear.unmerge bias path at line 152–155 also does it correctly:

if not self.is_feedforward and (base_layer.bias is not None):
    scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
    orig_dtype = base_layer.bias.data.dtype
    base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)

so the intended shape of the fix is unambiguous — _ConvNd.unmerge was copy-pasted from merge for the bias branch and never flipped to a div.

Worked example

Start with bias = [1.0, 2.0] and IA3 scaling = [3.0, 0.5].

step current code expected
after merge [3.0, 1.0] [3.0, 1.0]
after unmerge [9.0, 0.5] [1.0, 2.0]

Fix

Mirror the Linear.unmerge pattern: divide by scaling.data + 1e-8 (the + 1e-8 guards against any IA3 scaling entry that happens to be zero — same tolerance used on the weight line just above and in Linear.unmerge).

                 if not self.is_feedforward and (base_layer.bias is not None):
                     scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape)
                     orig_dtype = base_layer.bias.data.dtype
-                    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
+                    base_layer.bias.data = torch.div(base_layer.bias.data, scaling.data + 1e-8).to(orig_dtype)

One-line change, symmetric with the weight path right above and with Linear.unmerge in the same file. No effect on the feedforward or bias is None branches.

\`_ConvNd.unmerge\` in \`src/peft/tuners/ia3/layer.py\` is supposed to
undo the corresponding \`_ConvNd.merge\` by inverting the (IA)^3
scaling applied to the base layer's weight **and** bias. The weight
path correctly divides by the IA3 scaling (line 278):

    base_layer.weight.data = torch.div(
        base_layer.weight.data, ia3_scaling + 1e-8
    ).to(orig_dtype)

but the bias path right below it accidentally uses \`torch.mul\`
again, the same op \`merge\` uses:

    # merge (line 256)
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)
    ...
    # unmerge (line 283)  <-- BUG: should be torch.div
    base_layer.bias.data = torch.mul(base_layer.bias.data, scaling.data).to(orig_dtype)

So a merge followed by an unmerge leaves the bias scaled by
\`ia3_l ** 2\` instead of being restored to its original value. For
any non-feedforward IA3 Conv layer (Conv2d / Conv3d) whose base
layer has \`bias is not None\`, \`merge() -> unmerge()\` permanently
corrupts the bias — and subsequent \`disable_adapters\`-triggered
unmerges compound the error.

The sibling \`Linear.unmerge\` in the same file (line 155) already
uses the correct pattern:

    base_layer.bias.data = torch.div(
        base_layer.bias.data, scaling.data + 1e-8
    ).to(orig_dtype)

Mirror that pattern in \`_ConvNd.unmerge\`: divide by
\`scaling.data + 1e-8\` so the bias unmerge is the inverse of the
bias merge, and the \`+ 1e-8\` tolerance guards against division by
any IA3 scaling entry that happens to be zero.
Copy link
Copy Markdown
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, nice find.

Can you also reduce the tolerance in test_custom_models.py for test_merge? (to verify that the fix works)

--- a/tests/testing_common.py
+++ b/tests/testing_common.py
@@ -608,7 +608,7 @@ class PeftCommonTester:
                 atol, rtol = 1e-3, 1e-3
             if (config.peft_type in {"IA3", "LORA", "OFT"}) and (model_id in conv_ids):
                 # for some reason, the Conv introduces a larger error
-                atol, rtol = 0.3, 0.01
+                atol, rtol = 1e-2, 0.01
             if (config.peft_type == "OFT") and not is_decoder:
                 atol, rtol = 0.3, 0.01
             if model_id == "trl-internal-testing/tiny-Llama4ForCausalLM":

It would also be nice if you'd found out why test_safe_merge doesn't detect this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants