Skip to content

Qwen3.5 DeltaNet breaks sample independence when using sequence packing #2131

@HayrapetyanZhirayr

Description

@HayrapetyanZhirayr

Describe the bug

Qwen3.5 models that use Qwen3_5GatedDeltaNet (DeltaNet / FLA backend) produce incorrect training behavior when using packed sequences (e.g., NEAT packing).

Although packing is explicitly demonstrated in the official config (qwen3_5_4b_neat_packing.yaml), the underlying DeltaNet implementation is not segment-aware and does not respect sample boundaries inside packed sequences.

As a result, information leaks across packed samples via recurrent state, leading to incorrect training dynamics.

Steps/Code to reproduce bug

  1. Prepare a dataset with two independent samples:

    • sample_1
    • sample_2
  2. Run the recipe (with Qwen3_5GatedDeltaNet) on each sample individually, and log the output of the first DeltaNet layer. (packing may be still on, but two separate runs or batches)

  3. Run the model on both samples packed sequentially (packing on)

  4. Split the output of step3 back into segments (sample_1, sample_2).

  5. Compare outputs (they will differ)

Expected behavior

In step5 of bug reproduction the outputs should be the same sample_{1,2} w/o packing and sample_{1,2} packed

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions