Qwen3.5 DeltaNet breaks sample independence when using sequence packing

**Describe the bug**

Qwen3.5 models that use Qwen3_5GatedDeltaNet (DeltaNet / FLA backend) produce incorrect training behavior when using packed sequences (e.g., NEAT packing).

Although packing is explicitly demonstrated in the official config (qwen3_5_4b_neat_packing.yaml), the underlying DeltaNet implementation is not segment-aware and does not respect sample boundaries inside packed sequences.

As a result, information leaks across packed samples via recurrent state, leading to incorrect training dynamics.

**Steps/Code to reproduce bug**

1. Prepare a dataset with two independent samples:
    * sample_1
    * sample_2
   
2. Run the recipe (with Qwen3_5GatedDeltaNet) on each sample individually, and log the output of the first DeltaNet layer. (packing may be still on, but two separate runs or batches)

3. Run the model on both samples packed sequentially (packing on)

4. Split the output of step3  back into segments (sample_1, sample_2). 

5. Compare outputs (they will differ)


**Expected behavior**

In step5 of bug reproduction the outputs should be the same sample_{1,2} w/o packing and sample_{1,2} packed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3.5 DeltaNet breaks sample independence when using sequence packing #2131

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Qwen3.5 DeltaNet breaks sample independence when using sequence packing #2131

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions