Skip to content

Fix yarn rope factor and mask_token_id in HF conversion#529

Open
jlamypoirier wants to merge 3 commits into
mainfrom
jlp_diffusion_yarn_rope
Open

Fix yarn rope factor and mask_token_id in HF conversion#529
jlamypoirier wants to merge 3 commits into
mainfrom
jlp_diffusion_yarn_rope

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

Fixes two real bugs in the HF config converter, surfaced while investigating the broken diffusion conversions.

Fixes

Yarn rope factor. The yarn branch of the Llama rope config converter omitted the factor key (Fast-LLM's YarnRotaryConfig.scale_factor), unlike the llama3 branch right above it. transformers' yarn rope validation requires it, so exporting any yarn model produced an HF config that failed to instantiate (Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}). Added the symmetric factorscale_factor mapping on export and import.

mask_token_id allowlist. Diffusion configs (Dream, DiffusionLlama) carry a mask_token_id default the inherited Llama/Qwen2 converters don't consume. It's a generation/inference token id Fast-LLM doesn't store, in the same category as the bos/eos/pad ids already allowlisted; added it to _HF_METADATA_ALLOWLIST.

With these, test_conversion (config + weight round-trip) now passes for both diffusion_llama and dream.

Verification (cluster, transformers 4.57.5)

  • test_converters.py walker: 37 passed (no regression).
  • test_checkpoint.py --models llama diffusion_llama dream --run-extra-slow: 106 passed, 2 failed. llama fully green (no regression); the only failures are test_huggingface_model[diffusion_llama] and [dream] — see below.

Remaining issues (NOT fixed here; formats stay convert: broken)

The diffusion formats still fail test_huggingface_model, which loads the exported model through its custom modeling code and compares the forward pass against Fast-LLM. These are not converter bugs:

  1. Bidirectional forward divergence (dream, confirmed). Dream is a bidirectional diffusion LM; the HF model's forward diverges structurally from Fast-LLM's causal run (completely different logits/hidden states, not a tolerance miss). Matching it needs bidirectional-attention modeling in Fast-LLM, not a converter change.
  2. Missing generation_config.json (diffusion_llama, confirmed). The custom modeling from_pretrained requires a generation_config.json that Fast-LLM doesn't export (dream ships one in its external dir; diffusion_llama doesn't). Loading fails before the forward, so its forward divergence (likely the same bidirectional issue) is unverified.

Both are modeling / model-load concerns, separate and larger than this PR. The fixture comments in tests/utils/model_configs.py now state this in place of the misleading "Conversion is broken" TODO.

Note: the yarn factor fix isn't exercised by normal CI today — no non-diffusion fixture uses yarn, and the diffusion convert group stays broken on the forward test.

🤖 Generated with Claude Code

jlamypoirier and others added 3 commits June 1, 2026 12:10
The yarn branch of the rope config converter omitted the `factor` key (Fast-LLM's
`YarnRotaryConfig.scale_factor`), unlike the llama3 branch right above it. transformers' yarn rope
validation requires it, so exporting a yarn config produced an HF config that failed to instantiate
(`Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}`) — the diffusion_llama
conversion failure. Add the symmetric factor <-> scale_factor mapping on both export and import.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diffusion configs (Dream, DiffusionLlama) carry a mask_token_id default that the inherited
Llama/Qwen2 converters do not consume; it is a generation/inference token id Fast-LLM does not store,
in the same category as the bos/eos/pad ids already allowlisted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Conversion (config + weights) now works for diffusion_llama and dream; the misleading
"Conversion is broken" TODO is replaced with the actual reason the convert group stays `broken`:
test_huggingface_model fails because these are bidirectional diffusion LMs whose HF forward diverges
from Fast-LLM's causal run (and diffusion_llama additionally lacks an exported generation_config.json).
Both are modeling/model-load concerns, not converter bugs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant