Fix yarn rope factor and mask_token_id in HF conversion by jlamypoirier · Pull Request #529 · ServiceNow/Fast-LLM

jlamypoirier · 2026-06-01T16:49:22Z

Fixes two real bugs in the HF config converter, surfaced while investigating the broken diffusion conversions.

Fixes

Yarn rope factor. The yarn branch of the Llama rope config converter omitted the factor key (Fast-LLM's YarnRotaryConfig.scale_factor), unlike the llama3 branch right above it. transformers' yarn rope validation requires it, so exporting any yarn model produced an HF config that failed to instantiate (Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}). Added the symmetric factor ↔ scale_factor mapping on export and import.

mask_token_id allowlist. Diffusion configs (Dream, DiffusionLlama) carry a mask_token_id default the inherited Llama/Qwen2 converters don't consume. It's a generation/inference token id Fast-LLM doesn't store, in the same category as the bos/eos/pad ids already allowlisted; added it to _HF_METADATA_ALLOWLIST.

With these, test_conversion (config + weight round-trip) now passes for both diffusion_llama and dream.

Verification (cluster, transformers 4.57.5)

test_converters.py walker: 37 passed (no regression).
test_checkpoint.py --models llama diffusion_llama dream --run-extra-slow: 106 passed, 2 failed. llama fully green (no regression); the only failures are test_huggingface_model[diffusion_llama] and [dream] — see below.

Remaining issues (NOT fixed here; formats stay `convert: broken`)

The diffusion formats still fail test_huggingface_model, which loads the exported model through its custom modeling code and compares the forward pass against Fast-LLM. These are not converter bugs:

Bidirectional forward divergence (dream, confirmed). Dream is a bidirectional diffusion LM; the HF model's forward diverges structurally from Fast-LLM's causal run (completely different logits/hidden states, not a tolerance miss). Matching it needs bidirectional-attention modeling in Fast-LLM, not a converter change.
Missing generation_config.json (diffusion_llama, confirmed). The custom modeling from_pretrained requires a generation_config.json that Fast-LLM doesn't export (dream ships one in its external dir; diffusion_llama doesn't). Loading fails before the forward, so its forward divergence (likely the same bidirectional issue) is unverified.

Both are modeling / model-load concerns, separate and larger than this PR. The fixture comments in tests/utils/model_configs.py now state this in place of the misleading "Conversion is broken" TODO.

Note: the yarn factor fix isn't exercised by normal CI today — no non-diffusion fixture uses yarn, and the diffusion convert group stays broken on the forward test.

🤖 Generated with Claude Code

The yarn branch of the rope config converter omitted the `factor` key (Fast-LLM's `YarnRotaryConfig.scale_factor`), unlike the llama3 branch right above it. transformers' yarn rope validation requires it, so exporting a yarn config produced an HF config that failed to instantiate (`Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}`) — the diffusion_llama conversion failure. Add the symmetric factor <-> scale_factor mapping on both export and import. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Diffusion configs (Dream, DiffusionLlama) carry a mask_token_id default that the inherited Llama/Qwen2 converters do not consume; it is a generation/inference token id Fast-LLM does not store, in the same category as the bos/eos/pad ids already allowlisted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Conversion (config + weights) now works for diffusion_llama and dream; the misleading "Conversion is broken" TODO is replaced with the actual reason the convert group stays `broken`: test_huggingface_model fails because these are bidirectional diffusion LMs whose HF forward diverges from Fast-LLM's causal run (and diffusion_llama additionally lacks an exported generation_config.json). Both are modeling/model-load concerns, not converter bugs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jlamypoirier and others added 3 commits June 1, 2026 12:10

jlamypoirier mentioned this pull request Jun 1, 2026

[bug] Conversion broken for diffusion models #320

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix yarn rope factor and mask_token_id in HF conversion#529

Fix yarn rope factor and mask_token_id in HF conversion#529
jlamypoirier wants to merge 3 commits into
mainfrom
jlp_diffusion_yarn_rope

jlamypoirier commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jlamypoirier commented Jun 1, 2026

Fixes

Verification (cluster, transformers 4.57.5)

Remaining issues (NOT fixed here; formats stay convert: broken)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Remaining issues (NOT fixed here; formats stay `convert: broken`)