Fix yarn rope factor and mask_token_id in HF conversion#529
Open
jlamypoirier wants to merge 3 commits into
Open
Fix yarn rope factor and mask_token_id in HF conversion#529jlamypoirier wants to merge 3 commits into
jlamypoirier wants to merge 3 commits into
Conversation
The yarn branch of the rope config converter omitted the `factor` key (Fast-LLM's
`YarnRotaryConfig.scale_factor`), unlike the llama3 branch right above it. transformers' yarn rope
validation requires it, so exporting a yarn config produced an HF config that failed to instantiate
(`Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}`) — the diffusion_llama
conversion failure. Add the symmetric factor <-> scale_factor mapping on both export and import.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diffusion configs (Dream, DiffusionLlama) carry a mask_token_id default that the inherited Llama/Qwen2 converters do not consume; it is a generation/inference token id Fast-LLM does not store, in the same category as the bos/eos/pad ids already allowlisted. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Conversion (config + weights) now works for diffusion_llama and dream; the misleading "Conversion is broken" TODO is replaced with the actual reason the convert group stays `broken`: test_huggingface_model fails because these are bidirectional diffusion LMs whose HF forward diverges from Fast-LLM's causal run (and diffusion_llama additionally lacks an exported generation_config.json). Both are modeling/model-load concerns, not converter bugs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes two real bugs in the HF config converter, surfaced while investigating the broken diffusion conversions.
Fixes
Yarn rope
factor. The yarn branch of the Llama rope config converter omitted thefactorkey (Fast-LLM'sYarnRotaryConfig.scale_factor), unlike the llama3 branch right above it. transformers' yarn rope validation requires it, so exporting any yarn model produced an HF config that failed to instantiate (Missing required keys in rope_parameters for 'rope_type'='yarn': {'factor'}). Added the symmetricfactor↔scale_factormapping on export and import.mask_token_idallowlist. Diffusion configs (Dream, DiffusionLlama) carry amask_token_iddefault the inherited Llama/Qwen2 converters don't consume. It's a generation/inference token id Fast-LLM doesn't store, in the same category as the bos/eos/pad ids already allowlisted; added it to_HF_METADATA_ALLOWLIST.With these,
test_conversion(config + weight round-trip) now passes for bothdiffusion_llamaanddream.Verification (cluster, transformers 4.57.5)
test_converters.pywalker: 37 passed (no regression).test_checkpoint.py --models llama diffusion_llama dream --run-extra-slow: 106 passed, 2 failed. llama fully green (no regression); the only failures aretest_huggingface_model[diffusion_llama]and[dream]— see below.Remaining issues (NOT fixed here; formats stay
convert: broken)The diffusion formats still fail
test_huggingface_model, which loads the exported model through its custom modeling code and compares the forward pass against Fast-LLM. These are not converter bugs:dream, confirmed). Dream is a bidirectional diffusion LM; the HF model's forward diverges structurally from Fast-LLM's causal run (completely different logits/hidden states, not a tolerance miss). Matching it needs bidirectional-attention modeling in Fast-LLM, not a converter change.generation_config.json(diffusion_llama, confirmed). The custom modelingfrom_pretrainedrequires ageneration_config.jsonthat Fast-LLM doesn't export (dream ships one in its external dir; diffusion_llama doesn't). Loading fails before the forward, so its forward divergence (likely the same bidirectional issue) is unverified.Both are modeling / model-load concerns, separate and larger than this PR. The fixture comments in
tests/utils/model_configs.pynow state this in place of the misleading "Conversion is broken" TODO.Note: the yarn
factorfix isn't exercised by normal CI today — no non-diffusion fixture uses yarn, and the diffusion convert group staysbrokenon the forward test.🤖 Generated with Claude Code