[MoRI short term patch] add qwen3.5-fp4 MI355X SGLang PD-disaggregation#1579
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 5845a91. Configure here.
|
|
||
| - config-keys: | ||
| - qwen3.5-fp4-mi355x-sglang-disagg | ||
| description: | ||
| - "Add Qwen3.5-397B-A17B-MXFP4 MI355X SGLang PD-disaggregation" | ||
| - "Bump image to lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523, 1P1D TP8/EP1, dp-attn false, conc [8..512]" | ||
| - "MoRI conn.py overlay (48e459bd) via job.slurm; launcher qwen3.5_fp4_mi355x_sglang-disagg.sh" | ||
| pr-link: XXX |
There was a problem hiding this comment.
🔴 The new perf-changelog entry at the bottom of perf-changelog.yaml has pr-link: XXX (a literal placeholder; every other entry uses a real https://github.com/SemiAnalysisAI/InferenceX/pull/N URL), and its config-keys lists only qwen3.5-fp4-mi355x-sglang-disagg even though the same PR also adds two new top-level keys to amd-master.yaml: qwen3.5-fp8-mi355x-sglang-disagg and glm5-fp8-mi355x-sglang-disagg. Please replace XXX with this PR's URL and either add the two missing keys to config-keys (with their own description bullets) or split them into their own changelog entries.
Extended reasoning...
What's wrong
The new changelog entry added at the bottom of perf-changelog.yaml (lines 2310–2317) has two issues:
-
Placeholder
pr-link: XXX. Every other entry in this file uses a real GitHub PR URL (e.g.pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1300on line 2309, and similarly at lines 2285, 2293, 2301).XXXis unique to this entry — grep confirms it appears on exactly one line in the file. It's clearly a stale placeholder the author forgot to fill in before pushing. -
config-keysis missing two of the three new configs. Theamd-master.yamldiff in this same PR adds three new top-level keys:qwen3.5-fp4-mi355x-sglang-disagg(line 351 of the new file)qwen3.5-fp8-mi355x-sglang-disagg(line 384)glm5-fp8-mi355x-sglang-disagg(line 537)
Plus the corresponding
models.yamlentries (Qwen3.5-397B-A17B-MXFP4,Qwen3.5-397B-A17B-FP8,GLM-5-FP8) and launcher scripts (qwen3.5_fp4_…,qwen3.5_fp8_…,glm5_fp8_…). All three are clearly intended to ship in this PR. The changelog block only lists the FP4 one underconfig-keysand the description bullets only mention the FP4 launcher.
Why it matters
pr-link: XXX is not a valid URL — any downstream tooling that parses pr-link as a hyperlink (changelog renderers, release-note generators, dashboards that link back to the PR) will at best render a broken link and at worst error out. The pydantic ChangelogEntry schema in utils/matrix_logic/validation.py only types pr-link as str, so it won't fail schema validation — making this exactly the kind of issue that slips past CI and surfaces in published artifacts.
The missing config-keys entries mean the FP8 Qwen3.5 and GLM-5 disagg configurations are introduced silently: anyone looking at the changelog to understand what changed for those configs will see nothing, even though the PR is clearly shipping them (full amd-master.yaml scenarios, models.yaml model entries with model-specific flags and tuning, dedicated launcher scripts, and GLM-5-specific env vars in env.sh).
Step-by-step verification
grep -n 'pr-link: XXX' perf-changelog.yaml→ matches exactly one line (2317).grep -n 'pr-link:' perf-changelog.yaml | head→ every other entry has a realhttps://github.com/...URL.- In the
amd-master.yamlhunk of this PR's diff, count the new top-level keys ending in-sglang-disaggadded by this PR (not present before): three keys, as listed above. - The new entry's
config-keys:list contains exactly one item —qwen3.5-fp4-mi355x-sglang-disagg. The other two are absent.
How to fix
Two one-line edits in perf-changelog.yaml:
- config-keys:
- qwen3.5-fp4-mi355x-sglang-disagg
- qwen3.5-fp8-mi355x-sglang-disagg
- glm5-fp8-mi355x-sglang-disagg
description:
- "Add Qwen3.5-397B-A17B-MXFP4 MI355X SGLang PD-disaggregation"
- "Add Qwen3.5-397B-A17B-FP8 MI355X SGLang PD-disaggregation (dp-attn=false; see amd-master.yaml comment for the upstream MoRI/is_deepep_class_backend issue)"
- "Add GLM-5-FP8 MI355X SGLang PD-disaggregation (NSA attention; sets SGLANG_ROCM_FUSED_DECODE_MLA=0 in env.sh, installs transformers glm_moe_dsa via setup_deps.sh)"
- "Bump image to lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523, 1P1D TP8/EP1, conc [8..512]"
- "MoRI conn.py overlay (48e459bd) via job.slurm; launchers qwen3.5_fp4_/qwen3.5_fp8_/glm5_fp8_mi355x_sglang-disagg.sh"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1579Or alternatively, split into three separate changelog entries (one per new config key) if you prefer to keep each entry narrowly scoped — that's also consistent with how other multi-config PRs in this file are structured.
Severity
Documentation/changelog hygiene only — no runtime impact, no risk of breaking benchmarks. But pr-link: XXX is a known-bad placeholder that should always be replaced before merging, and the omitted config-keys cause real misattribution in any automation that consumes this file. Easy pre-merge fix.
| $ISL $OSL "${CONC_LIST// /x}" inf \ | ||
| ${PREFILL_ENABLE_EP} ${PREFILL_ENABLE_DP} \ | ||
| ${DECODE_ENABLE_EP} ${DECODE_ENABLE_DP} \ | ||
| ${PREFILL_TP} ${DECODE_TP} \ | ||
| ${RANDOM_RANGE_RATIO}) | ||
|
|
||
| if [[ $? -ne 0 ]]; then |
There was a problem hiding this comment.
🔴 The glm5-fp8 disagg launcher (added in this PR) invokes submit.sh with only 15 positional args, omitting the 16th ${NODE_LIST:-} arg that the two sibling qwen3.5 disagg launchers in the same PR pass. Since submit.sh reads NODE_LIST=${16} to populate --nodelist for sbatch, any caller exporting NODE_LIST to pin glm5 disagg jobs to specific hostnames will be silently ignored — only for glm5. Append ${NODE_LIST:-} to match the qwen launchers (one-line fix).
Extended reasoning...
What the bug is
This PR adds three sibling launcher scripts for the new sglang-disagg configs:
benchmarks/multi_node/qwen3.5_fp4_mi355x_sglang-disagg.shbenchmarks/multi_node/qwen3.5_fp8_mi355x_sglang-disagg.shbenchmarks/multi_node/glm5_fp8_mi355x_sglang-disagg.sh
All three call amd_utils/submit.sh with the same positional-argument schema. The two qwen launchers pass 16 positional args, terminating with ${RANDOM_RANGE_RATIO} \\\n ${NODE_LIST:-}. The glm5 launcher stops at ${RANDOM_RANGE_RATIO} — only 15 positional args.
Code path that triggers it
benchmarks/multi_node/amd_utils/submit.sh line 76 reads:
NODE_LIST=${16}This is a positional-arg read, not an env-var read. Bash does not auto-fill positional args from the environment, so when the glm5 launcher omits the 16th arg, ${16} expands to the empty string regardless of whether the caller exported NODE_LIST=... before invoking the launcher.
Further down (around submit.sh line 130–145), NODE_LIST is parsed as comma-separated hostnames and used to build an sbatch --nodelist <csv> flag. So with the glm5 launcher, --nodelist is never set; SLURM picks any matching idle nodes.
Why existing code doesn't prevent it
The schema is purely positional (no getopts/named args), so there is no validation that the launcher and submit.sh agree on the trailing args. The two qwen scripts and the glm5 script are otherwise identical templates, so the divergence is almost certainly a copy-paste omission rather than intentional.
Impact
Asymmetric, surprising behavior between three sibling launchers added in the same PR:
NODE_LIST=node07,node08 bash qwen3.5_fp8_mi355x_sglang-disagg.sh→ sbatch is invoked with--nodelist node07,node08(pinned).NODE_LIST=node07,node08 bash glm5_fp8_mi355x_sglang-disagg.sh→ sbatch is invoked without--nodelist(any nodes; the env var is silently dropped).
Today nothing in the in-tree workflow plumbing sets NODE_LIST, so the feature is dormant for all three. But the moment a caller (CI step, runner script, or a human running locally) tries to pin GLM-5 disagg jobs using the documented escape hatch that already works for qwen, they'll get a silent failure with no error message — the job will land on whatever nodes SLURM picks, which on a shared cluster may be the wrong hardware class entirely.
Fix
One-line change in benchmarks/multi_node/glm5_fp8_mi355x_sglang-disagg.sh — append ${NODE_LIST:-} to the submit.sh call to match the two qwen siblings:
JOB_ID=$(bash ./submit.sh $PREFILL_NODES \
$PREFILL_NUM_WORKERS \
$DECODE_NODES \
$DECODE_NUM_WORKERS \
$ISL $OSL "${CONC_LIST// /x}" inf \
${PREFILL_ENABLE_EP} ${PREFILL_ENABLE_DP} \
${DECODE_ENABLE_EP} ${DECODE_ENABLE_DP} \
${PREFILL_TP} ${DECODE_TP} \
${RANDOM_RANGE_RATIO} \
${NODE_LIST:-})Step-by-step proof
- User exports
NODE_LIST=smci355-ccs-aus-07,smci355-ccs-aus-08and runsbash benchmarks/multi_node/glm5_fp8_mi355x_sglang-disagg.sh. - The launcher executes
bash ./submit.sh $1 ... ${RANDOM_RANGE_RATIO}— 15 positional args. The exportedNODE_LISTis in the child shell's env but not in$@. - Inside submit.sh,
NODE_LIST=${16}runs.$16is unset (only 15 args were passed), soNODE_LISTis set to the empty string. The shell-level assignment masks any env-inheritedNODE_LISTbecauseNODE_LIST=${16}is a plain assignment (notNODE_LIST=${16:-$NODE_LIST}). - The
if [[ -n "${NODE_LIST//[[:space:]]/}" ]]guard around line 130 evaluates false (empty after stripping whitespace), soNODELIST_OPTstays empty. sbatchis invoked without--nodelist; SLURM picks any matching idle nodes from the partition.
Same five-step recipe with qwen3.5_fp8_mi355x_sglang-disagg.sh substituted at step 1: step 2 passes 16 args, step 3 sets NODE_LIST=smci355-ccs-aus-07,smci355-ccs-aus-08, step 4 builds --nodelist smci355-ccs-aus-07,smci355-ccs-aus-08, and step 5 pins sbatch to those exact nodes. The only difference between the two paths is the missing 16th argument in the glm5 launcher.
| PREFILL_DECODE_DIFFERENT_TP="" | ||
| if [[ "$PREFILL_ENABLE_DP" != "$DECODE_ENABLE_DP" ]]; then | ||
| if is_truthy "$ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS"; then | ||
| if [[ "$DECODE_ENABLE_DP" == "true" ]]; then | ||
| PREFILL_DECODE_DIFFERENT_TP="--disaggregation-decode-tp ${DECODE_TP_SIZE} --disaggregation-decode-dp ${DECODE_TP_SIZE}" | ||
| else | ||
| PREFILL_DECODE_DIFFERENT_TP="--disaggregation-decode-tp ${DECODE_TP_SIZE} --disaggregation-decode-dp 1" | ||
| fi | ||
| else | ||
| echo "[INFO] Skipping --disaggregation-decode-tp/--disaggregation-decode-dp; set ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS=true to enable them." | ||
| fi | ||
| fi |
There was a problem hiding this comment.
🔴 PREFILL_DECODE_DIFFERENT_TP is populated at server.sh:211-217 but never appended to PREFILL_MODE_FLAGS, PREFILL_SERVER_CONFIG, or the PREFILL_CMD heredoc, so the new ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS opt-in plumbed through submit.sh → job.slurm → server.sh is dead code: --disaggregation-decode-tp / --disaggregation-decode-dp never reach sglang.launch_server. The fix is to append $PREFILL_DECODE_DIFFERENT_TP to the prefill flags (e.g. into PREFILL_MODE_FLAGS). The configs added in this PR all use matching prefill/decode dp-attn so the outer guard doesn't trigger today, but the documented feature is non-functional as-merged.
Extended reasoning...
What the bug is. The new block at benchmarks/multi_node/amd_utils/server.sh:208-222 introduces ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS, an opt-in switch threaded all the way from submit.sh (line 101) → job.slurm (lines 92, 306, 436) → server.sh (line 27). When the user opts in and PREFILL_ENABLE_DP != DECODE_ENABLE_DP, the block computes a flag string into PREFILL_DECODE_DIFFERENT_TP:
PREFILL_DECODE_DIFFERENT_TP="--disaggregation-decode-tp ${DECODE_TP_SIZE} --disaggregation-decode-dp ${DECODE_TP_SIZE}"
# or
PREFILL_DECODE_DIFFERENT_TP="--disaggregation-decode-tp ${DECODE_TP_SIZE} --disaggregation-decode-dp 1"But that variable is never read again. PREFILL_MODE_FLAGS (line 227), build_server_config (lines ~336-405), PREFILL_SERVER_CONFIG (line ~409), and the PREFILL_CMD heredoc that actually invokes sglang.launch_server (lines ~474, ~620) all proceed without it. Grepping the entire repo confirms only three references to PREFILL_DECODE_DIFFERENT_TP exist (the init and the two assignments inside the if/else), and --disaggregation-decode-tp / --disaggregation-decode-dp appear nowhere else in the repo.
Why existing code doesn't prevent it. The [INFO] Skipping ... log in the else branch is suppressed when the user opts in, which makes the bug worse: the user is told implicitly that the flags were enabled, but they were silently dropped before reaching sglang.
Step-by-step proof. Suppose a user picks a heterogeneous P/D config and sets ENABLE_DISAGG_DECODE_PARALLELISM_FLAGS=true, PREFILL_ENABLE_DP=false, DECODE_ENABLE_DP=true, DECODE_TP_SIZE=8:
submit.sh:101exportsENABLE_DISAGG_DECODE_PARALLELISM_FLAGS=trueinto the env.job.slurm:92reads it,:306re-exports it, and:436injects it into the docker-eblock.- Inside the container,
server.sh:27re-reads it; the block at L208 triggers becausefalse != true;is_truthyreturns 0; theDECODE_ENABLE_DP=truebranch fires, soPREFILL_DECODE_DIFFERENT_TP="--disaggregation-decode-tp 8 --disaggregation-decode-dp 8". - Execution falls through to L227 where
PREFILL_MODE_FLAGSis constructed without$PREFILL_DECODE_DIFFERENT_TP. build_server_config "prefill" ...(called at the line just belowDECODE_SERVER_CONFIG=...) concatenatesparallel_args + MODEL_BASE_FLAGS + MODEL_DP_FLAGS + PREFILL_MODE_FLAGS— none of which contain the variable.PREFILL_CMD="... python3 -m sglang.launch_server ... ${PREFILL_SERVER_CONFIG}"launches sglang without the disaggregation-decode flags.
Impact. The opt-in feature is silently a no-op. The configs added by this PR (qwen3.5-fp4/fp8 + glm5-fp8 sglang-disagg) all use dp-attn=false on both prefill and decode, so the outer if [[ "$PREFILL_ENABLE_DP" != "$DECODE_ENABLE_DP" ]] guard never fires for the in-tree YAMLs, meaning this PR's sweeps are unaffected. However, the entire new plumbing was added explicitly to enable heterogeneous P/D configurations — and every future user who flips the switch will silently get a mis-launched sglang. Worse, the user will not see the [INFO] Skipping ... log that would otherwise tip them off, so they will believe the flags took effect.
How to fix. Append $PREFILL_DECODE_DIFFERENT_TP to the prefill flag bundle. Minimal patch:
PREFILL_MODE_FLAGS="--mem-fraction-static ${PREFILL_MEM_FRACTION_STATIC} ... --cuda-graph-bs ${prefill_cuda_graph_bs[*]} ${PREFILL_DECODE_DIFFERENT_TP} "Or thread it through build_server_config and the specific_config branch for prefill. Either way, the variable needs to actually reach PREFILL_CMD.
5845a91 to
5053106
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26580860737 |
Add qwen3.5-fp4-mi355x-sglang-disagg recipe; MXFP4 model; MoRI overlay.
5053106 to
9e26be6
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26588184290 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=26590332882 |
|
/reuse-sweep-run |
Resolve perf-changelog.yaml append conflict by keeping all three new entries: main's #1579 (qwen3.5-fp4-mi355x-sglang-disagg) plus this branch's #1574 re-trigger and the AMD multinode measured-power entry. Append-only file (process_changelog rejects deletions); no lines removed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add Qwen3.5-397B-A17B-MXFP4 MI355X SGLang PD-disaggregation
Note
Low Risk
Benchmark and CI configuration only; no application runtime or security-sensitive code paths.
Overview
Adds MI355X prefill–decode disaggregation benchmarks for Qwen3.5-397B-A17B MXFP4 on SGLang, parallel to the existing FP8 disagg entry.
A new
qwen3.5-fp4-mi355x-sglang-disaggmatrix entry useslmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523, modelamd/Qwen3.5-397B-A17B-MXFP4, 1P1D with prefill/decode TP8/EP1, dp-attn false, and fixed-seq sweeps at 1k1k and 8k1k with conc 8–512 (no MTP).Supporting changes add
Qwen3.5-397B-A17B-MXFP4server flags inbenchmarks/multi_node/amd_utils/models.yaml(MoRI transfer, aiter attention, prefill/decode tuning) and launcherqwen3.5_fp4_mi355x_sglang-disagg.sh, which maps CI env vars and callsamd_utils/submit.shlike the FP8 disagg script.perf-changelog.yamldocuments the config and MoRIconn.pyoverlay viajob.slurm.Reviewed by Cursor Bugbot for commit 9e26be6. Bugbot is set up for automated code reviews on this repo. Configure here.