-
Notifications
You must be signed in to change notification settings - Fork 178
[MoRI short term temp patch] GLM-5 FP8 MI355X SGLang disaggregated #1572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
functionstackx
merged 4 commits into
main
from
chang/glm5-fp8-mi355x-sglang-disagg-pr-2
May 28, 2026
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
a953663
Add GLM-5 FP8 MI355X SGLang disaggregated benchmark (PR-2).
f7bac55
Update benchmarks/multi_node/amd_utils/setup_deps.sh
ChangLiu0709 f3dcff0
fix: add FRAMEWORK to check_env_vars, fix NODELIST variable name
github-actions[bot] a8722db
[Klaud Cold] GLM-5 disagg: port MoRI conn.py overlay to fix PD startu…
functionstackx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # In-tree sglang patches for the MoRI PD-disagg path | ||
|
|
||
| This directory carries small Python overlays that get bind-mounted over | ||
| the upstream sglang source inside the docker container at runtime. | ||
| They are needed because some sglang releases ship known bugs in the | ||
| MoRI disaggregation backend that block our benchmark + accuracy | ||
| configs. | ||
|
|
||
| The mount is wired through the `EXTRA_DOCKER_MOUNTS` env var that | ||
| `job.slurm` consumes (an opt-in `${EXTRA_DOCKER_MOUNTS:-}` after the | ||
| existing `-v` block). The local-test driver scripts under | ||
| `scripts/sglang_disagg/` pre-set this env var to the path of the | ||
| relevant overlay; CI runners that need the patch can do the same. | ||
|
|
||
| ## `mori_conn.py` | ||
|
|
||
| Overlays | ||
| `/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py`. | ||
|
|
||
| Source: forked from the file shipped in | ||
| `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523` | ||
| (sglang [v0.5.12.post1](https://github.com/sgl-project/sglang/tree/v0.5.12.post1)). | ||
| Four logical edits, all confined to `MoriKVReceiver.send_state`, | ||
| `MoriKVReceiver._register_kv_args`, and | ||
| `MoriKVReceiver._send_swa_dsa_state`: | ||
|
|
||
| 1. **Sender flatten** — handle the framework's nested | ||
| `state_item_lens: List[List[int]]` instead of crashing in the | ||
| naked `struct.pack("I", item_len)` (the legacy `List[int]` | ||
| assumption). Idempotent for legacy flat callers. | ||
| 2. **`state_type` legacy fallback** — when the legacy singular | ||
| `kv_args.state_type` is `'none'` but `state_mem_descs` is non-empty, | ||
| read `kv_args.state_types[0]` (the modern plural API that Mooncake | ||
| and NIXL already use). Routes `MAMBA → _send_mamba_state` and | ||
| `DSA/SWA → _send_swa_dsa_state` correctly. | ||
| 3. **Consumer normalization** — flatten `state_item_lens` and | ||
| `state_dim_per_tensor` to flat `List[int]` once at the entry of | ||
| `send_state`, so the existing per-tensor index arithmetic | ||
| (`state_item_lens[i]`) and length checks | ||
| (`len(state_item_lens) == len(state_mem_descs)`) keep working. | ||
| 4. **DSA index rank+length normalization** — inside | ||
| `_send_swa_dsa_state`, before the `group_concurrent_contiguous` | ||
| call, ravel both `src_state_indices` and `dst_state_indices` to 1-D | ||
| and re-truncate to common length. Upstream's existing truncation | ||
| only slices the outer axis, leaving 2-D `(1, N)` arrays unchanged | ||
| and triggering an `np.diff` broadcasting error | ||
| (`shapes (1,12) (0,)`) for GLM-5 (single-DSA-component) prefill | ||
| traffic. See | ||
| `scripts/sglang_disagg/docs_glm5/01-bug-analysis.md` for the full | ||
| write-up. | ||
|
|
||
| Verified passing GSM8K = 0.978 ± 0.004 on Qwen3.5-397B-A17B-FP8 1P+1D | ||
| TP=8 dp-attn=false (matches and slightly exceeds upstream | ||
| [PR #22665](https://github.com/sgl-project/sglang/pull/22665)'s | ||
| reported 0.970 GSM8K on the bf16 baseline). GLM-5 (DSA) verification | ||
| in progress under | ||
| `scripts/sglang_disagg/docs_glm5/02-fix-and-verification.md`. | ||
|
|
||
| This is a stop-gap. The proper upstream fix is to migrate MoRI to the | ||
| plural `state_types: List[StateType]` API (full design + diff in | ||
| `scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`). | ||
|
|
||
| ## How to enable | ||
|
|
||
| ```bash | ||
| export EXTRA_DOCKER_MOUNTS="-v $DI_REPO_DIR/benchmarks/multi_node/amd_utils/patches/mori_conn.py:/sgl-workspace/sglang/python/sglang/srt/disaggregation/mori/conn.py:ro" | ||
| ``` | ||
|
|
||
| `$DI_REPO_DIR` is the InferenceX checkout root that `job.slurm` | ||
| already mounts into the container at `/workspace`. | ||
|
|
||
| When this env var is unset (CI default for runs that don't need the | ||
| patch), `${EXTRA_DOCKER_MOUNTS:-}` expands to the empty string and | ||
| container behavior is byte-identical to the unpatched path. | ||
|
|
||
| ## When to use which patch | ||
|
|
||
| | Image / version | Need `mori_conn.py` overlay? | | ||
| |---|---| | ||
| | `lmsysorg/sglang-rocm:v0.5.12.post1-rocm720-mi35x-20260523` | yes (Qwen3.5-MoE-FP8, GLM-5, any hybrid model on this image) | | ||
| | `lmsysorg/sglang-rocm:v0.5.10.post1-rocm720-mi35x-*` (used by `dsr1-fp4-*-disagg`) | not validated; same code path likely affected — try with the overlay if you hit the same `struct.error` | | ||
| | `rocm/sgl-dev:sglang-0.5.9-rocm720-mi35x-mori-*` (used by `dsr1-fp8-*-disagg`, `glm5-*-disagg`) | predates [PR #22665](https://github.com/sgl-project/sglang/pull/22665); different code paths; **do not** apply this overlay | | ||
|
|
||
| When upstream merges the proper fix (see | ||
| `scripts/sglang_disagg/docs/03-upstream-pr-proposal.md`) and that | ||
| fix lands in a published image, retire this overlay and the | ||
| `EXTRA_DOCKER_MOUNTS` knob can stay (still useful for future patches). |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.