Skip to content

[NV] Update B300 DSV4 SGLang Pareto sweep#1575

Open
Ankur-singh wants to merge 3 commits into
mainfrom
dsv4-b300-sglang-html-frontier
Open

[NV] Update B300 DSV4 SGLang Pareto sweep#1575
Ankur-singh wants to merge 3 commits into
mainfrom
dsv4-b300-sglang-html-frontier

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

@Ankur-singh Ankur-singh commented May 27, 2026

Rebased copy of #1552 with origin/main merged in and the perf-changelog.yaml conflict resolved. Original PR: #1552.

Summary

  • update dsv4-fp4-b300-sglang to lmsysorg/sglang:nightly-dev-cu13-20260522-7cf193fe
  • align the B300 DSV4 non-MTP SGLang sweep with the 2026-05-19 8k/1k submission frontier:
    • TP8 no DP attention: c1, c2, c4, c8, c16, c32, c64
    • DEP8 with DP attention: c512, c768, c1024, c1536, c2048
  • map the single-node launch script to the TP-only flashinfer_mxfp4 recipe vs the DEP8 mixed-chunk DeepEP/DeepGEMM recipe

Testing

  • bash -n benchmarks/single_node/dsv4_fp4_b300_sglang.sh
  • git diff --check
  • uv run --python 3.13 --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-sglang --config-files .github/configs/nvidia-master.yaml --no-evals
  • uv run --python 3.13 --with pydantic --with pyyaml python utils/matrix_logic/generate_sweep_configs.py test-config --config-keys dsv4-fp4-b300-sglang --config-files .github/configs/nvidia-master.yaml --evals-only
  • uv run --python 3.13 --with pydantic --with pyyaml python utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc

Note

Low Risk
Changes only benchmark matrix YAML, a single-node launch script, and perf changelog—no production serving or auth paths.

Overview
Aligns dsv4-fp4-b300-sglang with the 2026-05-19 8k/1k submission Pareto: switches to lmsysorg/sglang:nightly-dev-cu13-20260522-7cf193fe, documents the two-line frontier, and replaces the old mixed TP/EP/concurrency grid with explicit dp-attn search spaces—TP8/EP1 conc-list 1–64 and TP8/EP8 DP-attention 512–2048 for both 1024/1024 and 8192/1024 scenarios.

benchmarks/single_node/dsv4_fp4_b300_sglang.sh is collapsed from many per-CONC branches into two launch paths: DP_ATTENTION=false → flashinfer_mxfp4 TP-only (chunked prefill 8192, fixed max-running-requests 512); DP_ATTENTION=true → DEP8 mixed-chunk megamoe throughput (MegaMoE/NCCL env, static mem/SWA tuning). Non-eval runs now pass --context-length 16384; DeepGEMM precompile is toggled per branch.

perf-changelog.yaml records the config and image change for dsv4-fp4-b300-sglang.

Reviewed by Cursor Bugbot for commit deee4cc. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit deee4cc. Configure here.

- { tp: 8, ep: 8, dp-attn: true, conc-start: 2048, conc-end: 2048 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 4096, conc-end: 4096 }
- { tp: 8, ep: 1, dp-attn: false, conc-list: [1, 2, 4, 8, 16, 32, 64] }
- { tp: 8, ep: 8, dp-attn: true, conc-list: [512, 768, 1024, 1536, 2048] }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-node full-sweep crashes on conc-list configs

Medium Severity

The new dsv4-fp4-b300-sglang config uses conc-list in single-node fixed-seq-len search-space entries, but the generate_full_sweep() function's single-node code path unconditionally accesses conc-start and conc-end without first checking for conc-list. Running full-sweep over the nvidia master config will crash with a KeyError. The test-config path (used by the PR's CI and process_changelog.py) handles conc-list correctly, which is why the PR's own tests pass — but the full-sweep command (documented in the README and available via e2e-tests.yml) is broken for this config.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit deee4cc. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

Comment thread perf-changelog.yaml
Comment on lines +3175 to +3181
- config-keys:
- dsv4-fp4-b300-sglang
description:
- "Update DeepSeek-V4-Pro FP4 B300 SGLang non-MTP sweep to the 2026-05-19 8k/1k submission frontier: TP8 no-DP-attention c1-c64 and DEP8 DP-attention c512/c768/c1024/c1536/c2048"
- "Use lmsysorg/sglang:nightly-dev-cu13-20260522-7cf193fe to pick up the merged SGLang warmup path"
- "Map dp-attn=false to TP8 flashinfer_mxfp4 with chunked-prefill 8192; map dp-attn=true to DEP8 mixed-chunk MegaMoE throughput settings"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1552
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new perf-changelog.yaml entry's pr-link points to PR #1552 (the closed/superseded original), but the merging PR is #1575. This breaks the file's convention (every neighboring entry links to its own merging PR) and will trip utils/merge_with_reuse.sh: its conflict-resolution helper skips entries whose link doesn't end with /pull/<current-pr>, and the post-merge assert last["pr-link"].endswith("/$PR") would fail. Change the link to /pull/1575 (or use the XXX placeholder).

Extended reasoning...

What the bug is

The new entry appended at perf-changelog.yaml:3175-3181 sets:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1552

…but the PR being merged is #1575 (the rebased successor — the PR description itself notes "Rebased copy of #1552 with origin/main merged in"). PR #1552 is closed and will not contain the merge commit for these changes.

Why this is a real issue, not just cosmetic

Every other recent entry in perf-changelog.yaml links to its own merging PR (e.g. the immediately preceding dsv4-fp4-mi355x-sglang entry → /pull/1568, others → /pull/1555, /pull/1558, /pull/1516, /pull/1354). AGENTS.md documents this convention. More importantly, the repo's canonical merge tool utils/merge_with_reuse.sh (referenced from .claude/commands/merge-prs.md) enforces it programmatically:

  • At ~line 136, when resolving perf-changelog.yaml conflicts it filters incoming entries with: if "XXX" not in link and not link.endswith(f"/pull/{pr}"): continue. With pr=1575 and link=…/pull/1552, this entry would be silently skipped during automated conflict resolution, producing "No PR contributions found".
  • At ~line 172, after the merge it runs assert last["pr-link"].endswith("/$PR"). With $PR=1575 and the last entry pointing at /pull/1552, this assertion would fail.

Step-by-step proof

  1. Maintainer runs utils/merge_with_reuse.sh 1575.
  2. The script fetches origin/main, attempts the merge, and hits a conflict in perf-changelog.yaml (likely, given how frequently this file changes).
  3. The embedded Python helper walks the incoming entries from the PR branch. For each new entry it checks:
    if "XXX" not in link and not link.endswith(f"/pull/{pr}"): continue
    With pr = "1575" and link = "https://github.com/SemiAnalysisAI/InferenceX/pull/1552", the condition is true → the entry is dropped from the merged result.
  4. Even if no conflict arises (so the helper isn't invoked), the post-merge sanity check runs:
    assert last["pr-link"].endswith(f"/{pr}")
    Since the last entry is the new one and ends with /1552, this assertion raises and the merge tool aborts.
  5. Downstream readers using pr-link to find the diff/merge commit land on a closed, superseded PR instead of the one that actually merged.

Fix

Update line 3181 to:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1575

or, equivalently, use the XXX placeholder that merge_with_reuse.sh rewrites at merge time:

  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

The historical reference to #1552 (which the PR description already provides) can stay in the PR description; the changelog entry should point at the PR that actually lands the change, both for convention and for tooling correctness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants