Skip to content

refactor(mtp): extract BaseMTPModel mixin shared by existing MTP draft models#1337

Merged
sufubao merged 1 commit into
ModelTC:mainfrom
sufubao:qw35_mtp_refactor
Jun 9, 2026
Merged

refactor(mtp): extract BaseMTPModel mixin shared by existing MTP draft models#1337
sufubao merged 1 commit into
ModelTC:mainfrom
sufubao:qw35_mtp_refactor

Conversation

@sufubao

@sufubao sufubao commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

What

Extracts a small BaseMTPModel mixin that consolidates the draft-model wiring duplicated across the existing MTP models — deepseek_mtp, mistral_mtp, glm4_moe_lite_mtp, and qwen3_moe_mtp.

Each of those draft models repeated the same setup: reuse the main model's req/mem managers and rope caches, pop the main_model / mtp_previous_draft_models kwargs before the base __init__, and mark themselves as draft models. This moves that shared logic into one place.

Why

A behaviour-preserving cleanup (+8/−99 across the four models) that removes duplication and gives a single home for the is_mtp_draft_model marker. It is a preparatory refactor split out of a larger Qwen3.5 / Qwen3.5-MoE MTP change so the follow-up feature PR stays focused on the new capability.

Changes

  • New lightllm/models/base_mtp_model.py — the BaseMTPModel mixin, plus a unit test for its wiring.
  • deepseek_mtp / mistral_mtp / glm4_moe_lite_mtp / qwen3_moe_mtp model classes now mix in BaseMTPModel (mixed in before the concrete base model so the shared overrides win via MRO).

Testing

  • pre-commit (black 21.12b0 + flake8 6.1.0) clean.
  • unit_tests/models/test_base_mtp_model_mixin.py covers the mixin's kwargs-popping, manager/rope sharing, and the is_mtp_draft_model marker.

…t models

Consolidate the duplicated draft-model wiring across the deepseek / mistral / glm4_moe_lite / qwen3_moe MTP models into a single BaseMTPModel mixin: draft models reuse the main model's req/mem managers and rope caches, pop the main_model / previous-draft-models kwargs before the base __init__, and carry the is_mtp_draft_model marker. Behaviour-preserving (+8/-99 across the four models); no new dependencies. Includes a unit test for the mixin wiring.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new BaseMTPModel mixin class to consolidate duplicated initialization logic, manager sharing, and ROPE cache reuse across multiple MTP draft models (DeepSeek, GLM4, Mistral, and Qwen3). The review feedback correctly identifies a critical issue with the cooperative multiple inheritance (MRO) chain in BaseMTPModel._init_custom, which overrides the method without calling super()._init_custom(). Suggestions are provided to call super()._init_custom() and to update the unit tests using a dummy subclass to properly test this cooperative inheritance behavior.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@ModelTC ModelTC deleted a comment from gemini-code-assist Bot Jun 9, 2026
@ModelTC ModelTC deleted a comment from gemini-code-assist Bot Jun 9, 2026
@sufubao sufubao merged commit e03ef9a into ModelTC:main Jun 9, 2026
1 check passed
@sufubao sufubao deleted the qw35_mtp_refactor branch June 9, 2026 01:01
sufubao added a commit that referenced this pull request Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant