Skip to content

T-303: sector heuristic on HSJ/XFORM — guard verification + regression test#247

Merged
ms609 merged 6 commits into
cpp-searchfrom
feature/t303-sector-hsj-guard
Jun 15, 2026
Merged

T-303: sector heuristic on HSJ/XFORM — guard verification + regression test#247
ms609 merged 6 commits into
cpp-searchfrom
feature/t303-sector-hsj-guard

Conversation

@ms609

@ms609 ms609 commented Jun 12, 2026

Copy link
Copy Markdown
Owner

T-303 (P2): Sector heuristic degrades silently on HSJ/XFORM datasets

build_reduced_dataset() in src/ts_sector.cpp does not copy hierarchy_blocks, tip_labels, n_orig_chars, hsj_alpha, or sankoff_* into the reduced dataset. Since rd.data.scoring_mode is copied, an unguarded sector would dispatch hsj_score()/Sankoff against empty hierarchy/Sankoff data and silently degrade to Fitch-only — a wrong internal accept/reject heuristic (missed improvements, accept-then-revert churn). Final acceptance scores are unaffected (recomputed on the full dataset). EW, IW, PROFILE are fine.

What this PR does

  • Guards already present. The approach-(a) fallback guards in rss_search and xss_search (the only two routines that call build_reduced_dataset) are already on cpp-search (commit e5ff2942), mirroring the existing T-275 guard. No change needed there.
  • css_search documented as safe. It is the third sector routine but is not affected: it never builds a reduced dataset — it runs tbr_search() with a sector_mask against the full ds, so score_tree() dispatches hsj_score()/Sankoff with complete data and its internal heuristic is correct for every scoring mode. Added an in-code comment so it is not re-flagged. (Guarding it would wrongly disable a working HSJ/XFORM path.)
  • Regression test. New Tier-2 test drives the full HSJ + sectorial pipeline (rss/xss guarded, css on full ds), asserting it completes, returns valid trees with a finite positive HSJ score, and is deterministic across identical-seed runs.

Why not approach (b) (copy the fields)

Not tractable here. The HTU pseudo-tip is a Fitch from_above state-set with no valid HSJ tip_labels (original-character tokens) or Sankoff tip_costs representation, so the reduced dataset cannot be made correct for those modes without new from-above machinery in both scoring kernels. Approach (a) is the conservative, T-275-consistent fix.

Note on testability

T-303 is silent by construction: final scores are always recomputed on the full dataset, so a guard regression cannot be caught by an absolute-score assertion. The test therefore locks in pipeline stability/determinism (it would catch a crash or score desync introduced by the guarded sector path).

Dispatched agent t303. GHA agent-check: run 27398557784. Found by /red-team area 5 (2026-05-26). PROFILE+IW are fine.

🤖 Generated with Claude Code

…sion test

build_reduced_dataset() omits hierarchy_blocks/tip_labels/n_orig_chars/
hsj_alpha/sankoff_* fields, so rss_search/xss_search are already guarded to
fall back under HSJ/XFORM (commit e5ff294, same class as the T-275 guard).

css_search needs no guard: it never builds a reduced dataset — it runs
tbr_search() with a sector_mask against the full ds, so score_tree() dispatches
hsj_score()/Sankoff with complete data and the sector-internal heuristic is
correct for every scoring mode. Documented this with an in-code comment so it
is not re-flagged.

Approach (b) (copy the fields into the reduced dataset) is not tractable: the
HTU pseudo-tip is a Fitch from_above state-set with no valid HSJ tip_labels or
Sankoff tip_costs representation, so the reduced dataset cannot be made correct
for those modes without new from-above machinery in both scoring kernels.

Adds a Tier-2 regression test driving the full HSJ + sectorial pipeline
(rss/xss guarded, css on full ds) and asserting it completes, stays
self-consistent, and is deterministic across identical-seed runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ms609 added a commit that referenced this pull request Jun 12, 2026
@ms609 ms609 merged commit 627a62c into cpp-search Jun 15, 2026
0 of 10 checks passed
@ms609 ms609 deleted the feature/t303-sector-hsj-guard branch June 15, 2026 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant