Skip to content

Score MorphyLength via native kernel for inapplicable data (Stage 1 of MorphyLib removal)#251

Open
ms609 wants to merge 7 commits into
cpp-searchfrom
feature/native-morphy-helpers
Open

Score MorphyLength via native kernel for inapplicable data (Stage 1 of MorphyLib removal)#251
ms609 wants to merge 7 commits into
cpp-searchfrom
feature/native-morphy-helpers

Conversation

@ms609

@ms609 ms609 commented Jun 15, 2026

Copy link
Copy Markdown
Owner

What & why

Routes MorphyLength() / MorphyTreeLength() for the default inapplicable gap mode through the native Fitch kernel (FastCharacterLength) instead of vendored MorphyLib.

This fixes a known MorphyLib bug scoring {1-} (ambiguous-with-inapplicable) tokens — e.g. Lobo pattern 93 scores 2 (correct BGS) vs MorphyLib 3. Legacy custom-search scorers (TreeSearch, Ratchet, Jackknife, the custom-criteria vignette) now match the already-correct MaximizeParsimony (native kernel).

Per maintainer decision: not aiming for MorphyLib bit-equivalence — native scores are the correct v2.0 behaviour.

How

  • PhyDat2Morphy()/SingleCharMorphy() attach precomputed native scoring data (.NativeMorphyData/.SingleCharNative). The Morphy pointer is retained, so Morphy()s C-loop and MorphyLib introspection are untouched.
  • Rare ambiguous/extra gap modes keep the MorphyLib path for now (helper returns NULL -> falls through).

Validation

MorphyTreeLength == round(TreeLength) and per-character SingleCharMorphy sums == TreeLength, verified across Lobo + congreveLamsdell datasets on pectinate & balanced trees (new test-native-morphy-equivalence.R). Lobo pectinate = 273 (was 274 under the MorphyLib bug). NB CI may surface legacy tests that assert the old MorphyLib scores; those should be updated to the native-correct values.

Staging (drop MorphyLib entirely from 2.0)

  1. (this PR) native scoring for the inapplicable MorphyLength path.
  2. migrate Morphy()s preorder_morphy/morphy_iw/morphy_profile loop + ambiguous/extra modes to native.
  3. delete vendored MorphyLib sources, mpl_* bindings, introspection helpers; update DESCRIPTION/Makevars.

🤖 Generated with Claude Code

ms609 and others added 5 commits June 15, 2026 17:19
…f MorphyLib removal)

MorphyLength()/MorphyTreeLength() route the default inapplicable gap mode
through the native Fitch kernel (FastCharacterLength) instead of vendored
MorphyLib. Fixes a known MorphyLib bug scoring `{1-}` ambiguous-with-
inapplicable tokens (Lobo pattern 93: native 2 vs MorphyLib 3), aligning the
legacy custom-search scorers (TreeSearch, Ratchet, Jackknife, custom.Rmd)
with the already-correct MaximizeParsimony.

- PhyDat2Morphy()/SingleCharMorphy() attach precomputed native scoring data
  (.NativeMorphyData/.SingleCharNative); the Morphy pointer is retained so
  Morphy()'s C-loop and introspection are untouched.
- Rare ambiguous/extra gap modes keep the MorphyLib path for now.
- Not aiming for MorphyLib-equivalence: native scores are the correct v2.0
  behaviour. Verified MorphyTreeLength == TreeLength across Lobo + CL
  datasets and per character via SingleCharMorphy.

Stage 1 of dropping MorphyLib entirely from 2.0; later stages migrate
Morphy()'s preorder_morphy/morphy_iw/morphy_profile loop and delete the
vendored sources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Captures the migration of Morphy()'s preorder_morphy/morphy_iw/morphy_profile
loop and the ambiguous/extra gap modes to native (Stage 2), and the deletion of
vendored MorphyLib sources/bindings/introspection (Stage 3). Flags the Morphy()
approach (swap closures vs redirect to MaximizeParsimony) as needing maintainer
review, since both change superseded-path behaviour.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Morphy() was the redundant MorphyLib R-loop search (EW/IW/profile + ratchet +
constraints), never released on CRAN (not in 1.8.0), sitting between native
MaximizeParsimony() and the custom-criteria TreeSearch()/Ratchet(). Everything
it did is covered natively, so it is deleted rather than migrated.

- Delete Morphy() + its private helpers (.EWScore/.IWScore/.ProfileScore/
  .TBRSearch/.CombineResults/.ReplaceResults/.Time/.DateTime) from R/Morphy.R.
- Replace MaximizeParsimony()'s legacy-parameter delegation (do.call(Morphy,..))
  with an informative error.
- Drop export(Morphy); delete man/Morphy.Rd + man/dot-CombineResults.Rd; fix
  dangling [Morphy()] doc links (WhenFirstHit, EasyTrees, MaximizeParsimony).
- Delete test-Morphy.R (+ snapshots); drop Morphy() calls from test-CustomSearch.

Leaves morphy_iw/morphy_profile dead (Morphy() was their only caller), to be
removed in Stage 3. Full suite green: FAIL 0, PASS 11201.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the vendored Morphy Phylogenetic Library entirely.  All parsimony
scoring now runs through the native C++ kernel, which implements the
Brazeau-Guillerme-Smith inapplicable algorithm correctly (including the
`{1-}` token MorphyLib mis-scored).  The native engine's own Wagner
builder (ts_wagner.cpp) is independent of MorphyLib, so the library
deletes wholesale - no MPLstate extraction needed.

Custom-search resampling (Jackknife/MorphyBootstrap) now resamples
character weights functionally and scores them with the native kernel,
replacing the in-place mpl_set_charac_weight mutation that the Stage-1
native MorphyLength path silently ignored (a latent regression).  The
custom TreeScorer/EdgeSwapper API is unchanged; a new regression test
asserts resampled weights affect the score.

- PhyDat2Morphy/SingleCharMorphy native-only; UnloadMorphy a no-op;
  RandomTreeScore native (kept RandomMorphyTree + pure C RANDOM_TREE)
- Removed mpl_* bindings, MorphyWeights/SetMorphyWeights/GapHandler/
  MorphyErrorCheck/summary.morphyPtr, GetMorphyLength/C_MorphyLength,
  MORPHYLENGTH/RANDOM_TREE_SCORE, morphy_iw/morphy_profile/preorder_morphy
- Deleted 17 vendored MorphyLib C/C++ sources; trimmed TreeSearch-init.c
  and build_postorder.h; regenerated RcppExports/NAMESPACE/man

Only the default `inapplicable` gap treatment is supported natively.
Full suite green; the only failures are pre-existing WideSample/MaxMin
API drift, unrelated to this change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Synthesise the best of PR #251 (this branch) and PR #216 (T-200, Agent B),
which independently decoupled the R-loop search from MorphyLib.

Adopted from #216:
- Efficient native scorer: PrepareData() pre-extracts the scoring matrices
  once and EdgeListScore() calls ts_fitch_score() directly, instead of
  rebuilding a phylo and re-extracting tip data on every score.
- `concavity` parameter on TreeSearch()/Ratchet()/Jackknife(): Inf = equal
  weights (default), finite = implied weights, "profile" = profile parsimony.
  Custom InitializeData hooks still receive the dataset unchanged.
- custom.Rmd implied-weights section rewritten from ~150 lines of hand-rolled
  scaffolding to a few lines using `concavity` directly.

Kept from #251 (and fixing #216's staleness):
- MorphyLib stays fully deleted (#216 retained it as a fallback).
- Score weights use .ScaleWeight() (fractional-weight aware) and Inf for the
  equal-weights concavity, matching current TreeLength(); #216 used
  as.integer(weight) and -1.0, predating those changes.
- Functional resampling + a regression test (doubling weights doubles score).

Naming: the layer is renamed to purpose-based names rather than #216's
`*Native*` prefix, because once MorphyLib is gone "Native" no longer contrasts
with anything.  PhyDat2Morphy->PrepareData, UnloadMorphy->ReleaseData,
is.morphyPtr->is.ParsimonyData, SingleCharMorphy->SingleCharData,
MorphyLength->EdgeListScore, MorphyTreeLength->TreeScore,
MorphyBootstrap->BootstrapTree, RandomMorphyTree->RandomPostorderTree.
The old names remain as .Deprecated() aliases.

TreeScore() == TreeLength() verified across equal weights, implied weights and
profile parsimony; full suite green (only the pre-existing WideSample/MaxMin
API-drift failures remain, unrelated to this change).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ms609 and others added 2 commits June 16, 2026 16:08
MorphyLib C/C++ code was fully removed in 33dc949; the Copyright field
is now factually wrong. Drop it. Brazeau and Guillerme contributed the
BGS inapplicable-data algorithm, not a vendored library, so both are
recredited as ctb with an accurate comment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant