Skip to content

Add coalescent species tree pipeline as step 4astral (ASTER/ASTRAL-III)#83

Open
Alexdami17 wants to merge 14 commits into
DessimozLab:combinedfrom
Alexdami17:custom-scripts
Open

Add coalescent species tree pipeline as step 4astral (ASTER/ASTRAL-III)#83
Alexdami17 wants to merge 14 commits into
DessimozLab:combinedfrom
Alexdami17:custom-scripts

Conversation

@Alexdami17
Copy link
Copy Markdown

Summary

This PR adds an optional fourth step to the read2tree pipeline that produces a coalescent-based species tree using the ASTER suite. It is designed as a post-processing step after --step 3combine and does not change any existing behaviour.

  • New step: --step 4astral filters per-OG alignments from step 3, infers individual gene trees with IQ-TREE in parallel, and passes them to ASTER to produce a coalescent species tree.
  • Filtering: --min_samples (minimum taxon occupancy) and --max_gap (maximum gap fraction per sequence) control which OGs pass to gene tree inference.
  • Optional trimming: --trim runs ClipKIT on each filtered alignment before IQ-TREE.
  • ASTER binary selection: auto-detects available binary from astral3, astral-pro3, astral-pro2, wastral, astral4, or accepts an explicit --astral_binary argument. wASTRAL and ASTRAL-IV are supported alongside the default ASTRAL-III.
  • Pass-through options: --iqtree_model, --iqtree_args, and --astral_args let users override defaults or pass arbitrary flags without modifying the pipeline code.
  • --no_fast: disables the -fast flag for a full ML search per gene. Required when combining with bootstrap options via --iqtree_args.
  • --dna: switches step 4 to use DNA alignments from 06_align_merge_dna instead of amino acids. Default model switches to GTR+G.
  • Resume support: if a run is interrupted, re-invoking --step 4astral skips gene trees already written to 08_gene_trees_aa/ or 08_gene_trees_dna/.
  • No existing code changed: TreeInference.py, the --tree option, and all existing steps are untouched.

New dependencies (bioconda)

  • aster (provides astral3, wastral, astral4, etc.)
  • clipkit (optional; only needed when --trim is used)

Alexdami17 added 14 commits May 31, 2026 22:05
Introduces an optional fourth step that builds a coalescent-based species
tree from the per-OG alignments produced by step 3combine, complementing
the existing supermatrix approach. This addresses the limitation that
concatenation ignores differing gene-tree histories across orthogroups.

New step: read2tree --step 4astral
- Filters per-OG alignments by taxon occupancy (--min_samples) and gap
  fraction (--max_gap), converting phylip-relaxed .fa files to clean FASTA
- Optionally trims filtered alignments with ClipKIT (--trim flag)
- Runs IQ-TREE per gene in parallel via multiprocessing.Pool using the
  LG+F+G model with SH-aLRT branch support (-alrt 1000) in fast mode
- Collects gene trees and runs ASTER (astral3) to estimate the coalescent
  species tree, writing astral_tree_merge.nwk to the output directory

New wrappers: Clipkit (wrappers/aligners/clipkit.py) and Aster
(wrappers/treebuilders/aster.py) following existing wrapper conventions.
New helper get_gene_tree_options() added to iqtree.py.
New dependencies: aster and clipkit added to environment.yml.
README updated with step 4 usage, output files, and installation notes.
…EE runs

- get_gene_tree_options() now includes --abayes so aBayes posterior supports
  are annotated on gene trees alongside SH-aLRT, providing the best weighting
  signal for wASTRAL hybrid mode
- AsterCLI auto-detection extended to wastral and astral4 as final fallbacks;
  astral3 remains the default
- README updated with a comparison table of all ASTER binaries and usage
  examples for wastral and astral4, with guidance on when to choose each
Update docstring and error message in aster.py to reflect that multiple
ASTER algorithms (astral3, wastral, astral4, etc.) are supported, not
just ASTRAL-III. Add --abayes to the step 4 IQ-TREE invocation summary
in the README.
…p 4astral

Expose three optional arguments scoped to step 4astral that let users
override the built-in IQ-TREE substitution model and append arbitrary
flags to both IQ-TREE and ASTER invocations. Defaults are unchanged.
README documents the new options with example invocations.
If a .treefile already exists and is non-empty in 08_gene_trees/, the
worker skips the IQ-TREE call and returns the existing tree directly.
This lets a user re-invoke --step 4astral after an interruption without
re-running gene tree inference from scratch.
Replace -nt with -T in get_gene_tree_options() as IQ-TREE 2/3 uses -T
for thread specification. Add --no_fast to allow disabling the -fast
flag for a full ML search, which is required when combining with
bootstrap options such as -B via --iqtree_args.
Adds --dna to switch step 4astral from amino acid to DNA alignments
(06_align_merge_dna). Default IQ-TREE model is GTR+G for DNA runs.
All output folders and files now carry an _aa or _dna suffix to
distinguish runs: 08_gene_trees_aa/dna, gene_trees_merge_aa/dna.nwk,
astral_tree_merge_aa/dna.nwk.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant