Add coalescent species tree pipeline as step 4astral (ASTER/ASTRAL-III)#83
Open
Alexdami17 wants to merge 14 commits into
Open
Add coalescent species tree pipeline as step 4astral (ASTER/ASTRAL-III)#83Alexdami17 wants to merge 14 commits into
Alexdami17 wants to merge 14 commits into
Conversation
Introduces an optional fourth step that builds a coalescent-based species tree from the per-OG alignments produced by step 3combine, complementing the existing supermatrix approach. This addresses the limitation that concatenation ignores differing gene-tree histories across orthogroups. New step: read2tree --step 4astral - Filters per-OG alignments by taxon occupancy (--min_samples) and gap fraction (--max_gap), converting phylip-relaxed .fa files to clean FASTA - Optionally trims filtered alignments with ClipKIT (--trim flag) - Runs IQ-TREE per gene in parallel via multiprocessing.Pool using the LG+F+G model with SH-aLRT branch support (-alrt 1000) in fast mode - Collects gene trees and runs ASTER (astral3) to estimate the coalescent species tree, writing astral_tree_merge.nwk to the output directory New wrappers: Clipkit (wrappers/aligners/clipkit.py) and Aster (wrappers/treebuilders/aster.py) following existing wrapper conventions. New helper get_gene_tree_options() added to iqtree.py. New dependencies: aster and clipkit added to environment.yml. README updated with step 4 usage, output files, and installation notes.
…l3/astral-pro3/astral-pro2
…EE runs - get_gene_tree_options() now includes --abayes so aBayes posterior supports are annotated on gene trees alongside SH-aLRT, providing the best weighting signal for wASTRAL hybrid mode - AsterCLI auto-detection extended to wastral and astral4 as final fallbacks; astral3 remains the default - README updated with a comparison table of all ASTER binaries and usage examples for wastral and astral4, with guidance on when to choose each
Update docstring and error message in aster.py to reflect that multiple ASTER algorithms (astral3, wastral, astral4, etc.) are supported, not just ASTRAL-III. Add --abayes to the step 4 IQ-TREE invocation summary in the README.
…p 4astral Expose three optional arguments scoped to step 4astral that let users override the built-in IQ-TREE substitution model and append arbitrary flags to both IQ-TREE and ASTER invocations. Defaults are unchanged. README documents the new options with example invocations.
If a .treefile already exists and is non-empty in 08_gene_trees/, the worker skips the IQ-TREE call and returns the existing tree directly. This lets a user re-invoke --step 4astral after an interruption without re-running gene tree inference from scratch.
Replace -nt with -T in get_gene_tree_options() as IQ-TREE 2/3 uses -T for thread specification. Add --no_fast to allow disabling the -fast flag for a full ML search, which is required when combining with bootstrap options such as -B via --iqtree_args.
Adds --dna to switch step 4astral from amino acid to DNA alignments (06_align_merge_dna). Default IQ-TREE model is GTR+G for DNA runs. All output folders and files now carry an _aa or _dna suffix to distinguish runs: 08_gene_trees_aa/dna, gene_trees_merge_aa/dna.nwk, astral_tree_merge_aa/dna.nwk.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds an optional fourth step to the read2tree pipeline that produces a coalescent-based species tree using the ASTER suite. It is designed as a post-processing step after
--step 3combineand does not change any existing behaviour.--step 4astralfilters per-OG alignments from step 3, infers individual gene trees with IQ-TREE in parallel, and passes them to ASTER to produce a coalescent species tree.--min_samples(minimum taxon occupancy) and--max_gap(maximum gap fraction per sequence) control which OGs pass to gene tree inference.--trimruns ClipKIT on each filtered alignment before IQ-TREE.astral3, astral-pro3, astral-pro2, wastral, astral4, or accepts an explicit--astral_binaryargument. wASTRAL and ASTRAL-IV are supported alongside the default ASTRAL-III.--iqtree_model,--iqtree_args, and--astral_argslet users override defaults or pass arbitrary flags without modifying the pipeline code.--no_fast: disables the-fastflag for a full ML search per gene. Required when combining with bootstrap options via--iqtree_args.--dna: switches step 4 to use DNA alignments from06_align_merge_dnainstead of amino acids. Default model switches toGTR+G.--step 4astralskips gene trees already written to08_gene_trees_aa/or08_gene_trees_dna/.TreeInference.py, the--treeoption, and all existing steps are untouched.New dependencies (bioconda)
aster(providesastral3,wastral,astral4, etc.)clipkit(optional; only needed when--trimis used)