Tutorial on ASTRAL - Tandy Warnowtandy.cs.illinois.edu/astral-tutorial.pdf · 2020. 1. 3. ·...
Embed Size (px)
Transcript of Tutorial on ASTRAL - Tandy Warnowtandy.cs.illinois.edu/astral-tutorial.pdf · 2020. 1. 3. ·...
-
Tutorial on ASTRAL
Siavash Mirarab Electrical and Computer Engineering
University of California, San Diego
1
Download: https://github.com/smirarab/astral
Tutorial: https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
https://github.com/smirarab/astralhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
-
Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
2
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
-
Unrooted quartets under MSC model
For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)
2
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
The most frequent gene tree = The most likely species tree
-
3
Orang.Gorilla ChimpHuman Rhesus
More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
-
3
Orang.Gorilla ChimpHuman Rhesus
1. Break gene trees into quartets of species
2. Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees
• Statistically consistent under the multi-species coalescent model with error-free input
(n4)
More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)
-
ASTRAL versions• ASTRAL-I (
-
ASTRAL versions• ASTRAL-I (
-
ASTRAL versions• ASTRAL-I (
-
Input/Output• Input: Unrooted gene trees
• Can have missing data
• Can have polytomies
• Can have multiple alleles
• Output: The estimated unrooted species tree
• Will have branch lengths in coalescent units on internal branches (not super accurate)
• Will have a measure of support called localPP
5
-
Caveats
• Assumptions for statistical consistency: a randomly distributed sample of recombination-free, reticulation-free, error-free, orthologous gene trees
• Practically: reduced accuracy with low accuracy gene trees (will see more)
6
-
Start of tutorial
7
-
To cover …• Show installation
• Start at https://github.com/smirarab/astral
• A note on ASTRAL-MP
• Tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
• Do a simple test run
• Show help
8
https://github.com/smirarab/astralhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
-
Notes on input
9
-
bestML versus MLBS
10
• Multi-locus Bootstrapping (MLBS) is also possible
• MLBS not suggested as it seems to degrade accuracy compared to simply using ML gene trees. Bioinformatics, 2014, Mirarab et al.
-
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●
1000 200 50
low medium high low medium high low medium high
5%
10%
15%
gene tree error
Spec
ies
tree
topo
logi
cal e
rror (
FN)
● ●ASTRAL−II CA−ML
Comparison to concatenation: depends on the level of gene tree error
11
1000 gene trees 200 gene trees 50 gene trees
Simulations, 200 species, deep medium
level ILS [Mirarab and
Warnow, 2016]
-
●
●
●
● ●●
●
●
●
●
●
●●
●
●
●●
●
1000 200 50
low medium high low medium high low medium high
5%
10%
15%
gene tree error
Spec
ies
tree
topo
logi
cal e
rror (
FN)
● ●ASTRAL−II CA−ML
Comparison to concatenation: depends on the level of gene tree error
11
1000 gene trees 200 gene trees 50 gene trees
Simulations, 200 species, deep medium
level ILS [Mirarab and
Warnow, 2016]
-
Low support branches• Gene tree support matters
• Does it help to contract branches with low support?
12
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
-
Low support branches• Gene tree support matters
• Does it help to contract branches with low support?
• Yes, but only for very low support branches
12
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
-
Low support branches• Gene tree support matters
• Does it help to contract branches with low support?
• Yes, but only for very low support branches
• Mostly helps in the presence of low support gene trees
12
Simulations: 100 taxa, simphy, ILS: around 46% true discordance FastTree, support from bootstrapping
genes
All replicates
BMC Bioinformatics, 2018, Zhang et al.
-
ASTRAL-III on all 14,446 unbinned gene trees
13
Zhang, Chao, et al. BMC Bioinformatics 19, no. S6 (2018): 153. https://doi.org/10.1186/s12859-018-2129-y
https://doi.org/10.1186/s12859-018-2129-y
-
Using aLRT tests
PhyML_mac -m GTR -o n -d nt -i ...phylip -u somffile -b -2 -n 1000
14
-
• Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
15
nw_ed 1KP-genetrees.tre 'i & b
-
Multiple individuals
• What if we sample multiple individuals from each species?
• In recently diverged species individuals can be non-monophyletic in gene trees
• Sampling multiple individuals may provide extra signal
16
-
Multiple individuals helpful?
17
0.0
0.1
0.2
0.3
Variable effort500 genes; 200 species
Fixed effortgenes x inds = 1000; 200 species
Fixed effort500 genes; species x inds = 200
Spec
ies
tree
erro
r (R
F di
stan
ce) 1 ind.
2 ind.
5 ind.
Yes, it marginally helps accuracy
-
Multiple individuals helpful?
17
0.0
0.1
0.2
0.3
Variable effort500 genes; 200 species
Fixed effortgenes x inds = 1000; 200 species
Fixed effort500 genes; species x inds = 200
Spec
ies
tree
erro
r (R
F di
stan
ce) 1 ind.
2 ind.
5 ind.
Yes, it marginally helps accuracy
But not if sequencing effort is kept fixed
-
Multiple individuals helpful?
17
0.0
0.1
0.2
0.3
Variable effort500 genes; 200 species
Fixed effortgenes x inds = 1000; 200 species
Fixed effort500 genes; species x inds = 200
Spec
ies
tree
erro
r (R
F di
stan
ce) 1 ind.
2 ind.
5 ind.
Yes, it marginally helps accuracy
But not if sequencing effort is kept fixed
-
• Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
18
java -jar astral.5.7.3.jar -i test_data/easy-multi.tre -a test_data/mapping-10.txt
Sorry, the test files are not on the website right now!
https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
-
Should you filter?• Filtering genes based on missing species from specific genes?
• Generally not beneficial (see Molloy and Warnow, Systematic Biology 2018)
• Filtering genes based on gene tree estimation error?
• Depends on conditions (see Molloy and Warnow, Systematic Biology 2018)
• Filtering fragmentary sequences from genes while keeping the gene?
• Often beneficial (see Sayyari, Whitfield, and Mirarab, MBE 2018)
19
-
Filtering fragments
20
• Added fragmentation to simulated data with patterns similar to the Misof. et. al. insect (transcriptome data)
• Filtering simply removes fragmentary data from genes but keeps the gene
-
TreeShrink
• Removes super long branches from gene trees
• Is automatic
• Improves gene trees
21
-
Notes on output
22
-
• Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
• Open the tree
• Comment on rooting
• Examine log information
23
https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#the-astral-log-information
-
Local posterior probability• Recall quartet frequencies follow a multinomial distribution
24
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
m2 = 63 m3 = 57m1 = 80
θ2 θ3θ1
-
Local posterior probability• Recall quartet frequencies follow a multinomial distribution
• P (gene tree seen m1/m times = species tree) = P(θ1 > 1/3)
• Can be solved analytically. The resulting measure is called “the local posterior probability” (localPP)
• Handle n>4 by averaging quartet scores
24
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
m2 = 63 m3 = 57m1 = 80
θ2 θ3θ1
-
Quartet support vs. localPP
25
!3Background: How many targets are enough? The number of loci required for confidently resolving a phylogenetic relationship depends on its “hardness”. Discordant gene trees due to incomplete lineage sorting (ILS) can make branches of a species tree very difficult to resolve. ILS is a function of the branch length and population size. This means that shorter branches or larger population sizes increase ILS, and can make species tree reconstruction more difficult. The co-PI has recently developed a new approach for estimating the support of a branch in a coalescent-based framework based on the proportion of gene trees that support quartet topologies in gene trees[31]. This new approach is analogous to finding the probability that a three-sided die is loaded towards each facet by observing outcomes of many tosses. We can also ask the reverse question: if we have an estimate of the degree of bias of a die, how many tosses are needed to confidently find its loaded side. Similarly, this approach[31] sheds light on the relationship between the number of genes required to achieve a level of support for a branch of a certain length in coalescent units (Fig. 3). While the co-PI’s method of calculating support, which is implemented in ASTRAL [32, 33], gives the basic mathematical framework, more method development is needed (see below).
Research Approach Specimens Suitable specimens of more than 80 species of Sabellidae and >120 Terebelliformia species have been collected from localities around the world in the last 15 years by the PI. Further collecting will be undertaken for some key taxa for a few more transcriptomes and for targeted DNA capture. We also have agreements from other experts in Sabellidae and Terebelliformia to provide other needed ethanol-preserved specimens for targeted capture. We will obtain ~ 50% of the known sabellid diversity and ~25% of terebelliforms for targeted capture sequencing (~500 species total). Our plan is to sample all genera, particularly focusing on their type species. This should allow for the generation of robust phylogenies, from which we will revise the taxonomy. Specimens will be vouchered at the SIO Benthic Invertebrate Collection and biodiversity information curated on the Encyclopedia of Life. Sequencing (transcriptomes and targeted capture of DNA) A targeted capture approach will be used with an appropriate number of loci (see below) to generate robust phylogenies of Sabellidae and Terebelliformia. Two different sets of targets will be designed from transcriptomes, across Sabellidae and Terebelliformia repectively. We have already generated new transcriptomes for nine sabellids, a serpulid and a fabriciid (bold terminals in Fig. 4), and nine Terebelliformia (bold terminals in Fig. 5) and combined this data with the few publicly available transcriptomes. Based on direct sequencing data already obtained for several genes for Sabellidae and Terebelliformia [Rouse in prep.; Stiller et al. in prep.], our sampling arguably spans the extant diversity of each clade. Several more transcriptomes will be needed to ensure the targeted capture methods will be robust. Methods for generating these transcriptomes will follow those previously used by us[6, 34]. How many targets are needed? A targeted capture pipeline typically starts from a large number of loci sequenced from a small number of species using genome-wide approaches (e.g., transcriptomics)[7, 8, 35]. This initial dataset is then used to select a smaller subset of loci for the targeted capture phase, which involves a larger set of species. To reduce the cost and effort, one would want to minimize the number of loci in the second phase, as long as sufficient loci are selected to confidently resolve relationships of interest. To date the number of loci has been determined in an ad hoc manner, either by the ultra conserved elements discovered[7, 8] or limitations of the targeted capture technology[13]. Initial analyses of our two datasets demonstrates how the number of
Fig. 3. Impact of number of genes (colors) and amount of ILS (x-axis) on support (posterior probability) for a branch (y-axis). Under the coalescent model, for four species separated with an unrooted branch of length d, the probability of a gene tree being identical to the species tree is 1-2/3e-d (we use this as a measure of ILS). We measure support using local pp [31] for various numbers of gene trees (assuming no gene tree estimation error). More ILS (ie., low 1-2/3e-d) requires more genes to get high support.
quartet frequency
Increased number of genes (k ) ⇒ increased support
Decreased discordance ⇒ increased support
-
26
localPP is more accurate than bootstrapping
250 500
1000 1500
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
False Positive Rate
Recall
MLBS Local PP
250 500
1000 1500
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
False Positive Rate
Recall
MLBS Local PP
250 500
1000 1500
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
False Positive Rate
Recall
MLBS Local PP
250 500
1000 1500
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00
False Positive Rate
Recall
MLBS Local PP
Avian simulated dataset (48 taxa, 1000 genes)
the ROC curves show (fig. 5), for the same number offalse positives branches, local posterior probabilities result inbetter recall than MLBS. This pattern is more pronouncedfor shorter alignments, which have increased gene treeerror. For example, for the 250 bp model condition, if wechoose a support threshold that results in 0.01 FPR, with localposterior values, we still recover 84% of correct branches,whereas with MLBS, the same FPR results in retaining70% of correct branches. Thus, for a desired level of precision,better recall can be obtained using local posteriorprobabilities.
Branch LengthBranch length accuracy on the avian dataset was a function ofgene tree estimation error whether ASTRAL or MP-EST wasused (table 4). With true gene trees, branch length log error
FIG. 4. Branch length accuracy on the A-200 dataset with Medium ILS. See supplementary figures S7 and S8, Supplementary Material online forlow and high ILS. The estimated branch length is plotted against the true branch length in log scale (base 10). Blue line: a fitted generalizedadditive model with smoothing (Wood 2011).
Table 3. Branch Length Accuracy on the A-200 Dataset.
Dataset n Log Err RMSE
True gt Est. gt True gt Est. gt
Low ILS 1,000 0.10 0.42 5.57 6.75Low ILS 200 0.16 0.44 6.22 6.99Low ILS 50 0.25 0.48 6.84 7.29Med ILS 1,000 0.03 0.20 0.22 0.86Med ILS 200 0.07 0.22 0.44 0.91Med ILS 50 0.13 0.26 0.74 1.05High ILS 1,000 0.06 0.15 0.03 0.13High ILS 200 0.11 0.18 0.07 0.15High ILS 50 0.18 0.24 0.14 0.19
NOTE.—Logarithmic error (Log Err) and RMSE are shown for true species treesscored with true gene trees or estimated gene trees (Est. gt).
250 500
1000 1500
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00False Positive Rate
Rec
all
MLBS Local PP
FIG. 5. ROC curve for the avian dataset based on MLBS and local PPPPsupport values. Boxes show different numbers of sites per gene (con-trolling gene tree estimation error).
Fast Coalescent-Based Support Computation . doi:10.1093/molbev/msw079 MBE
9
by guest on May 28, 2016
http://mbe.oxfordjournals.org/
Dow
nloaded from
[Sayyari and Mirarab, MBE, 2016]
100X faster
-
Branch Length
• The level of discordance is a function of coalescent unit branch length
27
[Sayyari and Mirarab, MBE, 2016]
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
✓1 = 1�2
3e�d
-
Branch Length
• The level of discordance is a function of coalescent unit branch length
• A single quartet (n=4): just reverse the discordance formula to get the ML estimate
27
[Sayyari and Mirarab, MBE, 2016]
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
✓1 = 1�2
3e�d
-
Branch Length
• The level of discordance is a function of coalescent unit branch length
• A single quartet (n=4): just reverse the discordance formula to get the ML estimate
• n>4: average frequencies around a branch
27
[Sayyari and Mirarab, MBE, 2016]
Orang.
Gorilla Chimp
HumanOrang.
GorillaChimp
Human
Orang.
Gorilla
Chimp
Human
θ2=15% θ3=15%θ1=70%Gorilla
Orang.
Chimp
Human
d=0.8
✓1 = 1�2
3e�d
-
Branch lengths accuracy
28
True gene trees
1500 1000
500 250
−7.5
−5.0
−2.5
0.0
2.5
−7.5
−5.0
−2.5
0.0
2.5
−7.5 −5.0 −2.5 0.0 2.5 −7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)
estim
ated
bra
nch
leng
th (l
og s
cale
)
−7.5
−5.0
−2.5
0.0
2.5
−7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)
estim
ated
bra
nch
leng
th (l
og s
cale
)
With true gene trees, ASTRAL correctly estimates BL
[Sayyari and Mirarab, MBE, 2016]
-
Branch lengths accuracy
28
True gene trees
1500 1000
500 250
−7.5
−5.0
−2.5
0.0
2.5
−7.5
−5.0
−2.5
0.0
2.5
−7.5 −5.0 −2.5 0.0 2.5 −7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)
estim
ated
bra
nch
leng
th (l
og s
cale
)
−7.5
−5.0
−2.5
0.0
2.5
−7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)
estim
ated
bra
nch
leng
th (l
og s
cale
)
was only 0.06, corresponding to branches that are about 14%shorter or longer than the true branch. As gene tree estima-tion error increases with reduced number of sites (see table 1for gene tree error statistics), the branch length error also
increases. Thus, while 1,500 bp genes give 0.17 log error,250 bp genes result in 0.59 error, which corresponds tobranches that are on average 3.9 times too short or long.Moreover, unlike true gene trees, the error in branch lengthsestimated based on estimated gene trees is biased towardunderestimation (fig. 6), a pattern that increases in intensitywith shorter alignments.
ASTRAL and MP-EST have similar branch length accuracymeasured by log error for highly accurate gene trees, butASTRAL has an advantage with increased gene tree error(supplementary fig. S10, Supplementary Material online andtable 4). Error measured by root mean squared error (RMSE)(which emphasizes the accuracy of long branches) is compa-rable for the two methods, but MP-EST has a slight advantagegiven accurate gene trees.
Biological DatasetsFor each biological dataset, we show MLBS support and thelocal posterior probabilities, computed based on RAxML genetrees available from respective publications. We also collapsegene tree branches with
-
Caveats• Branch length:
• Only for internal branches unless you have multiple individuals for a species
• In coalescent units, so the *true* value is still a function of population size and generation time in addition to actual time
• Local PP:
• Empirically better than BS support but based on many assumptions
29
-
30
• Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md
• Score an existing tree
• Extensive branch annotation
• Polytomy test (-t 10)
• Sayyari, Erfan, and Siavash Mirarab. “Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.” Genes 9, no. 3 (February 28, 2018): 132. https://doi.org/10.3390/genes9030132 .
https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#scoring-existing-treeshttps://doi.org/10.3390/genes9030132
-
• https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of
Gene Tree Discordance.” MPE 122 (2018): 110–15.
31
Discovista: visualizing discordance
https://github.com/esayyari/DiscoVistahttps://doi.org/10.1016/j.ympev.2018.01.019
-
• https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of
Gene Tree Discordance.” MPE 122 (2018): 110–15.
31
Discovista: visualizing discordance
https://github.com/esayyari/DiscoVistahttps://doi.org/10.1016/j.ympev.2018.01.019
-
ASTRAL-MP• A separate branch on GitHub from the main branch
https://github.com/smirarab/ASTRAL/tree/MP
• Try Installation: https://github.com/smirarab/ASTRAL/tree/MP#installation
• Compiling from the code may or may not be needed. Make sure you do the test below (AVX2)
• Explore -C and -T options32
java -D"java.library.path=lib/" -jar native_library_tester.jar
https://github.com/smirarab/ASTRAL/tree/MPhttps://github.com/smirarab/ASTRAL/tree/MP#installation
-
Some ASTRAL-related papers• ASTRAL performs well under ILS and HGT (Davidson, Vachaspati, Mirarab, and
Warnow). BMC Genomics (2015).
• It can handle multiple alleles per species (Rabiee, Sayyari, and Mirarab). MPE (2019)
• It computes branch length and branch support and can test for polytomies using quartet frequencies (Sayyari and Mirarab). MBE (2016) and Genes (2018)
• Visualizing quartet discordance using DiscoVista (Sayyari, Whitfield, and Mirarab) MPE (2018)
• Fragmentary sequences can negatively impact ASTRAL trees (Sayyari, Whitfield, and Mirarab). MBE (2017)
• Filtering loci is not beneficial! (Molloy and Warnow), Systematic Biology (2018)
• ASTRAL consistent under models of missing data (Nute, Molloy, Chou, and Warnow). BMC Genomics (2018)
• How many genes does ASTRAL need? (Shekhar, Roch, and Mirarab). Transactions on Computational Biology and Bioinformatics (2017)
• Using ASTRAL as a supertree method (Vachaspati and Warnow). Bioinformatics (2017)
33
-
I’ll stop here
34
Time left?
— A-Pro (https://github.com/chaoszhang/A-pro)
— MLBS (https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrapping)
https://github.com/chaoszhang/A-prohttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrappinghttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrappinghttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrapping