of 48 /48
Tutorial on ASTRAL Siavash Mirarab Electrical and Computer Engineering University of California, San Diego 1 Download: https://github.com/smirarab/astral Tutorial: https://github.com/smirarab/ASTRAL/ blob/master/astral-tutorial.md
• Author

others
• Category

## Documents

• view

38

1

Embed Size (px)

### Transcript of Tutorial on ASTRAL - Tandy Warnowtandy.cs.illinois.edu/astral-tutorial.pdf · 2020. 1. 3. ·...

• Tutorial on ASTRAL

Siavash Mirarab Electrical and Computer Engineering

University of California, San Diego

1

Tutorial: https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

https://github.com/smirarab/astralhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Unrooted quartets under MSC model

For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)

2

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

θ2=15% θ3=15%θ1=70%Gorilla

Orang.

Chimp

Human

d=0.8

• Unrooted quartets under MSC model

For a quartet (4 species), the most probable unrooted quartet tree (among the gene trees) is the unrooted species tree topology (Allman, et al. 2010)

2

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

θ2=15% θ3=15%θ1=70%Gorilla

Orang.

Chimp

Human

d=0.8

The most frequent gene tree   =  The most likely species tree

• 3

Orang.Gorilla ChimpHuman Rhesus

More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)

• 3

Orang.Gorilla ChimpHuman Rhesus

1. Break gene trees into quartets of species

2. Find the species tree with the maximum number of induced quartet trees shared with the collection of input gene trees

• Statistically consistent under the multi-species coalescent model with error-free input

(n4)

More than 4 speciesFor 5 or more species, the unrooted species tree topology can be different from the most probable gene tree (called “anomaly zone”) (Degnan, 2013)

• ASTRAL versions• ASTRAL-I (

• ASTRAL versions• ASTRAL-I (

• ASTRAL versions• ASTRAL-I (

• Input/Output• Input: Unrooted gene trees

• Can have missing data

• Can have polytomies

• Can have multiple alleles

• Output: The estimated unrooted species tree

• Will have branch lengths in coalescent units on internal branches (not super accurate)

• Will have a measure of support called localPP

5

• Caveats

• Assumptions for statistical consistency: a randomly distributed sample of recombination-free, reticulation-free, error-free, orthologous gene trees

• Practically: reduced accuracy with low accuracy gene trees (will see more)

6

• Start of tutorial

7

• To cover …• Show installation

• Start at https://github.com/smirarab/astral

• A note on ASTRAL-MP

• Tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Do a simple test run

• Show help

8

https://github.com/smirarab/astralhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Notes on input

9

• bestML versus MLBS

10

• Multi-locus Bootstrapping (MLBS) is also possible

• MLBS not suggested as it seems to degrade accuracy compared to simply using ML gene trees. Bioinformatics, 2014, Mirarab et al.

• ● ●●

●●

●●

1000 200 50

low medium high low medium high low medium high

5%

10%

15%

gene tree error

Spec

ies

tree

topo

logi

cal e

rror (

FN)

● ●ASTRAL−II CA−ML

Comparison to concatenation: depends on the level of gene tree error

11

1000 gene trees 200 gene trees 50 gene trees

Simulations, 200 species, deep medium

level ILS [Mirarab and

Warnow, 2016]

• ● ●●

●●

●●

1000 200 50

low medium high low medium high low medium high

5%

10%

15%

gene tree error

Spec

ies

tree

topo

logi

cal e

rror (

FN)

● ●ASTRAL−II CA−ML

Comparison to concatenation: depends on the level of gene tree error

11

1000 gene trees 200 gene trees 50 gene trees

Simulations, 200 species, deep medium

level ILS [Mirarab and

Warnow, 2016]

• Low support branches• Gene tree support matters

• Does it help to contract branches with low support?

12

Simulations: 100 taxa, simphy,  ILS: around 46% true discordance FastTree, support from bootstrapping

genes

All replicates

BMC Bioinformatics, 2018, Zhang et al.

• Low support branches• Gene tree support matters

• Does it help to contract branches with low support?

• Yes, but only for very low  support branches

12

Simulations: 100 taxa, simphy,  ILS: around 46% true discordance FastTree, support from bootstrapping

genes

All replicates

BMC Bioinformatics, 2018, Zhang et al.

• Low support branches• Gene tree support matters

• Does it help to contract branches with low support?

• Yes, but only for very low  support branches

• Mostly helps in the presence of low support gene trees

12

Simulations: 100 taxa, simphy,  ILS: around 46% true discordance FastTree, support from bootstrapping

genes

All replicates

BMC Bioinformatics, 2018, Zhang et al.

• ASTRAL-III on all 14,446 unbinned gene trees

13

Zhang, Chao, et al. BMC Bioinformatics 19, no. S6 (2018): 153. https://doi.org/10.1186/s12859-018-2129-y

https://doi.org/10.1186/s12859-018-2129-y

• Using aLRT tests

PhyML_mac -m GTR -o n -d nt -i ...phylip -u somffile -b -2 -n 1000

14

• • Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

15

nw_ed 1KP-genetrees.tre 'i & b

• Multiple individuals

• What if we sample multiple individuals from each species?

• In recently diverged species individuals can be non-monophyletic in gene trees

• Sampling multiple individuals may provide extra signal

16

17

0.0

0.1

0.2

0.3

Variable effort500 genes; 200 species

Fixed effortgenes x inds = 1000; 200 species

Fixed effort500 genes; species x inds = 200

Spec

ies

tree

erro

r (R

F di

stan

ce) 1 ind.

2 ind.

5 ind.

Yes, it marginally helps accuracy

17

0.0

0.1

0.2

0.3

Variable effort500 genes; 200 species

Fixed effortgenes x inds = 1000; 200 species

Fixed effort500 genes; species x inds = 200

Spec

ies

tree

erro

r (R

F di

stan

ce) 1 ind.

2 ind.

5 ind.

Yes, it marginally helps accuracy

But not if sequencing effort is kept fixed

17

0.0

0.1

0.2

0.3

Variable effort500 genes; 200 species

Fixed effortgenes x inds = 1000; 200 species

Fixed effort500 genes; species x inds = 200

Spec

ies

tree

erro

r (R

F di

stan

ce) 1 ind.

2 ind.

5 ind.

Yes, it marginally helps accuracy

But not if sequencing effort is kept fixed

• • Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

18

java -jar astral.5.7.3.jar -i test_data/easy-multi.tre -a test_data/mapping-10.txt

Sorry, the test files are not on the website right now!

https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Should you filter?• Filtering genes based on missing species from specific genes?

• Generally not beneficial (see Molloy and Warnow, Systematic Biology 2018)

• Filtering genes based on gene tree estimation error?

• Depends on conditions (see Molloy and Warnow, Systematic Biology 2018)

• Filtering fragmentary sequences from genes while keeping the gene?

• Often beneficial (see Sayyari, Whitfield, and Mirarab, MBE 2018)

19

• Filtering fragments

20

• Added fragmentation to simulated data with patterns similar to the Misof. et. al. insect (transcriptome data)

• Filtering simply removes fragmentary data from genes but keeps the gene

• TreeShrink

• Removes super long branches from gene trees

• Is automatic

• Improves gene trees

21

• Notes on output

22

• • Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Open the tree

• Comment on rooting

23

https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#the-astral-log-information

• Local posterior probability• Recall quartet frequencies follow a multinomial distribution

24

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

m2 = 63 m3 = 57m1 = 80

θ2 θ3θ1

• Local posterior probability• Recall quartet frequencies follow a multinomial distribution

• P (gene tree seen m1/m times = species tree) = P(θ1 > 1/3)

• Can be solved analytically. The resulting measure is called “the local posterior probability” (localPP)

• Handle n>4 by averaging quartet scores

24

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

m2 = 63 m3 = 57m1 = 80

θ2 θ3θ1

• Quartet support vs. localPP

25

!3Background: How many targets are enough? The number of loci required for confidently resolving a phylogenetic relationship depends on its “hardness”. Discordant gene trees due to incomplete lineage sorting (ILS) can make branches of a species tree very difficult to resolve. ILS is a function of the branch length and population size. This means that shorter branches or larger population sizes increase ILS, and can make species tree reconstruction more difficult. The co-PI has recently developed a new approach for estimating the support of a branch in a coalescent-based framework based on the proportion of gene trees that support quartet topologies in gene trees[31]. This new approach is analogous to finding the probability that a three-sided die is loaded towards each facet by observing outcomes of many tosses. We can also ask the reverse question: if we have an estimate of the degree of bias of a die, how many tosses are needed to confidently find its loaded side. Similarly, this approach[31] sheds light on the relationship between the number of genes required to achieve a level of support for a branch of a certain length in coalescent units (Fig. 3). While the co-PI’s method of calculating support, which is implemented in ASTRAL [32, 33], gives the basic mathematical framework, more method development is needed (see below).

Research Approach Specimens Suitable specimens of more than 80 species of Sabellidae and >120 Terebelliformia species have been collected from localities around the world in the last 15 years by the PI. Further collecting will be undertaken for some key taxa for a few more transcriptomes and for targeted DNA capture. We also have agreements from other experts in Sabellidae and Terebelliformia to provide other needed ethanol-preserved specimens for targeted capture. We will obtain ~ 50% of the known sabellid diversity and ~25% of terebelliforms for targeted capture sequencing (~500 species total). Our plan is to sample all genera, particularly focusing on their type species. This should allow for the generation of robust phylogenies, from which we will revise the taxonomy. Specimens will be vouchered at the SIO Benthic Invertebrate Collection and biodiversity information curated on the Encyclopedia of Life. Sequencing (transcriptomes and targeted capture of DNA) A targeted capture approach will be used with an appropriate number of loci (see below) to generate robust phylogenies of Sabellidae and Terebelliformia. Two different sets of targets will be designed from transcriptomes, across Sabellidae and Terebelliformia repectively. We have already generated new transcriptomes for nine sabellids, a serpulid and a fabriciid (bold terminals in Fig. 4), and nine Terebelliformia (bold terminals in Fig. 5) and combined this data with the few publicly available transcriptomes. Based on direct sequencing data already obtained for several genes for Sabellidae and Terebelliformia [Rouse in prep.; Stiller et al. in prep.], our sampling arguably spans the extant diversity of each clade. Several more transcriptomes will be needed to ensure the targeted capture methods will be robust. Methods for generating these transcriptomes will follow those previously used by us[6, 34]. How many targets are needed? A targeted capture pipeline typically starts from a large number of loci sequenced from a small number of species using genome-wide approaches (e.g., transcriptomics)[7, 8, 35]. This initial dataset is then used to select a smaller subset of loci for the targeted capture phase, which involves a larger set of species. To reduce the cost and effort, one would want to minimize the number of loci in the second phase, as long as sufficient loci are selected to confidently resolve relationships of interest. To date the number of loci has been determined in an ad hoc manner, either by the ultra conserved elements discovered[7, 8] or limitations of the targeted capture technology[13]. Initial analyses of our two datasets demonstrates how the number of

Fig. 3. Impact of number of genes (colors) and amount of ILS (x-axis) on support (posterior probability) for a branch (y-axis). Under the coalescent model, for four species separated with an unrooted branch of length d, the probability of a gene tree being identical to the species tree is 1-2/3e-d (we use this as a measure of ILS). We measure support using local pp [31] for various numbers of gene trees (assuming no gene tree estimation error). More ILS (ie., low 1-2/3e-d) requires more genes to get high support.

quartet frequency

Increased number of genes (k ) ⇒ increased support

Decreased discordance ⇒ increased support

• 26

localPP is more accurate than bootstrapping

250 500

1000 1500

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00

False Positive Rate

Recall

MLBS Local PP

250 500

1000 1500

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00

False Positive Rate

Recall

MLBS Local PP

250 500

1000 1500

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00

False Positive Rate

Recall

MLBS Local PP

250 500

1000 1500

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00

False Positive Rate

Recall

MLBS Local PP

Avian simulated dataset (48 taxa, 1000 genes)

the ROC curves show (fig. 5), for the same number offalse positives branches, local posterior probabilities result inbetter recall than MLBS. This pattern is more pronouncedfor shorter alignments, which have increased gene treeerror. For example, for the 250 bp model condition, if wechoose a support threshold that results in 0.01 FPR, with localposterior values, we still recover 84% of correct branches,whereas with MLBS, the same FPR results in retaining70% of correct branches. Thus, for a desired level of precision,better recall can be obtained using local posteriorprobabilities.

Branch LengthBranch length accuracy on the avian dataset was a function ofgene tree estimation error whether ASTRAL or MP-EST wasused (table 4). With true gene trees, branch length log error

FIG. 4. Branch length accuracy on the A-200 dataset with Medium ILS. See supplementary figures S7 and S8, Supplementary Material online forlow and high ILS. The estimated branch length is plotted against the true branch length in log scale (base 10). Blue line: a fitted generalizedadditive model with smoothing (Wood 2011).

Table 3. Branch Length Accuracy on the A-200 Dataset.

Dataset n Log Err RMSE

True gt Est. gt True gt Est. gt

Low ILS 1,000 0.10 0.42 5.57 6.75Low ILS 200 0.16 0.44 6.22 6.99Low ILS 50 0.25 0.48 6.84 7.29Med ILS 1,000 0.03 0.20 0.22 0.86Med ILS 200 0.07 0.22 0.44 0.91Med ILS 50 0.13 0.26 0.74 1.05High ILS 1,000 0.06 0.15 0.03 0.13High ILS 200 0.11 0.18 0.07 0.15High ILS 50 0.18 0.24 0.14 0.19

NOTE.—Logarithmic error (Log Err) and RMSE are shown for true species treesscored with true gene trees or estimated gene trees (Est. gt).

250 500

1000 1500

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.000.00 0.25 0.50 0.75 1.00False Positive Rate

Rec

all

MLBS Local PP

FIG. 5. ROC curve for the avian dataset based on MLBS and local PPPPsupport values. Boxes show different numbers of sites per gene (con-trolling gene tree estimation error).

Fast Coalescent-Based Support Computation . doi:10.1093/molbev/msw079 MBE

9

by guest on May 28, 2016

http://mbe.oxfordjournals.org/

Dow

[Sayyari and Mirarab, MBE, 2016]

100X faster

• Branch Length

• The level of discordance is a function of coalescent unit branch length

27

[Sayyari and Mirarab, MBE, 2016]

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

θ2=15% θ3=15%θ1=70%Gorilla

Orang.

Chimp

Human

d=0.8

✓1 = 1�2

3e�d

• Branch Length

• The level of discordance is a function of coalescent unit branch length

• A single quartet (n=4): just reverse the discordance formula to get the ML estimate

27

[Sayyari and Mirarab, MBE, 2016]

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

θ2=15% θ3=15%θ1=70%Gorilla

Orang.

Chimp

Human

d=0.8

✓1 = 1�2

3e�d

• Branch Length

• The level of discordance is a function of coalescent unit branch length

• A single quartet (n=4): just reverse the discordance formula to get the ML estimate

• n>4: average frequencies around a branch

27

[Sayyari and Mirarab, MBE, 2016]

Orang.

Gorilla Chimp

HumanOrang.

GorillaChimp

Human

Orang.

Gorilla

Chimp

Human

θ2=15% θ3=15%θ1=70%Gorilla

Orang.

Chimp

Human

d=0.8

✓1 = 1�2

3e�d

• Branch lengths accuracy

28

True gene trees

1500 1000

500 250

−7.5

−5.0

−2.5

0.0

2.5

−7.5

−5.0

−2.5

0.0

2.5

−7.5 −5.0 −2.5 0.0 2.5 −7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)

estim

ated

bra

nch

leng

th (l

og s

cale

)

−7.5

−5.0

−2.5

0.0

2.5

−7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)

estim

ated

bra

nch

leng

th (l

og s

cale

)

With true gene trees, ASTRAL correctly estimates BL

[Sayyari and Mirarab, MBE, 2016]

• Branch lengths accuracy

28

True gene trees

1500 1000

500 250

−7.5

−5.0

−2.5

0.0

2.5

−7.5

−5.0

−2.5

0.0

2.5

−7.5 −5.0 −2.5 0.0 2.5 −7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)

estim

ated

bra

nch

leng

th (l

og s

cale

)

−7.5

−5.0

−2.5

0.0

2.5

−7.5 −5.0 −2.5 0.0 2.5true branch length (log scale)

estim

ated

bra

nch

leng

th (l

og s

cale

)

was only 0.06, corresponding to branches that are about 14%shorter or longer than the true branch. As gene tree estima-tion error increases with reduced number of sites (see table 1for gene tree error statistics), the branch length error also

increases. Thus, while 1,500 bp genes give 0.17 log error,250 bp genes result in 0.59 error, which corresponds tobranches that are on average 3.9 times too short or long.Moreover, unlike true gene trees, the error in branch lengthsestimated based on estimated gene trees is biased towardunderestimation (fig. 6), a pattern that increases in intensitywith shorter alignments.

ASTRAL and MP-EST have similar branch length accuracymeasured by log error for highly accurate gene trees, butASTRAL has an advantage with increased gene tree error(supplementary fig. S10, Supplementary Material online andtable 4). Error measured by root mean squared error (RMSE)(which emphasizes the accuracy of long branches) is compa-rable for the two methods, but MP-EST has a slight advantagegiven accurate gene trees.

Biological DatasetsFor each biological dataset, we show MLBS support and thelocal posterior probabilities, computed based on RAxML genetrees available from respective publications. We also collapsegene tree branches with

• Caveats• Branch length:

• Only for internal branches unless you have multiple individuals for a species

• In coalescent units, so the *true* value is still a function of population size and generation time in addition to actual time

• Local PP:

• Empirically better than BS support but based on many assumptions

29

• 30

• Back to tutorial on https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md

• Score an existing tree

• Extensive branch annotation

• Polytomy test (-t 10)

• Sayyari, Erfan, and Siavash Mirarab. “Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.” Genes 9, no. 3 (February 28, 2018): 132. https://doi.org/10.3390/genes9030132 .

https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.mdhttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#scoring-existing-treeshttps://doi.org/10.3390/genes9030132

• • https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of

Gene Tree Discordance.” MPE 122 (2018): 110–15.

31

Discovista: visualizing discordance

https://github.com/esayyari/DiscoVistahttps://doi.org/10.1016/j.ympev.2018.01.019

• • https://github.com/esayyari/DiscoVista • Sayyari, et. al. “DiscoVista: Interpretable Visualizations of

Gene Tree Discordance.” MPE 122 (2018): 110–15.

31

Discovista: visualizing discordance

https://github.com/esayyari/DiscoVistahttps://doi.org/10.1016/j.ympev.2018.01.019

• ASTRAL-MP• A separate branch on GitHub from the main branch

https://github.com/smirarab/ASTRAL/tree/MP

• Try Installation:  https://github.com/smirarab/ASTRAL/tree/MP#installation

• Compiling from the code may or may not be needed. Make sure you do the test below (AVX2)

• Explore -C and -T options32

java -D"java.library.path=lib/" -jar native_library_tester.jar

https://github.com/smirarab/ASTRAL/tree/MPhttps://github.com/smirarab/ASTRAL/tree/MP#installation

• Some ASTRAL-related papers• ASTRAL performs well under ILS and HGT (Davidson, Vachaspati, Mirarab, and

Warnow). BMC Genomics (2015).

• It can handle multiple alleles per species (Rabiee, Sayyari, and Mirarab). MPE (2019)

• It computes branch length and branch support and can test for polytomies using quartet frequencies (Sayyari and Mirarab). MBE (2016) and Genes (2018)

• Visualizing quartet discordance using DiscoVista (Sayyari, Whitfield, and Mirarab) MPE (2018)

• Fragmentary sequences can negatively impact ASTRAL trees (Sayyari, Whitfield, and Mirarab). MBE (2017)

• Filtering loci is not beneficial! (Molloy and Warnow), Systematic Biology (2018)

• ASTRAL consistent under models of missing data (Nute, Molloy, Chou, and Warnow). BMC Genomics (2018)

• How many genes does ASTRAL need? (Shekhar, Roch, and Mirarab). Transactions on Computational Biology and Bioinformatics (2017)

• Using ASTRAL as a supertree method (Vachaspati and Warnow). Bioinformatics (2017)

33

• I’ll stop here

34

Time left?

— A-Pro (https://github.com/chaoszhang/A-pro)

— MLBS (https://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrapping)

https://github.com/chaoszhang/A-prohttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrappinghttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrappinghttps://github.com/smirarab/ASTRAL/blob/master/astral-tutorial.md#multi-locus-bootstrapping