Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

28
Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology [18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 1 1–19 AUTHOR QUERIES AQ1: Please check whether the edit made to the article title is appropriate. AQ2: Please check that all names have been spelled correctly and appear in the correct order. Please also check that all initials are present. Please check that the author surnames (family name) have been correctly identified by a pink background. If this is incorrect, please identify the full surname of the relevant authors. Occasionally, the distinction between surnames and forenames can be ambiguous, and this is to ensure that the authors’ full surnames and forenames are tagged correctly, for accurate indexing online. Please also check all author affiliations. AQ3: Please provide department name (if any) for affiliations ‘2, 4–8, 15, 16’ and also provide full road and district address and Zip or postal code for affiliations ‘3, 4, 6, 8, 9, 11, 15’. AQ4: Please clarify whether this is “Reeder et al. 2015a” or “Reeder et al. 2015b” throughout the article. AQ5: Please check that the text is complete and that all figures, tables and their legends are included. AQ6: Please check that special characters, equations, dosages and units, if applicable, have been reproduced accurately. AQ7: Permission to reproduce any third party material in your paper should have been obtained prior to acceptance. If your paper contains figures or text that require permission to reproduce, please confirm that you have obtained all relevant permissions and that the correct permission text has been used as required by the copyright holders. Please contact [email protected] if you have any questions regarding permissions. AQ8: Please check whether the section hierarchy is OK as set. AQ9: Please spell out qPCR, MCMC, GTR, UCE (if necessary). AQ10: Please note that the unit ‘My’ has been ‘Ma’ throughout the article. Please confirm. AQ11: Please note that the Refs. [Greene 1997; Pyron 2017; and Anisimova et al. 2011] are not listed in the references list. Please add them to the list or delete the citations. AQ12: Please confirm whether the citations of Supplementary material are OK as set. AQ13: If you have submitted files to Dryad, please can you check the Dryad URL to make sure it contains the correct doi and is linked to your data package? Please confirm in the proof if it is correct or if it needs to be changed. AQ14: Please confirm whether the “Funding” section is OK as set. AQ15: Please update the missing volume number and page numbers for Ref. [Burbrink et al. 2019]. AQ16: Please confirm whether Ref. [Estes et al. 1988] is OK as set.’ AQ17: There is no mention of Ref. [Fry 2005] in the text. Please insert a citation in the text or delete the reference as appropriate. AQ18: Please provide complete details for Refs. [Gao and Norell 1998; Kuhn 2008; Rhodin et al. 2015]. AQ19: Please provide the missing volume number for Refs. [Minh et al. 2018; Robinson and Foulds 1981]. AQ20: Please provide the missing publisher name and publisher location for Ref. [R Core Team 2015]. AQ21: Please provide the missing publisher location for Ref. [Zhang 2010]. AQ22: If applicable figures have been placed as close as possible to their first citation. Please check that they are complete and that the correct figure legend is present. Figures in the proof are low resolution versions that will be replaced with high resolution versions when the journal is printed. AQ23: This figure is currently intended to appear in color online and black and white in print. Please check the black and white versions at the end of the proof, and if necessary, re-word the text and legend to avoid references to color. AQ24: We noticed Figure 6 (B) has 5 parts and the received source figure PDF ‘Fig_6.Var_Imp_NEW’ has only 4 parts. Kindly note, we have included the 5th part in Fig. 6 (B), please confirm if this fine or advise if the 5th Part needs to be deleted.

Transcript of Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Page 1: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 1 1–19

AUTHOR QUERIESAQ1: Please check whether the edit made to the article title is appropriate.

AQ2: Please check that all names have been spelled correctly and appear in the correct order. Please also check thatall initials are present. Please check that the author surnames (family name) have been correctly identified bya pink background. If this is incorrect, please identify the full surname of the relevant authors. Occasionally,the distinction between surnames and forenames can be ambiguous, and this is to ensure that the authors’full surnames and forenames are tagged correctly, for accurate indexing online. Please also check all authoraffiliations.

AQ3: Please provide department name (if any) for affiliations ‘2, 4–8, 15, 16’ and also provide full road and districtaddress and Zip or postal code for affiliations ‘3, 4, 6, 8, 9, 11, 15’.

AQ4: Please clarify whether this is “Reeder et al. 2015a” or “Reeder et al. 2015b” throughout the article.

AQ5: Please check that the text is complete and that all figures, tables and their legends are included.

AQ6: Please check that special characters, equations, dosages and units, if applicable, have been reproducedaccurately.

AQ7: Permission to reproduce any third party material in your paper should have been obtained prior toacceptance. If your paper contains figures or text that require permission to reproduce, please confirm thatyou have obtained all relevant permissions and that the correct permission text has been used as requiredby the copyright holders. Please contact [email protected] if you have any questions regardingpermissions.

AQ8: Please check whether the section hierarchy is OK as set.

AQ9: Please spell out qPCR, MCMC, GTR, UCE (if necessary).

AQ10: Please note that the unit ‘My’ has been ‘Ma’ throughout the article. Please confirm.

AQ11: Please note that the Refs. [Greene 1997; Pyron 2017; and Anisimova et al. 2011] are not listed in the referenceslist. Please add them to the list or delete the citations.

AQ12: Please confirm whether the citations of Supplementary material are OK as set.

AQ13: If you have submitted files to Dryad, please can you check the Dryad URL to make sure it contains the correctdoi and is linked to your data package? Please confirm in the proof if it is correct or if it needs to be changed.

AQ14: Please confirm whether the “Funding” section is OK as set.

AQ15: Please update the missing volume number and page numbers for Ref. [Burbrink et al. 2019].

AQ16: Please confirm whether Ref. [Estes et al. 1988] is OK as set.’

AQ17: There is no mention of Ref. [Fry 2005] in the text. Please insert a citation in the text or delete the reference asappropriate.

AQ18: Please provide complete details for Refs. [Gao and Norell 1998; Kuhn 2008; Rhodin et al. 2015].

AQ19: Please provide the missing volume number for Refs. [Minh et al. 2018; Robinson and Foulds 1981].

AQ20: Please provide the missing publisher name and publisher location for Ref. [R Core Team 2015].

AQ21: Please provide the missing publisher location for Ref. [Zhang 2010].

AQ22: If applicable figures have been placed as close as possible to their first citation. Please check that they arecomplete and that the correct figure legend is present. Figures in the proof are low resolution versions thatwill be replaced with high resolution versions when the journal is printed.

AQ23: This figure is currently intended to appear in color online and black and white in print. Please check the blackand white versions at the end of the proof, and if necessary, re-word the text and legend to avoid referencesto color.

AQ24: We noticed Figure 6 (B) has 5 parts and the received source figure PDF ‘Fig_6.Var_Imp_NEW’ has only 4 parts.Kindly note, we have included the 5th part in Fig. 6 (B), please confirm if this fine or advise if the 5th Partneeds to be deleted.

Page 2: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 2 1–19

MAKING CORRECTIONS TO YOUR PROOF

These instructions show you how to mark changes or add notes to your proofs using Adobe Acrobat Professional versions 7 and onwards, or Adobe Reader DC. To check what version you are using go to Help then About. The latest version of Adobe Reader is available for free from get.adobe.com/reader.

DISPLAYING THE TOOLBARS Adobe Reader DC In Adobe Reader DC, the Comment toolbar can be found by

the right-hand side of the page (shown below).

The toolbar shown below will then display along the top.

Acrobat Professional 7, 8, and 9 In Adobe Professional, the Comment toolbar can be found by

(s) oolbar, and then clicking (shown below).

The toolbar shown below will then be displayed along the top.

USING TEXT EDITS AND COMMENTS IN ACROBAT This is the quickest, simplest and easiest method both to make corrections, and for your corrections to be transferred and checked.

1. Click Text Edits 2. Select the text to be annotated or place your cursor at the insertion point and start typing. 3. Click the Text Edits drop down arrow and select the required action. You can also right click on selected text for a range of commenting options, or add sticky notes. SAVING COMMENTS In order to save your comments and notes, you need to save the file (File, Save) when you close the document.

USING COMMENTING TOOLS IN ADOBE READER All commenting tools are displayed in the toolbar. You cannot use text edits, however you can still use highlighter, sticky notes, and a variety of insert/replace text options.

POP-UP NOTES In both Reader and Acrobat, when you insert or edit text a pop-up box will appear. In Acrobat it looks like this:

In Reader it looks like this, and will appear in the right-hand pane:

DO NOT MAKE ANY EDITS DIRECTLY INTO THE TEXT, USE COMMENTING TOOLS ONLY.

Page 3: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 3 1–19

Journal: Systematic BiologyArticle Doi: 10.1093/sysbio/syz062

Article Title: Interrogating Genomic-Scale Data for Squamata (Lizards, Snakes,and Amphisbaenians) Show no Support for Key TraditionalMorphological Relationships

First Author: Frank T. BurbrinkCorr. Author: Hussam Zaher

INSTRUCTIONSWe encourage you to use Adobe’s editing tools (please see the next page for instructions). If this is not possible, please list clearly in an e-mail [email protected]. Please do not send corrections as track changed Word documents.

Changes should be corrections of typographical errors only. Changes that contradict journal style will not be made.

These proofs are for checking purposes only. They should not be considered as final publication format. The proof must not be used for any otherpurpose. In particular we request that you do not post them on your personal/institutional web site,and do not print and distribute multiplecopies. Neither excerpts nor all of the article should be included in other publications written or edited by yourself until the final version hasbeen published and the full citation details are available. You will be sent these when the article is published, along with an author PDF of thefinal article.

1. License to Publish: If you have not already done so, please visit the link in your Welcome email and complete your License to Publishonline.

2. Author groups: Please check that all names have been spelled correctly and appear in the correct order. Please also check that allinitials are present. Please check that the author surnames (family name) have been correctly identified by a pink background. If this isincorrect, please identify the full surname of the relevant authors. Occasionally, the distinction between surnames and forenames canbe ambiguous, and this is to ensure that the authors’ full surnames and forenames are tagged correctly, for accurate indexing online.Please also check all author affiliations.

3. Figures: Figures have been placed as close as possible to their first citation. Please check that they have no missing sections and that thecorrect figure legend is present.

4. Conflict of interest: All authors must make a formal statement indicating any potential conflict of interest that might constitute anembarrassment to any of the authors if it were not to be declared and were to emerge after publication. Such conflicts might include,but are not limited to, shareholding in or receipt of a grant or consultancy fee from a company whose product features in the submittedmanuscript or which manufactures a competing product. The following statement has been added to your proof: ‘Conflict of Interest:none declared.’ If this is incorrect please supply the necessary text to identify the conflict of interest.

5. Permissions: Permission to reproduce any third party material in your paper should have been obtained prior to acceptance. If yourpaper contains figures or text that require permission to reproduce, please email [email protected] as soon as possible.

Page 4: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 1 1–19

Syst. Biol. 0(0):1–19, 2019© The Author(s) 2019. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved.For permissions, please email: [email protected]:10.1093/sysbio/syz062

Interrogating Genomic-Scale Data for Squamata (Lizards, Snakes, and Amphisbaenians)Show no Support for Key Traditional Morphological Relationships

FRANK T. BURBRINK1, FELIPE G. GRAZZIOTIN2, R. ALEXANDER PYRON3, DAVID CUNDALL4, STEVE DONNELLAN5,6, FRANCES

IRISH7, J. SCOTT KEOGH8, FRED KRAUS9, ROBERT W. MURPHY10, BRICE NOONAN11, CHRISTOPHER J. RAXWORTHY1, SARA

RUANE12, ALAN R. LEMMON13, EMILY MORIARTY LEMMON14, AND HUSSAM ZAHER15,16,∗1Department of Herpetology, The American Museum of Natural History, 79th Street at Central Park West, New York, NY 10024, USA; 2Laboratório deColeções Zoológicas, Instituto Butantan, Av. Vital Brasil, 1500—Butantã, São Paulo—SP 05503-900, Brazil; 3Department of Biological Sciences, The

George Washington University, Washington, DC 20052, USA; 4Biological Sciences, W. Packer Avenue, Lehigh University, Bethlehem, PA 18015, USA;5South Australian Museum, North Terrace, Adelaide SA 5000, Australi; 6School of Biological Sciences, University of Adelaide, SA 5005 Australia;

7Biological Sciences, Moravian College, 1200 Main St, Bethlehem, PA 18018, US; 8Division of Ecology and Evolution, Research School of Biology, TheAustralian National University, Canberra, ACT 2601, Australia; 9Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor,MI 48109, USA; 10Department of Natural History, Royal Ontario Museum, 100 Queens Park, Toronto, ON M5S 2C6, Canada; 11Department of Biology,University of Mississippi, Oxford, MS 38677, USA; 12Department of Biological Sciences, 206 Boyden Hall, Rutgers University, 195 University Avenue,

Newark, NJ 07102, USA; 13Department of Scientific Computing, Florida State University, Dirac Science Library, Tallahassee, FL 32306-4102, USA;14Department of Biological Science, Florida State University, 319 Stadium Drive, Tallahassee, FL 32306-4295, USA; 15Museu de Zoologia da Universidadede São Paulo, São Paulo, Brazil CEP 04263-000, Brazil; and 16Centre de Recherche sur la Paléobiodiversité et les Paléoenvironnements (CR2P), UMR 7207

CNRS/MNHN/Sorbonne Université, Muséum national d’Histoire naturelle, 8 rue Buffon, CP 38, 75005 Paris, France∗Correspondence to be sent to: Museu de Zoologia da Universidade de São Paulo, São Paulo, Brazil CEP 04263-000, Brazil; E-mail: [email protected]

Received XX XXXX XXXX; reviews returned XX XXXX XXXX; accepted XX XXXX XXXXAssociate Editor: Robert Thomson

Abstract.—Genomics is narrowing uncertainty in the phylogenetic structure for many amniote groups. For one of the mostdiverse and species-rich groups, the squamate reptiles (lizards, snakes, and amphisbaenians), an inverse correlation betweenthe number of taxa and loci sampled still persists across all publications using DNA sequence data and reaching a consensuson the relationships among them has been highly problematic. In this study, we use high-throughput sequence data from289 samples covering 75 families of squamates to address phylogenetic affinities, estimate divergence times, and characterizeresidual topological uncertainty in the presence of genome-scale data. Importantly, we address genomic support for thetraditional taxonomic groupings Scleroglossa and Macrostomata using novel machine-learning techniques. We interrogategenes using various metrics inherent to these loci, including parsimony-informative sites (PIS), phylogenetic informativeness,length, gaps, number of substitutions, and site concordance to understand why certain loci fail to find previously well-supported molecular clades and how they fail to support species-tree estimates. We show that both incomplete lineagesorting and poor gene-tree estimation (due to a few undesirable gene properties, such as an insufficient number of PIS), mayaccount for most gene and species-tree discordance. We find overwhelming signal for Toxicofera, and also show that noneof the loci included in this study supports Scleroglossa or Macrostomata. We comment on the origins and diversificationof Squamata throughout the Mesozoic and underscore remaining uncertainties that persist in both deeper parts of thetree (e.g., relationships between Dibamia, Gekkota, and remaining squamates; and between the three toxiferan cladesIguania, Serpentes, and Anguiformes) and within specific clades (e.g., affinities among gekkotan, pleurodont iguanians,and colubroid families). [Neural network; gene interrogation; lizards; snakes; genomics; phylogeny.]

Well-supported phylogenies inferred using both thor-

AQ1

AQ2

AQ3

AQ7

ough taxon-sampling and genome-scale sequence dataare paramount for understanding phylogenetic struc-ture and settling debates about higher-level taxonomy.Phylogenomic analyses can provide reliable trees fordownstream use in comparative biology (Garland et al.2005; Wortley et al., 2005; Heath et al., 2008; Ruaneet al., 2015; Burbrink et al., 2019) and helps unravelevolutionary complexity (Philippe et al., 2011), suchas deep-time phylogenetic reticulation (Burbrink andGehara, 2018). In recent years, well-resolved phylogeniesof birds and mammals used both large numbers of genesand living taxa (Prum et al., 2015; Liu et al., 2017).Unfortunately, among amniotes, squamates have fallenbehind and all comparative and taxonomic studies stillrely on phylogenetic structure estimated from eithera handful of genes and a large number of species(Pyron et al., 2013), small number of lineages withphylogenomic data (Streicher and Wiens 2017), and a

few of intermediate range with ∼50 loci and 161 taxa(Wiens et al., 2012; Reeder et al. 2015). Furthermore,phylogenetic thinking about such groups often reflectshistorical morphological hypotheses that are weaklycongruent or incongruent with recent phylogenomicestimates (Conrad 2008; Gauthier et al., 2012; Losos et al.,2012).

The order Squamata comprises almost 10,800 extantlizards, snakes, and amphisbaenians (Uetz et al. 2018)showing continuous diversification since the Jurassic,with many groups surviving the Cretaceous/Tertiarymass extinction (Evans 2003; Evans and Jones 2010;Jones et al., 2013). Extant squamates occur in nearlyall habitats globally and show huge variation in bodysize, body shape, limb types (including repeated com-plete limb loss), oviparous and viviparous reproduct-ive modes, complex venoms, and extremely varieddiets that include plants, invertebrates and verteb-rates (Vitt and Pianka 2005; Vitt and Caldwell 2009;

1

Page 5: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 2 1–19

2 SYSTEMATIC BIOLOGY

Colston et al., 2010; Pyron and Burbrink, 2014; Zaheret al., 2014; Fry 2015). Squamates are important researchorganisms in the fields of behavior, ecology, and macro-evolution, for which major studies on their speciation,biogeography, and latitudinal richness gradients havecontributed to the basic understanding of how diversityaccumulates across the globe (O’Connor and Shine 2004;Ricklefs et al., 2007; Pyron and Burbrink 2012; Pyron,2014; Burbrink et al., 2015; Esquerré and Scott Keogh 2016;Esquerré et al., 2017).

Attempts to understand relationships among squam-ates over the last 250 years reflect the input fromhundreds of researchers spanning morphological, small-scale molecular, and now phylogenomic data sets,marked by several key milestones (Oppel 1811; Camp1923; Underwood 1967; Estes et al., 1988; Townsendet al., 2004; Vidal and Hedges 2005, 2009; Conrad 2008;Wiens et al., 2010, 2012; Gauthier et al., 2012; Pyronet al., 2013, 2014; Reeder et al. 2015; Streicher and Wiens2016). Although research from both morphologicaland molecular studies have converged toward sim-ilar content of most family-level groups, relationshipsamong these groups and, more intriguingly, the deepestdivisions within squamates remain highly contentious(Losos et al., 2012). Squamate relationships inferredfrom molecular sequence-based phylogenetics that arefundamentally at odds with historical interpretations ofmorphological data include: 1) monophyly of Toxicofera(snakes, iguanians, and anguiforms; Vidal and Hedges2005), 2) polyphyly of Anilioidea (pipe-snakes) andMacrostomata (large-gaped snakes including Acrochor-didae, Boidea, Bolyeriidae, Colubriformes, Pythonidae,Tropidophiidae, and Ungaliophiidae), 3) paraphyly ofScolecophidia (worm-snakes), 4) phylogenetic affinit-ies of Dibamia and Amphisbaenia within squamates(Hallermann 1998; Rieppel and Zaher 2000; Conrad 2008;Gauthier et al., 2012), and 5) phylogenetic affinities ofHeloderma and Shinisaurus within anguiformes (Gao andNorell 1998).

In brief, all cladistic morphological studies of extantand extinct taxa since the landmark analysis of Esteset al. (1988), which itself tested the original divisionsof Camp (1923), have supported a basal split betweenIguania, represented by Pleurodonta and Acrodonta,and Scleroglossa, which includes gekkotans and allremaining (“autarchoglossan”) squamate groups (seeConrad [2008] and Gauthier et al. [2012] for reviews).This arrangement at the root of crown-Squamata issupported by a suite of morphological characters thatrange from 2 to 10 unambiguous synapomorphiesuniting Scleroglossa as the sister group of Iguania(Conrad 2008; Gauthier et al., 2012). Alternatively, allmolecular phylogenetic studies since Saint et al. (1998),Townsend et al. (2004) and Vidal and Hedges (2005) havedemonstrated that snakes, anguimorphs, and iguaniansshare a more-recent common ancestor to the exclusionof other squamates; this clade is collectively referred toas Toxicofera (Vidal and Hedges 2005). Importantly, re-analysis of combined molecular and morphological data

have found quantitatively similar hidden morphologicalsupport for Toxicofera as well as for Scleroglossa (Reederet al. 2015), which suggests that most traits supportinga basal Iguania/Scleroglossa split are the result of con-vergent ecological adaptations. However, Simões et al.(2018) recently rejected this basal split into Scleroglossausing a new morphological data matrix containingcharacters that support the molecular phylogeny (at leastin part). This suggests that coding of some of thesemorphological traits may have been historically in erroror reflected homoplasy.

Many morphological studies recovered a single originfor large-gaped snakes, referred to as Macrostomata(Cundall et al., 1993; Rieppel et al., 2003; Conrad 2008;Wilson et al., 2010; Gauthier et al., 2012; Zaher andScanferla 2012; Simões et al., 2018). This group has beendefined by a large number of skull features, all involvingthe dentigerous upper and lower jaws, palatal, andsuspensorium bones, which contribute to an increase ofgape size (Rieppel 1988; Cundall and Irish 2008). How-ever, Macrostomata was found to be paraphyletic basedon fossil taxa and morphological data alone (Lee andScanlon 2002; Rieppel et al., 2003; Scanlon 2006; Rieppel,2012). Similarly, molecular studies using mtDNA and/orsingle-copy nuclear genes also demonstrate paraphylyin Macrostomata, where Aniliidae (non-Macrostomata)and Tropidophiidae (Macrostomata) represent sistertaxa to the exclusion of other macrostomatan families(Vidal and Hedges, 2004; Pyron and Burbrink, 2012;Wiens et al., 2012; Streicher and Wiens, 2016); this has alsobeen reinforced by some unconventional morphologicaltraits (Siegel et al., 2011). Finally, a recent study com-bining morphological and molecular data from extantand fossil species using tip dating methods showed thatmacrostomatan morphological features evolved earlyin snakes and subsequently reversed multiple times(Harrington and Reeder, 2017).

Some authors have argued that because morpho-logical data are subjectively coded and demonstrablysusceptible to convergence among key traits, they shouldbe considered biased against estimating correct rela-tionships when compared with more voluminous andobjectively coded molecular data (Wiens et al., 2010;Reeder et al. 2015). Molecular data, therefore, should notbe as influenced by ontogeny, environment, convergence,or user-coded bias when compared with morphologicaltraits. However, a few examples of molecular conver-gence within mtDNA in squamates exist (Castoe et al.,2009) and among particular genes in other organisms(Parker et al., 2013; Projecto-Garcia et al., 2013; Zouand Zhang, 2016). Although instances of genome-wideconvergence occur (Foote et al., 2015), it neverthelessseems unlikely to expect convergence across the nuc-lear and mtDNA genomes while yielding identicaltopologies with respect to groups like Toxicofera andAlethinophidia.

Although genome-scale data should thus be optimalfor resolving squamate relationships, most molecularstudies have been limited by either having many taxa

Page 6: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 3 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 3

for few genes (Pyron et al., 2013) or having few taxa andmany genes (Streicher and Wiens, 2017). For many stud-ies, understanding how many genes support a speciestree and what signal exists for alternative arrangementsis currently unknown for most genomic-scale data sets.Thus, a study using genome-scale molecular data anddense sampling of extant lineages is needed to confirmthe robustness of the molecular signal, particularly withrespect to methods that can interrogate phylogeneticsignal across loci with respect to the species tree (e.g.,Arcila et al., 2017).

Here, we sample 289 species from nearly all squamatefamilies for 394 anchored phylogenomic loci (AHE;Lemmon et al., 2012) to investigate species-tree relation-ships within Squamata, assess support for all nodes,and estimate divergence dates. We then examine theinfluence of each locus on the overall topology. Spe-cifically, we use machine-learning techniques (Lek et al.,1996; Zhang, 2010) to quantify and understand whyparticular genes do not support the standard molecularrelationships and determine if there is hidden supportfor Scleroglossa and Macrostomata. Given past morpho-logical evidence for these relationships, we expect atleast some loci in any phylogenetic context (concatenatedor species trees) should support these relationships.Thus, the genomic distribution of congruence andincongruence should highlight the underlying biolo-gical mechanisms for these topological disputes andpotentially provide resolution. Results from our researchsolidify relationships among most extant squamates,further clarify their taxonomy, and highlight remainingproblems.

METHODS AND MATERIALS

Data setUsing anchored phylogenomics (Lemmon et al.,

AQ8

2012), we generated a genomic data set for 289 spe-cies representing 75 families of squamates and oneRhynchocephalian (see Supplemental material availableon Dryad at http://dx.doi.org/10.5061/dryad.6392n3s).Our taxonomic arrangement followed Vidal and Hedges(2005, 2009), Conrad (2008), and Vidal et al. (2010)for higher-level squamatan clades; Pyron et al. (2013)and Barker et al. (2015) for Booidea and Pythonoideafamilial and generic levels; Zaher et al. (2009) and Kellyet al. (2009), including nomenclatural suggestions ofSavage (2015) and Rhodin et al. (2015) for higher andfamilial caenophidian clades. Whole genomic DNA wasextracted from tissue using the Qiagen DNeasy kitfollowing manufacturer’s protocols at the Center forAnchored Phylogenomics at Florida State Universityand data were assembled using Anchored Phylogen-omics (www.anchoredphylogeny.com). Following Qubitfluorometer quantification using a dsDNA HS Assay kit(Invitrogen™), we sonicated up to 1 g of the extractedDNA to a size range of 150–400 bp using a CovarisE220 Focused-ultasonicator. We then prepared librariesfollowing Lemmon et al. (2012) using a Beckman-Coulter

FXp liquid-handling robot. After ligating 8 bp indexes,we pooled libraries in groups of 16 and enriched thelibrary pools using the AHE probes developed forSquamates by Ruane et al. (2015) and Tucker et al. (2016).After verifying the quality and quantity of the enrichedlibraries by bioanalysis and qPCR, we sequenced thelibraries at the Florida State University translationallaboratory on eight lanes of Illumina HiSeq 2500 withpaired-end 150-bp protocol (∼395 Gb of total data).

Following sequencing, we demultiplexed readspassing the Casava high-chastity filter, allowing forno index mismatches. Next, we merged overlappingread pairs to remove library adapters and correctfor sequencing errors (Rokyta et al., 2012). We thenassembled the reads using the quasi-de novo approachdescribed by Ruane et al. (2015) and Prum et al. (2015),with Calamaria pavimentata and Anolis carolinensis as ref-erences. We avoided sequences derived from low-levelcontamination or misindexing by removing assemblyclusters containing fewer than 500 reads. We also verifiedthe identity of some tissue samples by comparingmitochondrial genes to previously published mitochon-drial sequences (mainly cytb and the rRNAs genes).Mitochondrial sequences were assembled through mapto reference approach using the Geneious mapper inGeneious R9 (Biomatters Ltd.; Kearse et al., 2012). Weestablished orthology by clustering consensus sequencesusing pairwise-distances, as described by Hamilton et al.(2016). For some loci, we removed aberrant sequencesresulting from low-level contamination or low-divergentparalog that passed by our primary filters by using amodified version of the orthology assessment methoddescribed by Hamilton et al. (2016). In this modifiedapproach, we iteratively clustered the sequences usingpairwise-distances for each nested taxonomic level.We then aligned orthologous sequences using MAFFTv7.023b (Katoh and Standley, 2013). We trimmed andmasked the alignments following Hamilton et al. (2016);but with MINGOODSITES = 13, MINPROPSAME =0.3, and MISSINGALLOWED = 82, then inspected thealignments manually in Geneious R9 to identify andremove remaining aberrant sequences.

PhylogenyWe estimated models of substitution for each

locus using ModelFinder (Chernomor et al., 2016;Kalyaanamoorthy et al., 2017), which uses maximumlikelihood to fit 22 substitutional models including upto 6 free-rate gamma categories. We first estimatedphylogeny and tree support using the ultrafast non-parametric bootstrap approximation (n = 1000; Minhet al., 2013) over the partitioned concatenated datasets. Support was also estimated using the Shimodaira–Hasegawa-like approximate likelihood ratio test (SH;Shimodaira and Hasegawa, 1999; Anisimova et al. 2011).With the locus-partitions and substitution models, wegenerated gene trees with support for each locus in IQ-TREE v1.6.6 (Nguyen et al., 2015). We then generated

Page 7: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 4 1–19

4 SYSTEMATIC BIOLOGY

a species tree using ASTRAL III using IQ gene treesas inputs with support estimated using local posteriorprobabilities on quadripartitions with default hyper-parameter inputs (Yule prior for branch lengths andthe species tree set to 0.5; Mirarab and Warnow,2015). We also ran ASTRAL using 1000 bootstrappedIQ trees under the multilocus bootstrapping feature.We compared these concatenated and species treesusing Robinson–Foulds (RF) distances (Robinson andFoulds, 1981), which provide a measure of topologicaldissimilarity, and then determined if all measures ofsupport were correlated among shared branches usingSpearman rank correlation (Supplementary materialavailable on Dryad). The Squamata tree was rooted withtheir closest living relative, the rhynchocephalian Sphen-odon punctatus; outgroup status of this taxon has beendiscussed in other recent molecular and morphologicalphylogenies (Gauthier et al., 2012; Jones et al., 2013; Chenet al., 2015; Harrington et al., 2016).

Because bootstraps, SH, or posterior probabilities donot provide comprehensive measures of underlyingagreements or disagreement among sites and genesfor supporting any topological arrangement, weexamined site and gene concordance factors (sCFand gCF, respectively). Here, gCF indicated thepercentage of gene trees showing a particular branchfrom a species tree (Ané et al., 2007; Baum, 2007),whereas sCF shows the number of sites supportingthat branch. Values of sCF have a lower bound of33% given the three possible quartets for each node,whereas gCF values are calculated from a full genetree and, therefore, may not resolve a particular nodeyielding a lower bound of 0%. We estimated gCF andsCF values across all nodes of the ASTRAL speciestree and concatenated IQ trees using IQ-TREE v1.6.6(Nguyen et al. 2015; Minh et al. 2018) and associatedR code (http://www.robertlanfear.com/blog/files/concordance_factors.html).

Because discordance as estimated here between genetrees and the species tree may be caused by eitherincomplete lineage sorting (ILS) and poorly estimatedgene trees, we attempted to isolate these issues across allnodes. If ILS is driving gene and species-tree differences,then the two discordant topologies at a particularnode (i.e., not the primary node from the species-treetopology) should be equivalent for gCF and for sCF,though sites may be linked within single loci and provideunreliable estimates. Using a X2 test for genes and sitesseparately, we sum the number of genes and the numberof sites supporting the two discordant topologies at eachnode to determine if they deviate significantly frombeing evenly represented. If they are not significant,this is a reasonable expectation that discordance amonggene trees and among sites are due to ILS (Husonet al., 2005; Green et al., 2010; Martin et al., 2015).We estimated X2 between genes and sites using scriptprovided here: http://www.robertlanfear.com/blog/files/concordance_factors.html.

Divergence DatesWe estimated divergence dates (with error) using the

penalized-likelihood approach in TreePL (Smith andO’Meara, 2012) with three phylogenetic data sets. A full

AQ9MCMC-based relaxed phylogenetics method was notcomputationally feasible for a data set of this size. Thefirst data set used for dating was composed of 1000concatenated bootstrapped (pseudoreplicated) data setsgenerated from the IQTREE rapid bootstrap function(UFBoots). Second, because tree space may not havebeen widely explored using this method of generatingbootstrapped trees (Smith et al., 2018), we also generated100 pseudoreplicates in RAxML 1.6.7 (Stamatakis, 2014),estimated phylogeny for each in IQTREE. Third, wealso generated a test tree by fitting the concatenateddata to the ASTRAL topology producing a species-tree topology with branch lengths in substitution rates.We then re-estimated the phylogeny and generateddated trees using TreePL; this method produced datesgiven tree uncertainty from the two bootstrap replicatesand the ASTRAL tree. For TreePL, we chose the bestsmoothing parameter to balance a tradeoff betweenrates across the tree being clock-like or completelysaturated by using a cross-validation approach. Thisapproach sequentially removed terminal taxa, producedan estimate of these branches from the remainingdata given an optimal smoothing parameter, and thenproduced an appropriate smoothing parameter from thefit of the real branch and the pruned branch (Sanderson,2002). We iterated this 14 times over a range of smoothingparameters from 1 × 10−7–1 × 104 and implementedthe thorough option to ensure that the run iteratesuntil convergence. To calibrate these trees, we followedJones et al. (2013), Alencar et al. (2016), and Zaheret al. (2018), and added other taxa, to utilize 26 fossilsand locations on the phylogeny for estimating diver-gence dates (see Supplementary material available onDryad).

Locus Support for Morphological TopologyWe examined how often individual loci recovered

the traditional Scleroglossa/Iguania division and amonophyletic Macrostomata (including Acrochordidae,Boidea, Bolyeriidae, Calabariidae, Candoidae, Charin-idae, Erycidae, Loxocemidae, Pythonidae, Sanziniidae,Tropidophiidae, Ungaliophiidae, Xenopeltidae, and all17 families of Colubroides recognized herein). For Sclero-glossa, we tested for monophyly of all squamates exclud-ing Iguania. For Macrostomata, we tested for monophylyof Amerophidia (Aniliidae + Tropidophiidae), a groupthat rejects Macrostomata, as aniliids are canonicalnon-macrostomatans, and tropidophiids are canonicalmacrostomatans (Vidal and Hedges, 2004). This group-ing in particular is an example of conflict betweenmolecular, osteological, and soft-tissue characters (seeSiegel et al., 2011; Hsiang et al., 2015). To test a more

Page 8: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 5 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 5

difficult phylogenetic relationship, we also examined theplacement of Dibamia, here inferred in the species treeas sister to Gekkota.

We first estimated the probability of Toxicofera andAmerophidia monophyly, respectively, using the meas-ures of support in the species tree (quadripartition andbootstrapped IQ gene trees) and asked how many locialso recover these relationships. For the remaining locithat did not show these relationships, we asked howoften they find the basic Scleroglossa/Iguania split andmonophyly of Macrostomata and whether these outlierloci together recovered these traditional relationshipsusing species-tree methods. Similarly, we also examinedkey placements for Dibamia as the sister group to: 1) allother squamates, 2) Gekkota, and 3) all other squamatesexcluding Gekkota.

Genomic Interrogation Using Neural NetworksUsing a custom R script (R Core Team, 2015;

script provided as Supplementary material availableon Dryad), we examined the frequency of genes thatsupport each node of the dated species tree, similar toSmith et al. (2018). We assessed this support over alldated nodes to understand where and when particularphylogenetic relationships were poorly supported bythe density of gene trees that disagree with the speciestree. We then attempted to investigate why certain locidid not show the same relationships as the speciestree.

Using RF distances (Robinson and Foulds, 1981)between each gene tree and the species tree (RFgtst)as the response variable, we chose characteristics ofgenes known to affect resolution, support, or topologicalaccuracy such as gene length, base-pair composition,alignment gappiness, and variable sites (Rosenberg andKumar, 2003; Felsenstein, 2004; Wortley et al., 2005;Nagy et al., 2012; López-Giráldez et al., 2013; Som, 2015;Duchêne et al., 2017), which here included the followingparameters as a standard set of predictor variables: 1)number of sites, 2) base-pair content, 3) number ofparsimony-informative sites (PIS), 4) number of gaps, 4)maximum gap length, 5) mean gap length, 6) number ofsegregating sites with gaps, 7) standard deviation of gaplength, 8) sites with more than one substitution, 9) num-ber of bases observed at each site (1–4), 10) maximumor minimum phylogenetic informativeness, and 11) geneand site concordance (gCF and sCF). Metrics 1–9 weregenerated with a custom script using the Ape packagein R (Paradis et al., 2004). RF distances were estimatedusing a custom script based on the package phangorn.Metric 10 was estimated using a custom script based onthe package phyloinformR (Dornburg et al., 2016), whichestimates rates per site from the web server Phydesign(Lopez-Giraldez and Townsend, 2011) and metric 11 wasgenerated in IQ-TREE v1.6.6 (Nguyen et al., 2015).

To understand if these variables can predict RFgtst, weused artificial neural network (NN) regressions in caret(Kuhn, 2008). Artificial NNs are a type of deep-learning

method used as an alternative to likelihood techniquesto address complex questions with many predictorvariables in population genomics (Libbrecht and Noble,2015; Sheehan et al., 2016). As such, they are a powerfultool for understanding non-linear interactions amongnumerous parameters to predict responses resultingin a range of simple to complex models regardlessof the statistical distribution of those variables or therelationships among them (Lek et al., 1996; Zhang,2010). These methods have been successfully used toaddress phylogenetic and comparative tree-based testspreviously (Burbrink et al., 2017; Burbrink and Gehara,2018).

In brief, in the typical NN with multilayer feed-forward back-propagation used here, the basic structurebegan with genetic input variables (input neurons),joined by weighted synapses to hidden neurons, andthen finally ending in an output neuron. Every nodewas connected to every other node in the previous layer,and each node was provided with an activation value,defined as the difference between the weighted sum ofall inputs and a bias (threshold) parameter; connectedhidden neurons activate an output node that is comparedwith a known (dependent) value.

Specific to our NN, we scaled all predictor variablesby the minimum and maximum range of each datacategory. We separated the data into 70% standardtraining and 30% test. We also tested accuracy atother training and test percentages, respectively: 50/50,60/40, 70/30, 80/20, and 90/10. Each of these wasrun using 1000 maximum iterations, which ensuredconvergence. We resampled the data using the default25 bootstrap replicates to reach convergence with thefollowing tuning parameters: weight decay, root meansquared error (RMSE), r2, and mean absolute error(MAE). We examined the power of these variables andmodels to predict the response (RFgtst) by recompos-ing the test and training sets over 100 iterations andcomparing the resulting test statistics (RMSE, r2, andMAE) to those from randomized response variablesfor each of these 100 estimated models. Over thesereplicates, we also identified the top five most importantmodel variables (Supplementary Fig. S1 available onDryad).

Multicollinearity among variables may be problematicfor constructing model inferences if there is a lineardependence among these independent variables (DeVeaux and Ungar, 1994; Hastie et al., 2009; Dormannet al., 2013). Although overparameterization is often nota problem for NN given that they are used for predictionof the system and not necessarily for interpretation, pre-vious studies have demonstrated that machine-learningtechniques in general may be sensitive to changes invariables over collinear data (Shan et al., 2006; Dormannet al., 2013). Our NN models contain 26 variables thatmostly describe properties of genes; therefore, we usedvariance inflation factors (VIF; Montgomer et al., 2012)to filter collinearity data in the R package “Faraway”(Faraway, 2002). Here, we chose standard VIF >10 andremoved all but one variable greater than this value. We

Page 9: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 6 1–19

6 SYSTEMATIC BIOLOGY

then repeated the NN analyses as with all of the variablesdescribed above.

Similarly, we used NN to understand if the samevariables used to describe genes, but now also includingRFgtst, could be used to classify why loci do or donot support the monophyly of Toxicofera, Amerophidia(which implies lack of support for Macrostomata), andthe Gekkota and Dibamia sister relationship (GD). Ourbinary predictions were classified as “1” if the locussupported the relationship and “0” if it did not. We againused the “train” function with the identical set up asabove but replaced the tuning parameters with weighteddecay and model accuracy. This was replicated 500 timesfor recomposing test and training data sets, estimatingaccuracy, and then testing this against the accuracyof randomized responses. For each of these replicates,we also tested performance using a confusion matrix(Townsend, 1971; Kuhn, 2008) comparing predicted withactual classification from the training data.

RESULTS

AssembliesWe were able to recover 99.0% (SD = 2.3%) of the 394

target AHE loci (Supplementary material available onDryad). An average of 23.5% (SD = 10.0%) of the readsmapped to the target region. The resulting consensussequences averaged 1775 bp (SD = 180 bp). We obtainedfor each locus an average of 1.7 consensus sequences(assembly clusters, SD = 0.67), indicating that the lociwere low copy but not all loci had single copies. Theaverage coverage of these consensus sequences was 218(SD = 83).

Data setLoci had on average retained 92.4% (SD = 10.8)

of all taxa represented in the species tree (n = 289).Mean length and number of PIS were 1302 bp (range:170–2075, SD = 282.72) and 749 bp (range: 61–1455;SD = 282.7167), respectively. We found that 18 modelsprovided a best fit for substitutions across all loci, withthe TVM (transversion model, AG = CT, unequal basefrequencies) fitted to 27% of loci, GTR fitted to 18.1 % ofmodels, with either a five or six free-rate parameter (R)model describing rate heterogeneity.

PhylogenyTree estimates using either concatenated or species-

tree methods produced similar topologies (Figs. 1 and 2),AQ22 with RF distances of 44 between these trees being only

7.7% of the maximum RF distance. All methods showedstrong nodal support. All measures of support—whichincluded ASTRAL with local posterior probabilities,ASTRAL with 1000 IQ tree UF bootstraps, concaten-ated partitioned IQ trees with 1000 bootstraps, andconcatenated partitioned IQ trees with SH likelihoods

showed 0.90, 0.92, 0.93, and 0.95 percent of nodeswere supported above 0.95 for each method, respect-ively (Supplementary material available on Dryad).For shared nodes, support was significantly correlatedamong methods ( = 0.55–0.66; P= 2.2 × 10−16).

All phylogenies generally showed relationshipsamong groups that have been commonly recoveredin previous genomic or combined molecular-morphological studies (Figs. 1 and 2; Vidal andHedges, 2005; Wiens et al., 2012; Pyron et al., 2013;Reeder et al., 2015a; Streicher and Wiens, 2017). Wefound unambiguous support (i.e., support valuesof 1.0) for all main groups, including Unidentata,Episquamata, Toxicofera, Gekkota, Scincomorpha,Laterata, Anguiformes, Iguania, and Serpentes(Fig. 1). The tree supported an early division betweenGekkota/Dibamia and the remainder of Squamata,followed by a Scincomorpha and Episquamatasister relationship, and within Episquamata asister relationship between Laterata and Toxicofera.Resolution and support within each of these groupswas generally unambiguous (Supplementary materialavailable on Dryad). For instance, most of the nodeswithin Colubroides received probability support valuesof 1.0, except for sister relationships between Colubridae+ Grayiidae and Lamprophiidae + Pseudoxyrhophiidae,and the placement of Natricidae and Elapidae withinColubroidea and Elapoidea, respectively. Likewise,the remainder of Serpentes was well supported exceptfor the placement of Candoiidae and Bolyeriidae,and the sister relationship between the two cladescontaining the most recent common ancestors (MRCA)of 1) Pythonidae and Bolyeriidae and 2) Calabariidaeand Boidae. We also did not find strong support forrelationships among some families in Pleurodonta(Iguania) and Gekkota, the placement of Dibamia assister to Gekkota, and placement of Anguiformes orIguania (or their MRCA) as sister to Serpentes withinToxicofera (Fig. 2).

Tree SupportOur estimates of sCF and gCF were correlated across

the species tree of squamates (Fig. 3) but we note thatboth of these measures fell well below standard meas-ures of Pp support. For many of the standard squamaterelationships, including Toxicofera and Amerophidia,gCF were above 50%, even whereas sCF remainedlow. This indicated that estimating support from sitesalone, such as in bootstraps, may not provide credibleestimates of support with genomic-scale data. We alsodemonstrated a significant relationship between gCFand sCF and branch length (Fig. 3).

Most of the gene-to-species-tree discordance canbe described by ILS, where 82.6% of nodes did notshow significant differences between the two discord-ant topologies. Although likely not as reliable givennon-independence among sites, sCF shows 39.7% ofnodes not showing significance among discordance sites.

Page 10: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 7 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 7

Elapoidea incertae sedis

Quate

rnary

Neogene

Pale

ogene

Cre

tace

ous

Jura

ssic

Tria

ssic

Colubridae

GrayiidaeCalamariidaeSibynophiidae

Dipsadidae

Pseudoxenodontidae

Natricidae

Psammophiidae

Atractaspididae

Pseudoxyrhophiidae

Atractaspididae

Elapidae

Cyclocoridae

Lamprophiidae

*

Homalopsidae

Viperidae

PareidaeXenodermidaeAcrochordidaeUropeltidaeCylindrophiidae

Boidae

Candoiidae

Erycidae

UngaliophiidaeCharinidaeSanziniidaeCalabariidae

Pythonidae

LoxocemidaeXenopeltidae

BolyeriidaeAniliidaeTropidophiidae

Anomalepididae

Typhlopidae

Gerrhopilidae

Leptotyphlopidae

VaranidaeShinisauridaeXenosauridaeAnguidaeAnniellidae

DiploglossidaeHelodermatidae

PolychrotidaeLeiosauridaeOpluridaeHoplocercidaeLiolaemidae

CrotaphytidaeCorytophanidae

LeiocephalidaeTropiduridae

DactyloidaeIguanidaePhrynosomatidae

Agamidae

Chamaeleonidae

Teiidae

GymnophthalmidaeLacertidaeAmphisbaenidae

TrogonophiidaeBipedidaeXantusiidaeCordylidae

Gerrhosauridae

Scincidae

Sphaerodactylidae

GekkonidaePygopodidae

CarphodactylidaeDiplodactylidaeEublepharidaeDibamidae

SphenodontidaeRhyncocephalia

Squamata

Dibamia

Elapoidea

Colubroidea

Endoglyptodonta

ColubriformesColubroides

Caenophidia

Afrophidia

Amerophidia

Alethinophidia

Pythonoidea

Booidea

Uropeltoidea

Typhlopoidea

Serpentes

Pleurodonta

Acrodonta

Iguania

Anguiformes

Neoanguimorpha

Paleoanguimorpha

Anguioidea

Teiioidea

Lacertibaenia

Laterata

Toxicofera

Episquamata

Unidentata

Scincomorpha

Cordyloidea

Scincoidea

Gekkota

Amphisbaenia

Amphisbaenoidea

FIGURE 1. Dated species tree for Squamata with all major taxonomic categories indicated. Dark circles represent areas of local posteriorprobabilities on quadripartitions <95%. Black stars represent the location of fossils. Tip labels and dating error estimates are available in theSupplementary material available on Dryad.

Page 11: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 8 1–19

8 SYSTEMATIC BIOLOGY

Colubridae

Grayiidae Calamariidae

Sibynophiidae Dipsadidae Pseudoxenodontidae

Natricidae Psammophiidae

Atracta

spidi

dae

Pse

udox

yrhop

hiida

e

Lampr

ophii

dae

Elap

idae

Ela

poid

ea in

certa

e se

dis

Cyc

loco

ridae

Hom

alop

sida

e

Vip

erid

ae

Par

eida

e

Xen

oder

mid

ae

eadidrohcorcA

Uropeltidae

Cylindrophiidae

Boidae

Candoiidae

Erycidae

Ungaliophiidae

Charinidae

Sanziniidae

Calabariidae

Pythonidae

Loxocemidae Xenopeltidae Bolyeriidae Aniliidae Tropidophiidae Anomalepididae

Typhlopidae Gerrhopilidae

Leptotyphlopidae

Varanidae

Shinisauridae

Xenosauridae

Anguidae

Anniellidae

Diploglossidae

Helodermatidae

Polychrotidae

Leiosaurid

ae

Oplurid

ae

Hoploc

ercid

ae

Liolae

mida

e Cr

otap

hytid

ae

Cory

toph

anid

ae

Leio

ceph

alid

ae

Trop

idur

idae

D

acty

loid

ae

Igua

nida

e

Phr

ynos

omat

idae

eadi

mag

A Cham

aeleonidae Teiidae

Gym

nophthalmidae

Lacertidae Am

phisbaenidae Trogonophiidae

Bipedidae Xantusiidae

Cordylidae

Gerrhosauridae

Scincidae

Sphaerodactylidae

Gekkonidae

Pygopodidae

Carphodactylidae

Diplodactylidae

Eublepharidae

Dibamidae

SphenodontidaeTriassic

Serpentes

Anguiformes

Iguania

Laterata

Scincomorpha

Gekkota

Toxicofera

Episquamata

Squamata

Unidentata

Dibamia

PP Support <0.95

FIGURE 2. Reduced dated, circle phylogeny showing family-level relationships and higher. Low support (<95%) from local posteriorprobabilities on quadripartitions, bootstraps, and SH tests are indicated according to the legend.

Finally, logistic regression for the presence or absence ofToxicofera and Amerophidia given sCF was significant(P= 0.012 and 3.99 × 10−5), whereas the presence of theDibamia/Gekkota node showed no relationship withsCF (P= 0.45).

For each node of the species tree, more genes recovered

AQ5

AQ6

AQ23

a particular node than did not. This does not indicate,however, that the majority of gene topologies were thesame as the species tree (Fig. 4); 72% of nodes weresupported by the majority of gene trees. Most of themain groups showed high species-tree and individual-locus agreement, though we note that at 90–95 Ma,relationships among Pleurodonta families (Iguanians)and the placement of Candoiidae and Bolyeriidae amongPythonoidea and Booidea, respectively, showed strongdiscordance between the number of gene trees showingthe species-trees relationship. Most loci yielded strongphylogenetic informativeness over substantially largesubstitution rates to infer the origin and diversificationthroughout the history of Squamata sampled in our datetrees (Supplementary material available on Dryad).

Neither concatenated analyses nor species treessupported the traditional Scleroglossa or Macrosto-mata. In addition, no loci recovered those groupseither. In contrast, Toxicofera and Amerophidia werewell supported (100%) in concatenated and species-tree estimates (Supplementary material available onDryad). Among individual genes, 75% and 69% of locirecovered monophyletic Toxicofera and Amerophidia,respectively. However, the sister relationship betweenGekkota/Dibamia (GD) was inferred in all concatenatedand species-tree estimates but with low support (only42.5% of individual loci).

We also generated species trees using ASTRAL III fromthe loci not supporting Toxicofera, Amerophidia, andGD. For the former, we found that among these smallersets of loci, species trees did not support Scleroglossa,but rather placed Laterata sister to Serpentes, followedby Iguania and then Anguimorpha (Supplementarymaterial available on Dryad). When forcing a sisterrelationship between Laterata and Serpentes, estimat-ing a best supporting IQTree for this constraint, and

Page 12: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 9 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 9

0

25

50

75

100

0 25 50 75 100

0.4

0.6

0.8

1.0

QuadripartionSupport

Amerophidia

Toxicofera

Serpentes

Iguania

Anguiformes

LaterataScincomorpha

Gekkota

Gekkota/Dibamia

Unidentata

Episquamata

0 1 2 3 4 5

020

4060

8010

0

Branch Length (coalescent units)

Con

cord

ance

Fac

tors

Gene Concordance Factor

Site Concordance Factor

Gene Concordance Factors

Site

Con

cord

ance

Fac

tors

FIGURE 3. Plot showing the relationship between site and gene concordance factors (sCF and gCF) relative to quadripartition support fromASTRAL (top) and concordance factors regressed against branch length in coalescent units (bottom).

calculating gCF and sCF for that nodal constraint, wefind very little support, essentially random, for thisarrangement: gCF = 4.32 and sCF = 33.1. For those loci notsupporting Amerophidia, we found Aniliidae as sister toAlethinophidia, but still not supporting Macrostomata,given the inclusion of Uropeltoidea in Alethinophidia(Supplementary material available on Dryad). Finally,we found that 73% of the loci did not recover the GDrelationship but instead resolving Dibamia as sister toall squamates (43% of loci) or as Dibamia as sister tothe remaining Squamata after Gekkota (30% of loci;Supplementary material available on Dryad).

Divergence DatingOur divergence dates were largely congruent with

recent studies focused on estimating divergence times

in Squamata (e.g., Mulcahy et al., 2012; Jones et al.,2013; Pyron, 2016). Our analysis, though, providedan unprecedented taxonomic and genomic coverageat lower hierarchical levels of the squamate tree thatenabled new estimates of divergence dates among extantfamilies. Between the different sets of bootstrappedphylogeny, we found that the absolute mean differencefor dated nodes was only 0.668 Ma, and all dates atall shared nodes were correlated ( = 0.925, P = 2.2 ×10−16). Our estimates of divergence dates (Figs. 1 and 2;Supplementary material available on Dryad) suggestedthe origin of crown-Squamata to be in the Early Jurassic(190 Ma), though recent fossil evidence suggested thismay have occurred earlier in the Late Triassic (206 Ma;Simões et al., 2018). Deep divergences within the treeof Squamata occurred between the Early and MiddleJurassic, with the divergence among Gekkota–Dibamia,

Page 13: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 10 1–19

10 SYSTEMATIC BIOLOGY

Scincomorpha–Episquamata, Laterata–Toxicofera,Lacertibaenia–Teiioidea, and Iguania–Anguiformesoccurring between 190 and 155 Ma.

A large number of modern groups diverged withinthe Cretaceous: gekkotan, cordyloid, teiioid, anguiform,acrodontan, pleurodontan, and typhlopoid extant fam-ilies. Divergence of crown-pleurodont iguanian familiesoccurred within a short interval during the EarlyLate Cretaceous, between ∼98 and 79 Ma, followingthe end of the opening of the South Atlantic andduring a period of isolation of the western Gondwananlandmasses (McLoughlin, 2001). Similarly, their sister-group acrodontan extant families, Chamaeleonidae andAgamidae, diverged at ∼100 Ma. Afrophidian stem-Uropeltoidea, Pythonoidea, Booidea, and Caenophidiaalso diverged within the Late Cretaceous, althoughmost of their extant families diverged after the K/Pgboundary. Notable exceptions are the Cylindrophiidae,Bolyeriidae, Xenopeltidae, Calabariidae, Acrochordidae,and Xenodermidae, which diverged in the Late Creta-ceous. All extant families of the most diverse group ofsquamates, the Colubriformes (>3600 extant species),diverged within the Paleogene (Pareidae, Viperidae,Homalopsidae) or throughout the Eocene (all remainingfamilies).

Gene Interrogation via NN: Scleroglossa and MacrostomataTo understand discordance between gene trees and

species trees beyond ILS, we used two NN analyses.For the first, we used a regression approach andmodeled predictions for the RFgtst discordance. AllNN analyses converged and accuracy did not varyamong training data sets ranging in size from 0.5 to 0.9(mean and SD r2 =0.87, 0.015). We estimated an averageRMSE, r2, and MAE (SD in parentheses) of 0.06 (0.013),0.86 (0.07), 0.05 (0.009), respectively. These stronglysuggested that the NN was accurate, particularlyrelative to random responses (test between real andrandom accuracy metrics; P< 2.2 × 10−16), which wereon average 0.24 (RMSE), 0.01 (r2), 0.17 (MAE; Fig. 5).The top five most important variables for predictingRFgtst by each locus were: mean and SD of sCF, PIS,phylogenetic informativeness, frequency of sites withthree observable alternate bases, and frequency ofsites with four observable alternate bases (Fig. 6).Individually, Bayesian correlation (BayesFirstAid(https://github.com/rasmusab/bayesian_first_aid)between each of these variables and RFgtst showedstrong negative correlation (Fig. 6). We also testedthe efficacy of the NN approach by filtering formulticollinearity using VIF and removed all but fivevariables (mean and SD sCF, PIS, number of tips, andnumber of gaps). This essentially produced the sameprediction as using all 26 variables (accuracy = RMSE =0.06, r2 = 0.87, MAE = 0.047) with variable importanceranked at 1.0 for all uncorrelated variables: PIS, numberof tips, number of gaps, and standard deviation andmean sCF.

For the second machine-learning approach, we usedNN classification to understand if any of the properties ofthese loci along with RFgtst could predict why particulargenes failed to recover Toxicofera, Amerophidia, andDibamia/Gekkota. We determined that accuracy (scaledbetween 0 and 1) when running these models overthe training data set only had a modest increase overrandomizing the responses (Fig. 5; accuracy for Tox-icofera = 0.08, Amerophidia = 0.08, Dibamia/Gekkota= 0.13), suggesting difficulty determining why certainloci fail to find these three relationships. Similarly, areaunder the ROC curve was low (0.60–0.70), and confusionmatrices were not significant (P = 0.16–0.44). The top-ranked importance variables were RFgtst (0.96–1.0) andphylogenetic informativeness (0.92–0.96) for Toxicoferaand Dibamia/Gekkota. For Amerophidia, we found thatthe variables RFgtst (0.94) and frequency of adenosinebases (0.75) ranked highest.

DISCUSSIONUsing a genome-scale data set with a thorough and

diverse sampling of taxa, we corroborate nearly all recentmolecular studies in estimating strong support for sev-eral fundamental groupings in Squamata: Unidentata,Scincomorpha, Episquamata, Laterata, Toxicofera, andAmerophidia. Using gene-interrogation techniques, wedo not find any support in the genome for the traditionalmorphology-based Scleroglossa/Iguania division norfor monophyly of large-gaped snakes, Macrostomata.Although the former has mainly fallen out of use inthe literature because the rise of DNA sequence-basedphylogenies of snakes, the latter has remained in usegiven analyses of morphological data that stronglysupports it (Conrad, 2008; Gauthier et al., 2012; Zaherand Scanferla, 2012; Hsiang et al., 2015).

Artificial NNsArtificial NNs show that loci yielding discontinuity

with the species tree, as measured by scaled RF distance,can be characterized by a few general properties.Although traditional support remains high across mostparts of this phylogeny, both gCF and sCF provideanother view where in several cases loci fail to infer keynodes (Figs. 3 and 4). We developed this NN approachto better understand if particular properties of genes orsimple ILS can account for discordance (Figs. 5 and 6).Overall, most nodes reveal a pattern consistent with ILS.When using NN and filtering for multicollinearity topredict the degree of gene and species-tree discordance,we found the following properties of genes and sampleswere important for resolving species-tree phylogenies:the number of PIS, mean site concordance across allnodes in the tree, the number of terminals sampled,and, importantly, variance in concordance sites acrossall nodes of each gene tree.

As expected, genes with higher mean sCF across allnodes are better at producing trees concordant with the

Page 14: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 11 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 11

0 100 200 300

0

50

100

150

200

250

−300 −200 −100

Node Date

Supports

Species

Tree

Does Not

Support

Species

Tree

SquamataUnidentata

EpisquamataScincomorphaToxicoferaLaterata

Iguania

AnguiformesSerpentes

Amerophidia

Colubroides

Dibamia/Gekkota

Bolyeridae(Xenopeltidae(Loxocemida,Pythonidae))

Candoiidae,Boidae

Pleurodonta Relationships

FIGURE 4. Density of loci supporting (red) or not supporting (blue) nodes in the species tree scaled against dates of nodes (Ma). Majortaxonomic groupings are indicated along these locus densities by particular node.

species trees. This gene property is correlated with thenumber of PIS ( = 0.400, P< 1.1 × 10−15). In turn, withour tests of multicollinearity, PIS is correlated with alarge number of other properties including phylogeneticinformativeness over time (related to substitution rateover time), length of the gene, base-pair frequencies, andnumber of segregating sites. Interestingly, high variancein sCF predicts higher discordance between the geneand species-tree topologies as measured by RF. Thissuggests for the main topology estimated using species-tree techniques, genes with high sCF variance areconcordant with some nodes and not others, again likelyassociated with the number of PIS per locus. In addition,both gCF and sCF are correlated with branch lengthswhich indicates that difficult areas of a phylogeny toinfer with credible support will more likely be thosewhere the timing between divergences were small. Thishas been known from the phylogenetic literature usingboth concatenated and multispecies coalescent-basedmethods (Philippe et al., 1994; Xu and Yang, 2016).

In summary, gene-tree and species-tree discordance isa mix of ILS, which is easy to ameliorate given coalescent-based phylogenetic inference, various properties of thegenes, like site concordance and PIS, and short timesbetween divergences, which may be difficult to resolvewith any amount of data. It is likely that this NNmethod could serve to filter loci prior to final species-tree estimation. Care should be taken in cases where amajority of loci feature poor properties for tree inference,such as low PIS; preliminary species trees would likelybe poorly estimated and supported, thus providingunusable estimates of sCF and gCF.

Squamate PhylogenomicsSome authors have suggested that molecular con-

vergence, such as that found in previous phylogeneticstudies (e.g., Castoe et al., 2009), may account for anerroneous estimation of Toxicofera (see Losos et al.,2012). However, this suggestion, even when using ahandful of loci, seems improbable given the over-whelming support for the group estimated across manyindividual markers—including nuclear, mitochondrial,and structural (SINE) loci—and from concatenated andspecies trees. In addition, there appears to be verylittle unambiguous evidence for Scleroglossa in mostmorphological data sets (Reeder et al. 2015). Molecularconvergence has been found within loci—for instance,within burrowing squamates in mtDNA (Castoe et al.,2009), and even across the genome in some limitedexamples (Foote et al., 2015). However, expecting con-vergence across most or all heritable markers includingUCE (Streicher and Wiens, 2017), AHE loci, and mtDNAis unlikely given independence and function of theseloci and the extremely low probability that taxa wouldrandomly show the same relationships across thesegenes.

We do not find a single locus supporting Scleroglossa,whereas a majority of all possible loci containing all taxarecover Toxicofera (Figs. 3 and 4). Moreover, both con-catenated and species-tree estimates support Toxicoferaat 100%. The loci that do not recover Toxicofera arealso problematic for most relationships, yielding highergene-to-species-tree RF values. Interestingly, species-tree estimates from these loci also do not supportScleroglossa and still show some support for Toxicofera(Supplementary material available on Dryad). Previousresearch with a smaller data set also failed to find

Page 15: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 12 1–19

12 SYSTEMATIC BIOLOGY

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

30

Den

sity

0.5 0.6 0.7 0.8 0.9

02

46

810

Den

sity 0

0.5 0.6 0.7 0.8 0.9

02

46

8D

ensi

ty

0.5 0.6 0.7 0.8 0.9

02

46

810

Den

sity

Accuracy

Dibamia/Gekkota

Toxicofera

Amerophidia

RFst-gt

Random Real

FIGURE 5. Accuracy of NNs. The top graph shows accuracy predicting the RF distances between gene and species tree (RFstgt) using a neuralnet regression approach. The bottom three graphs show results from a neural-network-classification analyses to determine why particular locifail to find the indicated three phylogenetic groups. The figures represent the density of accuracy from known (real) classifications (0—does notsupport the group, 1—does support the group) and randomly shuffled classifications.

gene-support for Scleroglossa and supported Toxicofera(Reeder et al. 2015). Given that characters supportingScleroglossa (Conrad, 2008; Gauthier et al., 2012) maybe problematic due potential convergence arising fromindependent adaptation to burrowing (Reeder et al.2015), we discourage any further use of Scleroglossa as anomen outside of a historical context. On the other hand,sister-group relationships within Toxicofera remainunresolved given that the clade formed by Iguania andAnguiformes received low support values in one method(quadripartition support), despite the large numberof loci used in our study. This relationship has beeninferred previously using genomic data with greatersupport but using less taxa (Streicher and Wiens, 2017).

Therefore, given independent support between UCEsand AHE data, it may be likely that the node subtendingIguana/Anguiformes is correct.

It is important to note in our study Serpentes showsa long unbranched interval of almost 50 Ma separatingit from the remaining two toxicoferan clades (Fig. 1).The lack of representation of a number of key extinctlineages capable of resolving the placement of snakesby filling this geochronological gap may also confoundphylogeny. These remaining two toxicoferan cladesare also subtended by an unusually small branch,<1.4 my, where 98% of branching times in this tree

AQ10exceed this length. Unsolved relationships within Tox-icofera and other difficult regions in this tree (e.g.,

Page 16: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 13 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 13

−1.0 −0.5 0.0 0.5 1.0

−0.78 −0.67

0.0 0.5 1.0 1.5

0.0

0.2

0.4

0.6

N 370

−1.0 −0.5 0.0 0.5 1.0

median = −0.70

95% HDI

−0.76 −0.64

−100 0 100 200 300 400 500

0.0

0.2

0.4

0.6

N = 370

−1.0 −0.5 0.0 0.5 1.0

−0.85 −0.78

200 400 600 800 1000 1200 1400

0.0

0.2

0.4

0.6

RF

Dis

tanc

e G

T-S

T

N = 370

−1.0 −0.5 0.0 0.5 1.0

−0.82 −0.73

0 200 400 600

0.0

0.2

0.4

0.6

N = 370

Parsimony Informative SitesPhylogenetic Informativeness

RF

Dis

tanc

e G

T-S

T

RF

Dis

tanc

e G

T-S

T

RF

Dis

tanc

e G

T-S

T

Sites w 3 bases

Sites with 4 Bases

95% HDI

95% HDI

% HDI95

median = -0.78

ρ ρ

ρ

ρ

0

20

40

60

80

100

# S

se s

ites

w g

aps

# ga

ps

Gap

Siz

e S

D

Max

gap

siz

e

Num

ber

Tax

a

Fre

quen

cy b

ase

A

Num

ber

of s

ites

SD

Site

Con

c fa

ctor

s

Mea

n si

te c

onc

fact

ors

Site

s w

1 b

ase

ML

Diff

b/w

ST

& G

T

Pa

rs in

f si

tes

Ph

ylo

ge

n I

nf

Site

s w

4 b

ase

s

Site

s w

3 b

ase

s

Site

s >

0 s

ub

st

Me

an

ga

p s

ize

Site

s w

2 b

ase

Fre

quen

cy b

ase

T

Var

iabl

e Im

port

nace

(Fre

q)

−1.0 −0.5 0.0 0.5 1.0

0.63 0.75

0.10 0.15 0.20 0.25 0.30 0.35

0.0

0.2

0.4

0.6

median = -0.82 median = -0.73

median = 0.69

SD Site Concordance Factors

RF

Dis

tanc

e G

T-S

T

ρ

a)

b)

FIGURE 6. A) Variable importance (top five from each of 100 replicates) from a regression NN analysis designed to predict the scaled responseof RF distances for each gene tree (GT) against the species tree (ST). B) Bayesian correlations between RF distances and each of the four mostimportant predictor variables. Distribution of correlations () between each variables is indicated about the four panels.

AQ24

Page 17: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 14 1–19

14 SYSTEMATIC BIOLOGY

relationships within Pleurodonta), therefore, may bedue to excessively short internodes, which may requiredata from the entire genome to estimate with highsupport.

Similarly, we find strong support for a deep sister rela-tionship between Aniliidae (“Microstomata”) and Trop-idophiidae (Macrostomata) using both concatenated andspecies-tree methods (bootstrap and Pp support = 100%)and among loci (present in 69% of gene trees), indicatingthat the taxon Macrostomata defined by snakes withexpanded gapes (Zaher, 1998; Conrad, 2008; Wilson et al.,2010; Gauthier et al., 2012; Hsiang et al., 2015) is invalid.This suggests that the origin of large gapes has eitherevolved or been lost multiple times and is consistent withHarrington and Reeder (2017), and it may indicate thatprevious hypotheses on the origin of a “macrostomatan”diet in Serpentes requires re-analyses (e.g., Greene1997; Rodriguez-Robles et al., 1999). Unfortunately, the

AQ11 phylogenetic affinities of a number of extinct alethi-nophidian lineages with key macrostomatan features—such as Pachyrhachis, Haasiophis, Eupodophis, Yurlunggur,Wonambi, and Sanajeh—remain in dispute (Conrad, 2008;Gauthier et al., 2012; Zaher and Scanferla, 2012; Reederet al. 2015). A more accurate placement of these fossilsat the base of the alethinophidian tree might helpclarify higher-level affinities between amerophidianand afrophidian lineages, including the position ofuropeltids and tropidophiids.

Within Anguiformes, both concatenated and species-tree estimates support Shinisaurus as the sister taxonto varanids and Helodermatidae as the sister taxonto Anguioidea (Diploglossidae, Anniellidae, Anguidae,Xenosauridae). This topology conflicts with the onesuggested by morphological data sets, including thosewith expanded fossil sampling, where helodermatidsand Shinisaurus are traditionally recognized as thesister groups of varanids and Xenosaurus, respectively(McDowell and Bogert, 1954; Gao and Norell, 1998;Gauthier, 1998; Gauthier et al., 2012). However, Con-rad (2008) has shown that Shinisaurus and Xenosauruswere not closely related, the former being the sistergroup of varanoids (including Helodermatidae) and thelatter as the sister group of Anguidae (Conrad, 2008).Conrad’s (2008) morphological tree closely matchesmolecular estimates of anguiform affinities, includingours, with the exception of the phylogenetic positionof Helodermatidae. The distinct and strongly supportedevidence provided by morphological and molecular dataregarding the phylogenetic affinities of Helodermatidaewithin Anguiforms is another point of conflict that stillawaits a solution.

Finally, the placement of Dibamia within Squamatahas been difficult to resolve; molecular studies placeDibamia as the sister to Gekkota, sister to the remainingSquamata, or sister to Unidentata (Townsend et al., 2004;Vidal and Hedges, 2005; Pyron et al., 2013; Reeder et al.2015; Streicher and Wiens, 2017). Most morphologicalstudies place Dibamia with other limbless, burrowingtaxa such as Amphisbaenia or Serpentes (Evans andBarbadillo, 1998, 1999; Hallermann, 1998; Lee, 1998;

Rieppel and Zaher, 2000; Evans et al., 2005; Conrad, 2008;Gauthier et al., 2012), though authors typically acknow-ledge that many characters showing this relationshipmay be due to convergence reflecting shared ecology.Our analyses inferred Dibamia as the sister to Gekkota,but with poor support among methods and loci, whereboth gCF and sCF were split evenly among primaryand discordant nodes for both metrics. The next-most-probable placement is sister to all other Squamata,which was also found using UCEs in Streicher andWiens (2017), then sister to Episquamata. Importantly,our results never find Dibamia closely related to otherlimbless or burrowing taxa. In general, other methodscharacterizing genomic data sets have also had difficultyconfidently estimating deep relationships, for whichparticular genes may have biased results (Brown andThomson, 2016).

Phylogenetic Structure and the Origin of SquamatesThe structure of our phylogeny is extremely similar

between concatenated and species-tree methods, withRF distances being only 7.7% of the maximum RF.Support among all nodes for all measures of support ishigh, with >90% of nodes having 95% support or higher.Although we infer a tree similar to those publishedpreviously (Pyron et al., 2013; Reeder et al. 2015; Streicherand Wiens, 2017), where the tree is largely structuredas (Unidentata(Episquamata(Toxicofera))), we note thatseveral key nodes remain poorly supported (Figs. 1and 2). For example, several deep relationships such asthe sister to Serpentes (Anguiformes or Iguania, or both)within Toxicofera remain uncertain.

Similarly, support is low for resolving relationshipsamong pleurodont families (the primarily New Worldiguanians), which is similar to results from previousmolecular and morphological attempts to understand-ing these relationships (Etheridge and Frost, 1989;Townsend et al., 2004; Reeder et al. 2015; Streicher et al.,2015). Comparable with Streicher and Wiens (2016), wefound excessively short branch lengths subtending rela-tionships among iguanian families, here ranging from0.3 to 1.6 my, which are in the lower 0.4–1.7% of shortestinternodes on our dated tree. It is likely, given theirrapid divergence in the Upper Cretaceous, that signalacross the genome to confidently estimate this area of thetree may remain difficult to extract (Rokas and Carroll,2008). We also find poor support for the placement ofCandoiidae within Booidea and Bolyeriidae in respectto Booidea and Pythonoidea, relationships also dating tothe Upper Cretaceous. However, dense taxon samplingwithin these relictual families is not possible. Finally,support for relationships among some of the familiesof the rapidly diverging and diverse Colubroidea andElapoidea is lacking, mirroring numerous previousstudies (Lawson et al., 2005; Zaher et al., 2009; Pyronet al., 2011). We are presently sampling Colubroideaand Elapoidea more densely to better estimate thoseinterfamilial relationships. As with all regions in theSquamate Tree of Life with poorly supported, short

Page 18: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 15 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 15

nodes, it is hopeful that imminent whole genome studiesmay be capable of increasing support in that particularregion of the tree, though strong signal for a particulararrangement may always remain elusive.

Our results provide a solid foundation for the originsof all groupings of Squamata, with major pulses ofdiversification occurring in the Jurassic, Cretaceous, andPaleogene (Figs. 1 and 2). These results generally agreewith previous studies using genetic and morphologicaldata (Mulcahy et al., 2012; Pyron and Burbrink, 2014; Har-rington and Reeder, 2017), though disparity in samplingloci and taxa make direct comparisons among studiesdifficult. For example, the root times for Squamata ina recent paper were found to be older, with crown-Squamata originating in the Late Triassic (∼206 Ma;Simões et al., 2018). However, Simões et al. (2018) expan-ded the concept of Squamata by including Megachirellaand Marmoretta from the Middle Triassic and EarlyJurassic, respectively; these two poorly preserved taxahave been identified previously as stem-lepidosaurs(Evans and Jones, 2010). Our results are concordant withdivergence dates for the origin of Squamata and forthe major divisions giving rise to higher-level groupswithin squamates given by Pyron (2017) and Jones et al.(2013), but we note many of our calibrations wheretaken from those studies. Unidentata, Episquamata, andToxicofera all arose in the Early to Middle Jurassic. Inaddition, within these groups, diversification producingthe primary taxonomic divisions for the following taxaalso occurred in the Jurassic: Scincomorpha, Laterata,Iguania, Cordylioidea, Pleurodonta, Acrodonta, and theroot of Dibamia and Gekkota.

Similar to recent combined fossil, morphological,and molecular studies (Jones et al., 2013; Pyron,2016; Simões et al., 2018), a large number of diversegroups subsequently originated in the Cretaceous,including the crown Serpentes and their major divi-sions into Typhlopoidea, Amerophidia, Alethinophidia,Afrophidia, and Caenophidia. Within these divisions,well-known and widely distributed groups such asPythonoidea, Booidea, and Uropeltoidea diversified aswell. Within lizards, we see origins and diversificationwithin Gekkota, Teiioidea, Pleurodonta, and Acro-donta. Although understanding how the K/Pg bound-ary affected rates of diversification within Squamatarequires additional information from the fossil record,it is clear that the groups originating in the Mesozoicrapidly diversified into all major families during theCenozoic, ultimately producing the 10,800 currentlyknown extant species. Most of these extant familiesof squamates diversified massively throughout thePaleogene and Neogene, underscoring the origins anddiversification of the many hyperdiverse families ofColubriformes.

CONCLUSION

We provide a robust, dated phylogenomic estimate ofphylogenetic relationships among Squamata samplingwidely across almost all major lizard and snake groups.

This research provides a solid framework for under-standing the relationships and dates of origins ofall extant squamates showing that most major famil-ies diversified prior to the K/Pg boundary. We alsoprovide a novel framework for interrogating genesusing artificial-intelligence techniques to understandhow particular loci differ from species-tree estimates.Importantly, we show that both ILS and poor treeestimation given properties of genes associated with thenumber of informative sites, may produce significantdiscordance among gene and species trees. All analysesfail to support the two traditional grouping of squamatesinto Scleroglossa and Macrostomata, but rather a con-sistent pattern grouping the squamates into Unidentata,Scincomorpha, Episquamata, Laterata, and Toxicofera.We also highlight areas of topological uncertainty amongparticular groups, such as Pleurodonta family and deeptoxicoferan relationships, that represent potential aven-ues of novel research using whole genomes and densertaxon sampling to properly infer these relationships.

SUPPLEMENTARY MATERIAL

Data available from the Dryad Digital Repository:AQ12AQ13

http://dx.doi.org/10.5061/dryad.6392n3s.

ACKNOWLEDGMENTS

We thank the following curators who kindly providedtissue samples for our study: G. Schneider (UMMZ), L.Densmore, L. Grismer, A. Bauer, T. Jackman, R. Brownand C. K. Onn (KU), C. Austin, R. Brumfield and D.Dittman (LSUMNS), J. Rosado (MCZ), M. Hagemann(BPBM), A. Wynn (USNM), J. Vindum (CAS), J. McGuireand C. Spencer (MVZ), D. Kizirian (AMNH), A. Resetar(FMNH), K. Krysko and T. Lott (UF), and D. Dittman(LSUMZ).

FUNDING

This research was supported by Fundação de AmparoAQ14à Pesquisa do Estado de São Paulo [grant number

BIOTA-FAPESP 2011/50206-9 to H.Z.], National ScienceFoundation [grant numbers DEB-1257926 to F.T.B., DEB-1441719 to R.A.P., DEB-1257610 to C.J.R.], and AustralianResearch Council Discovery [grant number DP120104146to J.S.K. and S.C.D.].

REFERENCES

Alencar L.R.V., Quental T.B., Grazziotin F.G., Alfaro M.L., Martins M.,Venzon M., Zaher H. 2016. Diversification in vipers: phylogeneticrelationships, time of divergence and shifts in speciation rates. Mol.Phylogenet. Evol. 105:50–62.

Ané C., Larget B., Baum D.A., Smith S.D., Rokas A. 2007. Bayesianestimation of concordance among gene trees. Mol. Biol. Evol. 24:412–426.

Page 19: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 16 1–19

16 SYSTEMATIC BIOLOGY

Arcila D., Ortí G., Vari R., Armbruster J.W., Stiassny M.L.J., Ko K.D.,Sabaj M.H., Lundberg J., Revell L.J., Betancur-R R. 2017. Genome-wide interrogation advances resolution of recalcitrant groups in thetree of life. Nat. Ecol. Evol. 1:20.

Barker D.G., Barker T.M., Davis M.A., Schuett G.W. 2015. A review ofthe systematics and taxonomy of Pythonidae: an ancient serpentlineage. Zool. J. Linn. Soc. 175:1–19.

Baum D.A. 2007. Concordance trees, concordance factors, and theexploration of reticulate genealogy. Taxon 56:417–426.

Brown J.M., Thomson R.C. 2016. Bayes factors unmask highly variableinformation content, bias, and extreme influence in phylogenomicanalyses. Syst. Biol. 66:517–530.

Burbrink F.T., Gehara M. 2018. The biogeography of deep timephylogenetic reticulation. Syst. Biol. 67:743–755.

Burbrink F.T., Lorch J.M.J.M., Lips K.R. 2017. Host susceptibility tosnake fungal disease is highly dispersed across phylogenetic andfunctional trait space. Sci. Adv. 3:e1701387.

Burbrink F.T., McKelvy A.D.A.D., Pyron R.A., Myers E.A. 2015.Predicting community structure in snakes on Eastern Nearcticislands using ecological neutral theory and phylogenetic methods.Proc. R. Soc. B Biol. Sci. 282:20151700.

Burbrink F.T., Ruane S., Kuhn A., Rabibisoa N., Randriamahatant-AQ15 soa B., Raselimanana A.P., Andrianarimalala M.S.M., Cadle J.E.,

Lemmon A.R., Lemmon E.M., Nussbaum R.A., Jones L., PearsonR.G., Raxworthy C.J. 2019. The origins and diversification ofthe exceptionally rich gemsnakes (Colubroidea: Lamprophiidae:Pseudoxyrhophiinae) in Madagascar. Syst. Biol. In Press.

Camp C.L. 1923. Classification of the lizards. Bull. Am. Mus. Nat. Hist.48:289–480.

Castoe T.A., de Koning A.P.J., Kim H.M., Gu W.J., Noonan B.P., NaylorG., Jiang Z.J., Parkinson C.L., Pollock D.D. 2009. Evidence for anancient adaptive episode of convergent molecular evolution. Proc.Natl. Acad. Sci. USA 106:8986–8991.

Chen M.-Y., Liang D., Zhang P. 2015. Selecting question-specific genesto reduce incongruence in phylogenomics: a case study of jawedvertebrate backbone phylogeny. Syst. Biol. 64:1104–1120.

Chernomor O., von Haeseler A., Minh B.Q. 2016. Terrace aware datastructure for phylogenomic inference from supermatrices. Syst. Biol.65:997–1008.

Colston T.J.., Costa G.C.., Vitt L.J. 2010. Snake diets and the deep historyhypothesis. Biol. J. Linn. Soc. 101:476–486.

Conrad J.L. 2008. Phylogeny and systematics of squamata (Reptilia)based on morphology. Bull. Am. Mus. Nat. Hist. 310:1–182.

Cundall D., Irish F.J. 2008. The snake skull. In: Gans C., Gaunt A.S.,Adler K., editors. Biology of the reptilia. Ithaca: Cornell UniversityPress. p. 349–692.

Cundall D., Wallach V., Rossman D.A. 1993. The systematic rela-tionships of the snake genus Anomochilus. Zool. J. Linn. Soc.109:275–299.

De Veaux R.D., Ungar L.H. 1994. Multicollinearity: a tale of twononparametric regressions. New York (NY): Springer. p. 393–402.

Dormann C.F., Elith J., Bacher S., Buchmann C., Carl G., Carré G.,Marquéz J.R.G., Gruber B., Lafourcade B., Leitão P.J., MünkemüllerT., McClean C., Osborne P.E., Reineking B., Schröder B., SkidmoreA.K., Zurell D., Lautenbach S. 2013. Collinearity: a review ofmethods to deal with it and a simulation study evaluating theirperformance. Ecography (Cop.). 36:27–46.

Dornburg A., Fisk J.N., Tamagnan J., Townsend J.P. 2016. PhyIn-formR: phylogenetic experimental design and phylogenomic dataexploration in R. BMC Evol. Biol. 16:262.

Duchêne D.A., Duchêne S., Ho S.Y.W. 2017. New statistical criteriadetect phylogenetic bias caused by compositional heterogeneity.Mol. Biol. Evol. 34:1529–1534.

Esquerré D., Scott Keogh J. 2016. Parallel selective pressures driveconvergent diversification of phenotypes in pythons and boas. Ecol.Lett. 19:800–809.

Esquerré D., Sherratt E., Keogh J.S. 2017. Evolution of extremeontogenetic allometric diversity and heterochrony in pythons, aclade of giant and dwarf snakes. Evolution (NY) 71:2829–2844.

Estes R., Pregill G.K., Camp C.L., Charles L. 1988. PhylogeneticAQ16 relationships of the lizard families: essays commemorating Charles

L. Camp. In: Estes R., Pregill G.K., editors. Phylogenetic rela-tionships of the lizard families: essays commemorating Charles

L. Camp. Stamford: Stanford University Press. p. 119–282. CampMemorial Symposium on the Phylogenetic Relationships of theLizard Families 1982: Knoxville T.

Etheridge R.E., Frost D.R. 1989. A phylogenetic analysis and taxonomyof iguanian lizards (Reptilia, Squamata). Univ. Kansas Mus. Nat.Hist. Misc. Publ. 81:1–76.

Evans S., Wang Y., Li C. 2005. The early Cretaceous Chinese lizard,Yabeinosaurus: resolving an enigma. J. Syst. Palaeontol. 3:319–335.

Evans S.E. 2003. At the feet of the dinosaurs: the early history andradiation of lizards. Biol. Rev. Camb. Philos. Soc. 78:513–51.

Evans S.E., Barbadillo L.J. 1998. An unusual lizard (Reptilia: Squamata)from the Early Cretaceous of Las Hoyas, Spain. Zool. J. Linn. Soc.124:235–265.

Evans S.E., Barbadillo L.J. 1999. A short-limbed lizard from the LowerCretaceous of Spain. Syst. Paleontol. 60:73–85.

Evans S.E., Jones M.E.H. 2010. The origin, early history and diversifica-tion of Lepidosauromorph reptiles. Berlin, Heidelberg: Springer. p.27–44.

Faraway J.J. 2002. Practical regression and anova using R. Bath:University of Bath.

Felsenstein J. 2004. Inferring phylogenies. Sunderland (MA): SinauerAssociates.

Foote A.D., Liu Y., Thomas G.W.C., Vinav T., Alföldi J., Deng J., DuganS., van Elk C.E., Hunter M.E., Joshi V., Khan Z., Kovar C., Lee S.L.,Lindblad-Toh K., Mancia A., Nielsen R., Qin X., Qu J., Raney B.J.,Vijay N., Wolf J.B.W., Hahn M.W., Muzny D.M., Worley K.C., GilbertM.T.P., Gibbs R.A. 2015. Convergent evolution of the genomes ofmarine mammals. Nat. Genet. 47:272–5.

Fry B.G. 2005. From genome to “venome”: molecular origin andAQ17evolution of the snake venom proteome inferred from phylogenetic

analysis of toxin sequences and related body proteins. Genome Res.15:403–420.

Fry B.G. 2015. Venomous reptiles and their toxins: evolution, patho-physiology, and biodiscovery. Oxford: Oxford University Press.

Gao K., Norell M. 1998. Taxonomic revision of Carusia (Reptilia,AQ18Squamata) from the late Cretaceous of the Gobi Desert and

phylogenetic relationships of anguimorphan lizards. AmericanMuseum novitates. 1–51.

Garland T., Bennett A.F., Rezende E.L. 2005. Phylogenetic approachesin comparative physiology. J. Exp. Biol. 208:3015–3035.

Gauthier J.A. 1998. Fossil xenosaurid and anguid lizards from the EarlyEocene Wasatch Formation, southeast Wyoming, and a revision ofthe Anguioidea. Rocky Mt. Geol. 21:7–54.

Gauthier J.A., Kearney M., Maisano J.A., Rieppel O., Behlke A.D.B.2012. Assembling the squamate tree of life: perspectives from thephenotype and the fossil record assembling the squamate tree oflife: perspectives from the phenotype and the fossil record. Bull.Peabody Mus. Nat. Hist. 53:3–308.

Green R.E., Krause J., Briggs A.W., Maricic T., Stenzel U., Kircher M.,Patterson N., Li H., Zhai W., Fritz M.H.-Y., Hansen N.F., Durand E.Y.,Malaspinas A.-S., Jensen J.D., Marques-Bonet T., Alkan C., PrüferK., Meyer M., Burbano H.A., Good J.M., Schultz R., Aximu-PetriA., Butthof A., Höber B., Höffner B., Siegemund M., Weihmann A.,Nusbaum C., Lander E.S., Russ C., Novod N., Affourtit J., EgholmM., Verna C., Rudan P., Brajkovic D., Kucan Ž., Gušic I., DoronichevV.B., Golovanova L. V, Lalueza-Fox C., de la Rasilla M., Fortea J.,Rosas A., Schmitz R.W., Johnson P.L.F., Eichler E.E., Falush D.,Birney E., Mullikin J.C., Slatkin M., Nielsen R., Kelso J., LachmannM., Reich D., Pääbo S. 2010. A draft sequence of the Neandertalgenome. Science 328:710–722.

Hallermann J. 1998. The ethmoidal region of Dibamus taylori(Squamata: Dibamidae), with a phylogenetic hypothesis on dib-amid relationships within Squamata. Zool. J. Linn. Soc. 122:385–426.

Hamilton C.A., Lemmon A.R., Lemmon E.M., Bond J.E. 2016. Expand-ing anchored hybrid enrichment to resolve both deep and shallowrelationships within the spider tree of life. BMC Evol. Biol.16:212.

Harrington S.M., Leavitt D.H., Reeder T.W. 2016. Squamate phylo-genetics, molecular branch lengths, and molecular apomorphies:a response to McMahan et al. Copeia 104:702–707.

Page 20: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 17 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 17

Harrington S.M., Reeder T.W. 2017. Phylogenetic inference and diver-gence dating of snakes using molecules, morphology and fossils:new insights into convergent evolution of feeding morphology andlimb reduction. Biol. J. Linn. Soc. 121:379–394.

Hastie T., Tibshirani R., Friedman J. 2009. The elements of statisticallearning. New York (NY): Springer New York.

Heath T.A., Hedtke S.M., Hillis D.M. 2008. Taxon sampling and theaccuracy of phylogenetic analyses. J. Syst. Evol. 46:239–257.

Hsiang A.Y., Field D.J., Webster T.H., Behlke A.D., Davis M.B., RacicotR.A., Gauthier J.A. 2015. The origin of snakes: revealing the ecology,behavior, and evolutionary history of early snakes using genomics,phenomics, and the fossil record. BMC Evol. Biol. 15:87.

Huson D.H., Klöpper T., Lockhart P.J., Steel M.A. 2005. Reconstructionof reticulate networks from gene trees. Springer, Berlin, Heidelberg.p. 233–249.

Jones M.E., Anderson C., Hipsley C.A., Müller J., Evans S.E., SchochR.R. 2013. Integration of molecules and new fossils supports aTriassic origin for Lepidosauria (lizards, snakes, and tuatara). BMCEvol. Biol. 13:208.

Kalyaanamoorthy S., Minh B.Q., Wong T.K.F., von Haeseler A.,Jermiin L.S. 2017. ModelFinder: fast model selection for accuratephylogenetic estimates. Nat. Methods. 14:587–589.

Katoh K., Standley D.M. 2013. MAFFT multiple sequence alignmentsoftware version 7: improvements in performance and usability.Mol. Biol. Evol. 30:772–80.

Kearse M., Moir R., Wilson A., Stones-Havas S., Cheung M., SturrockS., Buxton S., Cooper A., Markowitz S., Duran C., Thierer T., AshtonB., Meintjes P., Drummond A. 2012. Geneious basic: an integratedand extendable desktop software platform for the organization andanalysis of sequence data. Bioinformatics 28:1647–1649.

Kelly C.M.R., Barker N.P., Villet M.H., Broadley D.G. 2009. Phylogeny,biogeography and classification of the snake superfamily Elapoidea:a rapid radiation in the late Eocene. Cladistics Int. J. Willi HennigSoc. 25:38–63.

Kuhn M. 2008. Caret:Classification and regression training package. Rpackage version 6.0-77. 28.

Lawson R., Slowinski J.B., Crother B.I., Burbrink F.T. 2005. Phylogenyof the Colubroidea (Serpentes): new evidence from mitochondrialand nuclear genes. Mol. Phylogenet. Evol. 37:581–601.

Lee M.S.Y. 1998. Convergent evolution and character correlation inburrowing reptiles: towards a resolution of squamate relationships.Biol. J. Linn. Soc. 65:369–453.

Lee M.S.Y., Scanlon J.D. 2002. Snake phylogeny based on osteology,soft anatomy and ecology. Biol. Rev. 77:333–401.

Lek S., Delacoste M., Baran P., Dimopoulos I., Lauga J., AulagnierS. 1996. Application of neural networks to modelling nonlinearrelationships in ecology. Ecol. Modell. 90:39–52.

Lemmon A.R., Emme S.A., Lemmon E.M. 2012. Anchored hybridenrichment for massively high-throughput phylogenomics. Syst.Biol. 61:727–744.

Libbrecht M.W., Noble W.S. 2015. Machine learning applications ingenetics and genomics. Nat. Rev. Genet. 16:321–332.

Liu L., Zhang J., Rheindt F.E., Lei F., Qu Y., Wang Y., Zhang Y., SullivanC., Nie W., Wang J., Yang F., Chen J., Edwards S. V, Meng J., Wu S.2017. Genomic evidence reveals a radiation of placental mammalsuninterrupted by the KPg boundary. Proc. Natl. Acad. Sci. USA114:E7282–E7290.

López-Giráldez F., Moeller A.H., Townsend J.P. 2013. Evaluatingphylogenetic informativeness as a predictor of phylogenetic signalfor metazoan, fungal, and mammalian phylogenomic data sets.Biomed. Res. Int. 2013:621604.

Lopez-Giraldez F., Townsend J.P. 2011. PhyDesign: a webapp forprofiling phylogenetic informativeness. BMC Evol. Biol. 11:152.

Losos J., Hillis D., Greene H. 2012. Evolution. Who speaks with a forkedtongue? Science 338:1428–1429.

Martin S.H., Davey J.W., Jiggins C.D. 2015. Evaluating the use of ABBA–BABA statistics to locate introgressed loci. Mol. Biol. Evol. 32:244–257.

McDowell S.B., Bogert C.M. 1954. The systematic position of Lanthan-otus and the affinities of the anguinomorphan lizards. Bull. Am.Mus. Nat. Hist. 105:1–142.

McLoughlin S. 2001. The breakup history of Gondwana and its impacton pre-Cenozoic floristic provincialism. Aust. J. Bot. 49:271.

Minh B.Q., Hahn M.W., Lanfear R. 2018. New methods to cal-AQ19culate concordance factors for phylogenomic datasets. bioRxiv

487801.Minh B.Q., Nguyen M.A.T., von Haeseler A. 2013. Ultrafast approxim-

ation for phylogenetic bootstrap. Mol. Biol. Evol. 30:1188–1195.Mirarab S., Warnow T. 2015. ASTRAL-II: coalescent-based species tree

estimation with many hundreds of taxa and thousands of genes.Bioinformatics 31:i44–i52.

Montgomer D.E., Peck E.A., Vining G.G. 2012. Introduction to linearregression analysis. Hoboken (NJ): Wiley.

Mulcahy D.G., Noonan B.P., Moss T., Townsend T.M., Reeder T.W., SitesJ.W., Wiens J.J. 2012. Estimating divergence dates and evaluatingdating methods using phylogenomic and mitochondrial data insquamate reptiles. Mol. Phylogenet. Evol. 65:974–991.

Nagy L.G., Kocsubé S., Csanádi Z., Kovács G.M., Petkovits T.,Vágvölgyi C., Papp T. 2012. Re-mind the gap! Insertion—deletiondata reveal neglected phylogenetic potential of the nuclearribosomal internal transcribed spacer (ITS) of fungi. PLoS One7:e49794.

Nguyen L.-T., Schmidt H.A., von Haeseler A., Minh B.Q. 2015. IQ-TREE:a fast and effective stochastic algorithm for estimating Maximum-Likelihood phylogenies. Mol. Biol. Evol. 32:268–274.

O’Connor D.E., Shine R. 2004. Parental care protects against infanticidein the lizard Egernia saxatilis (Scincidae). Anim. Behav. 68:1361–1369.

Oppel M. 1811. Die ordnungen, familien, und gattungen der reptilien,als prodrom einer naturgeschichte derselben. Munich: JosephLindauer.

Paradis E., Claude J., Strimmer K. 2004. APE: analyses of phylogeneticsand evolution in R language. Bioinformatics 20:289–290.

Parker J., Tsagkogeorga G., Cotton J.A., Liu Y., Provero P., Stupka E.,Rossiter S.J. 2013. Genome-wide signatures of convergent evolutionin echolocating mammals. Nature 502:228–31.

Philippe H., Brinkmann H., Lavrov D. V., Littlewood D.T.J., ManuelM., Wörheide G., Baurain D. 2011. Resolving difficult phylogen-etic questions: why more sequences are not enough. PLoS Biol.9:e1000602.

Philippe H., Chenuil A., Adoutte A. 1994. Can the Cambrian explosionbe inferred through molecular phylogeny? Development 1994:15–25.

Projecto-Garcia J., Natarajan C., Moriyama H., Weber R.E., Fago A.,Cheviron Z.A., Dudley R., McGuire J.A., Witt C.C., Storz J.F. 2013.Repeated elevational transitions in hemoglobin function during theevolution of Andean hummingbirds. Proc. Natl. Acad. Sci. USA110:20669–74.

Prum R.O., Berv J.S., Dornburg A., Field D.J., Townsend J.P., LemmonE.M., Lemmon A.R. 2015. A comprehensive phylogeny of birds(Aves) using targeted next-generation DNA sequencing. Nature526:569–573.

Pyron R.A. 2014. Temperate extinction in squamate reptiles andthe roots of latitudinal diversity gradients. Glob. Ecol. Biogeogr.23:1126–1134.

Pyron R.A. 2016. Novel approaches for phylogenetic inference frommorphological data and total-evidence dating in squamate reptiles(lizards, snakes, and amphisbaenians). Syst. Biol. 66:syw068.

Pyron R.A., Burbrink F.T. 2012. Extinction, ecological opportunity, andthe origins of global snake diversity. Evolution (NY) 66:163–178.

Pyron R.A., Burbrink F.T. 2014. Early origin of viviparity and multiplereversions to oviparity in squamate reptiles. Ecol. Lett. 17:13–21.

Pyron R.A., Burbrink F.T., Colli G.R., Montes de Oca A.N., Vitt L.J.J.,Kuczynski C.A.A., Wiens J.J., de Oca A.N.M. 2011. The phylogeny ofadvanced snakes (Colubroidea), with discovery of a new subfamilyand comparison of support methods for likelihood trees. Mol.Phylogenet. Evol. 58:329–342.

Pyron R.A., Burbrink F.T., Wiens J.J. 2013. A phylogeny and revisedclassification of Squamata, including 4161 species of lizards andsnakes. BMC Evol. Biol. 13:93.

Pyron R.A., Hendry C.R., Chou V.M.V.M., Lemmon E.M., LemmonA.R., Burbrink F.T. 2014. Effectiveness of phylogenomic data andcoalescent species-tree methods for resolving difficult nodes inthe phylogeny of advanced snakes (Serpentes: Caenophidia). Mol.Phylogenet. Evol. 81:221–231.

Pyron R.A., Hsieh F.W., Lemmon A.R., Lemmon E.M., Hendry C.R.2016. Integrating phylogenomic and morphological data to assess

Page 21: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 18 1–19

18 SYSTEMATIC BIOLOGY

candidate species-delimitation models in brown and red-belliedsnakes (Storeria ). Zool. J. Linn. Soc. 177:937–949.

R Core Team. 2015. R: a language and environment for statisticalAQ20 computing.

Reeder T.W., Townsend T.M., Mulcahy D.G., Noonan B.P., Wood P.L.,AQ4 Sites J.W., Wiens J.J. 2015a. Integrated analyses resolve conflicts over

squamate reptile phylogeny and reveal unexpected placements forfossil taxa. PLoS One 10:e0118199.

Rhodin A.G.J., Kaiser H., van Dijk P.P., Wüster W., O Shea M., ArcherM., Auliya M., Boitani L., Bour R., Clausnitzer V., Contreras-MacBeath T., Crother B.I., Daza J.M., Driscoll C.A., Flores-VillelaO., Frazier J., Fritz U., Gardner A.L., Gascon C., Georges A., GlawF., Grazziotin F.G., Groves C.P., Haszprunar G., Hava P., HeroJ.-M., Hoffmann M., Hoogmoed M.S., Horne B.D., Iverson J.B.,Jäch M., Jenkins C.L., Jenkins R.K.B., Kiester A.R., Keogh J.S.,Lacher Jr. T.E., Lovich J.E., Luiselli L., Mahler D.L., Mallon D.P.,Mast R., McDiarmid R.W., Measey J., Mittermeier R.A., MolurS., Mosbrugger V., Murphy R.W., Naish D., Niekisch M., OtaH., Parham J.F., Parr M.J., Pilcher N.J., Pine R.H., Rylands A.B.,Sanderson J.G., Savage J.M., Schleip W., Scrocchi G.J., Shaffer H.B.,Smith E.N., Sprackland R., Stuart S.N., Vetter H., Vitt L.J., Waller T.,Webb G., Wilson E.O., Zaher H., Thomson S. 2015b. Comment onSpracklandus Hoser, 2009 (Reptilia, Serpentes, ELAPIDAE): requestfor confirmation of availability of the generic name and for thenomenclatural validation of the journal in which it was published(Case 3601; BZN 70: 234–237; 71: 30–38, 133–135,181–182, 252–253).

Ricklefs R.E., Losos J.B., Townsend T.M. 2007. Evolutionary diversific-ation of clades of squamate reptiles. J. Evol. Biol. 20:1751–1762.

Rieppel O. 1988. A review of the origin of snakes. Evolutionary biology.Boston (MA): Springer US. p. 37–130.

Rieppel O. 2012. “Regressed” macrostomatan snakes. Fieldiana LifeEarth Sci. 5:99–103.

Rieppel O., Zaher H. 2000. The iIntramandibular joint in squamates,and the phylogenetic relationships of the fossil snake Pachyrhachisproblematicus Haas. Fieldiana Geol. 43:1–69.

Rieppel O., Zaher H., Tchernov E., Polcyn M.J. 2003. The anatomyand relationships of Haasiophis terrasanctus, a fossil snake with well-developed hind limbs from the Mid-Cretaceous of the Middle East.J. Paleontol. 77:536–558.

Robinson D.F., Foulds L.R. 1981. Comparison of phylogenetic trees.Math. Biosci. 131–147.

Rodriguez-Robles J.A., Bell C.J., Greene H.W. 1999. Gape size andevolution of diet in snakes: feeding ecology of erycine boas. J. Zool.248:49–58.

Rokas A., Carroll S.B. 2008. Frequent and widespread parallel evolutionof protein sequences. Mol. Biol. Evol. 25:1943–1953.

Rokyta D.R., Lemmon A.R., Margres M.J., Aronow K. 2012. Thevenom-gland transcriptome of the eastern diamondback rattlesnake(Crotalus adamanteus). BMC Genomics 13:312.

Rosenberg M.S., Kumar S. 2003. Heterogeneity of nucleotide frequen-cies among evolutionary lineages and phylogenetic inference. Mol.Biol. Evol. 20:610–621.

Ruane S., Raxworthy C.J.C.J., Lemmon A.R., Lemmon E.M., BurbrinkF.T. 2015. Comparing species tree estimation with large anchoredphylogenomic and small Sanger-sequenced molecular datasets: anempirical study on Malagasy pseudoxyrhophiine snakes. BMCEvol. Biol. 15:221.

Saint K.M., Austin C.C., Donnellan S.C., Hutchinson M.N. 1998. C-mos, a nuclear marker useful for squamate phylogenetic analysis.Mol. Phylogenet. Evol. 10:259–263.

Sanderson M.J. 2002. Estimating absolute rates of molecular evolutionand divergence times: a penalized likelihood approach. Mol. Biol.Evol. 19:101–109.

Savage J.M. 2015. What are the correct family names for the taxa thatinclude the snake general Xenodermus, Pareas, and Calamaria?Herpetol. Rev. 46:664–665.

Scanlon J.D. 2006. Skull of the large non-macrostomatan snakeYurlunggur from the Australian Oligo-Miocene. Nature 439:839–842.

Shan Y., Paull D., McKay R.I. 2006. Machine learning of poorlypredictable ecological data. Ecol. Modell. 195:129–138.

Sheehan S., Song Y.S., Buzbas E., Petrov D., Boyko A., Auton A. 2016.Deep learning for population genetic inference. PLOS Comput. Biol.12:e1004845.

Shimodaira H., Hasegawa M. 1999. Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol. Biol.Evol. 16:1114–1116.

Siegel D.S., Miralles A., Aldridge R.D. 2011. Controversial snakerelationships supported by reproductive anatomy. J. Anat. 218:342–348.

Simões T.R., Caldwell M.W., Tałanda M., Bernardi M., Palci A.,Vernygora O., Bernardini F., Mancini L., Nydam R.L. 2018. Theorigin of squamates revealed by a Middle Triassic lizard from theItalian Alps. Nature 557:706–709.

Smith S.A., Brown J.W., Yang Y., Bruenn R., Drummond C.P.,Brockington S.F., Walker J.F., Last N., Douglas N.A., Moore M.J.2018. Disparity, diversity, and duplications in the Caryophyllales.New Phytol. 217:836–854.

Smith S.A., O’Meara B.C. 2012. treePL: divergence time estimationusing penalized likelihood for large phylogenies. Bioinformatics28:2689–2690.

Som A. 2015. Causes, consequences and solutions of phylogeneticincongruence. Brief. Bioinform. 16:536–548.

Stamatakis A. 2014. RAxML version 8: a tool for phylogenetic analysisand post-analysis of large phylogenies. Bioinformatics 30:1312–1313.

Streicher J.W., Schulte J.A., Wiens J.J. 2015. How should genes andtaxa be sampled for phylogenomic analyses with missing data? Anempirical study in iguanian lizards. Syst. Biol. 65:128–45.

Streicher J.W., Wiens J.J. 2016. Phylogenomic analyses reveal novelrelationships among snake families. Mol. Phylogenet. Evol. 100:160–169.

Streicher J.W., Wiens J.J. 2017. Phylogenomic analyses of more than 4000nuclear loci resolve the origin of snakes among lizard families. Biol.Lett. 13:20170393.

Townsend J.T. 1971. Theoretical analysis of an alphabetic confusionmatrix. Percept. Psychophys. 9:40–50.

Townsend T.M., Larson A., Louis E., Macey J.R. 2004. Molecularphylogenetics of squamata: the position of snakes, amphisbaenians,and dibamids, and the root of the squamate tree. Syst. Biol.53:735–757.

Tucker D.B., Colli G.R., Giugliano L.G., Hedges S.B., Hendry C.R.,Lemmon E.M., Lemmon A.R., Sites J.W., Pyron R.A. 2016. Method-ological congruence in phylogenomic analyses with morphologicalsupport for teiid lizards (Sauria: Teiidae). Mol. Phylogenet. Evol.103:75–84.

Uetz P., Freed P., Hošek J. 2009. The reptile database. Available from:http://www.reptile-database.org.

Underwood G. 1967. A contribution to the classification of snakes.London: British Museum.

Vidal N., Hedges S.B. 2004. Molecular evidence for a terrestrial originof snakes. Proc. R. Soc. London Ser. B-Biological Sci. 271:S226–S229.

Vidal N., Hedges S.B. 2005. The phylogeny of squamate reptiles(lizards, snakes, and amphisbaenians) inferred from nine nuclearprotein-coding genes. C. R. Biol. 328:1000–1008.

Vidal N., Hedges S.B. 2009. The molecular evolutionary tree of lizards,snakes, and amphisbaenians. C. R. Biol. 332:129–139.

Vidal N., Marin J., Morini M., Donnellan S., Branch W.R., ThomasR., Vences M., Wynn A., Cruaud C., Hedges S.B. 2010. Blindsnakeevolutionary tree reveals long history on Gondwana. Biol. Lett.6:558–561.

Vitt L.J., Caldwell J.P. 2009. Herpetology. Burlington (MA): Elsevier.Vitt L.J., Pianka E.R. 2005. Deep history impacts present-day ecology

and biodiversity. Proc. Natl. Acad. Sci. USA 102:7877–7881.Wiens J.J., Hutter C.R., Mulcahy D.G., Noonan B.P., Townsend T.M.,

Sites J.W., Reeder T.W. 2012. Resolving the phylogeny of lizards andsnakes (Squamata) with extensive sampling of genes and species.Biol. Lett. 8:1043–1046.

Wiens J.J., Kuczynski C.A., Townsend T., Reeder T.W., Mulcahy D.G.,Sites J.W. 2010. Combining phylogenomics and fossils in higher levelsquamate reptile phylogeny: molecular data change the placementof fossil taxa. Syst. Biol. 59:674–688.

Wilson J.A., Mohabey D.M., Peters S.E., Head J.J. 2010. Predation uponhatchling dinosaurs by a new snake from the Late Cretaceous ofIndia. PLoS Biol. 8:e1000322.

Wortley A.H., Rudall P.J., Harris D.J., Scotland R.W. 2005. How muchdata are needed to resolve a difficult phylogeny? Case study inLamiales. Syst. Biol. 54:697–709.

Page 22: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 19 1–19

2019 BURBRINK ET AL.—GENOMIC RELATIONSHIPS OF SQUAMATES 19

Xu B., Yang Z. 2016. Challenges in species tree estimation under themultispecies coalescent model. Genetics 204:1353–1368.

Zaher H. 1998. The phylogenetic position of Pachyrhachis within snakes(Squamata, Lepidosauria). J. Vertebr. Paleontol. 18:1–3.

Zaher H., de Oliveira L., Grazziotin F.G., Campagner M., Jared C.,Antoniazzi M.M., Prudente A.L. 2014. Consuming viscous prey: anovel protein-secreting delivery system in neotropical snail-eatingsnakes. BMC Evol. Biol. 14:58.

Zaher H., Grazziotin F.G., Cadle J.E., Murphy R.W., de MouraJ.C., Bonatto S.L. 2009. Molecular phylogeny of advanced snakes(Serpentes, Caenophidia) with an emphasis on South Americanxenodontines: a revised classification and descriptions of new taxa.Pap. Avulsos Zool. 49:115–153.

Zaher H., Scanferla C.A. 2012. The skull of the Upper Creta-ceous snake Dinilysia patagonica Smith-Woodward, 1901, andits phylogenetic position revisited. Zool. J. Linn. Soc. 164:194–238.

Zaher H., Yánez-Muñoz M.H., Rodrigues M.T., Graboski R., MachadoF.A., Altamirano-Benavides M., Bonatto S.L., Grazziotin F.G. 2018.Origin and hidden diversity within the poorly known Galápagossnake radiation (Serpentes: Dipsadidae). Syst. Biodivers. 16:614–642.

Zhang W. 2010. Computational ecology: artificial neural networks andAQ21their applications. World Scientific.

Zou Z., Zhang J. 2016. Morphological and molecular convergences inmammalian phylogenetics. Nat. Commun. 7:12758.

Page 23: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 20 1–19

Elapoidea incertae sedis

Quate

rnary

Neogene

Pale

ogene

Cre

tace

ous

Jura

ssic

Tria

ssic

Colubridae

GrayiidaeCalamariidaeSibynophiidae

Dipsadidae

Pseudoxenodontidae

Natricidae

Psammophiidae

Atractaspididae

Pseudoxyrhophiidae

Atractaspididae

Elapidae

Cyclocoridae

Lamprophiidae

*

Homalopsidae

Viperidae

PareidaeXenodermidaeAcrochordidaeUropeltidaeCylindrophiidae

Boidae

Candoiidae

Erycidae

UngaliophiidaeCharinidaeSanziniidaeCalabariidae

Pythonidae

LoxocemidaeXenopeltidae

BolyeriidaeAniliidaeTropidophiidae

Anomalepididae

Typhlopidae

Gerrhopilidae

Leptotyphlopidae

VaranidaeShinisauridaeXenosauridaeAnguidaeAnniellidae

DiploglossidaeHelodermatidae

PolychrotidaeLeiosauridaeOpluridaeHoplocercidaeLiolaemidae

CrotaphytidaeCorytophanidae

LeiocephalidaeTropiduridae

DactyloidaeIguanidaePhrynosomatidae

Agamidae

Chamaeleonidae

Teiidae

GymnophthalmidaeLacertidaeAmphisbaenidae

TrogonophiidaeBipedidaeXantusiidaeCordylidae

Gerrhosauridae

Scincidae

Sphaerodactylidae

GekkonidaePygopodidae

CarphodactylidaeDiplodactylidaeEublepharidaeDibamidae

SphenodontidaeRhyncocephalia

Squamata

Dibamia

Elapoidea

Colubroidea

Endoglyptodonta

ColubriformesColubroides

Caenophidia

Afrophidia

Amerophidia

Alethinophidia

Pythonoidea

Booidea

Uropeltoidea

Typhlopoidea

Serpentes

Pleurodonta

Acrodonta

Iguania

Anguiformes

Neoanguimorpha

Paleoanguimorpha

Anguioidea

Teiioidea

Lacertibaenia

Laterata

Toxicofera

Episquamata

Unidentata

Scincomorpha

Cordyloidea

Scincoidea

Gekkota

Amphisbaenia

Amphisbaenoidea

Page 24: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 21 1–19

Colubridae

Grayiidae Calamariidae

Sibynophiidae Dipsadidae Pseudoxenodontidae

Natricidae Psammophiidae

Atracta

spidi

dae

Pse

udox

yrhop

hiida

e

Lampr

ophii

dae

Elap

idae

Ela

poid

ea in

certa

e se

dis

Cyc

loco

ridae

Hom

alop

sida

e

Vip

erid

ae

Par

eida

e

Xen

oder

mid

ae

eadidrohcorcA

Uropeltidae

Cylindrophiidae

Boidae

Candoiidae

Erycidae

Ungaliophiidae

Charinidae

Sanziniidae

Calabariidae

Pythonidae

Loxocemidae Xenopeltidae Bolyeriidae Aniliidae Tropidophiidae Anomalepididae

Typhlopidae Gerrhopilidae

Leptotyphlopidae

Varanidae

Shinisauridae

Xenosauridae

Anguidae

Anniellidae

Diploglossidae

Helodermatidae

Polychrotidae

Leiosaurid

ae

Oplurid

ae

Hoploc

ercid

ae

Liolae

mida

e Cr

otap

hytid

ae

Cory

toph

anid

ae

Leio

ceph

alid

ae

Trop

idur

idae

D

acty

loid

ae

Igua

nida

e

Phr

ynos

omat

idae

eadi

mag

A Cham

aeleonidae Teiidae

Gym

nophthalmidae

Lacertidae Am

phisbaenidae Trogonophiidae

Bipedidae Xantusiidae

Cordylidae

Gerrhosauridae

Scincidae

Sphaerodactylidae

Gekkonidae

Pygopodidae

Carphodactylidae

Diplodactylidae

Eublepharidae

Dibamidae

SphenodontidaeTriassic

Serpentes

Anguiformes

Iguania

Laterata

Scincomorpha

Gekkota

Toxicofera

Episquamata

Squamata

Unidentata

Dibamia

PP Support <0.95

Page 25: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 22 1–19

0

25

50

75

100

0 25 50 75 100

0.4

0.6

0.8

1.0

QuadripartionSupport

Amerophidia

Toxicofera

Serpentes

Iguania

Anguiformes

LaterataScincomorpha

Gekkota

Gekkota/Dibamia

Unidentata

Episquamata

0 1 2 3 4 5

020

4060

8010

0

Branch Length (coalescent units)

Con

cord

ance

Fac

tors

Gene Concordance Factor

Site Concordance Factor

Gene Concordance Factors

Site

Con

cord

ance

Fac

tors

Page 26: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 23 1–19

0 100 200 300

0

50

100

150

200

250

−300 −200 −100

Node Date

Supports

Species

Tree

Does Not

Support

Species

Tree

SquamataUnidentata

EpisquamataScincomorphaToxicoferaLaterata

Iguania

AnguiformesSerpentes

Amerophidia

Colubroides

Dibamia/Gekkota

Bolyeridae(Xenopeltidae(Loxocemida,Pythonidae))

Candoiidae,Boidae

Pleurodonta Relationships

Page 27: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 24 1–19

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

30

Den

sity

0.5 0.6 0.7 0.8 0.9

02

46

810

Den

sity 0

0.5 0.6 0.7 0.8 0.9

02

46

8D

ensi

ty

0.5 0.6 0.7 0.8 0.9

02

46

810

Den

sity

Accuracy

Dibamia/Gekkota

Toxicofera

Amerophidia

RFst-gt

Random Real

Page 28: Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

Copyedited by: AV MANUSCRIPT CATEGORY: Systematic Biology

[18:22 24/9/2019 Sysbio-OP-SYSB190063.tex] Page: 25 1–19

−1.0 −0.5 0.0 0.5 1.0

−0.78 −0.67

0.0 0.5 1.0 1.5

0.0

0.2

0.4

0.6

N 370

−1.0 −0.5 0.0 0.5 1.0

median = −0.70

95% HDI

−0.76 −0.64

−100 0 100 200 300 400 500

0.0

0.2

0.4

0.6

N = 370

−1.0 −0.5 0.0 0.5 1.0

−0.85 −0.78

200 400 600 800 1000 1200 1400

0.0

0.2

0.4

0.6

RF

Dis

tanc

e G

T-S

T

N = 370

−1.0 −0.5 0.0 0.5 1.0

−0.82 −0.73

0 200 400 600

0.0

0.2

0.4

0.6

N = 370

Parsimony Informative SitesPhylogenetic Informativeness

RF

Dis

tanc

e G

T-S

T

RF

Dis

tanc

e G

T-S

T

RF

Dis

tanc

e G

T-S

T

Sites w 3 bases

Sites with 4 Bases

95% HDI

95% HDI

% HDI95

median = -0.78

ρ ρ

ρ

ρ

0

20

40

60

80

100

# S

se s

ites

w g

aps

# ga

ps

Gap

Siz

e S

D

Max

gap

siz

e

Num

ber

Tax

a

Fre

quen

cy b

ase

A

Num

ber

of s

ites

SD

Site

Con

c fa

ctor

s

Mea

n si

te c

onc

fact

ors

Site

s w

1 b

ase

ML

Diff

b/w

ST

& G

T

Pa

rs in

f si

tes

Ph

ylo

ge

n I

nf

Site

s w

4 b

ase

s

Site

s w

3 b

ase

s

Site

s >

0 s

ub

st

Me

an

ga

p s

ize

Site

s w

2 b

ase

Fre

quen

cy b

ase

T

Var

iabl

e Im

port

nace

(Fre

q)

−1.0 −0.5 0.0 0.5 1.0

0.63 0.75

0.10 0.15 0.20 0.25 0.30 0.35

0.0

0.2

0.4

0.6

median = -0.82 median = -0.73

median = 0.69

SD Site Concordance Factors

RF

Dis

tanc

e G

T-S

T

ρ

a)

b)