PoSeiDon - rna.uni-jena.de · Contact Infection Mx1 specific binding No binding Species 1 Species...

1
Infection Mx1 specific binding No binding Species 1 Species 2 Species 3 Species 4 Mx1 Virus Fig. 2 Exemplarily shown is an 'arms race' between the host Mx1 gene and a virus that results in high selection pressure on the host to evolve a defence against the pathogen. The virus itself establishes countermeasures to evade the host immune system 2 . Schematically shown is the hypervariable loop region of Mx1. When positive selection has occured, the ratio between the non-synonymous (dN) and synonymous (dS) substitution rate became disturbed. In that way, certain amino acid changes are favored if they increase the hosts fitness, for example against an infection. The dN/dS (ω) ratio may reach values greater than 1 and we call such sites positively selected 1 . The detection of such sites allows researchers to gain insights into the evolution of a gene and might also help to develop counter- measurements against pathogens (Fig. 2). Positive Selection Example: Positive Selected Sites in Bat Mx1 Region % sites with ω 1 avg(ω) M8 BEB (PP 0.95/0.99) Mx1, F3x4, 13 bat species full (aa 1-649) aa 1-90 aa 91-183 aa 184-649 101.39 24.05 0.19 112.69 0.001 0.908 0.001 0.001 6.26 21.3 NA 6.59 3.45 2.76 NA 3.83 205; 209; 361; 439; 443; 494; 549; 562; 569; 570; 572; 573; 574; 575; 578; 581 16; 17; 19; 22; 23; 25; 26; 27; 31; 38; 40; 44; 46 none 205; 209; 361; 436; 439; 443; 494; 549; 562; 569; 570; 572; 573; 574; 575; 578; 581 M7 vs M8 -2(ln λ) M7 vs M8 p-value Tab. 1 Results of the evolutionary analyses for positive selected sites in bat Mx1, exemplarily shown for codon frequency F3x4 and paired NS site models M7 and M8 disallowing and allowing for positive selection, respectively. Using PoSeidon, we were able to identify the loop L4 of Mx1 as a hot spot for positive selection in bats 2 , as previously also shown for primates 4 . By splitting the alignment by possible recombination events identified with the pipeline, we also found high evidence of positive selection in the N-terminal region of bat Mx1 (Table 1, fragment aa 1-90). References [1] Yang, Ziheng. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24.8 (2007): 1586-1591. [2] Fuchs, Jonas, et al. Evolution and antiviral specificity of interferon-induced Mx proteins of bats against Ebola-, Influenza-, and other RNA viruses. Submitted. [3] Pond, Sergei L. Kosakovsky, et al. GARD: a genetic algorithm for recombination detection. Bioinformatics 22.24 (2006): 3096-3098. [4] Mitchell, Patrick S., et al. Evolution-guided identification of antiviral specificity determinants in the broadly acting interferon-induced innate immunity factor MxA. Cell host microbe 12.4 (2012): 598-604. PoSeiDon combines the following software: TranslatorX (v1.1), Abascal et al. (2010); 20435676 (PMID) Muscle (v3.8.31), Edgar (2004); 15034147 RAxML (v8.0.25), Stamatakis (2014); 24451623 Newick Utilities (v1.6), Junier and Zdobnov (2010); 20472542 MODELTEST, Posada and Crandall (1998); 9918953 HyPhy (v2.2), Pond et al. (2005); 15509596 GARD , Pond et al. (2006); 17110367 PaML4/CodeML (v4.8), Yang (2007); 17483113 Inkscape (v0.48.5) Ruby (v2.3.1) Fig. 4 Workflow of the PoSeiDon pipeline and example output. The PoSeiDon pipeline comprises in-frame alignment of homologous protein-coding sequences, detection of putative recombination events and evolutionary breakpoints, phylogenetic reconstructions and detection of positively selected sites in the full alignment and all possible fragments. Finally, all results are combined and visualized in a user-friendly and clear HTML web page. The resulting alignment fragments are indicated with colored bars in the HTML output. User Input Alignment Recombination Tree Positive Selection PoSeiDon seq1 GTTATGAAG... seq2 GTACTGAAA... FASTA 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 Fragment 1 578 579 580 GAA TCA GAA CAA CAG - - - - - - - - - - - - AAA AGG AAA TCC ACC TTG GTG ACT TCT GAA AGC AGC CAG CGA AAG ATC GAA TTA GAA GAA AAC - - - - - - - - - - - - AAG AAG AAG TCC GTC TTT GCG CTT TCT GAA AAC AAT CAG AGA ATG ATC GAA TCC AAA GAG CAG - - - - - - - - - AAG GGG AGT TCT CGC GAG CAG ACG TCC TCT CTG GAG GAT CAG CGA AAG ATC GAA TCC AAA GAG CAG - - - - - - - - - AAG GGG AGT TCT CGC GAG CAG ACG TCC TCT CTG GAG GAT CAG CGA AAG ATC GAA TCG GAA GAG AAG AAG GGT TGT TCG CGC CAG CAG AAG GAG CAG AAT TTC TAT CAG GAG GAT CAG CGA AAG ATC GAA GCC GAA GAG AAT - - - - - - - - - AAG AAG AAG AAG AAG GAG CAT ATT TTC TTT GAA GAG GAC GGA CGA AAG ATC GAA AAA GAG AAG GAA - - - GAA GAA AGG AAG AGA ACA TTA GGT CGG GCG ATC TGC GAA GAG AGT CGG AGG AAA ATC GAG AAT GAA GAA CAA AAC AAG AAT AAA TCA AGA GTT TTG GAC CTT GTA CAG AGT - - - - - - TCT CAG AGG AAA GTC GAG AAG GAG AAG GAA - - - - - - GAA GAA ATG AAG AAG AAA TTT AAT TGT TTG AAC CTT CAA CAG CAG AGG AAA ATC 581 582 0.961 0.986 1.000 0.995 0.985 0.991 0.998 0.966 0.932 1.000 0.998 0.989 0.992 0.991 0.985 0.963 1.000 0.996 0.986 0.977 0.977 0.984 0.978 0.971 0.950 0.686 0.832 0.534 0.655 0.857 0.566 0.916 0.910 0.538 0.722 0.823 0.934 0.883 0.762 0.665 F3x4 F1x4 F61 Fragment 2 D E CMK Y F T S R NQG L V P H W A I Resulting fragments Model test KH test GARD RAxML Newick Utilities Output seq1 GTT ATG AAG ... seq2 GTA CTG AAA ... NT ALN seq1 V M R ... seq2 V L R ... AA ALN Muscle TranslatorX CODEML NS site models M0 M1a M7 M8a M2a M8 Codon frequencies F61 F1x4 F3x4 dN/dS (ω) ratios BEB M1a vs M2a M8a vs M8 Chi-squared test LRT M7 vs M8 PoSeiDon Here we present PoSeiDon, an easy-to-use web service (Fig. 1) to detect positively selected sites and recombination events in an alignment of coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus- host 'arms races' during evolution (Fig. 2). PoSeiDon is easy to use: just provide your nucleotide coding sequences as one multiple FASTA file and enter your E-Mail address (Fig. 1). The outcome is a user friendly web page, providing all intermediate results and data files and graphically displaying recombination events and positive selected sites (Fig. 4). Detection of Recombination Hence recombination can have a profound impact on evolutionary processes and can adversely affect phylogenetic reconstruction and the accurate detection of positive selection, screening for it should be a default step in each comparative evolutionary study. Within PoSeiDon, we use GARD 3 to detect possible breakpoints within an alignment. Fragments are further screened for positive selection independently (Fig. 4). 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Model averaged support Breakpoint location 1-1947 549 270 549 The Web Server Fig. 1 Web interface of PoSeiDon. The Web server is freely available at http://www.rna.uni-jena.de/poseidon. Fig. 3 In this example we detected 2 possible breakpoints in an alignment of 13 Mx1 sequences from bats 2 . Many thanks to Prof. Dr. Manja Marz, Prof. Dr. Georg Kochs and Jonas Fuchs as well as to the RNA bioinformatics group in Jena and the DFG for funding (SPP-1596)

Transcript of PoSeiDon - rna.uni-jena.de · Contact Infection Mx1 specific binding No binding Species 1 Species...

[email protected] [email protected]

www.rna.uni-jena.de

Contact

InfectionMx1 specificbinding

No binding

Species 1

Species 2

Species 3

Species 4

Mx1 Virus

Fig. 2 Exemplarily shown is an 'arms race' between the host Mx1 gene and a virus that results in high selection pressure on the host to evolve a defence against the pathogen. The virus itself establishes countermeasures to evade the host immune system2. Schematically shown is the hypervariable loop region of Mx1.

When positive selection has occured, the ratio between the non-synonymous (dN) and synonymous (dS) substitution rate became disturbed. In that way, certain amino acid changes are favored if they increase the hosts fitness, for example against an infection.

The dN/dS (ω) ratio may reach values greater than 1 and we call such sites positively selected1.

The detection of such sites allows researchers to gain insights into the evolution of a gene and might also help to develop counter-measurements against pathogens (Fig. 2).

Positive Selection Example: Positive Selected Sites in Bat Mx1

Region % sites withω 1

avg(ω) M8 BEB(PP 0.95/0.99)

Mx1, F3x4, 13 bat species

full (aa 1-649)

aa 1-90

aa 91-183

aa 184-649

101.39

24.05

0.19

112.69

0.001

0.908

0.001

0.001

6.26

21.3

NA

6.59

3.45

2.76

NA

3.83

205; 209; 361; 439; 443; 494; 549; 562; 569; 570; 572; 573; 574; 575; 578; 58116; 17; 19; 22; 23; 25; 26; 27; 31; 38; 40; 44; 46

none

205; 209; 361; 436; 439; 443; 494; 549; 562; 569;570; 572; 573; 574; 575; 578; 581

M7 vs M8-2(ln λ)

M7 vs M8p-value

Tab. 1 Results of the evolutionary analyses for positive selected sites in bat Mx1, exemplarily shown for codon frequency F3x4and paired NS site models M7 and M8 disallowing and allowing for positive selection, respectively.

Using PoSeidon, we were able to identify the loop L4 of Mx1 as a hot spot for positive selection in bats2, as previously also shown for primates4. By splitting the alignment by possible recombination events identified with the pipeline, we also found high evidence of positive selection in the N-terminal region of bat Mx1 (Table 1, fragment aa 1-90).

References[1] Yang, Ziheng. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution 24.8 (2007): 1586-1591.

[2] Fuchs, Jonas, et al. Evolution and antiviral specificity of interferon-induced Mx proteins of bats against Ebola-, Influenza-, and other RNA viruses. Submitted.

[3] Pond, Sergei L. Kosakovsky, et al. GARD: a genetic algorithm for recombination detection. Bioinformatics 22.24 (2006): 3096-3098.

[4] Mitchell, Patrick S., et al. Evolution-guided identification of antiviral specificity determinants in the broadly acting interferon-induced innate immunity factor MxA. Cell host microbe 12.4 (2012): 598-604.

PoSeiDon combines the following software:

TranslatorX (v1.1), Abascal et al. (2010); 20435676 (PMID)Muscle (v3.8.31), Edgar (2004); 15034147RAxML (v8.0.25), Stamatakis (2014); 24451623Newick Utilities (v1.6), Junier and Zdobnov (2010); 20472542MODELTEST, Posada and Crandall (1998); 9918953HyPhy (v2.2), Pond et al. (2005); 15509596GARD , Pond et al. (2006); 17110367PaML4/CodeML (v4.8), Yang (2007); 17483113Inkscape (v0.48.5)Ruby (v2.3.1)

Fig. 4 Workflow of the PoSeiDon pipeline and example output. The PoSeiDon pipeline comprises in-frame alignment of homologous protein-coding sequences, detection of putative recombination events and evolutionary breakpoints, phylogenetic reconstructions and detection of positively selected sites in the full alignment and all possible fragments. Finally, all results are combined and visualized in a user-friendly and clear HTML web page. The resulting alignment fragments are indicated with colored bars in the HTML output.

User Input

Alignment Recombination Tree Positive Selection

PoSeiDon

seq1GTTATGAAG...seq2GTACTGAAA...

FASTA

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

CAA

CAA

CAA

CAA

CAA

CAA

CAA

CAA

CAG

ACG

CAG

CAG

GAG

GAA

GAG

GCG

GCG

GCG

GCG

GCG

GCG

ATG

GTG

GTG

TAT

TAT

TAT

TAC

TAC

TAC

TAC

TAC

TAC

TAT

TAT

TAC

CGG

CGG

CGG

CGG

CGG

CGG

CGG

CGG

CGC

CAG

CGG

CAG

ACT

ACT

ACT

GCT

GCT

GCT

GGT

GCT

ACC

AGA

AAA

AGA

TGG

TGG

CAT

GCG

GCG

GCG

GCG

TCG

GCG

TCG

TCA

TCA

CTG

CTG

CTA

CTG

CTG

CTG

CTG

CTG

CTG

TTA

TTA

TTA

CAG

CAG

CAG

CAG

CAG

CAG

CAG

CAG

GGA

CGG

CAG

CAG

AAG

AAG

ATG

AAG

AAG

AAG

AAG

AAG

AAG

AAA

ATA

AAA

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

GTC

GTC

CGA

AGA

AGA

CGA

CGA

CGA

CGA

CGA

CGA

AGG

AGG

AGG

GAG

GAG

GAG

GAG

GAG

GAG

GAG

GAG

GGG

GAG

GAG

GAG

AAG

AAG

AAG

AAG

AAG

AAG

AAG

AAG

ATG

AAG

AAG

AAG

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAG

GAG

TCA

TTA

TTA

TCT

TCA

TCC

TCC

TCG

GCC

AAA

AAG

AAT

GAA

GAA

GAA

AAA

AAA

AAA

AAA

GAA

GAA

GAG

GAG

GAA

CAA

CAA

GAA

GAA

GAA

GAG

GAG

GAG

GAG

AAG

AAG

GAA

CAG

CAG

AAC

GAG

GAG

CAG

CAG

AAG

AAT

GAA

GTT

CAA

CAG ATG TAC CAG AGT TCA TTA CAG AAA ATC AGG GCG AAG GAG AAG GAG AAGGAT GAA

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

GAC

CAA

CAA

CAA

CAA

CAA

CAA

CAA

CAA

CAG

ACG

CAG

CAG

GAG

GAA

GAG

GCG

GCG

GCG

GCG

GCG

GCG

ATG

GTG

GTG

TAT

TAT

TAT

TAC

TAC

TAC

TAC

TAC

TAC

TAT

TAT

TAC

CGG

CGG

CGG

CGG

CGG

CGG

CGG

CGG

CGC

CAG

CGG

CAG

ACT

ACT

ACT

GCT

GCT

GCT

GGT

GCT

ACC

AGA

AAA

AGA

TGG

TGG

CAT

GCG

GCG

GCG

GCG

TCG

GCG

TCG

TCA

TCA

CTG

CTG

CTA

CTG

CTG

CTG

CTG

CTG

CTG

TTA

TTA

TTA

CAG ATG TAC CAG AGT TCA TTAGAT

CAG

CAG

CAG

CAG

CAG

CAG

CAG

CAG

GGA

CGG

CAG

CAG

AAG

AAG

ATG

AAG

AAG

AAG

AAG

AAG

AAG

AAA

ATA

AAA

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

ATC

GTC

GTC

CGA

AGA

AGA

CGA

CGA

CGA

CGA

CGA

CGA

AGG

AGG

AGG

GAG

GAG

GAG

GAG

GAG

GAG

GAG

GAG

GGG

GAG

GAG

GAG

AAG

AAG

AAG

AAG

AAG

AAG

AAG

AAG

ATG

AAG

AAG

AAG

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAA

GAG

GAG

TCA

TTA

TTA

TCT

TCA

TCC

TCC

TCG

GCC

AAA

AAG

AAT

GAA

GAA

GAA

AAA

AAA

AAA

AAA

GAA

GAA

GAG

GAG

GAA

CAA

CAA

GAA

GAA

GAA

GAG

GAG

GAG

GAG

AAG

AAG

GAA

CAG

CAG

AAC

GAG

GAG

CAG

CAG

AAG

AAT

GAA

GTT

CAA

CAG AAA ATC AGG GCG AAG GAG AAG GAG AAG GAA

558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577

Fragment 1

578 579 580

GAA TCA GAA CAA CAG - - - - - - - - - - - - AAA AGG AAA TCC ACC TTG GTG ACT TCT GAA AGC AGC CAG CGA AAG ATC

GAA TTA GAA GAA AAC - - - - - - - - - - - - AAG AAG AAG TCC GTC TTT GCG CTT TCT GAA AAC AAT CAG AGA ATG ATC

GAA TCC AAA GAG CAG - - - - - - - - - AAG GGG AGT TCT CGC GAG CAG ACG TCC TCT CTG GAG GAT CAG CGA AAG ATC

GAA TCC AAA GAG CAG - - - - - - - - - AAG GGG AGT TCT CGC GAG CAG ACG TCC TCT CTG GAG GAT CAG CGA AAG ATC

GAA TCG GAA GAG AAG AAG GGT TGT TCG CGC CAG CAG AAG GAG CAG AAT TTC TAT CAG GAG GAT CAG CGA AAG ATC

GAA GCC GAA GAG AAT - - - - - - - - - AAG AAG AAG AAG AAG GAG CAT ATT TTC TTT GAA GAG GAC GGA CGA AAG ATC

GAA AAA GAG AAG GAA - - - GAA GAA AGG AAG AGA ACA TTA GGT CGG GCG ATC TGC GAA GAG AGT CGG AGG AAA ATC

GAG AAT GAA GAA CAA AAC AAG AAT AAA TCA AGA GTT TTG GAC CTT GTA CAG AGT- - - - - - TCT CAG AGG AAA GTC

GAG AAG GAG AAG GAA - - - - - - GAA GAA ATG AAG AAG AAA TTT AAT TGT TTG AAC CTT CAA CAG CAG AGG AAA ATC

581 582

0.961 0.986 1.000 0.995 0.985 0.991 0.998

0.966 0.932 1.000 0.998 0.989 0.992 0.991

0.985 0.963 1.000 0.996 0.986 0.977 0.977

0.984

0.978

0.971

0.950 0.686 0.832 0.534 0.655

0.857 0.566 0.916

0.910

0.538

0.722

0.823

0.9340.883 0.762 0.665

F3x4F1x4F61

Fragment 2

D E C M K

YFTSR

N Q G L V

PHWAIResultingfragments

Model test

KH test

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000 1200 1400 1600 1800 2000

GARD

RAxML

Newick Utilities

Output

seq1GTT ATGAAG ...seq2GTA CTGAAA ...

NT ALN

seq1V M R ...seq2V L R ...

AA ALN

Muscle

TranslatorX

CODEML

NS site models

M0M1aM7

M8a

M2aM8

Codon frequencies

F61 F1x4 F3x4

dN/dS (ω) ratios

BEB

M1a vs M2a M8a vs M8

Chi-squared test

LRT M7 vs M8

PoSeiDon

Here we present PoSeiDon, an easy-to-use web service (Fig. 1) to detect positively selected sites and recombination events in an alignment of coding sequences. Sites that undergo positive selection can give you insights in the evolutionary history of your sequences, for example showing you important mutation hot spots, accumulated as results of virus-host 'arms races' during evolution (Fig. 2).

PoSeiDon is easy to use: just provide your nucleotide coding sequences as one multiple FASTA file and enter your E-Mail address (Fig. 1). The outcome is a user friendly web page, providing all intermediate results and data files and graphically displaying recombination events and positive selected sites (Fig. 4).

Detection of Recombination

Hence recombination can have a profound impact on evolutionary processes and can adversely affect phylogenetic reconstruction and the accurate detection of positive selection, screening for it should be a default step in each comparative evolutionary study.

Within PoSeiDon, we use GARD3 to detect possible breakpoints within an alignment.

Fragments are further screened for positive selection independently (Fig. 4).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Mod

el a

vera

ged

supp

ort

Breakpoint location

1-1947

549

270 549

The Web Server

Fig. 1 Web interface of PoSeiDon. The Web server is freely available at http://www.rna.uni-jena.de/poseidon.

Fig. 3 In this example we detected 2 possible breakpoints in an alignment of 13 Mx1 sequences from bats2.

Many thanks to

Prof. Dr. Manja Marz,Prof. Dr. Georg Kochs andJonas Fuchs

as well as to the RNA bioinformatics group in Jena

and the DFG for funding (SPP-1596)

PoSeiDon - a Web Server for the Detection of Evolutionary Recombination Events and Positive SelectionMartin Hölzer1,2 and Manja Marz1-6

1Faculty of Mathematics and Computer Science, Friedrich Schiller University Jena, Germany, 2European Virus Bioinformatics Center (EVBC), Friedrich Schiller University Jena, Germany, 3FLI Leibniz Institute for Age Research, Jena, Germany, 4Michael Stifel Center Jena, Germany, 5Aging Research Center (ARC), Jena, Germany, 6German Center for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Germany