Multiple alignment using sequence family profiles
Eric Nawrocki
Sean Eddy’s Lab
Howard Hughes Medical InstituteJanelia Farm Research Campus
Small subunit ribosomal RNA and the tree of life
• 1977 - Carl Woese decided to classify allliving things phylogenetically
• needed “a molecule of appropriatelybroad distribution” for comparativeanalysis
• SSU rRNA was chosen
– universally distributed
– highly conserved
– large enough to provide sufficientdata (1500-1800 nt)
– readily isolated
Pace, Science:276; 1997
**
*
Universal conservation of SSU rRNA
Escherichia coli Methanococcus vannielii Zea mays
Secondary structure diagrams from:URL:http://www.rna.ccbb.utexas.edu/
Environmental surveys target SSU
• mid 1980s - Norman Pace develops methodology fordetermination of SSU sequences without cultivation
• many different environments have been surveyed
• known biodiversity has been greatly expanded:
– recognized bacterial phyla:11 in 1987, 36 in 1998, 52 in 2003, 67 in 2006...
• SSU databases contain millions of sequences:
name # seqs # citationsSilva 3.2M 1125RDP 2.6M 1170Greengenes 1.0M 1012
Silva: Pruesse et al., 2007 NAR 35.21:7188-96RDP: Cole et al., 2009 NAR 37:D141-45Greengenes: DeSantis et al., 2006 AEM 72:5069-72
PCR
rRNA/rDNAsequences
genomicsequence
genomiclibrary
sequencing
cloning
extraction
sampleenvironmental
rRNA/rDNA
rRNA/rDNAclones
community
shotgunsequencing
treesphylogenetic
adapted from: Hugenholtz,
Genome Biology:2002 3(2)
bulk RNA/DNA
cloning
analysiscomparative
The comparative analysis step:
Alignment and Phylogenetic Inference
Goals of the alignment program:- accurate: because alignment errors confound phylogenetic inference- scalable to handle up to millions of seqs and fast:
Information that can help:- many known sequences- manually curated SSU alignments- known secondary structure of SSU
SSU tree
SSU alignment
SSU sequences
alignmentprogram
tree buildingprogram
tree buildingprogram
De novo multiple sequence alignment methods assume very little
progressive alignment
aligned sequences
alignment polishing
hierarchical clustering(minimum spanning tree,UPGMA)
guidetree
Drawbacks for SSU alignment:1. Won’t scale to millions of seqs (O(N^2))2. Ignores previous knowledge of SSU
N^2 comparisons(pairwise alignment,K-mer counting)
N^2 distancematrix
input sequences:
Assumption:These sequences are homologous.
A trusted (probably manually curated) reference alignment can help
3. Add to new alignment
trusted alignment:
"nearest neighbor" alignment:
2. pairwise alignmentto nearest neighbor
new sequences:
1. Find nearest neighbor
?
?
?
?} N seqs
Mseqs{
Advantages over de novo:1. More scalable (O(MN))2. Uses existing, trusted SSU alignment
Drawbacks:1. Still slow if M is large (~5000 for Greengenes)2. Pairwise alignment ignores varying conservation across alignment
Profile-based alignment is O(N)
profile
alignment
new sequences:
trusted alignment (”seed”):
Advantages over de novo and nearest-neighbor:1. Scalable (O(N))2. Uses existing, trusted SSU alignment3. Uses position-specific scores
Drawbacks:1. Only consensus positions are aligned, other nucleotides are inserted2. Ignorant of phylogeny
Profiles have position-specific scores
(substitutions, gap open, gap extend)
1 8764321 876432
yeast GUCaUUCGGC...ACfly GCC.UU-GGA...GCcow GCA.UUCGUC...-Cmouse GCA.UU-GAU...GChuman GCGaUUCGCU...GCchicken GUA.UUCGUA...ACsnake GUGaUUCGCG...ACcroc GUU.UU-GAG...AC
GUCACGU U55UCG
11C
1010GAACGU
99
ACGU
worm G-G.UUCGCGccaACstarfish G-U.UUCGAU...-C
sequenceprofile
Profiles HMMs are probabilistic profiles built from alignments
M
I
D
M
I
D
UC
2
One HMM node peralignment column
P(C) = 0.5P(U) = 0.5
transitions
3 states per node:(M) Match: emits residues (I) Insert: inserts extra residues(D) Delete: deletes residues
HMMs generate homologoussequences.
Node for column 2:
transitions
Given a sequence, the most likelypath that could have generatedthat sequence can be computed.This path implies an alignment.
G UCACGU U U C G CG
AACGU
ACGUM
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
MB
II
D
EM
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
MB
II
D
EM
I
D
1 5 11109876432 118
1 8764321 876432
yeast GUCaUUCGGC...ACfly GCC.UU-GGA...GCcow GCA.UUCGUC...-Cmouse GCA.UU-GAU...GChuman GCGaUUCGCU...GCchicken GUA.UUCGUA...ACsnake GUGaUUCGCG...ACcroc GUU.UU-GAG...AC
GUCACGU U55UCG
11C
1010GAACGU
99
ACGU
worm G-G.UUCGCGccaACstarfish G-U.UUCGAU...-C
Sequences are aligned to profiles HMMs using dynamic programming
algorithms very similar to Smith-Waterman
M
I
D
M
I
D
UC
2
One HMM node peralignment column
P(C) = 0.5P(U) = 0.5
transitions
3 states per node:(M) Match: emits residues (I) Insert: inserts extra residues(D) Delete: deletes residues
HMMs generate homologoussequences.
Node for column 2:
transitions
Given a sequence, the most likelypath that could have generatedthat sequence can be computed.This path implies an alignment.
G UCACGU U U C G CG
AACGU
ACGUM
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
MB
II
D
EM
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
M
I
D
MB
II
D
EM
I
D
1 5 11109876432 118
MMMMMMMMMB EM
urchin
MMMMMMMMMB EM
G U U U U C CAA A
urchin GUU.UUC-AA...AC
DD
1 8764321 876432
yeast GUCaUUCGGC...ACfly GCC.UU-GGA...GCcow GCA.UUCGUC...-Cmouse GCA.UU-GAU...GChuman GCGaUUCGCU...GCchicken GUA.UUCGUA...ACsnake GUGaUUCGCG...ACcroc GUU.UU-GAG...AC
GUCACGU U55UCG
11C
1010GAACGU
99
ACGU
worm G-G.UUCGCGccaACstarfish G-U.UUCGAU...-C
Profile alignment can differ from pairwise alignment
Pairwise alignmentto nearest neighbor
starfish GUUUCGAUCurchin GUUUUCAAAC
starfish G-UUUCGAUCurchin GUUUUCAAAC * **** * *
1 8764321 876432
yeast GUCaUUCGGC...ACfly GCC.UU-GGA...GCcow GCA.UUCGUC...-Cmouse GCA.UU-GAU...GChuman GCGaUUCGCU...GCchicken GUA.UUCGUA...ACsnake GUGaUUCGCG...ACcroc GUU.UU-GAG...AC
GUCACGU U55UCG
11C
1010GAACGU
99
ACGU
worm G-G.UUCGCGccaACstarfish G-U.UUCGAU...-Curchin GUU.UUC-AA...AC
yeast GUCaUUCGGC...ACfly GCC.UU-GGA...GCcow GCA.UUCGUC...-Cmouse GCA.UU-GAU...GChuman GCGaUUCGCU...GCchicken GUA.UUCGUA...ACsnake GUGaUUCGCG...ACcroc GUU.UU-GAG...ACworm G-G.UUCGCGccaACstarfish G-U.UUCGAU...-Curchin GUU.UUCAAA...-C
1 8764321 876432GUCACGU U
55UCG
11C
1010GAACGU
99
ACGU
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Sequence conservation per position blue: highly conserved ...... red: highly variable
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Frequency of insertions after each positiongrey: zero to very few inserts ...... teal: 1-2% inserts
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Sequence conservation per position blue: highly conserved ...... red: highly variable
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
Secondary structure (mutual) information per positionblue: low information ...... red: high information
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Covariance models (CMs) are built
from structure-annotated alignments
32
4 14
5 1312
6 117 10
8 9
15
16 27
17 2618
19 25
21 2322
guide tree:
AA
G C
A UA
C GU G
U C
U
G C
G CC
G C
A AC
input multiple alignment:
AUA
:AAG
:GCG
<AAU
<CCC
<UUU
:UUU
:CCC
:GG-
:GGG
>AAC
:UUA
>CGC
>U-G
:GCG
<GAG
<CCC
:GCA
<AAC
:CAC
:AAA
:CGU
>CUU
>CGC
>humanmouse
orc
[structure]
g...
c...
.a..
.a..
1 5 10 15 20 25 28
example structure(human):
states for 3 guide tree nodes:A C
B
16
17
18
ROOT
MATL2
MATL3
BIF
4 14
5 13
12
6 11
7
8
9
10
BEGL
MATP
MATP
MATR
MATP
MATL
MATL
MATL
MATL
END
BEGR
MATL15
MATP16 27
MATP17 26
MATL18
MATP19 25
MATL21
MATL22
MATL23
END
MATP
MATP
MATL
MP ML MR D
IL IR
MP ML MR D
IL IR
ML D
IL
62 63
50 51 52 53
54 55
56 57 58 59
60 61
64
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Benchmark of SSU alignment
• How accurate are profile-based alignments?
• ’Gold standard’ testing dataset
– structural alignment of 152 bacterial SSU sequences from Robin Gutell’s database
– this is the CRW bacterial seed alignment filtered to 92% identity
– determined by ’manual’ comparative analysis
51 test sequences:
profile(CM or HMM)
alignment
procedure
Test: How similar are these alignments?
no seed/test sequencepair is > 80% identical
80% id
sequenceidentity
tree
trusted “gold standard’alignment (152 sequences):
seed alignment(101 sequences):
test alignment(51 sequences)
Profiles produce accurate SSU alignments
alignment timeaccuracy (sec/seq)
Muscle-3.8.31 (de novo) 95.4% 0.49
HMMER3 (HMMs) 96.8% 0.04
Infernal 1.1 (CMs) 98.1% 0.50
Muscle: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
HMMER: hmmer.janelia.org
Infernal: infernal.janelia.org
Probabilistic models allow direction calculation of useful quantities
• Posterior decoding algorithm computes the posterior probability that each nucleotide iscorrectly aligned given the model
– allows HMM banding for CM alignment (O(N3log(N)) reduced to close to O(N2)
– useful for identifying and removing (masking) columns that are not reliably alignedprior to phylogenetic inference
SSU tree
SSU alignment
SSU sequences
alignmentprogram
tree buildingprogram
tree buildingprogram
Mask columns
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Phil Hugenholtz’s manually created mask imposed on archaeal SSUblack: included in alignment (1257) pink: excluded from alignment (251)
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
Secondary Structure: small subunit ribosomal RNA
Archaeoglobus fulgidus(X05567,AE000965)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Archaeoglobales 5.Archaeoglobaceae6.Archaeoglobus February 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AUUC U G G U
U GAUCCUGCCAGAG
GC
CG
CU
GC
U
AUC
CG
GC
UG
G
GAC
U AAG
CC AUGC
G
A
G U CA A
G G G G C U U G U A U C C C UUC
GGGGAUGCAAGCACCGGC
GG
ACGGC
UC
AGU
AACA
CGUGGAC A
ACC
UG
CCCUCGGG
U G G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCC
AU
AG
GGGAUGGG
UACUG
GAAUGUCCCAUCUC
CG
AAAGCGC
UU A G C G
CCCGAGG
AU
GG
GUCUGCGGC
GGA
UU
AG
GUU
GUU
GG
UG
GGG
UAACG
G CC
CA
CC
AA
GC
CGAA G A
UC
CG
UA
CGGGCCAUG
AG A
GU GGG A
GC C C G G
AG
AUGGACC
CUGAGACA C G
G G U C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
AA
A C C G C G A C G G G GUC
AG
CC
GG
AGUGCUCGCGCAUC G C G C G G G C
UG
UC
GG
GG
UG
CCUAA A A
AG
CA
CC
CC
ACA
GC
AA GGGCCGGGC
AA
G GCCGG
UGGCA
GCCGC C
GC G
GUAA
UA
CCGG
CGGCCCGA
GU
GG
CG
GC
CA
CUUUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCCGGGCUGGUA
AGUCCUCCGGGA
AAUCUGGCGGCU
UA A C C G U C A G
A C UG C C G G A G G A
U AC U G C C A G C C
UAG
GGACCGGGA
GA
GGCCGGG
GGUAUUCCCGGAGUA
GGGGUGA
A A U C CU
GU A
AUC C C G G G A G G A C C
AC C
UG
UGG C G
AA
GGCGCCCGGCUG
GAACGGGUCCGAC
GGUGAG
GGA C GA
AG
GCCAGGG
GA G
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC U G
UAAA
CG
AUG C G G A C U A G
GU
GU
CA
CC
GA
AG
CU A C
GAG
CU
UC
GG
UG
GU
GC
CGGAG
GGA
AGC
CGUUA
AGUCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A CGGGU
GGAGCCUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GGAA
G C U U ACCGGGGGAG
ACAGCG
GGAUG
AAGGUCGGGC
UGA
A GACCUUA CC A
G
A
C UAGCUG A GA
GG
UGGUGC
A UG
GCCGCCGUCA
GUUCGUACUG
UGAAGC
A
U
CCUG
UU
A AGU
CAGGC
AA C G A G C
GA G A
CC C G C G C C C C C A G U U G C C
A GC G G U U C C C U
UC
GGGGAAG
CCGGGCACA
CUGGGGG
G
A
CUGCCGGCG
CUAAGCCGGAGG
AAGGUGCGGG
CAACGGCAGGU C
CGU
AUGCCCC G
AA
UCCCCCGG
GC
UAC
ACGCGGGCUAC A A U
GG
CC
GG
GAC
A A U GG
GU
A CCG
A C CC
C G AAAG
GGG
UAG
GU
AAU
CC
CC
UAAAC
CC
GG
UCUAA
CC
UGGGAUCGAGGGC
UGC
AACUCGCCCUCGU
GAACC
UG
GAAUCCG
UAGUAAUCGCGCC U
CA
AAAUG
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
AAGCCACCC
GAGUGGGCCAGGGGCGAGGGGGUGGCCC
UAGGCCACCUUCGAGCCCAGGGUCCGCGA
GGGGGGCU
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
AUCUGCGGCUGGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Aeropyrum pernix (str. K1)(AP000062)1.cellular organisms 2.Archaea 3.Crenarchaeota 4.Desulfurococcales 5.Desulfurococcaceae 6.Aeropyrum 7.Aeropyrum pernix February 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
ACUC C G G U
U GAUCCUGCCGGAC
CC
GA
CC
GC
U
AUC
GG
GG
UA
G
GGC
U AAG
CC AUGG
G
A
G U CG A
A C G C C C C G C CGC
UGCGGGGCGUGGC
GG
ACGGC
UG
AGU
AACA
CGUGGCU A
ACC
UA
CCCUCGGG
A C G G GG A U A
A C C CC U
C G G GAA
ACUGGGGCU
AAUCCCCG
AU
AG
GCGGGGGGUGCCU
GGA
AGGGUCCUCCGC
CG
AAAG
GGGCCGCGGGGGG
UUA
U C GC C C G C G G U C C G
CCCGAGG
AU
GG
GGCUGCGGC
CCA
UCA
CGG
UA
GUU
GG
CG
GGG
UAAUG
G CC
CG
CC
AA
GC
CUAC G A
CG
GG
UA
GGGGCCGUG
AG A
GC GGG A
GC C C C C
AG
AUGGGCA
CUGAGACA A G
G G C C C AGG
CC
CUA
C GG
GG
CG C A C
CAGGC
G
CGAAAAC
UCCG
CAAUGCGGG
CA
A C C G U G A C G G G GCU
AC
CC
CG
AGUGCCCCCCUUG G G G G G G C
UU
UU
CC
CC
GC
UGUAA G U
AG
GC
GG
GG
GA
AU
AA GCGGGGGGC
AA
G A CUGG
UGUCA
GCCGC C
GC G
GUAA
UA
CCAG
CCCCGCGA
GC
GG
UC
GG
GG
UGCUU
AC
UG G
GC
CUA
AA
GC
GCCCGUA
GCCGGCUCGACA
AGUCCCCGCCUA
AAGCCCCGGGCU
CA A C C C G G G G
A GU G G C G G G G A
U AC U G U C G G G C
UAG
GGGGCGGGA
GA
GGCCGAG
GGUACUCCCGGGGUA
GGGGCGA
A A U C CA
UU A
AUC C C G G G A G G A C C
AC C
AG
UGG C G
AA
GGCGCUCGGCUG
GAACGC GCCCGAC
GGUGAG
GGGC GA
AA
GCCGGGG
GA G
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCCGGC U G
UAAA
CG
AUG C G G G C U A G
GU
GU
UG
GG
CG
GG
CG U G
CG
AGC
CC
GC
CC
AG
UG
CCG
CAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACC
A C A A GGGGU
GGAGCCUGCGGC
UUAAU
UG
GA
GUCAAC G C
CG
GGAA
CC U C C
ACCGGGGGCG
AC
AG
CAGGAU
GA
CG
GC
CA
GGC
UGA
A GGC
CU
UG
CC C
G
A
C GCG
CU
G A GGGG
AGAUGC
A UG
GCCGUCGCCA
GCUCGUGCCG
UGAGGU
G
U
CCUG
UU
A AGU
CAGGC
AA C G A G C
GA G A
CC C C C G C C C C U A G U U G C C
A U CC C C G G G G C
GACCCGGGGGG
CACACUAGGGG
G
A
CUGCCGCCG
CUAAGGCGGAGG
AAGGAGGGGG
CAACGGCAGGU C
AGC
AUGCCCC U
AA
ACCCCCGG
GC
UGC
ACGCGGGCUAC A A U
GG
CG
GG
GAC
A G C GG
GUG C
CG
A C CC
C G AAAG
GGG
GAG
GC
AAU
CC
CU
CAAAC
CC
CG
CCUCA
GU
UGGGAUCGAGGGC
UGC
AACUCGCCCUCGU
GAACG
UG
GAAUCCC
UAGUAACCGCGCGU
CA
UCAUC
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACUGCCCG
UC
GCUCCACCC
GAGGGGGGAGGGGGUGAGGCCCUCUCCG
C CGGGAGGGGGUCGAACCCCCUCCCCCCGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCNNN
Secondary Structure: small subunit ribosomal RNA
Halobacterium sp.(AE004437)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Halobacteriales 5.Halobacteriaceae 6.HalobacteriumMarch 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GC
CA
UU
GC
U
AUC
GG
AG
UC
C
GAU
U UAG
CC AUGC
U
A
G U UG U
G C G G G U UUA
GACCCGCAGC
GG
AAAGC
UC
AGU
AACA
CGUGGCC A
AGC
UA
CCCUGUGG
A C G G GA A U A
C U C U C G G GAA
ACUGAGGCU
AAUCCCCG
AU
AA
CGCUUUGC
UCCUG
GAAGGGGCAAAGCC
GG
AAACGCU
CC
G G C GCCACAGG
AU
GC
GGCUGCGGU
CGA
UU
AG
GUA
GAC
GG
UG
GGG
UAACG
G CC
CA
CC
GU
GC
CCAU A A
UC
GG
UA
CGGGUUGUG
AG A
GC A AG A
GC C C G G
AG
ACGGAAU
CUGAGACA A G
A U U C C GGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UUUA
CACUGUACG
AA
A G U G C G A U A A G GGG
AC
UC
CG
AGUGUGAAGGCAUAGA G C C U U C A C
UU
UU
GU
AC
AC
CGUAA G G
UG
GU
GC
AC
GA
AU
AA GGACUGGGC
AA
G A CCGG
UGCCA
GCCGC C
GC G
GUAA
UA
CCGG
CAGUCCGA
GU
GA
UG
GC
CG
AUCUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCUGGCUGAACA
AGUCCGUUGGGA
AAUCUGUCCGCU
UA A C G G G C A G
G C GU C C A G C G G A
A AC U G U U C A G C
UUG
GGACCGGAA
GA
CCUGAGG
GGUACGUCUGGGGUA
GGAGUGA
A A U C CU
GU A
AUC C U G G A C G G A C C
GC C
GG
UGG C G
AA
AGCGCCUCAGGA
GAACGGAUCCGAC
A GUGAG
GGA C GA
AA
GCUAGGG
UC U
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUAGC U G
UAAA
CG
AUG U C C G C U A G
GU
GU
GG
CG
CA
GG
CU A C
GAG
CC
UG
CG
CU
GU
GC
CGUAG
GGA
AGC
CGAGA
AGCGGAC
CGCCU
G G GAA G U A
CG U C U G
CA
AGGAUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A CCGGA
GGAGCCUGCGGU
UUAAU
UG
GA
CUCAAC G C
CG
GACA
U C U C ACCAGCCCCG
ACAGUAGUAAU
GA
CG
GU
CA
GGU
UGA
U GA
CC
UU
AC
C C GA
G
GCUA CUGA GAGG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCUG
UU
A AGU
CAGGC
AA C G A G C
GA G A
CC C G C A C U C C U A A U U G C C
A GC A G U A C C C U
UUGGGUA
GCUGGG
UACAUUAGGUG
G
A
CUGCCGCUG
CCAAAGCGGAGG
AAGGAACGGG
CAACGGUAGGU C
AGU
AUGCCCC G
AA
UGGGCUGG
GC
AAC
ACGCGGGCUAC A A U
GG
UC
GA
GAC
A A U GG
GAA G
CC
A C UC
C G AGAG
GAG
GCG
CU
AAU
CU
CCU
AAAC
UC
GA
UCGUA
GU
UCGGAUUGAGGGC
UGA
AACUCGCCCUCAU
GAAGC
UG
GAUUCGG
UAGUAAUCGCGUGU
CA
GCAGC
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
AAAUCACCC
GAGUGGGGUUCGGAUGAGGCCGGCA
U GCGCUGGUCAAAUCUGGGCUCCGCAA
GGGGGAUU
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
AUCUGCGGCUGGAUCACCUCCU
Secondary Structure: small subunit ribosomal RNA
Methanococcus vannielii(M36507)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Methanococcales 5.Methanococcaceae 6.Methanococcus February 2001
5’
3’
Citation and related information available at http://www.rna.ccbb.utexas.edu
AUUC C G G U
U GAUCCCGCCGGAG
GC
UA
CU
GC
U
AUU
GG
GG
UU
C
GAC
U AAG
CC AUGC
G
A
G U CU
A U G G U UUC
GGCCAUGGC
GG
ACGGC
UC
AUU
AACA
CGUGGUU A
ACU
UA
ACCUCAGG
U G G A GC A U A
A C C U U G G GAA
ACUGAGGAU
AAUUCUCC
AU
AAGAAAAGC
AG
UCUG
GAACGA
UUCUUUUC
UG
AAAGCA
UA
U G C GCCCGAGG
AU
AG
GACUGCGCU
CGA
UU
AG
GUA
GUU
GG
UG
GGG
UAAUG
G CC
CA
CC
AA
GC
CUAC G A
UC
GA
UA
CGGGCCUUG
AG A
GA GGG A
GC C C G G
AG
AUGGGGA
CUGAGACA C G
G C C C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCACG
AA
A G U G C G A C G G G GGG
AC
CC
CA
AGUGCUCAUGCACA G C A U G G G C
UU
UU
AU
CA
AG
UGUAA A C
AG
CU
UG
AG
GA
AU
AA GGGCUGGGC
AA
G UUCGG
UGCCA
GCAGC C
GC G
GUAA
UA
CCGA
CGGCCCGA
GU
GG
UA
GC
CA
CUCUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCCGGUCCAGUA
AGUCCCUGUUUA
AAUUCUCUGGCU
UA A C C A G A G G
A CU G G C A G G G A
U AC U G C U G G A C
UUG
GGACCGGGA
GA
GGACAAG
GGUACUCCAGGGGUA
GCGGUGA
A A U G UG
UU G
AUC C U U G G A G G A C C
AC C
UA
UGG C G
AA
GGCACUUGUCUG
GAACGGGUCCGAC
GGUGAG
GGA C GA
AA
GCCAGGG
GC G
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC C G
UAAA
CU
CUG C G A A C U A G
GU
GU
CA
CC
UG
GG
CC U C
GAG
CC
CA
GG
UG
GU
GC
CGAAG
GGA
AGC
CGUUA
AGUUCGC
CGCCU
G G GGA G U A
CG G U C G
CA
AGACUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACC
A C A A CGGGU
GGAGCCUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GGCA
U C U C ACCAGGAGCG
ACAGCA
UGAUG
ACGGCCAGGU
UGA
C GACCUUGCC U
G
A
A GCGCUGA GA
GG
UGGUGC
A UG
GCCAUCGUCA
GCUCGUACCG
CGAGGC
G
U
CCUG
UU
A AGU
CAGGU
AA C G A G C
GA G A
CC C G U G C C C U A U G U U G C G
A CU A C U U U C U
CC
GGAAGGUAAGCACU
CAUAGGG
G
A
CCGCUAGCG
CUAAGCUAGAGG
AAGGAGCGGG
CAACGAUAGGU C
CGC
AUGCCCC G
AA
UCUCCUGG
GC
UAC
ACGCGGGCUAC A A U
GG
CU
AG
GAC
A A U GG
GCU G
CU
A C CC
U G AAAA
GGG
ACG
CG
AAU
CU
CC
GAAAC
CU
AG
UCGUA
GU
UCGGAUCGUGGGC
UGU
AACUCGCCCACGU
GAAGC
UG
GAAUCCG
UAGUAAUCGCAGU U
CA
UAAUA
CU
GC
GG
UGA
AU
GU
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
ACACCACCC
GAGUUGGGUUCAGGUGAGGCCUUGGCCU U U
GGCUAGGGUCGAACCUGGGCUCAGCGA
GGGGGGUG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCCNN
Secondary Structure: small subunit ribosomal RNA
Natronorubrum bangense(Y14028)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Halobacteriales 5.Halobacteriaceae 6.Natronorubrum May 2001
5’
3’
Citation and related information available at http://www.rna.ccbb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GU
CA
UU
GC
U
AUU
GG
AG
UC
C
GAU
U UAG
CC AUGC
U
A
G U UG U
A C G A G U UUG
UACUCGUAGC
AG
AUAGC
UC
AGU
AACA
CGUGGCC A
AAC
CA
CCCUAUGG
A U C C GG A U A
A C C U C G G GAA
ACUGAGGCC
AAUCCGGA
AU
AC
GAUUCCC
AA
GC
UG
GAACU
GC
UGGGAAUC
CG
AAACGCU
CA
G G C GCCAUAGG
AU
GU
GGCUGCGGC
CGA
UU
AG
GUA
GAC
GG
UG
GGG
UAACG
G CC
CA
CC
GU
GC
CAAU A A
UC
GG
UA
CGGGUUGUG
AG A
GC A AG A
GC C C G G
AG
ACGGUAU
CUGAGACA A G
A U A C C GGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UUUA
CACUGCACG
CC
A G U G C G A U A A G GGG
AC
UC
CG
AGUGCGAGGGCAUAU A G U C C U C G C
UU
UU
CA
CC
AC
CGUAA G G
AG
GU
GG
UA
GA
AU
AA GUGCUGGGC
AA
G A CCGG
UGCCA
GCCGC C
GC G
GUAA
UA
CCGG
CAGCACGA
GU
GA
UG
AC
CG
CUAUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCUGGCCGUGCA
AGUCUAUCGGGA
AAUCUGUGUGCC
UA A C A C A C A G
G C GU C C G G U G G A
A AC U G C U C G G C
UUG
GGACCGGAA
GA
CCAGAGG
GGUACGUCUGGGGUA
GGAGUGA
A A U C CC
GU A
AUC C U G G A C G G A C C
AC C
GG
UGG C G
AA
AGCGCCUCUGGA
AGACGGAUCCGAC
GGUGAG
GGA C GA
AA
GCUCGGG
UC A
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCGAGC U G
UAAA
CG
AUG U C U G C U A G
GU
GU
GA
CA
CA
AA
CU A C
GAG
UU
UG
UG
UU
GU
GC
CGUAG
GGA
AGC
CGUGA
AGCAGAC
CGCCU
G G GAA G U A
CG U C C G
CA
AGGAUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A CCGGA
GGAGCCUGCGGU
UUAAU
UG
GA
CUCAAC G C
CG
GACA
U C U C ACCAGCAUCG
ACAAUGU
GCAGUGA
CG
GU
CA
GUG
UGA
U GAG
CU
UA
CC U
GA
GCCA UUGA GAGG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCUG
UU
A AGU
CAGGC
AA C G A G C
GA G A
CC C G C A U G C C U A A U U G C C
A GC A G U A C C C U
UGUGGUA
GCUGGG
UACAUUAGGUA
G
A
CUGCCAGUG
CCAAACUGGAGG
AAGGAACGGG
CAACGGUAGGU C
AGU
AUGCCCC G
AA
UGUGCUGG
GC
GAC
ACGCGGGCUAC A A U
GG
CC
AC
GAC
A G U GG
GAU G
CA
A C CC
C G AAAG
GGG
ACG
CU
AAU
CU
CCG
AAAC
GU
GG
UCGUA
GU
UCGGAUUGAGGGC
UGA
AACUCGCCCUCAU
GAAGC
UG
GAUUCGG
UAGUAAUCGCGCC U
CA
GAAGG
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
AAAGCACCC
GAGUGAGGUCCGGAUGAGGCCGCCGC A A
CGGCGGUCGAAUCUGGGCUUCGCAA
GGGGGCUU
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
AUCUGCGGCUGGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Nanoarchaeum equitans(AJ318041)1.cellular organisms 2.Archaea 3.Nanoarchaeota 4.Nanoarchaeum July 2004
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
ACUC C C G U
U GAUCCUGCGGGAG
GC
CA
CC
GC
U
AUC
UC
CG
UC
C
GGC
U AAC
CC AUGG
A
A
G G CGA
G G G U C C C C G GGU
AAGGGGGCCCGCC
GC
ACGGC
UG
AGU
AACA
CGUCGGU A
ACC
UA
CCCUCGGG
A C G G GG
A U AA C C C C G G G
AA
ACUGGGGCU
AAUCCCCG
AU
AG
GGGAUGGGUGCUG
GAAGGCCCCAUCCC
CG
AGAGGGGCUAGCGGUAC
UUC C CC C G C U A G C C C G
CCCGAGG
AU
GG
GCCGGCGGC
CCA
UC
AG
GUA
GUU
GG
CG
GGG
UAAUG
G CC
CG
CC
AA
GC
CGAA G A
CG
GG
UA
GGGGCCGUG
AG A
GC GGGA
GC C C C C
AG
AUCGGCACUG
AGA
C A A GG G C C G A
GG
CC
CUA
C GG
GG
CG C A C
CAGGG
G
CGAAACC
UCCG
CAAUGCGGG
AA
A C C G U G A C G G G GGG
AC
GG
AG
AGUGCCGGAGGGCGUUA U G C U C U C C G G C
UU
UU
GG
GG
AG
UGUAA G U
AG
CU
CC
CC
GA
AU
AA GCGGUGGGC
AA
G A GGGG
UGGCA
GCCGC C
GC G
GGAA
CA
CCCC
CACCGCGA
GC
GG
UG
GC
CG
UGAUU
AU
UG G
GC
CUA
AA
GG
GGCCGUA
GCCGGGCCGGUG
UGGCUCCGGUGAAA
UCCUCGGGCU
CA A C C C G A G G
G C GC G C C G G A G C
U AC U A C C G G C C
UAG
GGACCGGGA
GG
GGCCGAC
CGUACUCCCGGGGGA
GCGGUGA
A A U G CU
GU A
AUC C C G G G A G G A C G
AC C
CG
UGG C G
AA
AGCGGUCGGCCA
GAACGGGUCCGAC
GGUGAG
GGCC GA
AG
GCCGGGG
GC U
AGAACGGG
AUU
A G AGAC
CCCG
GUA
UU
CCCGGC U G
UCAA
CG
CUG C G G G C U A C
CU
GC
UG
GG
CG
GG
CU A C
GAG
CC
CG
CC
CA
GU
GG
GGUAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U A G G C GG
G G G A G C AC
A C A A GA GGU
GGGGUGCGCGGU
UUAAU
UG
GA
UUCGAC G C
CG
GGAA
C C U C ACCGGGGCUG
ACAGCA
CAAUGA
UG
GU
CG
GCC
UGA
A GGG
CC
UA
CC G
GA
G GCGCUGA GA
GG
AGGUGC
A UG
GCCGCCGUCA
GCCUGUGCCG
UGAGGU
G
C
CCUG
UU
A AGU
CAGGA
AA C A G G C
GA G A
CC C G C G C C C G C A G
U U GC G
A CG G C C G
AA
AGGCCGGCACA
CUGCGGG
G
A
CUGCCGGGG
AAACCCGGAGG
AAGGUGCGGG
CGACGGCAGGU A
UGC
AUGCCCC G
AA
UGCCCCGG
GC
UAC
ACGCGCGCAUC A A U
GG
GC
GG
GAC
A G GG
GGC
C GCG
A CC
CC
G AAA
GG
GGG
AGCAA
AUC
CCC
AAAC
CC
GC
UCUCA
GU
CCAGAUCGAGGGC
UGC
AACUCGCCCUCGU
GACGG
CG
GAAUCUC
UAGUAGUCGGACGU
CA
CCAGC
GU
CC
GG
CGA
AU
AC
GUCCC
UGCUCCUUGCA
CUCACCGCCCG
UC
AAGCCACCC
GAGCUGGGGCCUAGC
GAGGCCGUGGGGGGU
U CGCCCCCCACGGUCGA
GCUAGGCCCCGGCGA
GGGGGGCU
AAG
UCGA
CAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Natronobacterium innermongoliae(AF009601)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Halobacteriales 5.Halobacteriaceae 6.Natronobacterium February 2001
5’
3’
Citation and related information available at http://www.rna.ccbb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GU
CA
UU
GC
U
AUU
GG
AG
UC
C
GAU
U UAG
CC AUGC
U
A
G U UG U
A C G A G U UUA
GACUCGUAGC
AG
AUAGC
UC
AGU
AACA
CGUGGCC A
AAC
UA
CCCUAUGG
A U C C AG A C A
A C C U C G G GAA
ACUGAGGCU
AAUCUGGA
AU
AC
GACUCUC
AU
CCUG
GAGUGG
AGAGAGUC
CG
AAACGCU
CC
G G C GCCAUAGG
AU
GU
GGCUGCGGC
CGA
UU
AG
GUA
GAC
GG
UG
GGG
UAACG
G CC
CA
CC
GU
GC
CAGU A A
UC
GG
UA
CGGGUUGUG
AG A
GC A AG A
GC C C G G
AG
ACGGUAU
CUGAGACA A G
A U A C C GGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UUUA
CACUGCACG
CC
A G U G C G A U A A G GGG
AC
UC
CA
AGUGCGAGGGCAUAU A G U C C U C G C
UU
UU
UG
CG
AC
CGUAA G G
UG
GU
CG
CG
GA
AU
AA GUGCUGGGC
AA
G A CCGG
UGCCA
GCCGC C
GC G
GUAA
UA
CCGG
CAGCACGA
GU
GA
UG
AC
CG
CUGUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCUGGCCGCGCA
AGUCUAUCGGGA
AAUCUCUUCGCU
UA A C G A A G A G
G C GU C C G G U G G A
A AC U G U G U G G C
UUG
GGACCGGAA
GA
CCAGAGG
GGUACGUCUGGGGUA
GGAGUGA
A A U C CC
GU A
AUC C U G G A C G G A C C
AC C
GG
UGG C G
AA
AGCGCCUCUGGA
AGACGGAUCCGAC
GGUGAG
GGA C GA
AA
GCUCGGG
UC A
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCGAGC U G
UAAA
CG
AUG U C U G C U A G
GU
GU
GA
CA
CA
GG
CU A C
GAG
CC
UG
UG
UU
GU
GC
CGUAG
GGA
AGC
CGUGA
AGCAGAC
CGCCU
G G GAA G U A
CG U C C G
CA
AGGAUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A CCGGA
GGAGCCUGCGGU
UUAAU
UG
GA
CUCAAC G C
CG
GACA
U C U C ACCAGCAUCG
ACAAUGU
GCAGUGA
CG
GU
CA
GUG
UGA
U GAG
CU
UA
CC U
GA
GCCA UUGA GAGG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCUG
UU
A AGU
CAGGC
AA C G A G C
GA G A
CC C G C A C U C C U A A U U G C C
A GC A A C A C C C U
UGUGGUG
GUUGGG
UACAUUAGGAG
G
A
CUGCCAGUG
CCAAACUGGAGG
AAGGAACGGG
CAACGGUAGGU C
AGU
AUGCCCC G
AA
UGUGCUGG
GC
GAC
ACGCGGGCUAC A A U
GG
CC
AC
GAC
A G U GG
GAU G
CA
A C GC
C G AAAG
GCG
ACG
CU
AAU
CU
CC
GAAAC
GU
GG
UCGUA
GU
UCGGAUUGAGGGC
UGA
AACUCGCCCUCAU
GAAGC
UG
GAUUCGG
UAGUAAUCGCGCC U
CA
GAAGG
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
AAAGCACCC
GAGUGGGGUCCGGAUGAGGCCCGGU U U
CCGGGUCGAAUCUGGGCUCCGCAA
GGGGGCUU
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
AUCUGCGGCUGGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Pyrococcus abyssi(AJ248283)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermococcales 5.Thermococcaceae 6.Pyrococcus April 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
CGUACUCCCUUAAUUC C G G U
U GAUCCUGCCGGAG
GC
CA
CU
GC
U
AUG
GG
GG
UC
C
GAC
U AAG
CC AUGC
G
A
G U CA A
G G G G G C G U C C C UUC
UGGGACGCCACCGGC
GG
ACGGC
UC
AGU
AACA
CGUCGGU A
ACC
UA
CCCUCGGG
A G G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCC
AU
AG
GCCUGGGG
UACUG
GAAGGUCCCCAGGC
CG
AAAGGGAGCCG
UA
A G G C U C C GCCCGAGG
AU
GG
GCCGGCGGC
CGA
UU
AG
GUA
GUU
GG
UG
GGG
UAACG
G CC
CA
CC
AA
GC
CGAA G A
UC
GG
UA
CGGGCCGUG
AG A
GC GGG A
GC C C G G
AG
AUGGACA
CUGAGACA C G
G G U C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
AA
A C C G C G A C G G G GGG
AC
CC
CC
AGUGCCGUGCCUCU G G C A C G G C
UU
UU
CC
GG
AG
UGUAA A A
AG
CU
CC
GG
GA
AU
AA GGGCUGGGC
AA
G GCCGG
UGGCA
GCCGC C
GC G
GUAA
UA
CCGG
CGGCCCGA
GU
GG
UG
GC
CA
CUAUU
AU
UG G
GC
CUA
AA
GC
GGCCGUA
GCCGGGCCCGUA
AGUCCCUGGCGA
AAUCCCACGGCU
CA A C C G U G G G
G C UC G C U G G G G A
U AC U G C G G G C C
UUG
GGACCGGGA
GA
GGCCGGG
GGUACCCCCGGGGUA
GGGGUGA
A A U C CU
AU A
AUC C C G G G G G G A C C
GC C
AG
UGG C G
AA
GGCGCCCGGCUG
GAACGGGUCCGAC
GGUGAG
GGCC GA
AG
GCCAGGG
GA G
CGAA
CCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC U G
UAAA
GG
AUG C G G G C U A G
GU
GU
CG
GG
CG
AG
CU U C
GAG
CU
CG
CC
CG
GU
GC
CGUAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A GGGGU
GGAGCGUGCGGUUUAAU
UG
GA
UUCAAC
G CC
GGGAA
C C U C ACCGGGGGCG
ACGGCA
GGAUGA
AG
GC
CA
GGC
UGA
A GGU
CU
UG
CC G
GA
CGCGCCGA GA
GG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCAC
UU
A AGU
GUGGU
AA C G A G C
GA G A
CC C G C G C C C C C A G U U G C C
A GU C C C U C C C G
CU
CGGGAGGGAGGCACU
CUGGGGG
G
A
CUGCCGGCG
AUAAGCCGGAGG
AAGGGGCGGG
CGACGGUAGGU C
AGU
AUGCCCC G
AA
ACCCCCGG
GC
UACACGCGCGCUAC A A U
GG
GC
GG
GAC
A A U GG
GUG C
CG
A C CC
C G AAAG
GGG
GAG
GU
AAU
CC
CCU
AAAC
CC
GC
CCUCA
GU
UCGGAUCGCGGGC
UGC
AACUCGCCCGCGU
GAAGC
UG
GAAUCCC
UAGUACCCGCGCGU
CA
UCAUC
GC
GC
GG
CGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
ACUCCACCC
GAGCGGGGCCUAGGUGAGGCCCGAUCUCCU
U CGGGAGGUCGGGUCGAGCCUAGGCUCCGUGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUACGGCUCGAUCACCUCCUAUC
Secondary Structure: small subunit ribosomal RNA
Pyrococcus furiosus(U20163)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermococcales 5.Thermococcaceae 6.Pyrococcus September 2000
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GC
CA
CU
GC
U
AUG
GG
GG
UC
C
GAC
U AAG
CC AUGC
G
A
G U CA A
G G G G G C G U C C C UUC
UGGGACGCCACCGGC
GG
ACGGC
UC
AGU
AACA
CGUCGGU A
ACC
UA
CCCUCGGG
A G G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCC
AU
AGGCCUGGGG
UACUG
GAAGGUCCCCAGGCC
GA
AAGGGAGCCG
UA
A G G C C CG
CCCGAGG
AU
GC
GGCGGCGGC
CGA
UU
AG
GUA
GUU
GG
UG
GGG
UAACG
G CC
CA
CC
AA
GC
CGAA G A
UC
GG
UA
CGGGCCGUG
AG A
GC GGG A
GC C C G G
AG
AUGGACA
CUGAGACA C G
G G U C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
AA
A C C G C G A C G G G GGG
AC
CC
CC
AGUGCCGUGCCUCU G G C A C G G C
UU
UU
CC
GG
AG
UGUAA A A
AG
CU
CC
GG
GA
AU
AA GGGCUGGGC
AA
G GCCGG
UGGC
AGC
CGC C
GC
GGUAA
UA
CCGG
CGGCCCGA
GU
GG
UG
GC
CA
CUAUU
AU
UG G
GC
CUA
AA
GC
GGCCGUA
GCCGGGCCCGUA
AGUCCCUGGCGA
AAUCCCACGGCU
CA A C C G U G G G
G C UC G C U G G G G A
U AC U G C G G G C C
UUG
GGACCGGGA
GA
GGCCGGG
GGUACCCCCGGGGUA
GGGGUGA
A A U C CU
AU A
AUC C C G G G G G G A C C
GC C
AG
UGG C G
AA
GGCGCCCGGCUG
GAACGGGUCCGAC
GGUGAG
CGCGGA
AG
GCCAGGG
GA G
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC U G
UAAA
GG
AUG C G G G C U A G
GU
GU
CG
GG
CG
AG
CU U C
GAG
CU
CG
CC
CG
GU
GC
CGUAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A GGGGU
GGAGCGUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GGAA
C C U C ACCGGGGGCG
ACGGCAGGA
UG
AA
GG
CC
AG
GCU
GA
A GGU
CU
UG
CC G G
AC
GCGCCGA GA
GG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCAC
UU
A AGU
GUGGU
AA C G A G C
GA G A
CC C G C G C C C C C A G U U G C C
A GU C C C U C C C G
CU
CGGGAGGAGGCACU
CUGGGGG
G
A
CUGCCGGCG
AUAAGCCGGAGG
AAGGGGCGGG
CGACGGUAGGU C
AGU
AUGCCCC G
AA
ACCCCCGG
GC
UAC
ACGCGCGCUAC A A U
GG
GC
GG
GAC
A A U GG
GAC C
CG
A C CU
C G AAAG
GGG
AAG
GG
AAU
CC
CC
UAAAC
CC
GC
CCUCA
GU
UCGGAUCGCGGGC
UGC
AACUCGCCCGCGU
GAAGC
UG
GAAUCCC
UAGUACCCGCGUGU
CA
UCAUC
GC
GC
GG
CGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
ACUCCACCC
GAGCGGAGUCCGGGUGAGGCCUGAUCUCCU
U CGGGAGGUCAGGUCGAGCCCGGGCUCCGUGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A A C C G U A G G GGA
ACCUACGGCUCGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Pyrococcus horikoshii(AP000001)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermococcales 5.Thermococcaceae 6.Pyrococcus March 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GC
CA
CU
GC
U
AUG
GG
GG
UC
C
GAC
U AAG
CC AUGC
G
A
G U CA A
G G G G G C G U C C C UUC
UGGGACGCCACCGGC
GG
ACGGC
UC
AGU
AACA
CGUCGGU A
ACC
UA
CCCUCGGG
A G G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCC
AU
AG
GCCUGGG
GU
ACUG
GAAGGU
CCCCAGGC
CG
AAAGGGAGCCG
UA
A G G C U C C GCCCGAGG
AU
GG
GCCGGCGGC
CGA
UU
AG
GUA
GUU
GG
UG
GGG
UAACG
G CC
CA
CC
AA
GC
CGAA G A
UC
GG
UA
CGGGCCGUG
AG A
GC GGG A
GC C C G G
AG
AUGGACA
CUGAGACA C G
G G U C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
AA
A C C G C G A C G G G GGG
AC
CC
CC
AGUGCCGUGCCUCU G G C A C G G C
UU
UU
CC
GG
AG
UGUAA A A
AG
CU
CC
GG
GA
AU
AA GGGCUGGGC
AA
G GCCGG
UGGCA
GCCGC C
GC G
GUAA
UA
CCGG
CGGCCCGA
GU
GG
UG
GC
CA
CUAUU
AU
UG G
GC
CUA
AA
GC
GGCCGUA
GCCGGGCCCGUA
AGUCCCUGGCGA
AAUCCCACGGCU
CA A C C G U G G G
G C UC G C U G G G G A
U AC U G C G G G C C
UUG
GGACCGGGA
GA
GGCCGGG
GGUACCCCCGGGGUA
GGGGUGA
A A U C CU
AU A
AUC C C G G G G G G A C C
GC C
AG
UGG C G
AA
GGCGCCCGGCUG
GAACGGGUCCGAC
GGUGAG
GGCC GA
AG
GCCAGGG
GAG
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC U G
UAAA
GG
AUG C G G G C U A G
GU
GU
CG
GG
CG
AG
CU U C
GAG
CU
CG
CC
CG
GU
GC
CGUAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A GGGGU
GGAGCGUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GGAA
C C U C ACCGGGGGCG
ACGGCA
GGAUG
AAG
GC
CA
GGC
UGA
A GGU
CU
UG
CC G
GA
C GCGCCGA GA
GG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCAC
UU
A AGU
GUGGU
AA C G A G C
GA G A
CC C G C G C C C C C A G U U G C C
A GU C C C U C C C G
CU
CGGGAGGGAGGCACU
CUGGGGG
G
A
CUGCCGGCG
AUAAGCCGGAGG
AAGGGGCGGG
CGACGGUAGGU C
AGU
AUGCCCC G
AA
ACCCCCGG
GC
UAC
ACGCGCGCUAC A A U
GG
GC
GG
GAC
A A U GG
GUG C
CG
A C CC
C G AAAG
GGG
GAG
GU
AAU
CC
CC
UAAAC
CC
GC
CCUCA
GU
UCGGAUCGCGGGC
UGC
AACUCGCCCGCGU
GAAGC
UG
GAAUCCC
UAGUACCCGCGCGU
CA
UCAUC
GC
GC
GG
CGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
ACUCCACCC
GAGCGGGGCCUAGGUGAGGCCCGAUCUCCU
U CGGGAGGUCGGGUCGAGCCUAGGCUCCGUGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUACGGCUCGAUCACCUCCUAUC
Secondary Structure: small subunit ribosomal RNA
Pyrodictium occultum(M21087)1.cellular organisms 2.Archaea 3.Crenarchaeota 4.Desulfurococcales 5.Pyrodictiaceae 6.Pyrodictium February 2001
5’
3’
Citation and related information available at http://www.rna.ccbb.utexas.edu
ACUC C G G U
U GAUCCUGCCGGGC
CC
GA
CC
GC
U
AUC
GG
GG
UG
G
GAC
U AAG
CC AUGG
G
A
G U CG U
G C G C C C C C G GAC
GCGGGGGCGCGGC
GG
ACGGC
UG
AGU
AACA
CGUGGCC A
ACC
UA
CCCUCGGG
A C G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCG
AU
AG
GCGAGGG
GG
CC
UG
GAACG
GG
UCCCUCGC
CG
AAAGGGCGCGCGGAG
CCU
C C C C G C G C G C C GCCCGAGG
AU
GG
GGCUGCGGC
CCA
UC
AG
GUA
GUU
GG
CG
GGG
UAACG
G CC
CG
CC
AA
GC
CGAU A A
CG
GG
UA
GGGGCCGUG
AG A
GC GGG A
GN C C C C
AG
AUGGGCA
CUGAGACA A G
G G C C C AGG
CC
CUA
C GG
GG
CG C A C
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
CA
A C C G U G A C G G G GUC
AC
CC
CG
AGUGCCGCCGAUA A G G C G G C
UG
UU
CC
CC
GC
UGUAG G A
AG
GC
GG
GG
GA
GU
AA GCGGGGGGC
AA
G UCUGG
UGUCA
GCCGC C
GC G
GUAA
UA
CCAG
CCCCGCGA
GC
GG
UC
GG
GA
UGAUU
AC
UG G
GC
CUA
AA
GC
GCCCGUA
GCCGGCCCGGUA
AGUCCCCCCCUA
AAGCCCCGGGCU
CA A C C C G G G G
A GU G G G G G G G A
U AC U G C C G G G C
UAG
GGGGCGGGA
GA
GGCCGAG
GGUACUCCCGGGGUA
GGGGCGA
A A U C CG
AU A
AUC C C G G G A G G A C C
AC C
AG
UGG C G
AA
GGCGCUCGGCUG
GAACGC GCCCGAC
GGUGAG
GGGC GA
AA
GCCGGGG
GA G
CAAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCCGGC U G
UAAA
CG
AUG C G G G C U A G
GU
GU
UG
GG
CG
GG
CU U G
GAG
CC
CG
CC
CA
GU
GC
CGCAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G N A G C ACC
A C A A GGGGU
GGAGCCUGCGGC
UUAAU
UG
GA
GUCAAC G C
CG
GGAA
U C U U ACCGGGGGCG
ACAGCA
GGAUG
ACG
GC
CA
GGC
UGA
C GAC
CU
UG
CC C
GA
CA
CGCUGA GA
GG
AGGUGC
A UG
GCCGUCGCCA
GCUCGUGCCG
UGAGGU
G
U
CCGG
UU
A AGU
CCGGC
AA C G A G C
GA G A
CC C C C A C C C C U A G U U G C U
A CC C C G G G G C
CACCCCGGGGG
CACACUAGGGG
G
A
CUGCCGNCG
UCC
AAGGCGGAGG
AAGGAGGGGG
CCACGGCAGGU C
AGC
AUGCCCC G
AA
UCCCCCGG
GC
UGC
ACGCGGGCUAC A A U
GG
CG
GG
GAC
A G C GG
GAU C
CG
A C CC
C G AAAG
GGG
GAG
GC
AAU
CC
CUU
AAAC
CC
CG
CCGCA
GU
UGGGAUCGAGGGC
UGC
AACUCGCCCUCGU
GAACG
CG
GAAUCCC
UAGUAACCGCGCGU
UA
GCAUC
GC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
GCUCCACCC
GAGCGGGGGAGGGGUGAGGCCCCGUCC
CU
C G CCA
GGGCGGGGUCGAACCUCUCCCCCGCGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Picrophilus torridus DSM 9790(AE017261)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermoplasmata 5.Thermoplasmatales 6.Picrophilaceae 7.Picrophilus 8.Picrophilus torridus July 2004
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
ACUC C G G U
U GAUCCUGCCGGCG
GC
CA
CU
GC
U
AUC
AA
GU
UC
C
GAC
U AAG
CC AUGC
G
A
G U CAA
G G G G C C GUA
AGGCACCGGC
GU
ACCGC
UC
AGU
AACA
CGCGGAU A
AUC
UA
CCCUCGGG
A A G G GC
A UAA C C U C G G G
AA
ACUGAGGCU
AAUU
CCCUAU
AG
CCAUUCA
GAACUG
GAAUGUUUGGAUGG
UG
AAGGCU
CC
G G C GCCCGGGG
AU
GA
GUCUGCGGC
CUA
UC
AG
GUA
GUA
GG
UG
GUG
UAACG
G AC
CA
CC
UA
GC
CUAA G A
CG
GG
UA
CGGGCCCUG
GG A
GGGGU A
GC C C G G
AG
AUGGACUCUG
AGA
C A C UA G U C C A
GG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UGUG
CAAUGCGCG
CA
A G C G C G A C A C G GGG
AG
CU
UG
AGUGUCUUGGCAAA
A G C C A A G A CU
UU
UC
UU
AU
GCC
UAA A A
AG
CA
UA
AG
GA
AU
AA GGGCUGGGU
AA
G A CGGG
UGCCA
GCCGC C
GC G
GUAA
CA
CCCG
CAGCUCAA
GU
GG
UG
GU
CA
CUUUU
AC
UG A
GC
CUA
AA
GC
GUUCGUA
GCCGGCUUUGUA
AAUCUCCAGGUA
AAUUCUAGCGCU
UA A C G U U A G A
U CU C C U G G A G A
G AC U G C A A A G C
UUG
GGACCGGGU
GG
GGUUGAA
CGUACUUUCAGGGUA
GGGGUAA
A A U C CU
GU A
AUC C U G G A A G G A C G
AC C
GG
UAG C G
AA
GGCGUUCAACUA
GAACGGAUCCGAC
GGUGAG
GA A C GA
AG
GCUAGGG
GA G
CAAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUAGC U G
UAAA
CU
CUG C C C A C U U G
GU
GU
UG
CC
UU
UC
CG U U
GAG
GG
GA
GG
CA
GU
GC
CGGAG
CGA
AGG
UGUUA
AGUGGGC
CGCUU
G G GGA G U A
UG G U C G
CA
AGACUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACC
G C A A CGGGA
GGAAUGUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GAAA
A C U C ACCGGGAGCG
ACCUGU
GGAUGA
GA
GU
CA
ACC
UGA
C GAG
UU
UA
CU C
GA
U AGCA GGA GA
GG
UGGUGC
A UG
GCCGUCGUCA
GCUCGUACCG
UAGGGC
G
U
UCAC
UU
A AGU
GUGA U
AA C G A G C
GA G A
CC C C C A U C U C C A A U U G C A
A UU A C C U A C G
CG
GGUAGGUAAGCACU
UUGGAGA
G
A
CCGCCAGCG
CUAAGCUGGAGG
AAGGAGGGGU
CGACGGCAGGUCA
GUACGCCCC G
AA
UCUCCCGG
GC
UAC
ACGCGCAUUAC A A A
GG
GC
AG
GAC
A A U GC
GU
U GCA
A CC
UC
G AAA
GG
GGAAG
CU
AAU
CG
CC
AAAC
CU
GU
CCGUA
GU
UAGGAUUGAGGGC
UGU
AACUCGCCCUCAU
GAAUC
UG
GAUUCCG
UAGUAAUCGCGGGU
CA
ACAAC
CC
GC
GG
UGA
AU
AU
GCCCC
UGCUCCUUGCA
CACACCGCCUG
UC
AAACCAUCC
GAGCUGGUGUUGGAU
GAGGGUUAGCUCG
A GAGGGUUAGCUCAA
AUCUGAUGUCAGUGA
GGAGGGUU
AAG
UCA U
AAC
A AG
G
U A U C U G U A G G GGA
ACCUGCAGAUGNNNNNNNNNNNN
Secondary Structure: small subunit ribosomal RNA
Sulfolobus solfataricus(X03235)1.cellular organisms 2.Archaea 3.Crenarchaeota 4.Sulfolobales 5.Sulfolobaceae 6.Sulfolobus February 2001
5’
3’
Citation and related information available at http://www.rna.ccbb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAC
CC
GA
CC
GC
U
AUC
GG
GG
UA
G
GGA
U AAG
CC AUGG
G
A
G U CU U
A C A C U C C C G GGU
AAGGGAGUGUGGC
GG
ACGGC
UG
AGU
AACA
CGUGGCU A
ACC
UA
CCCUCGGG
A C G G GG A U A
A C C C C G G GAA
ACUGGGGAU
AAUCCCCG
AU
AG
GGAAGGA
GU
CCUG
GAAUGG
UUCCUUCC
CU
AAAGGGCUAUAGGC
UAUUU
C C CG U U U G U A G C C G
CCCGAGG
AU
GG
GGCUACGGC
CCA
UC
AG
GCU
GUC
GG
UG
GGG
UAAAG
G CC
CA
CC
GA
AC
CUAU A A
CG
GG
UA
GGGGCCGUG
GA A
GC GGG A
GC C U C C
AG
UUGGGCA
CUGAGACA A G
G G C C C AGG
CC
CUA
C GG
GG
CG C A C
CAGGC
G
CGAAACG
UCCC
CAAUGCGCG
AA
A G C G U G A G G G C GCU
AC
CC
CG
AGUGCCUCCGCA
A G G A G G CU
UU
UC
CC
CG
CUC
UAA A A
AG
GC
GG
GG
GA
AU
AA GCGGGGGGC
AA
G UCUGG
UGUCA
GCCGC C
GC G
GUAA
UA
CCAG
CUCCGCGA
GU
GG
UC
GG
GG
UGAUU
AC
UG G
GC
CUA
AA
GC
GCCUGUA
GCCGGCCCACCA
AGUCGCCCCUUA
AAGUCCCCGGCU
CA A C C G G G G A
A CU G G G G G C G A
U AC U G G U G G G C
UAG
GGGGCGGGA
GA
GGCGGGG
GGUACUCCCGGAGUA
GGGGCGA
A A U C CU
UA G
AUA C C G G G A G G A C C
AC C
AG
UGG C G
GA
AGCGCCCCGCUA
GAACGC GCCCGAC
GGUGAG
A GGC GA
AA
GCCGGGG
CA G
CAAACGGG
AUU
A G AUAC
CCCG
GUA
GU
CCCGGC U G
UAAA
CG
AUG C G G G C U A G
GU
GU
CG
AG
UA
GG
CU U A
GAG
CC
UA
CU
CG
GU
GC
CGCAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G U C G
CA
AGACUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACC
A C A A GGGGU
GGAACCUGCGGC
UCAAU
UG
GA
GUCAAC G C
CU
GGAA
U C U U ACCGGGGGAG
ACCGCA
GUAUGA
CG
GC
CA
GGC
UAA
C GAC
CU
UG
CC U
GA
C UCGCGGA GA
GG
AGGUGC
A UG
GCCGUCGCCA
GCUCGUGUUG
UGAAAU
G
U
CCGG
UU
A AGU
CCGGC
AA C G A G C
GA G A
CC C C C A C C C C U A G U U G G U
A UU C U G G A C U
CC
GGUCCAGAACCACA
CUAGGGG
G
A
CUGCCGGCG
UAAGCCGGAGG
AAGGAGGGGG
CCACGGCAGGU C
AGC
AUGCCCC G
AA
ACUCCCGG
GC
CGC
ACGCGGGUUAC A A U
GG
CA
GG
GAC
A A C GG
GAU G
CU
A C CU
C G AAAG
GGG
GAG
CC
AAU
CCU
UAAAC
CC
UG
CCGCA
GU
UGGGAUCGAGGGC
UGA
AACCCGCCCUCGU
GAACG
AG
GAAUCCC
UAGUAACCGCGGGU
CA
ACAAC
CC
GC
GG
UGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
GCUCCACCC
GAGCGCGAAAGGGGUGAGGUCCCUUGC
GA U A
AGUGGGGGAUCGAACUCCUUUCCCGCGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCNNN
Secondary Structure: small subunit ribosomal RNA
Thermoplasma acidophilum (AL445067, M38637)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermoplasmales 5.Thermoplasmaceae 6.Thermoplasma July 2004
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
ACUC C G G U
C GAUCCUGCCGGCG
GU
CA
CU
GC
U
AUC
AG
GU
UC
C
GAC
U AAG
CC AUGC
A
A
G U CA U
G G G G C C GUA
AGGCACCGGC
GA
ACAGC
UC
AGU
AACA
CGUGGAU A
AUU
UA
CCCUCAGG
C G G G GC A U A
A C C U C G G GAA
ACUGAGGCU
AAUUCCCC
AU
AG
UCAUUAC
AAAC
UG
GAAUG
GUUGUAAUGA
UG
AAAGCU
UC
U G C ACCUGAGG
AU
AA
GUCUGCGGC
CUA
UC
AG
GUA
GUA
GG
UG
GUG
UAAAG
G AC
CA
CC
UA
GC
CUAA G A
CG
GG
UA
CGGGCCCUG
AA A
GGGGG A
GC C C G G
AG
AUGGACUCUG
AGA
C A A CA G U C C A
GG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAAAC
UGUG
CAAUGCGCG
AA
A G C G C G A C A C G GGG
AG
CC
UG
AGUGCCUUGACUUUU U G U C A A G G C
UU
UU
CU
GA
UG
CCUAA A A
AG
CA
UC
AG
GA
AU
AA GGGCUGGGC
AA
G A CGGG
UGCCA
GCCGC C
GC G
GUAA
CA
CCCG
CAGCUCGA
GU
GG
UG
AU
CA
CUUUU
AU
UG A
GU
CUA
AA
GC
GUUCGUA
ACCGGUCUUAUA
AAUCUUCAGAUA
AAUUCUUCCGCU
UA A C G G A A G A
A CU U C U G A A G A
G AC U G U A A G A C
UUG
GGACCGGGU
GA
GGUUGAA
UGUACUUUCAGGGUA
GGGGUAA
A A U C CU
GU A
AUC C U G A A A G G A C G
AC C
GG
UGG C G
AA
AGCGUUCAACUA
GAACGGAUCCGAC
GGUGAG
GGA C GA
AG
GCUAGGG
GA G
CAAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUAGC U G
UAAA
CG
CUG C C C A C U U G
GU
GU
UG
CU
UC
UC
CG U U
GAG
GG
GG
GG
CA
GU
GC
CGUAG
CGA
AGG
UGUUA
AGUGGGU
CACUU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACC
G C A A CGGGA
GGAGCGUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GAAA
A C U C ACCGGGAGCG
ACCUUC
GGAUGA
GG
GU
CA
ACC
UGA
C GAA
UU
UA
CC C
GA
U AGA A GGA GA
GG
UGGUGC
A UG
GCCGUCGUCA
GCUCGUACCG
UAGGGC
G
U
UCAC
UU
A AGU
GUGA U
AA C G A G C
AA G A
CC C C C A U C U C U A A U U G C C
A GA C U G U C U U
UG
CGGGCAGAGGCACU
UUAGAGG
G
A
CCGCCAGCG
CUAAGCUGGAGG
AAGGAGGGGU
CGACGGCAGGUCA
GUACGCCCC G
AA
UCUCCCGG
GC
UAC
ACGCGCGCUAC A A A
GG
AC
GG
GAC
A A U GA
GUU G
CA
A CC
UC
G AAAG
GG
GA
AGCU
AAC
CU
CG
AAAC
CC
GU
UCGUA
GU
CAGGACUGAGGGC
UGU
AACUCGCCCUCAC
GAAUG
UG
GAUUCCG
UAGUAAUCGUAGGU
CA
ACAGC
CU
AC
GG
UGA
AU
AU
GCCCC
UGCUCCUUGCA
CACACCGCCCG
UC
AAACCAUCC
GAGCUGGUGUUGGAU
GAGGGUCCGUCCU C U
GGAUGGAUUCGA
AUCUGAUGUCAGUGA
GGAGGGUU
AAG
UCGU
AAC
A AG
G
U A U C C G U A G G GGA
ACCUGCGGAUGGAUCACCUCCNN
Secondary Structure: small subunit ribosomal RNA
Thermococcus celer(M21529)1.cellular organisms 2.Archaea 3.Euryarchaeota 4.Thermococcales 5.Thermococcaceae 6.Thermococcus July 2004
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AUUC C G G U
U GAUCCUGCCGGAG
GC
CA
CU
GC
U
AUG
GG
GG
UC
C
GAC
U AAG
CC AUGC
G
A
G U CA U
G G G G C G C C UU
GCGCGCACCGGC
GG
ACGGC
UC
AGU
AACA
CGUCGGU A
ACC
UA
CCCUCGGG
A G G G GG A U A
A C C C C G G GAA
ACUGGGGCU
AAUCCCCC
AU
AG
GCCUGAGG
UACUG
GAAGGUCCUCAGGC
CG
AAAGGGGCU
CU G C C C G
CCCGAGG
AU
GG
GCCGGCGGC
CGA
UU
AG
GUA
GUU
GG
UG
GGG
UAACG
G CC
CA
CC
AA
GC
CGAA G A
UC
GG
UA
CGGGCCAUG
AG A
GU GGG A
GC C C G G
AG
AUGGACA
CUGAGACA C G
G G U C C AGG
CC
CUA
C GG
GG
CG C A G
CAGGC
G
CGAAACC
UCCG
CAAUGCGGG
CA
A C C G C G A C G G G GGG
AC
CC
CC
AGUGCCGUGGCAAC G C C A C G G C
UU
UU
CC
GG
AG
UGUAA A A
AG
CU
CC
GG
GA
AU
AA GGGCUGGGC
AA
G GCCGG
UGGCA
GCCGC C
GC G
GUAA
UA
CCGG
CGGCCCGA
GU
GG
UG
GC
CG
CUAUU
AU
UG G
GC
CUA
AA
GC
GUCCGUA
GCCGGGCCCGUA
AGUCCCUGGCGA
AAUCCCACGGCU
CA A C C G U G G G
G C UU G C U G G G G A
U AC U G C G G G C C
UUG
GGACCGGGA
GA
GGCCGGG
GGUACCCCUGGGGUA
GGGGUGA
A A U C CU
AU A
AUC C C A G G G G G A C C
GC C
AG
UGG C G
AA
GGCGCCCGGCUG
GAACGGGUCCGAC
GGUGAG
GGA C GA
AG
GCCAGGG
GA G
CGAACCGG
AUU
A G AUAC
CCGG
GUA
GU
CCUGGC U G
UAAA
GG
AUG C G G G C U A G
GU
GU
CG
GG
CG
AG
CU U C
GAG
CU
CG
CC
CG
GU
GC
CGAAG
GGA
AGC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G A G C ACU
A C A A GGGGU
GGAGCGUGCGGU
UUAAU
UG
GA
UUCAAC G C
CG
GGAA
C C U C ACCGGGGGCG
ACGGCA
GGAUGA
AG
GC
CA
GGC
UGA
A GGU
CU
UG
CC G
GA
CACGCCGA GA
GG
AGGUGC
A UG
GCCGCCGUCA
GCUCGUACCG
UGAGGC
G
U
CCAC
UU
A AGU
GUGGU
AA C G A G C
GA G A
CC C G C G C C C C C A G U U G C C
A GU C C U U C C C G
CUGGGGAGGAGG
CACUCUGGGGG
G
A
CCGCCGGCG
AUAAGCCGGAGG
AAGGAGCGGG
CGACGGUAGGU C
AGU
AUGCCCC G
AA
ACCCCCGG
GC
UAC
ACGCGCGCUAC A A U
GG
GC
GG
GAC
A A U GG
GAU C
CG
A CC
CC
G AAA
GG
GGA
AGGG
AAU
CC
CC
UAAAC
CC
GC
CCCCA
GU
UCGGAUCGCGGGC
UGC
AACUCGCCCGCGU
GAAGC
UG
GAAUCCC
UAGUACCCGCGUGU
CA
UCAUC
GC
GC
GG
CGA
AU
AC
GUCCC
UGCUCCUUGCA
CACACCGCCCG
UC
ACUCCACCC
GAGCGGGGUCCGGAU
GAGGCCUGAUCUCCCU
U CGGGGAGGUCGGGUCGA
GUCUGGGCUCCGUGA
GGGGGGAG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUACGGCUCGAUCACCUCCUN
Secondary Structure: small subunit ribosomal RNA
Thermoproteus tenax(M35966)1.cellular organisms 2.Archaea 3.Crenarchaeota 4.Thermoproteales 5.Thermoproteaceae 6.Thermoproteus February 2001
5’
3’
Citation and related information available at http://www.rna.icmb.utexas.edu
AAAC C G G U
U GAUCCUGCCGGAC
CU
GA
CC
GC
U
AUC
GG
GG
UG
G
GGC
U AAG
CC AUGC
G
A
G U CG C
G C G C C C G G GGC
GCCGGGCGCGGC
GC
ACGGC
UC
AGU
AACA
CGUACCC A
ACC
UA
ACCUCGGG
A G G G GG A C A
A C C C C G G GAA
ACUGGGGCU
GAUCCCCC
AU
AG
GGGAAGG
GC
GCUG
GAAGGC
CCCUUCCU
CC
AAAGGGAUCGCGGGCG
AU
C U C C C G C G G U C C GCCCGAGG
GU
GG
GGGUACGGC
CCA
UC
AG
GUU
GUU
GG
CG
GGG
UAACG
G CC
CG
CC
AA
GC
CGAA G A
CG
GG
UA
GGGGCGGUG
AG A
GC C GU GA
GC C C C G
AG
AUGGGCA
CUGAGACA A G
G G C C C AGG
CC
CUA
C GG
GG
UG C A G
CAGGC
G
CGAAUAC
UCCG
CAAUGCGGG
CA
A C C G C G A C G G G GCC
AC
CC
CG
AGUGCCGGGCGAAGA G C C C G G C
UU
UU
GC
CC
GG
UGUAA G G
AG
CC
GG
GC
GA
AU
AA GCGGGGGGU
AA
G UCUGG
UGUCA
GCCGC C
GC G
GUAA
UA
CCAG
CCCCGCGA
GU
GG
UC
AG
GG
UGAUU
AC
UG G
GC
UUA
AA
GC
GCCCGUA
GCCGGCCCGGCA
AGUCGCUCCUGA
AAUCCCCAGGCU
CA A C C U G G G G
G CA G G G G G C G A
U AC U G C C G G G C
UAG
GGGGCGGGA
GA
GGCCGCC
GGUACUCCGGGGGUA
GGGGCGA
A A U C CU
AU A
AUC C C C G G A G G A C C
AC C
AG
UGG C G
AA
AGCGGGCGGCCA
GAACGC GCCCGAC
GGUGAG
GGGC GA
AA
GCCGGGG
GA G
CAAAGGGG
AUU
A G AUAC
CCCU
GUA
GU
CCCGGC C G
UAAA
CG
AUG C G G G C U A G
CU
GU
CG
GC
CG
GG
CU U A
GGG
CC
CG
GC
CG
GU
GG
CGUAG
GGA
AAC
CGUUA
AGCCCGC
CGCCU
G G GGA G U A
CG G C C G
CA
AGGCUGAAA
CUUAA
A G G A A U U G G C GG
G G G G G C ACC
A C A A GGGGU
GAAGCUUGCGGC
UUAAU
UG
GA
GUCAAC G C
CG
GAAA
C C U U ACCCGGGGCG
ACAGCA
GGAUGA
AG
GC
CA
GGC
UAA
C GAC
CU
UG
CC G
GA
C GAGCUGA GA
GG
AGGUGC
A UG
GCCGUCGUCA
GCUCGUGCCG
UGAGGU
G
U
CCGG
UU
A AGU
CCGGC
AA C G A G C
GA G A
CC C C C A C C C C U A G U U G C U
A CC C C G C U C U
UC
GGGGCGGGGGCACA
CUAGGGG
G
A
CUGCCGGCG
UAAGCCGGAGG
AAGGAGGGGG
CGACGGCAGGU C
AGU
AUGCCCC G
AA
ACCCCGGG
GC
UGC
ACGCGAGCUGC A A U
GG
CG
GG
GAC
A G C GG
GAU C
CG
A C CC
C G AAAG
GGG
GAG
GC
AAU
CC
CG
UAAAC
CC
CG
CCCCA
GU
AGGGAUCGAGGGC
UGC
AACUCGCCCUCGU
GAACG
UG
GAAUCCC
UAGUAACCGCGUGU
CA
CCAAC
GC
GC
GG
UGA
AU
AC
GUCCC
UGCCCCUUGCA
CACACCGCCCG
UC
GCACCACCC
GAGGGAGUUCUCUGCGAGGCCCCUCGCUUGGGG
C AACCCAGGUGGGGGGACGAGCAGAGAACUCCCGA
GGGGGGUG
AAG
UCGU
AAC
A AG
G
U A G C C G U A G G GGA
ACCUGCGGCUGGAUCACCUCCNN
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Phil Hugenholtz’s manually created mask imposed on archaeal SSUblack: included in alignment (1257) pink: excluded from alignment (251)
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Posterior probability based archaeal SSU maskblack: included in alignment (1376)blue: excluded from alignment (132)
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Phil Hugenholtz’s manually created mask imposed on archaeal SSUblack: included in alignment (1257) pink: excluded from alignment (251)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Posterior probability based archaeal SSU maskblack: included in alignment (1376)blue: excluded from alignment (132)
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Comparison of manual and posterior-probability-based masksblack: included in both alignment (1216)
pink: excluded only from manual mask (160)blue: excluded only from PP mask (41)grey: excluded from both masks (91)
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
Posterior probability based archaeal SSU maskblack: included in alignment (1376)blue: excluded from alignment (132)
Probability-based mask colored by deletion frequencygrey: zero to very few deletions (1370/1376)blue: few deletions ...... red: many deletions
circles indicate excluded positions
5ʼ
3ʼ
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
created by the SSU-ALIGN packagestructure diagram derived from CRW database (http://www.rna.ccbb.utexas.edu/)
Automated masking removes the majority of alignment errors
alignment timeaccuracy (sec/seq)
Muscle-3.8.31∗ 95.4% 0.49
HMMER3 (HMMs) 96.8% 0.04
Infernal 1.1 (CMs) 98.1% 0.50
Infernal 1.1 (CMs)posterior probability masked 99.5% 0.50
(1302/1530 columns)
Infernal produces alignment that are
very similar to manually refined alignments.
SSU-ALIGN: structural alignment of SSU rRNAs using CMs
known tree plusnew sequences
new sequences aligned
new SSU sequences
SSU-ALIGN
tree buildingprogram
tree buildingprogram
existing, known referenceSSU alignment and tree- accurate: because alignment errors confound
Goals of the alignment program:- accurate: because alignment errors confound phylogenetic inference- scalable to handle up to millions of seqs and fast:
SSU-ALIGNIncludes Infernal CMs for archaeal, bacterialand eukaryotic SSU rRNA accurate:- structural alignment of sequences- probabilistic masking of ambiguous columns
scalable and fast:- can generate alignment of millions of seqs- speed is about 1 second/full length sequence- easily parallelized on clusters SSU-ALIGN tutorial tomorrow at 10:30
Acknowledgements
Sean EddyElena RivasTravis WheelerTom JonesDiana KolbeSeolkyoung JungRob FinnJody ClementsFred DavisLee HenryMichael Farrar
Top Related