Next Generation Sequencing Technologies in Microbial Ecology
Transcript of Next Generation Sequencing Technologies in Microbial Ecology
1
Next Generation Sequencing Technologies in
Microbial Ecology
Frank Oliver Glöckner
2
Max Planck Institute for Marine Microbiology
Founded 1992 in Bremen, Germany
Investigation of the role, diversity and features of microorganisms
Interactions with physical and chemical processes in marine and other aquatic habitats
3
Marine Microbiology at MPI – a Holistic Approach
Who is out there and
How much of which kind?
What are they doing and
Under which conditions are they doing what?
4
Promise of NGS: Much Denser Network of Data
Phylogenetic diversity• Qualitative data
• Quantitative data
Functional diversity• Functional inventory
• Operon structures
• Expression profiles
Environmental descriptors
-> Integrated datasets
Organisms
Genes
Environment
x, y, z, t
5
Data Integration
www.megx.net Kottmann et al., NAR, submitted
6
Ecological Genomics – The Vision
Key parametersStatistics
Modelling Predictions
Ecosystems Biology
7
Ribosomal RNA as a universal marker gene
sample
extracted nucleic acidsDNA rRNA
rDNA clonesnucleic acid probe
rDNAsequences
rDNAdataset
comparative analysis
hybridization phylogeny
Full cycle rRNA-approach
Pyrosequencing
Amann, 1995
8
Diversity Analysis
Clone lib
PCR
100-500
High diversity
2-3 month
Sample
Pedros-Alio, Trends in Microbiology, 2006, vol. 12, issue 6, page 257
9
Diversity Analysis
Clone lib
PCR
Tags
100-500
10,000-50,000
High diversity
2-3 month
1 week
Sample
10
Problems
Processing the data
Accuracy/Quantitative?
• DNA/RNA extraction
• Multiple Operons
• Technical replicates
• Noise (sequencing errors…)
11
SILVA Databases Specifications
Comprehensive & Aligned
• Bacteria, Archaea, Eukarya
• SSU, LSU
• Regularly updated
Quality first
• Quality management
• Transparent process documentation
Integrative
• Nomenclature
• Taxonomy
• Cultured, Typestrains
• Habitat (r100)
12
Growth of rRNA databases (RDP & SILVA)Growth of SSU ribosomal RNA databases (RDP II & SILVA)
www.arb-silva.de
473 1379 2251 2849 2849 4332 6205 7322 16277 1627760274
83960 101781
194696
286257
504295
756668
995747
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
SILVA 100
rRN
A S
eque
nces
Pruesse et al. NAR 2007, vol. 35, no 21, page 7188
Comprehensive ribosomal RNA databasesComprehensive ribosomal RNA databases
www.arb-silva.de
13
SILVA SSURef 100: Fully classified guide tree
www.arb-silva.de
14
ARB Software Suite
A Software Environment for Sequence Data
www.arb-home.de
Ludwig et al. NAR, 2004
ARB 5.0, 64 bit version released on 04. September 2009
15
Problems
Processing the data
Accuracy/Quantitative?
• DNA/RNA extraction
• Multiple Operons
• Noise (sequencing errors…)
• Technical replicates
16
Accuracy
The ‘rare biosphere’: a reality checkReeder and Knight, 2009, Nature Methods vol. 6, no. 9, p. 636
17
SILVA SSUParc 100, Sequence Length Distributionwww.arb-silva.de
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
0 100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
Length (bases)
rRN
A S
eque
nces
Comprehensive ribosomal RNA databasesComprehensive ribosomal RNA databases
18
Technical Replicates I
Helgoland Sample 11.02.2009, 454 Ti 1/2 PTP
Gomez-Alvarez et al., 2009, ISME Journal, pages 1-4
19
Technical Replicates II
Helgoland Sample 14.04.2009, 454 Ti 1/2 PTP
20
Technical Replicates III - Dereplication
0,0%
5,0%
10,0%
15,0%
20,0%
25,0%
Acidobacteria
Actinobacteria
Am
oeba
Bacteroidetes
BD1-5
Betaproteobacteria
Candidate division B
RC
1
Candidate division O
D1
Candidate division O
P11
Candidate division O
P3
Candidate division O
P8
Candidate division S
R1
Candidate division TM
6
Candidate division W
S1
Candidate division W
S3
Candidate division W
S6
Chlorobi
Chloroflexi
Crenarchaeota
Cyanobacteria
Deferribacteres
Deltaproteobacteria
Epsilonproteobacteria
Euglenozoa
Euryarchaeota
Firmicutes
Fungi
Fusobacteria
Gam
maproteobacteria
Gam
maproteobacteria_1
Gam
maproteobacteria_2
Gam
maproteobacteria_3
Gam
maproteobacteria_4
Gem
matim
onadetes
Lentisphaerae
Metazoa
ML635J-21
Planctomycetes
Rhodophyta et al.
SH
A-109
Spirochaetes
TA06
Viridiplantae
Unclassified
SSU Dereplicated
21
A Bioinformatic Workbench for Ecological Genomics
Organisms
Genes
Environment
A Software Environment for Sequence Data
Comprehensive ribosomal RNA databases
22
Functional Diversity Analysis
Fosmids
40,000
2-3 month
Random sequencing
End sequencing
10-100 ~ 400-4000 ORFs
20,000 ‘ORFs’
NGS1000
40,000 ORFs
DNA
High diversity
Sample
23
Fosmid Sequencing
Fosmids: 42PTP: 1/4, 454 Timin.: 101 bpmax.: 48,265 bpmean: 8,085 bpCosts: 3000 Euro
48 contigs > 10 kb
24
Functional Diversity Analysis
High diversity
SampleTags
500,000 - 1 Mio1 week
StatisticsBLASTCOGs
ClassifyAssembly?
DNA
25
Assembly
26
Are we prepared for the Data Flood?
27
Computing Infrastructure
Storage
- 8 TByte RAID file server
- 4 TByte RAID database server
Computers
- 43 cluster nodes
- several larger servers
- 400 CPU cores
Cooling
- Liquid coolingwith 5,000 L/h at 8°C
Power
- 25 - 30 kW at peak
- installed: 40 kVA
28
Moore’s Law - Outcompeted
29
Take Home Message for the Next Generation Biologists
Three languages!
• Mother tongue
• English
• Perl, Phyton…
• Data managementGarbage in -> garbage out!
• Standardisation
www.gensc.org
30
The Group http://www.microbial-genomics.de
Thanks for your attention