Next Generation Sequencing Technologies in Microbial Ecology

30
1 Next Generation Sequencing Technologies in Microbial Ecology Frank Oliver Glöckner

Transcript of Next Generation Sequencing Technologies in Microbial Ecology

Page 1: Next Generation Sequencing Technologies in Microbial Ecology

1

Next Generation Sequencing Technologies in

Microbial Ecology

Frank Oliver Glöckner

Page 2: Next Generation Sequencing Technologies in Microbial Ecology

2

Max Planck Institute for Marine Microbiology

Founded 1992 in Bremen, Germany

Investigation of the role, diversity and features of microorganisms

Interactions with physical and chemical processes in marine and other aquatic habitats

Page 3: Next Generation Sequencing Technologies in Microbial Ecology

3

Marine Microbiology at MPI – a Holistic Approach

Who is out there and

How much of which kind?

What are they doing and

Under which conditions are they doing what?

Page 4: Next Generation Sequencing Technologies in Microbial Ecology

4

Promise of NGS: Much Denser Network of Data

Phylogenetic diversity• Qualitative data

• Quantitative data

Functional diversity• Functional inventory

• Operon structures

• Expression profiles

Environmental descriptors

-> Integrated datasets

Organisms

Genes

Environment

x, y, z, t

Page 5: Next Generation Sequencing Technologies in Microbial Ecology

5

Data Integration

www.megx.net Kottmann et al., NAR, submitted

Page 6: Next Generation Sequencing Technologies in Microbial Ecology

6

Ecological Genomics – The Vision

Key parametersStatistics

Modelling Predictions

Ecosystems Biology

Page 7: Next Generation Sequencing Technologies in Microbial Ecology

7

Ribosomal RNA as a universal marker gene

sample

extracted nucleic acidsDNA rRNA

rDNA clonesnucleic acid probe

rDNAsequences

rDNAdataset

comparative analysis

hybridization phylogeny

Full cycle rRNA-approach

Pyrosequencing

Amann, 1995

Page 8: Next Generation Sequencing Technologies in Microbial Ecology

8

Diversity Analysis

Clone lib

PCR

100-500

High diversity

2-3 month

Sample

Pedros-Alio, Trends in Microbiology, 2006, vol. 12, issue 6, page 257

Page 9: Next Generation Sequencing Technologies in Microbial Ecology

9

Diversity Analysis

Clone lib

PCR

Tags

100-500

10,000-50,000

High diversity

2-3 month

1 week

Sample

Page 10: Next Generation Sequencing Technologies in Microbial Ecology

10

Problems

Processing the data

Accuracy/Quantitative?

• DNA/RNA extraction

• Multiple Operons

• Technical replicates

• Noise (sequencing errors…)

Page 11: Next Generation Sequencing Technologies in Microbial Ecology

11

SILVA Databases Specifications

Comprehensive & Aligned

• Bacteria, Archaea, Eukarya

• SSU, LSU

• Regularly updated

Quality first

• Quality management

• Transparent process documentation

Integrative

• Nomenclature

• Taxonomy

• Cultured, Typestrains

• Habitat (r100)

Page 12: Next Generation Sequencing Technologies in Microbial Ecology

12

Growth of rRNA databases (RDP & SILVA)Growth of SSU ribosomal RNA databases (RDP II & SILVA)

www.arb-silva.de

473 1379 2251 2849 2849 4332 6205 7322 16277 1627760274

83960 101781

194696

286257

504295

756668

995747

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

1992

1993

1994

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

SILVA 100

rRN

A S

eque

nces

Pruesse et al. NAR 2007, vol. 35, no 21, page 7188

Comprehensive ribosomal RNA databasesComprehensive ribosomal RNA databases

www.arb-silva.de

Page 13: Next Generation Sequencing Technologies in Microbial Ecology

13

SILVA SSURef 100: Fully classified guide tree

www.arb-silva.de

Page 14: Next Generation Sequencing Technologies in Microbial Ecology

14

ARB Software Suite

A Software Environment for Sequence Data

www.arb-home.de

Ludwig et al. NAR, 2004

ARB 5.0, 64 bit version released on 04. September 2009

Page 15: Next Generation Sequencing Technologies in Microbial Ecology

15

Problems

Processing the data

Accuracy/Quantitative?

• DNA/RNA extraction

• Multiple Operons

• Noise (sequencing errors…)

• Technical replicates

Page 16: Next Generation Sequencing Technologies in Microbial Ecology

16

Accuracy

The ‘rare biosphere’: a reality checkReeder and Knight, 2009, Nature Methods vol. 6, no. 9, p. 636

Page 17: Next Generation Sequencing Technologies in Microbial Ecology

17

SILVA SSUParc 100, Sequence Length Distributionwww.arb-silva.de

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

0 100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

Length (bases)

rRN

A S

eque

nces

Comprehensive ribosomal RNA databasesComprehensive ribosomal RNA databases

Page 18: Next Generation Sequencing Technologies in Microbial Ecology

18

Technical Replicates I

Helgoland Sample 11.02.2009, 454 Ti 1/2 PTP

Gomez-Alvarez et al., 2009, ISME Journal, pages 1-4

Page 19: Next Generation Sequencing Technologies in Microbial Ecology

19

Technical Replicates II

Helgoland Sample 14.04.2009, 454 Ti 1/2 PTP

Page 20: Next Generation Sequencing Technologies in Microbial Ecology

20

Technical Replicates III - Dereplication

0,0%

5,0%

10,0%

15,0%

20,0%

25,0%

Acidobacteria

Actinobacteria

Am

oeba

Bacteroidetes

BD1-5

Betaproteobacteria

Candidate division B

RC

1

Candidate division O

D1

Candidate division O

P11

Candidate division O

P3

Candidate division O

P8

Candidate division S

R1

Candidate division TM

6

Candidate division W

S1

Candidate division W

S3

Candidate division W

S6

Chlorobi

Chloroflexi

Crenarchaeota

Cyanobacteria

Deferribacteres

Deltaproteobacteria

Epsilonproteobacteria

Euglenozoa

Euryarchaeota

Firmicutes

Fungi

Fusobacteria

Gam

maproteobacteria

Gam

maproteobacteria_1

Gam

maproteobacteria_2

Gam

maproteobacteria_3

Gam

maproteobacteria_4

Gem

matim

onadetes

Lentisphaerae

Metazoa

ML635J-21

Planctomycetes

Rhodophyta et al.

SH

A-109

Spirochaetes

TA06

Viridiplantae

Unclassified

SSU Dereplicated

Page 21: Next Generation Sequencing Technologies in Microbial Ecology

21

A Bioinformatic Workbench for Ecological Genomics

Organisms

Genes

Environment

A Software Environment for Sequence Data

Comprehensive ribosomal RNA databases

Page 22: Next Generation Sequencing Technologies in Microbial Ecology

22

Functional Diversity Analysis

Fosmids

40,000

2-3 month

Random sequencing

End sequencing

10-100 ~ 400-4000 ORFs

20,000 ‘ORFs’

NGS1000

40,000 ORFs

DNA

High diversity

Sample

Page 23: Next Generation Sequencing Technologies in Microbial Ecology

23

Fosmid Sequencing

Fosmids: 42PTP: 1/4, 454 Timin.: 101 bpmax.: 48,265 bpmean: 8,085 bpCosts: 3000 Euro

48 contigs > 10 kb

Page 24: Next Generation Sequencing Technologies in Microbial Ecology

24

Functional Diversity Analysis

High diversity

SampleTags

500,000 - 1 Mio1 week

StatisticsBLASTCOGs

ClassifyAssembly?

DNA

Page 25: Next Generation Sequencing Technologies in Microbial Ecology

25

Assembly

Page 26: Next Generation Sequencing Technologies in Microbial Ecology

26

Are we prepared for the Data Flood?

Page 27: Next Generation Sequencing Technologies in Microbial Ecology

27

Computing Infrastructure

Storage

- 8 TByte RAID file server

- 4 TByte RAID database server

Computers

- 43 cluster nodes

- several larger servers

- 400 CPU cores

Cooling

- Liquid coolingwith 5,000 L/h at 8°C

Power

- 25 - 30 kW at peak

- installed: 40 kVA

Page 28: Next Generation Sequencing Technologies in Microbial Ecology

28

Moore’s Law - Outcompeted

Page 29: Next Generation Sequencing Technologies in Microbial Ecology

29

Take Home Message for the Next Generation Biologists

Three languages!

• Mother tongue

• English

• Perl, Phyton…

• Data managementGarbage in -> garbage out!

• Standardisation

www.gensc.org

Page 30: Next Generation Sequencing Technologies in Microbial Ecology

30

The Group http://www.microbial-genomics.de

Thanks for your attention