04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

22
MISAEL FERNANDEZ MENTOR: GIRI NARASIMHAN A Study of the Lung Microbiome in Chronic Obstructive Pulmonary Disease (COPD) Using Metagenomics

Transcript of 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

Page 1: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

MISAEL FERNANDEZMENTOR: GIRI NARASIMHAN

A Study of the Lung Microbiome in Chronic Obstructive Pulmonary

Disease (COPD) Using Metagenomics

Page 2: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

2

.

Microbial Communities

Page 3: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

3

Metagenomics Is Like Solving a Puzzle

Page 4: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

4

A Modular Analytical Workflow

Data Preprocessing • Screen for

Quality• Contaminatio

n Removal

Classification

• Assign Taxonomies

• Group Sequences

Single-Sample Analysis• Estimate

Richness• Estimate

Diversity

Multiple-Sample Analysis

• Compare Samples

• Additional Statistics

OVER 30 STEPS

Page 5: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

5

Richness vs. Diversity

Low Diversity High Diversity

Equal Richness

Page 6: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

6

Classification Accuracy

  0% Substitution 5% Substitution 10% Substitution 15% Substitution 20% Substitution 25% Substitution

  Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev.

Kingdom

400 bp 100.00% 0.00% 99.99% 0.01% 99.57% 0.09% 95.09% 0.28% 75.07% 4.64% 43.68% 12.88%

300 bp 100.00% 0.00% 99.97% 0.01% 99.16% 0.13% 91.06% 4.66% 66.47% 17.82% 39.57% 26.71%

200 bp 100.00% 0.00% 99.84% 0.11% 96.51% 3.07% 80.63% 17.16% 55.46% 34.64% 37.50% 39.21%

100 bp 99.91% 0.10% 96.96% 3.71% 81.04% 22.63% 59.84% 43.62% 46.06% 51.46% 38.65% 50.81%

Genus

400 bp 92.65% 19.55% 81.99% 28.57% 49.03% 38.96% 15.99% 23.13% 2.05% 6.25% 0.08% 0.74%

300 bp 88.84% 22.60% 74.29% 30.31% 36.45% 32.66% 8.62% 14.84% 0.94% 3.50% 0.04% 0.53%

200 bp 82.06% 26.21% 56.87% 30.29% 19.91% 21.47% 3.65% 7.11% 0.30% 1.40% 0.01% 0.21%

100 bp 56.21% 29.54% 20.82% 16.77% 4.24% 5.83% 0.51% 1.53% 0.06% 0.50% 0.00% 0.03%

Page 7: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

7

Chao Richness Estimate - Genus

0

20

40

60

80

100

120

Chao Estim...

Datasets

Est

imate

d N

um

ber

of

Gen

era

Page 8: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

8

COPD Is a Leading Cause of Death

Page 9: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

9

A Highly Interdisciplinary Study

Page 10: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

10

Study Participants Came from Three Groups

56 SUBJECTS

Page 11: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

11

A Large Amount of Data Was Analyzed

3%3%

13%

27%

53%

Low Quality

Chimeras

Contaminants

Unclassified Genera

Classified1,038,517 TOTAL READS

270,607 UNCLASSIFIED READS

559 GENERA DISTINGUISHED

425,075,393 BASES

554,907 CLASSIFIED READS

Page 12: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

12

Richness & Diversity Distributions

20 60 100

140

180

220

260

300

Mor

e0

2

4

6

8

10

12

14

Richness Distribution

Estimated Genera

Fre

qu

en

cy

12.

74.

46.

17.

89.

511

.212

.914

.616

.3 1819

.721

.423

.124

.826

.5

Mor

e0

1

2

3

4

5

6

7

8

9

Diversity Distribution

Inverse Simpson Diversity IndexFre

qu

en

cy

Page 13: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

13

Richness & Diversity Estimates

07_M

J

44_M

J

64_M

J

22_M

J

67_M

J

39_M

J

62_M

J

33_M

J

10_M

J

16_M

J

37_M

J

66_M

J

23_M

J

03_M

J

50_M

J

40_M

J

31_M

J

57_M

J

42_M

J

26_M

J

09_M

J

54_M

J

63_M

J

59_M

J

14_M

J

17_M

J

27_M

J

15_M

J0

50

100

150

200

250

300

350

0

5

10

15

20

25

30

RichnessDiversity

Esti

mate

d N

um

ber

of

Gen

era

Div

ers

ity I

nd

ex

Page 14: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

14

Differences in Richness and Diversity Exist

07_M

J

44_M

J

64_M

J

22_M

J

67_M

J

39_M

J

62_M

J

33_M

J

10_M

J

16_M

J

37_M

J

66_M

J

23_M

J

03_M

J

50_M

J

40_M

J

31_M

J

57_M

J

42_M

J

26_M

J

09_M

J

54_M

J

63_M

J

59_M

J

14_M

J

17_M

J

27_M

J

15_M

J0

50

100

150

200

250

300

350

0

5

10

15

20

25

30

Esti

mate

d N

um

ber

of

Gen

era

Div

ers

ity I

nd

ex

COPD

Smoker

Never Smoker

Page 15: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

15

Most Abundant Genera

59_M

J

17_M

J

28_M

J

10_M

J

45_M

J

07_M

J

20_M

J

67_M

J

33_M

J

58_M

J

22_M

J

19_M

J

55_M

J

25_M

J

32_M

J

64_M

J

63_M

J

66_M

J

62_M

J

36_M

J

65_M

J

16_M

J

57_M

J

42_M

J

21_M

J

53_M

J

24_M

J

54_M

J

30_M

J

31_M

J0

5,000

10,000

15,000

20,000OribacteriumCampylobacterunclassified14unclassified13unclassified12unclassified11Granulicatellaunclassified10GemellaParvimonasunclassified09Stenotrophomonasunclassified08Staphylococcusunclassified07Gp2BurkholderiaCorynebacteriumActinomycesunclassified06Porphyromonasunclassified05NeisseriaVeillonellaFusobacteriumDelftiaunclassified04unclassified03Propioni-bacterium

Num

ber

of

Reads

Page 16: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

16

Differences in Genera - COPD vs. Never Smokers

More Abundant in COPD More Abundant in Never SmokersPropionibacterium unclassified14 Streptococcus unclassified63unclassified04 Azospira Rothia Solirubrobacterunclassified22 Escherichia_Shigella Phocoenobacter unclassified99unclassified30 Brevundimonas Paludibacter CaulobacterSulfuricurvum Brevibacterium Simkania unclassified81unclassified28 Simonsiella unclassified78 unclassified90Serpens Parvibaculum Iamia PediococcusTropheryma Hyphomonas Thermomonas Chelativorans  Massilia unclassified106 Cedecea

Page 17: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

RONALD E. MCNAIR SCHOLARS PROGRAMMBRS -RISE

( N I H G R A N T # R 5 G M 0 6 1 3 4 7 )

FLORIDA DEPT. OF HEALTH

DR. DEETTA KAY MILLSDR. WALTER GOLDBERG

Thank You

Page 18: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

DR. KALAI MATHEELISA SCHNEPER, JONATHAN SEGAL,

EUGENIA SILVA-HERZOG

MICHAEL CAMPOS , JOEL FISHMAN, MATHIAS SALATHE, ADAM WANNER, JUAN INFANTE

MELITA JARIC

DR. GIRI NARASIMHAN

Thank You

Page 19: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

19

References and Credits

"Chronic Obstructive Pulmonary Disease (COPD)." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, 01 Mar. 2012. Web. 23 Aug. 2012. <http://www.cdc.gov/copd/data.htm>. "Chronic Obstructive Pulmonary Disease (COPD)." WHO. N.p., n.d. Web. 03 Sept. 2012. <http://www.who.int/mediacentre/factsheets/fs315/en/index.html>. "Schloss SOP." - Mothur. N.p., n.d. Web. 23 Aug. 2012. <http://www.mothur.org/wiki/Schloss_SOP>. Blankenberg, D., A. Gordon, G. Von Kuster, N. Coraor, J. Taylor, and A. Nekrutenko. "Manipulation of FASTQ Data with

Galaxy." Bioinformatics 26.14 (2010): 1783-785. Bunge, John, Linda Woodard, Dankmar Böhning, James A. Foster, Sean Connolly, and Heather K. Allen. "Estimating Population Diversity with CatchAll." Bioinformatics 28.17 (2012): n. pag. Cole, J. R., Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity, and J. M. Tiedje. "The Ribosomal Database Project: Improved Alignments and New Tools for

RRNA Analysis." Nucleic Acids Research 37.Database (2009): D141-145. Costello, E. K., C. L. Lauber, M. Hamady, N. Fierer, J. I. Gordon, and R. Knight. "Bacterial Community Variation in Human Body Habitats Across Space and Time." Science 326.5960 (2009): 1694-697. Edgar, R. C., B. J. Haas, J. C. Clemente, C. Quince, and R. Knight. "UCHIME Improves Sensitivity and Speed of Chimera Detection." Bioinformatics 27.16 (2011): 2194-200 Erb-Downward JR, Thompson DL, Han MK, Freeman CM, McCloskey L, Schmidt LA, Young VB, Toews GB, Curtis JL, Sundaram B, Martinez FJ, Huffnagle GB (2010). Analysis of the lung microbiome in the "healthy" smoker

and in COPD. PLoS One. 2011, 6(2):e16384. Fonseca, V. G., B. Nichols, D. Lallias, C. Quince, G. R. Carvalho, D. M. Power, and S. Creer. "Sample Richness and Genetic Diversity as Drivers of Chimera Formation in NSSU Metagenetic Analyses." Nucleic Acids Research

40.11 (2012): n. pag Generalized Draft Form of HMP Data Generation Working Group 16S 454 Default Protocol Version 4.2- Pilot Study P.1. N.p.: n.p., n.d. Hankinson JL, Odencrantz JR, Fedan KB (1999) Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med 159:179–187.Jones, William J. "High-Throughput Sequencing and

Metagenomics." Estuaries and Coasts 33 (2010): 944-52. Li, H, Durbin, R (2010). Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. [PMID: 20080505] Liesack, W., H. Weyland, and E. Stackebrandt. "Potential Risks of Gene Amplification by PCR as Determined by 16S RDNA Analysis of a Mixed-culture of Strict Barophilic Bacteria." Microbial Ecology 21.1 (1991): 191-98. Martinez, FJ, Han, MK, Flaherty, K, Curtis, J (2006). “Role of infection and antimicrobial therapy in acute exacerbations of chronic obstructive pulmonary disease.” Expert Rev Anti Infect Ther 4: 101–124.Petrosino, J. F., S.

Highlander, R. A. Luna, R. A. Gibbs, and J. Versalovic. "Metagenomic Pyrosequencing and Microbial Identification." Clinical Chemistry 55.5 (2009): 856-66. Pond, SK, Wadhawan, S, Chiaromonte, F, Ananda, G, Chung, W, Taylor, J, Nekrutenko, A, The Galaxy Team (2009). Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Research, 2009, 19: 2144-2153 Qiu, X., L. Wu, H. Huang, P. E. McDonel, A. V. Palumbo, J. M. Tiedje, and J. Zhou. "Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S RRNA Gene-Based Cloning." Applied and Environmental

Microbiology 67.2 (2001): 880-87. Richter, Daniel C., Felix Ott, Alexander F. Auch, Ramona Schmid, and Daniel H. Huson. "MetaSim—A Sequencing Simulator for Genomics and Metagenomics." Ed. Dawn Field. PLoS ONE 3.10 (2008): E3373. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009). Introducing mothur: open-source,

platform-independent, community-supported software for describing and comparing microbial communities. Applied Environmental Ecology. 2009, 75(23):7537-41. Schmeider, R, Edwards, R (2011). Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6(3):e17288. doi:10.1371/journal.pone.0017288 Smyth, R.p., T.e. Schlub, A. Grimm, V. Venturi, A. Chopra, S. Mallal, M.p. Davenport, and J. Mak. "Reducing Chimera Formation during PCR Amplification to Ensure Accurate Genotyping." Gene 469.1-2 (2010): 45-51. Stevens, David A., John R. Hamilton, Nancy Johnson, Kwang Kyu Kim, and Jung-Sook Lee. "Halomonas, a Newly Recognized Human Pathogen Causing Infections and Contamination in a Dialysis Center." Medicine 88.4 (2009):

244-49. T. Huber, G. Faulkner and P. Hugenholtz. “Bellerophon; a program to detect chimeric sequences in multiple sequence alignments.” Bioinformatics 20 (2004): 2317-2319. Wang, G. C. Y., and Y. Wang. "The Frequency of Chimeric Molecules as a Consequence of PCR Co-amplification of 16S RRNA Genes from Different Bacterial Species." Microbiology 142.5 (1996): 1107-114. Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied Environmental Microbiology. 2007 Aug;73(16):5261-7. Epub 2007

Jun 22. Wintzingerode, V., Friedrich, Ulf B. Gobel, and Erko Stackebrandt. "Determination of Microbial Diversity in Environmental Samples: Pitfalls of PCR-based RRNA Analysis." FEMS Microbiology Reviews 21.3 (1997): 213-29. Wooley, John C., Adam Godzick, and Iddo Friedberg. "A Primer on Metagenomics." PLoS Computational Biology 6.10 (2010): n. pag. IMAGES http://eco-restorellc.com/wp-content/uploads/2011/10/green-bacteria.jpg http://www.rikenresearch.riken.jp http://www.seaveg.com/ http://mytechbyme.files.wordpress.com/ http://www.jgi.doe.gov/ http://www.nhlbi.nih.gov/ http://www.bioquell.com/technology/microbiology/multidrug-resistant-pseudomonas-aeruginosa/ http://fc00.deviantart.net/

Page 20: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

20

59_M

J

17_M

J

28_M

J

10_M

J

45_M

J

07_M

J

20_M

J

67_M

J

33_M

J

58_M

J

22_M

J

19_M

J

55_M

J

25_M

J

32_M

J

64_M

J

63_M

J

66_M

J

62_M

J

36_M

J

65_M

J

16_M

J

57_M

J

42_M

J

21_M

J

53_M

J

24_M

J

54_M

J

30_M

J

31_M

J0

500

1,000

1,500

2,000

2,500

3,000

3,500

StreptococcusRothiaPhocoenobacterPaludibacterSimkaniaunclassified78IamiaThermomonasunclassified106unclassified63Solirubrobacterunclassified99Caulobacterunclassified81unclassified90PediococcusChelativoransCedeceaMassiliaHyphomonasParvibaculumSimonsiellaBrevibacteriumBrevundimonasEscherichia_ShigellaAzospiraunclassified14TropherymaSerpensunclassified28Sulfuricurvumunclassified30unclassified22unclassified04Propionibacterium

Num

ber

of

Reads

Differentially Significant Genera

COPD NeverSmoker

Page 21: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

21

PCA and Clustering

Cluster 1 COPD Smoker NSCOPD Smoker NSCOPD Smoker NSCOPD Smoker NSCOPD Smoker SmokerCOPD COPD SmokerCOPD COPD  

Cluster 2 COPD Smoker NSCOPD Smoker NSCOPD Smoker SmokerCOPD Smoker SmokerCOPD Smoker SmokerCOPD Smoker SmokerCOPD COPD COPDCOPD COPD COPD

Cluster 3  Smoker NS SmokerSmoker NS SmokerSmoker NS SmokerSmoker   Smoker

Summary NS Smoker COPDGroup 1 4 7 9Group 2 2 10 12Group 3 3 8 0

Page 22: 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd

22

Differentially Significant Genera

NameMean

(COPD) Var. (COPD)Mean (Never

Smoker)Var. (Never

Smoker) p-valueMean

Difference NameMean

(COPD) Var. (COPD)Mean (Never

Smoker)Var. (Never

Smoker) p-valueMean

Difference

Propionibacterium 8.2754% 2.69E-03 4.8787% 7.29E-04 0.03996 3.3967% Cedecea 0.0000% 0.00E+00 0.0013% 1.59E-09 0.04600 -0.0013%

unclassified04 0.7459% 1.23E-05 0.5007% 7.76E-06 0.04895 0.2452% Chelativorans 0.0000% 0.00E+00 0.0013% 1.59E-09 0.04600 -0.0013%

unclassified22 0.4113% 1.82E-06 0.2491% 1.58E-06 0.00500 0.1622% Pediococcus 0.0007% 8.98E-10 0.0020% 3.57E-09 0.03312 -0.0013%

unclassified30 0.1755% 2.77E-06 0.0624% 2.28E-07 0.00599 0.1131% unclassified90 0.0003% 2.01E-10 0.0020% 3.57E-09 0.03312 -0.0017%

Sulfuricurvum 0.1203% 3.79E-06 0.0225% 9.85E-08 0.01598 0.0978% unclassified81 0.0014% 9.60E-10 0.0033% 9.93E-09 0.02623 -0.0019%

unclassified28 0.1749% 1.73E-06 0.0785% 4.08E-07 0.01499 0.0964% Caulobacter 0.0020% 3.91E-09 0.0047% 1.95E-08 0.00135 -0.0027%

Serpens 0.1660% 9.75E-07 0.0806% 5.16E-07 0.02098 0.0853% unclassified99 0.0002% 7.46E-11 0.0033% 9.93E-09 0.00224 -0.0031%

Tropheryma 0.0738% 7.19E-06 0.0000% 0.00E+00 0.00100 0.0738% Solirubrobacter 0.0015% 3.24E-09 0.0053% 2.54E-08 0.00013 -0.0038%

unclassified14 0.1499% 9.39E-07 0.0905% 1.94E-07 0.04795 0.0594% unclassified63 0.0016% 3.56E-09 0.0071% 2.37E-08 0.02623 -0.0055%

Azospira 0.0357% 7.34E-07 0.0000% 0.00E+00 0.00500 0.0357% unclassified106 0.0004% 3.40E-10 0.0068% 2.26E-08 0.00877 -0.0064%

Escherichia_Shigella 0.0406% 2.51E-07 0.0135% 3.88E-08 0.04595 0.0271% Thermomonas 0.0000% 0.00E+00 0.0069% 4.32E-08 0.00212 -0.0069%

Brevundimonas 0.0253% 1.40E-07 0.0016% 2.26E-09 0.00899 0.0238% Iamia 0.0023% 4.16E-09 0.0104% 9.72E-08 0.02703 -0.0081%

Brevibacterium 0.0094% 4.27E-08 0.0000% 0.00E+00 0.01245 0.0094% unclassified78 0.0004% 4.18E-10 0.0103% 5.57E-08 0.03312 -0.0098%

Simonsiella 0.0085% 5.60E-08 0.0000% 0.00E+00 0.01989 0.0085% Simkania 0.0000% 0.00E+00 0.0120% 1.31E-07 0.00045 -0.0120%

Parvibaculum 0.0079% 6.97E-08 0.0000% 0.00E+00 0.03317 0.0079% Paludibacter 0.0039% 2.04E-08 0.0206% 2.23E-07 0.00446 -0.0166%

Hyphomonas 0.0062% 7.25E-08 0.0000% 0.00E+00 0.03184 0.0062% Phocoenobacter 0.0031% 7.05E-09 0.0442% 1.25E-06 0.01704 -0.0411%

Massilia 0.0061% 3.78E-08 0.0040% 1.43E-08 0.00446 0.0021% Rothia 0.1105% 1.19E-06 0.5286% 2.54E-05 0.02298 -0.4181%

Streptococcus 1.8356% 7.99E-05 4.8217% 1.70E-03 0.04795 -2.9861%