04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd
-
Upload
mitch-fernandez -
Category
Science
-
view
52 -
download
0
Transcript of 04.19.2013.an.analytical.workflow.for.metagenomic.data.and.its.application.to.the.study.of.copd
MISAEL FERNANDEZMENTOR: GIRI NARASIMHAN
A Study of the Lung Microbiome in Chronic Obstructive Pulmonary
Disease (COPD) Using Metagenomics
2
.
Microbial Communities
3
Metagenomics Is Like Solving a Puzzle
4
A Modular Analytical Workflow
Data Preprocessing • Screen for
Quality• Contaminatio
n Removal
Classification
• Assign Taxonomies
• Group Sequences
Single-Sample Analysis• Estimate
Richness• Estimate
Diversity
Multiple-Sample Analysis
• Compare Samples
• Additional Statistics
OVER 30 STEPS
5
Richness vs. Diversity
Low Diversity High Diversity
Equal Richness
6
Classification Accuracy
0% Substitution 5% Substitution 10% Substitution 15% Substitution 20% Substitution 25% Substitution
Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev.
Kingdom
400 bp 100.00% 0.00% 99.99% 0.01% 99.57% 0.09% 95.09% 0.28% 75.07% 4.64% 43.68% 12.88%
300 bp 100.00% 0.00% 99.97% 0.01% 99.16% 0.13% 91.06% 4.66% 66.47% 17.82% 39.57% 26.71%
200 bp 100.00% 0.00% 99.84% 0.11% 96.51% 3.07% 80.63% 17.16% 55.46% 34.64% 37.50% 39.21%
100 bp 99.91% 0.10% 96.96% 3.71% 81.04% 22.63% 59.84% 43.62% 46.06% 51.46% 38.65% 50.81%
Genus
400 bp 92.65% 19.55% 81.99% 28.57% 49.03% 38.96% 15.99% 23.13% 2.05% 6.25% 0.08% 0.74%
300 bp 88.84% 22.60% 74.29% 30.31% 36.45% 32.66% 8.62% 14.84% 0.94% 3.50% 0.04% 0.53%
200 bp 82.06% 26.21% 56.87% 30.29% 19.91% 21.47% 3.65% 7.11% 0.30% 1.40% 0.01% 0.21%
100 bp 56.21% 29.54% 20.82% 16.77% 4.24% 5.83% 0.51% 1.53% 0.06% 0.50% 0.00% 0.03%
7
Chao Richness Estimate - Genus
0
20
40
60
80
100
120
Chao Estim...
Datasets
Est
imate
d N
um
ber
of
Gen
era
8
COPD Is a Leading Cause of Death
9
A Highly Interdisciplinary Study
10
Study Participants Came from Three Groups
56 SUBJECTS
11
A Large Amount of Data Was Analyzed
3%3%
13%
27%
53%
Low Quality
Chimeras
Contaminants
Unclassified Genera
Classified1,038,517 TOTAL READS
270,607 UNCLASSIFIED READS
559 GENERA DISTINGUISHED
425,075,393 BASES
554,907 CLASSIFIED READS
12
Richness & Diversity Distributions
20 60 100
140
180
220
260
300
Mor
e0
2
4
6
8
10
12
14
Richness Distribution
Estimated Genera
Fre
qu
en
cy
12.
74.
46.
17.
89.
511
.212
.914
.616
.3 1819
.721
.423
.124
.826
.5
Mor
e0
1
2
3
4
5
6
7
8
9
Diversity Distribution
Inverse Simpson Diversity IndexFre
qu
en
cy
13
Richness & Diversity Estimates
07_M
J
44_M
J
64_M
J
22_M
J
67_M
J
39_M
J
62_M
J
33_M
J
10_M
J
16_M
J
37_M
J
66_M
J
23_M
J
03_M
J
50_M
J
40_M
J
31_M
J
57_M
J
42_M
J
26_M
J
09_M
J
54_M
J
63_M
J
59_M
J
14_M
J
17_M
J
27_M
J
15_M
J0
50
100
150
200
250
300
350
0
5
10
15
20
25
30
RichnessDiversity
Esti
mate
d N
um
ber
of
Gen
era
Div
ers
ity I
nd
ex
14
Differences in Richness and Diversity Exist
07_M
J
44_M
J
64_M
J
22_M
J
67_M
J
39_M
J
62_M
J
33_M
J
10_M
J
16_M
J
37_M
J
66_M
J
23_M
J
03_M
J
50_M
J
40_M
J
31_M
J
57_M
J
42_M
J
26_M
J
09_M
J
54_M
J
63_M
J
59_M
J
14_M
J
17_M
J
27_M
J
15_M
J0
50
100
150
200
250
300
350
0
5
10
15
20
25
30
Esti
mate
d N
um
ber
of
Gen
era
Div
ers
ity I
nd
ex
COPD
Smoker
Never Smoker
15
Most Abundant Genera
59_M
J
17_M
J
28_M
J
10_M
J
45_M
J
07_M
J
20_M
J
67_M
J
33_M
J
58_M
J
22_M
J
19_M
J
55_M
J
25_M
J
32_M
J
64_M
J
63_M
J
66_M
J
62_M
J
36_M
J
65_M
J
16_M
J
57_M
J
42_M
J
21_M
J
53_M
J
24_M
J
54_M
J
30_M
J
31_M
J0
5,000
10,000
15,000
20,000OribacteriumCampylobacterunclassified14unclassified13unclassified12unclassified11Granulicatellaunclassified10GemellaParvimonasunclassified09Stenotrophomonasunclassified08Staphylococcusunclassified07Gp2BurkholderiaCorynebacteriumActinomycesunclassified06Porphyromonasunclassified05NeisseriaVeillonellaFusobacteriumDelftiaunclassified04unclassified03Propioni-bacterium
Num
ber
of
Reads
16
Differences in Genera - COPD vs. Never Smokers
More Abundant in COPD More Abundant in Never SmokersPropionibacterium unclassified14 Streptococcus unclassified63unclassified04 Azospira Rothia Solirubrobacterunclassified22 Escherichia_Shigella Phocoenobacter unclassified99unclassified30 Brevundimonas Paludibacter CaulobacterSulfuricurvum Brevibacterium Simkania unclassified81unclassified28 Simonsiella unclassified78 unclassified90Serpens Parvibaculum Iamia PediococcusTropheryma Hyphomonas Thermomonas Chelativorans Massilia unclassified106 Cedecea
RONALD E. MCNAIR SCHOLARS PROGRAMMBRS -RISE
( N I H G R A N T # R 5 G M 0 6 1 3 4 7 )
FLORIDA DEPT. OF HEALTH
DR. DEETTA KAY MILLSDR. WALTER GOLDBERG
Thank You
DR. KALAI MATHEELISA SCHNEPER, JONATHAN SEGAL,
EUGENIA SILVA-HERZOG
MICHAEL CAMPOS , JOEL FISHMAN, MATHIAS SALATHE, ADAM WANNER, JUAN INFANTE
MELITA JARIC
DR. GIRI NARASIMHAN
Thank You
19
References and Credits
"Chronic Obstructive Pulmonary Disease (COPD)." Centers for Disease Control and Prevention. Centers for Disease Control and Prevention, 01 Mar. 2012. Web. 23 Aug. 2012. <http://www.cdc.gov/copd/data.htm>. "Chronic Obstructive Pulmonary Disease (COPD)." WHO. N.p., n.d. Web. 03 Sept. 2012. <http://www.who.int/mediacentre/factsheets/fs315/en/index.html>. "Schloss SOP." - Mothur. N.p., n.d. Web. 23 Aug. 2012. <http://www.mothur.org/wiki/Schloss_SOP>. Blankenberg, D., A. Gordon, G. Von Kuster, N. Coraor, J. Taylor, and A. Nekrutenko. "Manipulation of FASTQ Data with
Galaxy." Bioinformatics 26.14 (2010): 1783-785. Bunge, John, Linda Woodard, Dankmar Böhning, James A. Foster, Sean Connolly, and Heather K. Allen. "Estimating Population Diversity with CatchAll." Bioinformatics 28.17 (2012): n. pag. Cole, J. R., Q. Wang, E. Cardenas, J. Fish, B. Chai, R. J. Farris, A. S. Kulam-Syed-Mohideen, D. M. McGarrell, T. Marsh, G. M. Garrity, and J. M. Tiedje. "The Ribosomal Database Project: Improved Alignments and New Tools for
RRNA Analysis." Nucleic Acids Research 37.Database (2009): D141-145. Costello, E. K., C. L. Lauber, M. Hamady, N. Fierer, J. I. Gordon, and R. Knight. "Bacterial Community Variation in Human Body Habitats Across Space and Time." Science 326.5960 (2009): 1694-697. Edgar, R. C., B. J. Haas, J. C. Clemente, C. Quince, and R. Knight. "UCHIME Improves Sensitivity and Speed of Chimera Detection." Bioinformatics 27.16 (2011): 2194-200 Erb-Downward JR, Thompson DL, Han MK, Freeman CM, McCloskey L, Schmidt LA, Young VB, Toews GB, Curtis JL, Sundaram B, Martinez FJ, Huffnagle GB (2010). Analysis of the lung microbiome in the "healthy" smoker
and in COPD. PLoS One. 2011, 6(2):e16384. Fonseca, V. G., B. Nichols, D. Lallias, C. Quince, G. R. Carvalho, D. M. Power, and S. Creer. "Sample Richness and Genetic Diversity as Drivers of Chimera Formation in NSSU Metagenetic Analyses." Nucleic Acids Research
40.11 (2012): n. pag Generalized Draft Form of HMP Data Generation Working Group 16S 454 Default Protocol Version 4.2- Pilot Study P.1. N.p.: n.p., n.d. Hankinson JL, Odencrantz JR, Fedan KB (1999) Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med 159:179–187.Jones, William J. "High-Throughput Sequencing and
Metagenomics." Estuaries and Coasts 33 (2010): 944-52. Li, H, Durbin, R (2010). Fast and accurate long-read alignment with Burrows-Wheeler Transform. Bioinformatics, Epub. [PMID: 20080505] Liesack, W., H. Weyland, and E. Stackebrandt. "Potential Risks of Gene Amplification by PCR as Determined by 16S RDNA Analysis of a Mixed-culture of Strict Barophilic Bacteria." Microbial Ecology 21.1 (1991): 191-98. Martinez, FJ, Han, MK, Flaherty, K, Curtis, J (2006). “Role of infection and antimicrobial therapy in acute exacerbations of chronic obstructive pulmonary disease.” Expert Rev Anti Infect Ther 4: 101–124.Petrosino, J. F., S.
Highlander, R. A. Luna, R. A. Gibbs, and J. Versalovic. "Metagenomic Pyrosequencing and Microbial Identification." Clinical Chemistry 55.5 (2009): 856-66. Pond, SK, Wadhawan, S, Chiaromonte, F, Ananda, G, Chung, W, Taylor, J, Nekrutenko, A, The Galaxy Team (2009). Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Research, 2009, 19: 2144-2153 Qiu, X., L. Wu, H. Huang, P. E. McDonel, A. V. Palumbo, J. M. Tiedje, and J. Zhou. "Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S RRNA Gene-Based Cloning." Applied and Environmental
Microbiology 67.2 (2001): 880-87. Richter, Daniel C., Felix Ott, Alexander F. Auch, Ramona Schmid, and Daniel H. Huson. "MetaSim—A Sequencing Simulator for Genomics and Metagenomics." Ed. Dawn Field. PLoS ONE 3.10 (2008): E3373. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009). Introducing mothur: open-source,
platform-independent, community-supported software for describing and comparing microbial communities. Applied Environmental Ecology. 2009, 75(23):7537-41. Schmeider, R, Edwards, R (2011). Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS ONE 6(3):e17288. doi:10.1371/journal.pone.0017288 Smyth, R.p., T.e. Schlub, A. Grimm, V. Venturi, A. Chopra, S. Mallal, M.p. Davenport, and J. Mak. "Reducing Chimera Formation during PCR Amplification to Ensure Accurate Genotyping." Gene 469.1-2 (2010): 45-51. Stevens, David A., John R. Hamilton, Nancy Johnson, Kwang Kyu Kim, and Jung-Sook Lee. "Halomonas, a Newly Recognized Human Pathogen Causing Infections and Contamination in a Dialysis Center." Medicine 88.4 (2009):
244-49. T. Huber, G. Faulkner and P. Hugenholtz. “Bellerophon; a program to detect chimeric sequences in multiple sequence alignments.” Bioinformatics 20 (2004): 2317-2319. Wang, G. C. Y., and Y. Wang. "The Frequency of Chimeric Molecules as a Consequence of PCR Co-amplification of 16S RRNA Genes from Different Bacterial Species." Microbiology 142.5 (1996): 1107-114. Wang, Q, Garrity, GM, Tiedje, JM, Cole, JR (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied Environmental Microbiology. 2007 Aug;73(16):5261-7. Epub 2007
Jun 22. Wintzingerode, V., Friedrich, Ulf B. Gobel, and Erko Stackebrandt. "Determination of Microbial Diversity in Environmental Samples: Pitfalls of PCR-based RRNA Analysis." FEMS Microbiology Reviews 21.3 (1997): 213-29. Wooley, John C., Adam Godzick, and Iddo Friedberg. "A Primer on Metagenomics." PLoS Computational Biology 6.10 (2010): n. pag. IMAGES http://eco-restorellc.com/wp-content/uploads/2011/10/green-bacteria.jpg http://www.rikenresearch.riken.jp http://www.seaveg.com/ http://mytechbyme.files.wordpress.com/ http://www.jgi.doe.gov/ http://www.nhlbi.nih.gov/ http://www.bioquell.com/technology/microbiology/multidrug-resistant-pseudomonas-aeruginosa/ http://fc00.deviantart.net/
20
59_M
J
17_M
J
28_M
J
10_M
J
45_M
J
07_M
J
20_M
J
67_M
J
33_M
J
58_M
J
22_M
J
19_M
J
55_M
J
25_M
J
32_M
J
64_M
J
63_M
J
66_M
J
62_M
J
36_M
J
65_M
J
16_M
J
57_M
J
42_M
J
21_M
J
53_M
J
24_M
J
54_M
J
30_M
J
31_M
J0
500
1,000
1,500
2,000
2,500
3,000
3,500
StreptococcusRothiaPhocoenobacterPaludibacterSimkaniaunclassified78IamiaThermomonasunclassified106unclassified63Solirubrobacterunclassified99Caulobacterunclassified81unclassified90PediococcusChelativoransCedeceaMassiliaHyphomonasParvibaculumSimonsiellaBrevibacteriumBrevundimonasEscherichia_ShigellaAzospiraunclassified14TropherymaSerpensunclassified28Sulfuricurvumunclassified30unclassified22unclassified04Propionibacterium
Num
ber
of
Reads
Differentially Significant Genera
COPD NeverSmoker
21
PCA and Clustering
Cluster 1 COPD Smoker NSCOPD Smoker NSCOPD Smoker NSCOPD Smoker NSCOPD Smoker SmokerCOPD COPD SmokerCOPD COPD
Cluster 2 COPD Smoker NSCOPD Smoker NSCOPD Smoker SmokerCOPD Smoker SmokerCOPD Smoker SmokerCOPD Smoker SmokerCOPD COPD COPDCOPD COPD COPD
Cluster 3 Smoker NS SmokerSmoker NS SmokerSmoker NS SmokerSmoker Smoker
Summary NS Smoker COPDGroup 1 4 7 9Group 2 2 10 12Group 3 3 8 0
22
Differentially Significant Genera
NameMean
(COPD) Var. (COPD)Mean (Never
Smoker)Var. (Never
Smoker) p-valueMean
Difference NameMean
(COPD) Var. (COPD)Mean (Never
Smoker)Var. (Never
Smoker) p-valueMean
Difference
Propionibacterium 8.2754% 2.69E-03 4.8787% 7.29E-04 0.03996 3.3967% Cedecea 0.0000% 0.00E+00 0.0013% 1.59E-09 0.04600 -0.0013%
unclassified04 0.7459% 1.23E-05 0.5007% 7.76E-06 0.04895 0.2452% Chelativorans 0.0000% 0.00E+00 0.0013% 1.59E-09 0.04600 -0.0013%
unclassified22 0.4113% 1.82E-06 0.2491% 1.58E-06 0.00500 0.1622% Pediococcus 0.0007% 8.98E-10 0.0020% 3.57E-09 0.03312 -0.0013%
unclassified30 0.1755% 2.77E-06 0.0624% 2.28E-07 0.00599 0.1131% unclassified90 0.0003% 2.01E-10 0.0020% 3.57E-09 0.03312 -0.0017%
Sulfuricurvum 0.1203% 3.79E-06 0.0225% 9.85E-08 0.01598 0.0978% unclassified81 0.0014% 9.60E-10 0.0033% 9.93E-09 0.02623 -0.0019%
unclassified28 0.1749% 1.73E-06 0.0785% 4.08E-07 0.01499 0.0964% Caulobacter 0.0020% 3.91E-09 0.0047% 1.95E-08 0.00135 -0.0027%
Serpens 0.1660% 9.75E-07 0.0806% 5.16E-07 0.02098 0.0853% unclassified99 0.0002% 7.46E-11 0.0033% 9.93E-09 0.00224 -0.0031%
Tropheryma 0.0738% 7.19E-06 0.0000% 0.00E+00 0.00100 0.0738% Solirubrobacter 0.0015% 3.24E-09 0.0053% 2.54E-08 0.00013 -0.0038%
unclassified14 0.1499% 9.39E-07 0.0905% 1.94E-07 0.04795 0.0594% unclassified63 0.0016% 3.56E-09 0.0071% 2.37E-08 0.02623 -0.0055%
Azospira 0.0357% 7.34E-07 0.0000% 0.00E+00 0.00500 0.0357% unclassified106 0.0004% 3.40E-10 0.0068% 2.26E-08 0.00877 -0.0064%
Escherichia_Shigella 0.0406% 2.51E-07 0.0135% 3.88E-08 0.04595 0.0271% Thermomonas 0.0000% 0.00E+00 0.0069% 4.32E-08 0.00212 -0.0069%
Brevundimonas 0.0253% 1.40E-07 0.0016% 2.26E-09 0.00899 0.0238% Iamia 0.0023% 4.16E-09 0.0104% 9.72E-08 0.02703 -0.0081%
Brevibacterium 0.0094% 4.27E-08 0.0000% 0.00E+00 0.01245 0.0094% unclassified78 0.0004% 4.18E-10 0.0103% 5.57E-08 0.03312 -0.0098%
Simonsiella 0.0085% 5.60E-08 0.0000% 0.00E+00 0.01989 0.0085% Simkania 0.0000% 0.00E+00 0.0120% 1.31E-07 0.00045 -0.0120%
Parvibaculum 0.0079% 6.97E-08 0.0000% 0.00E+00 0.03317 0.0079% Paludibacter 0.0039% 2.04E-08 0.0206% 2.23E-07 0.00446 -0.0166%
Hyphomonas 0.0062% 7.25E-08 0.0000% 0.00E+00 0.03184 0.0062% Phocoenobacter 0.0031% 7.05E-09 0.0442% 1.25E-06 0.01704 -0.0411%
Massilia 0.0061% 3.78E-08 0.0040% 1.43E-08 0.00446 0.0021% Rothia 0.1105% 1.19E-06 0.5286% 2.54E-05 0.02298 -0.4181%
Streptococcus 1.8356% 7.99E-05 4.8217% 1.70E-03 0.04795 -2.9861%