Metagenomic ANalysis
Transcript of Metagenomic ANalysis
![Page 1: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/1.jpg)
SHAMAN : SHiny Application for Metagenomic ANalysisStevenn Volant, Amine GhozlaneHub Bioinformatique et Biostatistique – C3BI, USR 3756 IP CNRSBiomics – CITECH
![Page 2: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/2.jpg)
Image from Alberts Molecular Biology of the Cell 5th
Ribosome
ITS (1) : located between 18S and 5.8S rRNA genes
2 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 3: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/3.jpg)
Quantitative metagenomics pipeline
Mapping
Merging Clustering500
1000
300
Samples
1 N
OTU 1
OTU 2
OTU 3
16S/18S/ITS
3 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 4: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/4.jpg)
HUB – 16S/18S/ITS pipeline
A A
A A
A
A
Dereplication
Chimera filteringGold database
Amplicon sequences
Clustering
Taxonomic Annotation
Mapping
OTU
Countmatrix
PhylogenyMafft/Fastree
Vsearch3
1.Genomics, 102(5), 500-506. 2.Bioinformatics Vol.27 no. 21 2011, p2957-2963. 3.https://github.com/torognes/vsearch
Sample
Amplicon
FastQC
FLASH2
Silva SSU 123RDP
Greengenes13,4
Filter Human PhiX reads
Alientrimmer1
26 min
100 samples50 pairs30 min
Findley
UNITE150 min
4 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Readprocessing
OTU building
![Page 5: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/5.jpg)
Uparse/Vsearch
Integrated in QIIME, MOTHUR and LotuS
Nature methods, 10(10), 996-998.
5 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 6: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/6.jpg)
Quantitative metagenomics pipeline
Mapping
Merging Clustering500
1000
300
Samples
1 N
OTU 1
OTU 2
OTU 3
16S/18S/ITS
6 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
« être éclairé » (Toungouse - Sibérie)
SHAMAN
![Page 7: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/7.jpg)
SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is
considered only after an experiment is completed, when it may be too late. »
7 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 8: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/8.jpg)
SHAMAN : shaman.c3bi.pasteur.fr
8 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Counts Annotation
![Page 9: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/9.jpg)
SHAMAN : shaman.c3bi.pasteur.fr« There is no disputing the importance of statistical analysis in biological research, but too often it is
considered only after an experiment is completed, when it may be too late. »
9 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 10: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/10.jpg)
Metagenomic vs RNA-seq
Metagenomic RNA-seq
DistributionOverdispersed counts → Negative
binomialOverdispersed counts → Negative
binomial
Constraints Highly abundant species Highly expressed genes
Goal
Find differentially abundant features (species, familly, …):
OTU distributions and abundances vary between conditions
Find differentially expressed genes: Distributions and expression vary
between conditions
DESeq2 approach is usually used for RNA-seq dataset
Metagenomic data are similar to RNA-seq data
10 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 11: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/11.jpg)
Data normalization
How ?✗ Fitting the distributions (Total Read Count, UpperQuartile, Median, Full Quantile)✗ Account for the feature length (RPKM)✗ Concept of « effective reads number » (TMM, DESeq2)
Why ? ✗ To correct technical biaises and make samples comparables.
Remarks?✗ Some methods normalize the counts, others the library sizes✗ Some are designed for differential analysis
11 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 12: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/12.jpg)
DESeq2 normalization (OTU level)
Assumption✗ Most of the OTU have the « same » abundance between 2 conditions
where
Xij : Number of mapped reads of the OTU i in sample jn: Number of samples
Normalization factor :
12 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 13: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/13.jpg)
Comparison with RPKM (1/3)
● RPKM : Reads Per Kilobase per Million mapped reads
Assumption✗ Counts are proportional to abundance, the length and the sequencing deepth.
● Method
Normalized counts =
per Million Per Kilobase
Length Number of reads of sample j
13 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 14: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/14.jpg)
Comparison with RPKM (2/3)
Comparison of 7 normalization methods
14 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 15: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/15.jpg)
Comparison with RPKM (3/3)
Results on real data (7 samples) FDR and Power
15 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Dillies M. et al., Bioinformatics 2013
![Page 16: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/16.jpg)
To sum up
DESeq2 normalization provides better results
Recommend using DESeq2 to perform analysis of differential abundance
16 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 17: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/17.jpg)
Statistical model of DESeq2
Generalized Linear Model
Dispersion
Log2 fold change
Advantages✗ Allows complex experimental designs.
17 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
Size factor
Moyenne
![Page 18: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/18.jpg)
Dispersion estimation
Problem✗ Get a good estimate of the dispersion with a small number of samples.
Modelisation of the dispersion :
Function of the mean of normalized count
Local parametric regression
18 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 19: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/19.jpg)
Contrasts (comparisons)
Aim✗ Testing a specific effect without having to re-fit the model.
Advantages✗ Parameters are estimated with all samples.
Coefficients
Contrast vector
Covariance matrix
19 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 20: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/20.jpg)
Conclusions
SHAMAN
16s/18s/its analysis
Strong statistical approach
Several visualizations available
Access : http://shaman.c3bi.pasteur.fr
Incoming features
WGS analysis
New visualizations (Taxonomy plot, Krona, continuous data)
Compatibility with FROGS
![Page 21: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/21.jpg)
CIB – FROGS 16S/18S – GALAXY Pasteur
Galaxy team : Mathieu Valade, Fabien MareuilEmmanuel Quevillon, Eric Deveaud
Vsearchswarm
21 Stevenn Volant, Amine Ghozlane • SHAMAN : Shiny Application for Metagenomic ANalysis 29/01/2016
![Page 22: Metagenomic ANalysis](https://reader034.fdocuments.net/reader034/viewer/2022042713/6266f1aa070a2f0a1e29c376/html5/thumbnails/22.jpg)
Acknowledgements
CiTech Biomics Pole
Sean KENNEDYBéatrice REGNAULT
Olivier GASCUELMarie-Agnès DILLIESChristophe MALABATHugo VARETNicolas MAILLETPierre LECHATAnna ZHUKOVARachel TORCHET
C3BI Bioinformatics Biostatistics Hub