Non-negative Matrix Factorization with Sparseness Constraints
Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics...
-
Upload
julian-gallagher -
Category
Documents
-
view
219 -
download
1
Transcript of Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics...
![Page 1: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/1.jpg)
Compositionality and Sparseness in 16S rRNA data
Anthony Fodor Associate ProfessorBioinformatics and Genomics UNC Charlotte
![Page 2: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/2.jpg)
Can we fairly compare high and low biomass samples?
VS.
![Page 3: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/3.jpg)
Low abundance samples are inherently challenging to survey
Less abundantHanna et al - Comparison of culture and molecular techniques for microbial community characterization in infected necrotizing pancreatitis - J. Surgical Research - 2014
![Page 4: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/4.jpg)
Clearly the sequencing of negative controls should be part of all of our pipelines..
![Page 5: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/5.jpg)
Can we fairly compare samples with different numbers of sequences?
VS.
![Page 6: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/6.jpg)
16S rRNA experiments are always compositional and often sparse
Compositional – because different samples have different numbers of sequencesSparse – because there are many zeros in the spreadsheet
SAMPLES
OTUs
![Page 7: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/7.jpg)
Compositionality is a well-studied problem in statistics, but remains challenging
![Page 8: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/8.jpg)
Compositionality can introduce subtle artifacts into our dataset
Relative abundance
Problems include:Inference may report a change in A and B even thoughbiologically A and B have not changed.
The estimate of A and B is dependent on C. If C is contaminant (or rRNA in a RNA-seq experiment), the values ofA and B might not be appropriate.
A and B will appear correlated, but this is a statistical artifact.
![Page 9: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/9.jpg)
The correlation issue has been considered by multiple groups…
![Page 10: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/10.jpg)
The compositional nature of 16S rRNA data has led to controversies over analysis pipelines…
![Page 11: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/11.jpg)
Notice that in all the above examples, the ratio of B/A is always 2irrespective of what happens with taxa C.
10
5
=10 / 115
5 / 115
10 / 1015
5 / 1015= = 2
Normalization schemes can take advantage of working in ratio space
Relative abundance
![Page 12: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/12.jpg)
Cells in the spreadsheet with few counts are largely structured by sequencing depth
Source: Gevers et al. - The Treatment-Naive Microbiome in New-Onset Crohn’s Disease - Cell Host Microbe 2014
![Page 13: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/13.jpg)
Ordination without normalization leads to dependency of sequencing depth…
logLog10 (number of sequences) Bray-Curtis distance
![Page 14: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/14.jpg)
No normalization scheme eliminates the dependency of sequencing depth
![Page 15: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/15.jpg)
No normalization scheme eliminates compositional dependencies
Bioinformatics pipelines for 16S rRNA might consider explicitly tracking the number of sequences per samples as a potential confounder…
![Page 16: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/16.jpg)
Sequencing depth can be correlated with input variables of interest…
Log10 (number of sequences)
NM
DS
1
Theta YC distance
Diff
ere
nce
in n
um
be
r
of
seq
ue
nce
s
Source: Baxter et al. - Structure of the gut microbiome following colonization with human feces determines colonic tumor burden - Microbiome 2014
![Page 17: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/17.jpg)
Log10 (number of sequences) Log10 (number of sequences)
Log10 (number of sequences) Log10 (number of sequences)
Log10 (number of sequences) Log10 (number of sequences)
Log10 (number of sequences)
Theta YC distanceTheta YC distance
Theta YC distanceTheta YC distance
Theta YC distanceTheta YC distance
Theta YC distance
NM
DS
1N
MD
S 1
NM
DS
1
NM
DS
1
NM
DS
1N
MD
S 1
NM
DS
1
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Diff
eren
ce in
num
ber
of s
eque
nces
Different normalization schemes can have very different consequences for inference..
![Page 18: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/18.jpg)
No normalization scheme eliminates compositional dependencies (although some do better than others!).
Bioinformatics pipelines for 16S rRNA should explicitly track number of sequences per samples as a potential confounding variable.
Just as no one statistical test is appropriate for inference, there islikely no one normalization scheme that will be appropriate for all datasets.
Conclusions
![Page 19: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/19.jpg)
Raad Z. Gharaibeh
(We thank Dirk Gevers for providing a parsable OTU table for the Risk data)
![Page 20: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/20.jpg)
Cells in the spreadsheet with few counts are largely structured by sequencing depth
Source: Gevers et al. - The Treatment-Naive Microbiome in New-Onset Crohn’s Disease - Cell Host Microbe 2014
![Page 21: Compositionality and Sparseness in 16S rRNA data Anthony Fodor Associate Professor Bioinformatics and Genomics UNC Charlotte.](https://reader036.fdocuments.net/reader036/viewer/2022062805/5697c0111a28abf838ccbb37/html5/thumbnails/21.jpg)
In any experiment confounding variables can complicate inference..