Latent Block Model for Metagenomic data - Accueil |...
Transcript of Latent Block Model for Metagenomic data - Accueil |...
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Latent Block Model for Metagenomic data
J. Aubert, S. Schbath, S. [email protected]
Recent Computational Advances in Metagenomics 2016The Hague
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Plant-microbial communities interactions in the rhizosphereRhizosphere : the region of soil directly influenced by root secretions andassociated soil microorganismsMetagenomics : Study of genetic material recovered directly fromenvironmental samples → Who is there ?
S1 S2 S3 . . . Sj . . . SmBact. 1 0 0 0 . . . y1j . . . 3Bact. 2 59 17 43 . . . y2j . . . 3
. . . . . . . . . . . . . . . . . . . . . . . .Bact. i yi1 yi2 yi3 . . . yij . . . yid
. . . . . . . . . . . . . . . . . . . . . . . .Bact. n 90 120 123 . . . ynj . . . 95
Seq. depth 4738 5157 6010 . . .∑n
i=1 yij . . . 5916
yij = number of sequences from sample j assigned to bacteria i .Count data dependent from the sequencing effort
Model-based biclustering
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Latent Block Model (Govaert and Nadif. 2003)Assumptions(Zi) ∼M(1, π = (π1, . . . , πK )) (Wj) ∼M(1, ρ = (ρ1, . . . , ρG))Latent variables (Zi) and (Wj) are independentThe random variables Yij are independent conditionally on labels
Model(Yij),i = 1, . . . , n et j = 1, . . . ,m matrix of observations.
Yij |(Zik = 1,Wjg = 1) ∼ f (.;µiνjαkg)
• µi : mean level of presence of one particular bacteria
• νj : scale factor correcting for sampling effort (around 1)
• αkg : interaction level between bacteria and sampling units withinthe (k, g) group
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Graphical representation
• Zi and Wj unobserved labels ofrow i and column j
• Yij depends on the labels.• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Graphical representation
• Zi and Wj unobserved labels ofrow i and column j
• Yij depends on the labels.
• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Graphical representation
Conditional distribution of (Z,W)
• Zi and Wj unobserved labels ofrow i and column j
• Yij depends on the labels.
• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Poisson-Gamma Latent Block Model
Negative Binomial Distribution - Poisson-Gamma mixture
Yijr |(Zik = 1,Wjg = 1) ∼ P(µiνjr Uijrαkg)
withUijr ∼ Gamma(a, a)
p(Z,W,U,Y) p(Z,W,U|Y)W2
W1
Z1
Z2
U221U211
Y223
U212 U213 U222 U223
Y111 Y112Y113
Y211 Y212 Y213
U111 U112 U113
Y121 Y122 Y123
U121 U122 U123
Y221 Y222
W2W1
Z1
Z2
U221U211 U212 U213 U222 U223
U111 U112 U113 U121 U122 U123
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Poisson-Gamma Latent Block Model
Negative Binomial Distribution - Poisson-Gamma mixture
Yijr |(Zik = 1,Wjg = 1) ∼ P(µiνjr Uijrαkg)
withUijr ∼ Gamma(a, a)p(Z,W,U,Y) p(Z,W,U|Y)
W2W1
Z1
Z2
U221U211
Y223
U212 U213 U222 U223
Y111 Y112Y113
Y211 Y212 Y213
U111 U112 U113
Y121 Y122 Y123
U121 U122 U123
Y221 Y222
W2W1
Z1
Z2
U221U211 U212 U213 U222 U223
U111 U112 U113 U121 U122 U123
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Inference and challenges
Based on the observed data Y = (Yij), we want to infer :• the parameters θ = (µ, ν, α, π, ρ)• the hidden status
Classical algorithm : Expectation Maximization algorithm1. Expectation-step :
calculate p(H|Y) → sometimes impossible2. Maximization-step :
maximize E[log p(Y,H; θ)|Y] with respect to θ→ generally similar to standard MLE.
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Expectation Maximization algorithm :a variational point of view
Neal and Hilton (1999), Wainwright and Jordan (2008)Notations : H = (Z ,W,U)
Lower bound of the log-likelihoodFor any distribution q(H) :
log p(Y) ≥ log p(Y)−KL[q(H), p(H|Y)]= Eq[log p(Y,H)] +H(q(H))
Link with EM :
log p(Y) = E[log p(Y,H)] + H(p(H|Y))
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Poisson-Gamma LBM - variational distributions
Mean-field approximationQ = set of factorisable distributions
Q = {q : q(E) =∏
`
q`(H`) = q1(Z)q2(W)q3(U)}
Optimal variational distributionsThe minimizer q∗
` of KL[q(H)||p(H|Y)] with respect to q−`
satisfies q∗` ∝ expEq−`
[log p(Y,H)] for all `.
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Poisson-Gamma LBM - inferenceE-step
Eq(Zik) ∝ πk exp(Aijr )Eq(Wjg )
Eq(Wjg ) ∝ ρg exp(Aijr )Eq(Zik )
Eq(Uijr ) ∝ aijr
bijr
Eq(logUijr ) ∝ ψ(aijr )− log bijr
with
Aijr = µiνjrαkgEq(Uijr ) + yijrEq(logUijr )
aijr = a +∑i,j,r
Eq(Zik)Eq(Wjg )µiνjrαkg
bijr = a +∑k,g
Eq(Zik)Eq(Wjg )yijr
and ψ the digamma function.
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Application to plant-microbial communities interactions
In collaboration with C. Mougel, IGEPP Rennes
Hyp : Plant genotype modifies the structure of the bacteria community inthe rhizosphere (the region of soil directly influenced by root secretionsand associated soil microorganisms).
Aim : Better understand these interactions
Long term perspective→ Reduction of nitrogen inputs responsible for various pollution(substainable agriculture) by controlling the soil
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Medicago truncatulaSmall annual legume : modelorganism in genomic research.Forms symbioses withnitrogen-fixing rhizobia andarbuscular mycorrhizal fungi.
M. truncatula genotype corresponding to the core collection usedin genome-wide association studies.
MiSeq high-throughput amplicon sequencing of Medicagotruncatula rhizosphere (2-3 replicates ∗ 172 genotypes)
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Application to Medicago truncatula rhizosphere
Results at genus level (288 genus)
Poisson-Gamma LBM with K=16 and G=12 (a = 7.6)
0
2000
4000
0 2000 4000Yfit
Val
ue
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Application to Medicago truncatula rhizosphere
log(µi) ∼ Zi log(νj) ∼Wj
−2.5
0.0
2.5
5.0
7.5
1 5 7 2 4 9 13 6 8 10 3 15 12 16 14 11Z
log(
Mu)
−0.2
−0.1
0.0
0.1
1 2 3 4 5 6 7 8 9 10 11 12W
log(
nu)
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Application to Medicago truncatula rhizosphere
Interpretation of the αi ,jlog(α16,10) = −0.32 : negative interaction2 bacteria are less abundant in the group 10 of 25 sampleslog(α16,7) = 0.16 : positive interaction
• Genotypes group : nothing obvious on the ecophysiology ofthe plant
• All the replicates of a genotype do not always have the samerhizosphere bacterial community
• Bacteria are not known : long to interpret the clusters
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Conclusions
• LBM for overdispersed count data• Parsimonious and complex model enables us to reduce datadimension
• ICL criterion to select number of groups• Parameters biologically interpretable
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
AcknowledgmentsFor experiments and biological expertise
C. Mougel and the other partners of the MetaRhizo project
S. Robin S. Schbath
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere
Raw reads(454, Illumina,
etc…)
TaxonomyAffiliation
using internatioa
nl data bases
Statistics, phylogeny,…
Clustering(k-mer based
filtering algorithms)
Filtering(single
singleton, chimera)
Demultiplexing
Trimming
Clean reads
Alignement
A dedicated bioinformatic workflow for microbial communities diversity analysis: the GnS-PIPE