Latent Block Model for Metagenomic data - Accueil |...

20
Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere Latent Block Model for Metagenomic data J. Aubert, S. Schbath, S. Robin [email protected] Recent Computational Advances in Metagenomics 2016 The Hague

Transcript of Latent Block Model for Metagenomic data - Accueil |...

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Latent Block Model for Metagenomic data

J. Aubert, S. Schbath, S. [email protected]

Recent Computational Advances in Metagenomics 2016The Hague

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Plant-microbial communities interactions in the rhizosphereRhizosphere : the region of soil directly influenced by root secretions andassociated soil microorganismsMetagenomics : Study of genetic material recovered directly fromenvironmental samples → Who is there ?

S1 S2 S3 . . . Sj . . . SmBact. 1 0 0 0 . . . y1j . . . 3Bact. 2 59 17 43 . . . y2j . . . 3

. . . . . . . . . . . . . . . . . . . . . . . .Bact. i yi1 yi2 yi3 . . . yij . . . yid

. . . . . . . . . . . . . . . . . . . . . . . .Bact. n 90 120 123 . . . ynj . . . 95

Seq. depth 4738 5157 6010 . . .∑n

i=1 yij . . . 5916

yij = number of sequences from sample j assigned to bacteria i .Count data dependent from the sequencing effort

Model-based biclustering

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Latent Block Model (Govaert and Nadif. 2003)Assumptions(Zi) ∼M(1, π = (π1, . . . , πK )) (Wj) ∼M(1, ρ = (ρ1, . . . , ρG))Latent variables (Zi) and (Wj) are independentThe random variables Yij are independent conditionally on labels

Model(Yij),i = 1, . . . , n et j = 1, . . . ,m matrix of observations.

Yij |(Zik = 1,Wjg = 1) ∼ f (.;µiνjαkg)

• µi : mean level of presence of one particular bacteria

• νj : scale factor correcting for sampling effort (around 1)

• αkg : interaction level between bacteria and sampling units withinthe (k, g) group

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Graphical representation

• Zi and Wj unobserved labels ofrow i and column j

• Yij depends on the labels.• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Graphical representation

• Zi and Wj unobserved labels ofrow i and column j

• Yij depends on the labels.

• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Graphical representation

Conditional distribution of (Z,W)

• Zi and Wj unobserved labels ofrow i and column j

• Yij depends on the labels.

• Zi and Wj are not independentconditionally on Yij→ p(Z,W|Y) intractableRegular EM algorithm cannotbe used

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Poisson-Gamma Latent Block Model

Negative Binomial Distribution - Poisson-Gamma mixture

Yijr |(Zik = 1,Wjg = 1) ∼ P(µiνjr Uijrαkg)

withUijr ∼ Gamma(a, a)

p(Z,W,U,Y) p(Z,W,U|Y)W2

W1

Z1

Z2

U221U211

Y223

U212 U213 U222 U223

Y111 Y112Y113

Y211 Y212 Y213

U111 U112 U113

Y121 Y122 Y123

U121 U122 U123

Y221 Y222

W2W1

Z1

Z2

U221U211 U212 U213 U222 U223

U111 U112 U113 U121 U122 U123

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Poisson-Gamma Latent Block Model

Negative Binomial Distribution - Poisson-Gamma mixture

Yijr |(Zik = 1,Wjg = 1) ∼ P(µiνjr Uijrαkg)

withUijr ∼ Gamma(a, a)p(Z,W,U,Y) p(Z,W,U|Y)

W2W1

Z1

Z2

U221U211

Y223

U212 U213 U222 U223

Y111 Y112Y113

Y211 Y212 Y213

U111 U112 U113

Y121 Y122 Y123

U121 U122 U123

Y221 Y222

W2W1

Z1

Z2

U221U211 U212 U213 U222 U223

U111 U112 U113 U121 U122 U123

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Inference and challenges

Based on the observed data Y = (Yij), we want to infer :• the parameters θ = (µ, ν, α, π, ρ)• the hidden status

Classical algorithm : Expectation Maximization algorithm1. Expectation-step :

calculate p(H|Y) → sometimes impossible2. Maximization-step :

maximize E[log p(Y,H; θ)|Y] with respect to θ→ generally similar to standard MLE.

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Expectation Maximization algorithm :a variational point of view

Neal and Hilton (1999), Wainwright and Jordan (2008)Notations : H = (Z ,W,U)

Lower bound of the log-likelihoodFor any distribution q(H) :

log p(Y) ≥ log p(Y)−KL[q(H), p(H|Y)]= Eq[log p(Y,H)] +H(q(H))

Link with EM :

log p(Y) = E[log p(Y,H)] + H(p(H|Y))

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Poisson-Gamma LBM - variational distributions

Mean-field approximationQ = set of factorisable distributions

Q = {q : q(E) =∏

`

q`(H`) = q1(Z)q2(W)q3(U)}

Optimal variational distributionsThe minimizer q∗

` of KL[q(H)||p(H|Y)] with respect to q−`

satisfies q∗` ∝ expEq−`

[log p(Y,H)] for all `.

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Poisson-Gamma LBM - inferenceE-step

Eq(Zik) ∝ πk exp(Aijr )Eq(Wjg )

Eq(Wjg ) ∝ ρg exp(Aijr )Eq(Zik )

Eq(Uijr ) ∝ aijr

bijr

Eq(logUijr ) ∝ ψ(aijr )− log bijr

with

Aijr = µiνjrαkgEq(Uijr ) + yijrEq(logUijr )

aijr = a +∑i,j,r

Eq(Zik)Eq(Wjg )µiνjrαkg

bijr = a +∑k,g

Eq(Zik)Eq(Wjg )yijr

and ψ the digamma function.

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Application to plant-microbial communities interactions

In collaboration with C. Mougel, IGEPP Rennes

Hyp : Plant genotype modifies the structure of the bacteria community inthe rhizosphere (the region of soil directly influenced by root secretionsand associated soil microorganisms).

Aim : Better understand these interactions

Long term perspective→ Reduction of nitrogen inputs responsible for various pollution(substainable agriculture) by controlling the soil

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Medicago truncatulaSmall annual legume : modelorganism in genomic research.Forms symbioses withnitrogen-fixing rhizobia andarbuscular mycorrhizal fungi.

M. truncatula genotype corresponding to the core collection usedin genome-wide association studies.

MiSeq high-throughput amplicon sequencing of Medicagotruncatula rhizosphere (2-3 replicates ∗ 172 genotypes)

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Application to Medicago truncatula rhizosphere

Results at genus level (288 genus)

Poisson-Gamma LBM with K=16 and G=12 (a = 7.6)

0

2000

4000

0 2000 4000Yfit

Val

ue

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Application to Medicago truncatula rhizosphere

log(µi) ∼ Zi log(νj) ∼Wj

−2.5

0.0

2.5

5.0

7.5

1 5 7 2 4 9 13 6 8 10 3 15 12 16 14 11Z

log(

Mu)

−0.2

−0.1

0.0

0.1

1 2 3 4 5 6 7 8 9 10 11 12W

log(

nu)

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Application to Medicago truncatula rhizosphere

Interpretation of the αi ,jlog(α16,10) = −0.32 : negative interaction2 bacteria are less abundant in the group 10 of 25 sampleslog(α16,7) = 0.16 : positive interaction

• Genotypes group : nothing obvious on the ecophysiology ofthe plant

• All the replicates of a genotype do not always have the samerhizosphere bacterial community

• Bacteria are not known : long to interpret the clusters

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Conclusions

• LBM for overdispersed count data• Parsimonious and complex model enables us to reduce datadimension

• ICL criterion to select number of groups• Parameters biologically interpretable

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

AcknowledgmentsFor experiments and biological expertise

C. Mougel and the other partners of the MetaRhizo project

S. Robin S. Schbath

Metagenomic data Latent Block Model Application to plant-microbial communities interactions in the rhizosphere

Raw reads(454, Illumina,

etc…)

TaxonomyAffiliation

using internatioa

nl data bases

Statistics, phylogeny,…

Clustering(k-mer based

filtering algorithms)

Filtering(single

singleton, chimera)

Demultiplexing

Trimming

Clean reads

Alignement

A dedicated bioinformatic workflow for microbial communities diversity analysis: the GnS-PIPE