Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth...

18
Computational Statistics and Data Analysis 78 (2014) 82–99 Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling S. Trevezas a,, S. Malefaki b , P.-H. Cournède a a Laboratory of Mathematics Applied to Systems, École Centrale Paris, Grande Voie des Vignes, 92290 Châtenay-Malabry, France b Department of Mechanical Engineering & Aeronautics, University of Patras, GR 26500 Rio Patras, Greece article info Article history: Received 20 February 2013 Received in revised form 7 April 2014 Accepted 9 April 2014 Available online 18 April 2014 Keywords: Plant growth model Hidden Markov model Monte Carlo ECM-type algorithm Metropolis-within-Gibbs Automated Monte Carlo EM algorithm Sequential importance sampling with resampling abstract Mathematical modeling of plant growth has gained increasing interest in recent years due to its potential applications. A general family of models, known as functional–structural plant models (FSPMs) and formalized as dynamic systems, serves as the basis for the current study. Modeling, parameterization and estimation are very challenging problems due to the complicated mechanisms involved in plant evolution. A specific type of a non-homogeneous hidden Markov model has been proposed as an extension of the GreenLab FSPM to study a certain class of plants with known organogenesis. In such a model, the maximum likelihood estimator cannot be derived explicitly. Thus, a stochastic version of an expectation conditional maximization (ECM) algorithm was adopted, where the E-step was approximated by sequential importance sampling with resampling (SISR). The complexity of the E-step creates the need for the design and the comparison of different simulation methods for its approximation. In this direction, three variants of SISR and a Markov Chain Monte Carlo (MCMC) approach are compared for their efficiency in parameter estimation on simulated and real sugar beet data, where observations are taken by censoring plant’s evolution (destructive measurements). The MCMC approach seems to be more efficient for this particular application context and also for a large variety of crop plants. Moreover, a data-driven automated MCMC–ECM algorithm for finding an appropriate sample size in each ECM step and also an appropriate number of ECM steps is proposed. Based on the available real dataset, some competing models are compared via model selection techniques. © 2014 Elsevier B.V. All rights reserved. 1. Introduction Mathematical modeling of plant development and growth has gained increasing interest in the last twenty years, with potential applications in agricultural sciences, plant genetics or ecology. Functional–structural plant models (FSPMs, Sievänen et al., 2000) combine the description of plant architectural development and ecophysiological functioning, and offer the most promising perspectives for a better understanding of plant growth (Vos et al., 2010). However, the parameterization of FSPMs is generally impeded by several difficulties: the complex and interacting mechanisms which guide plant’s evolution generally are translated into strongly nonlinear models involving a large number of equations and parameters; experimental protocols to collect detailed data are heavy and often inaccurate; finally, plant models are generally developed without an appropriate statistical framework. As a consequence, plant growth models often remain Corresponding author. Tel.: +33 0141131798. E-mail addresses: [email protected] (S. Trevezas), [email protected] (S. Malefaki), [email protected] (P.-H. Cournède). http://dx.doi.org/10.1016/j.csda.2014.04.004 0167-9473/© 2014 Elsevier B.V. All rights reserved.

description

Mathematical modeling of plant growth has gained increasing interest in recent years dueto its potential applications. A general family of models, known as functional–structuralplant models (FSPMs) and formalized as dynamic systems, serves as the basis for thecurrent study. Modeling, parameterization and estimation are very challenging problemsdue to the complicated mechanisms involved in plant evolution. A specific type of anon-homogeneous hidden Markov model has been proposed as an extension of theGreenLab FSPM to study a certain class of plants with known organogenesis. In such amodel, the maximum likelihood estimator cannot be derived explicitly. Thus, a stochasticversion of an expectation conditional maximization (ECM) algorithm was adopted, wherethe E-step was approximated by sequential importance sampling with resampling (SISR).The complexity of the E-step creates the need for the design and the comparison ofdifferent simulation methods for its approximation. In this direction, three variants of SISRand a Markov Chain Monte Carlo (MCMC) approach are compared for their efficiency inparameter estimation on simulated and real sugar beet data, where observations are takenby censoring plant’s evolution (destructive measurements). The MCMC approach seemsto be more efficient for this particular application context and also for a large varietyof crop plants. Moreover, a data-driven automated MCMC–ECM algorithm for finding anappropriate sample size in each ECM step and also an appropriate number of ECM steps isproposed. Based on the available real dataset, some competing models are compared viamodel selection techniques.

Transcript of Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth...

Page 1: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

Computational Statistics and Data Analysis 78 (2014) 82–99

Contents lists available at ScienceDirect

Computational Statistics and Data Analysis

journal homepage: www.elsevier.com/locate/csda

Parameter estimation via stochastic variants of the ECMalgorithm with applications to plant growth modelingS. Trevezas a,∗, S. Malefaki b, P.-H. Cournède a

a Laboratory of Mathematics Applied to Systems, École Centrale Paris, Grande Voie des Vignes, 92290 Châtenay-Malabry, Franceb Department of Mechanical Engineering & Aeronautics, University of Patras, GR 26500 Rio Patras, Greece

a r t i c l e i n f o

Article history:Received 20 February 2013Received in revised form 7 April 2014Accepted 9 April 2014Available online 18 April 2014

Keywords:Plant growth modelHidden Markov modelMonte Carlo ECM-type algorithmMetropolis-within-GibbsAutomated Monte Carlo EM algorithmSequential importance sampling withresampling

a b s t r a c t

Mathematical modeling of plant growth has gained increasing interest in recent years dueto its potential applications. A general family of models, known as functional–structuralplant models (FSPMs) and formalized as dynamic systems, serves as the basis for thecurrent study. Modeling, parameterization and estimation are very challenging problemsdue to the complicated mechanisms involved in plant evolution. A specific type of anon-homogeneous hidden Markov model has been proposed as an extension of theGreenLab FSPM to study a certain class of plants with known organogenesis. In such amodel, the maximum likelihood estimator cannot be derived explicitly. Thus, a stochasticversion of an expectation conditional maximization (ECM) algorithm was adopted, wherethe E-step was approximated by sequential importance sampling with resampling (SISR).The complexity of the E-step creates the need for the design and the comparison ofdifferent simulation methods for its approximation. In this direction, three variants of SISRand a Markov Chain Monte Carlo (MCMC) approach are compared for their efficiency inparameter estimation on simulated and real sugar beet data, where observations are takenby censoring plant’s evolution (destructive measurements). The MCMC approach seemsto be more efficient for this particular application context and also for a large varietyof crop plants. Moreover, a data-driven automated MCMC–ECM algorithm for finding anappropriate sample size in each ECM step and also an appropriate number of ECM steps isproposed. Based on the available real dataset, some competing models are compared viamodel selection techniques.

© 2014 Elsevier B.V. All rights reserved.

1. Introduction

Mathematical modeling of plant development and growth has gained increasing interest in the last twenty years,with potential applications in agricultural sciences, plant genetics or ecology. Functional–structural plant models (FSPMs,Sievänen et al., 2000) combine the description of plant architectural development and ecophysiological functioning,and offer the most promising perspectives for a better understanding of plant growth (Vos et al., 2010). However, theparameterization of FSPMs is generally impeded by several difficulties: the complex and interacting mechanisms whichguide plant’s evolution generally are translated into strongly nonlinear models involving a large number of equationsand parameters; experimental protocols to collect detailed data are heavy and often inaccurate; finally, plant models aregenerally developed without an appropriate statistical framework. As a consequence, plant growth models often remain

∗ Corresponding author. Tel.: +33 0141131798.E-mail addresses: [email protected] (S. Trevezas), [email protected] (S. Malefaki), [email protected] (P.-H. Cournède).

http://dx.doi.org/10.1016/j.csda.2014.04.0040167-9473/© 2014 Elsevier B.V. All rights reserved.

Page 2: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 83

descriptive without a real predictive capacity. Efforts have thus been undertaken in the recent years to develop methods forparameter estimation and uncertainty assessment adapted to complex models of plant growth (Ford and Kennedy, 2011;Cournède et al., 2011; Trevezas and Cournède, 2013). In this paper, a certain class of plants with known organogenesis (inplants, organogenesis is the process of creation of new organs) is studied, whose growth is modeled by GreenLab FSPM (deReffye and Hu, 2003). A lot of agronomic plants can be modeled in this way, like maize (Guo et al., 2006), rapeseed (Jullienet al., 2011), grapevine (Pallas et al., 2011) or even trees (Mathieu et al., 2009). The parameters of the model are related toplant functioning. The vector of observations consists of organ masses, measured only once by censoring plant’s evolutionat a given observation time (destructive measurements).

In Cournède et al. (2011), a first approach for parameter estimation was introduced but based on the rather restrictiveassumption of an underlying deterministicmodel of biomass production and uncorrelated errors in themassmeasurementsof different organs in the plant structure. In Trevezas and Cournède (2013), the authors proposed amore general frameworkfor statistical analysis which can potentially be applied to a large variety of plant species by taking into account processand measurement errors. They provided a frequentist-based statistical methodology for parameter estimation in plantswith deterministic organogenesis rules. This framework can also serve as the basis for statistical analysis in plant modelswith stochastic organogenesis (see Kang et al., 2008 for the description of GreenLab with stochastic organogenesis). Thebasic idea consists in describing data (organ masses) measurements as resulting from the evolution of a non-homogeneoushidden Markov model (HMM), where the hidden states of the model correspond to the sequence of unknown biomasses(masses measured for living organisms) produced during successive growth cycles. In such a complexmodel, the maximumlikelihood estimator (MLE) cannot be derived explicitly and for this reason aMonte Carlo ECM-type (Expectation ConditionalMaximization) algorithm (Dempster et al., 1977; Meng and Rubin, 1993; Jank, 2005b; McLachlan and Krishnan, 2008) wasadopted to compensate for the non-explicit E-step and also the non-explicitM-step. The authors used sequential importancesamplingwith resampling (SISR) to simulate from the hidden states given the observed data. TheM-step is performedwith aconditionalmaximization approach (see, ECM inMeng and Rubin, 1993), inwhich the parameters of themodel are separatedinto two groups, one forwhich explicit updates can be derived by fixing the parameters of the other group, and one forwhichupdates are derived via numerical maximization.

Due to the typically large number of equations and time steps to consider in plant growth models, the computationalload is an important factor to take into account. Consequently, the efficiency of the estimation algorithms is a key issue toconsider, especially when the final objective is decision-aid in agriculture. Likewise, as one of the objectives of FSPM is tobe able to differentiate between genotypes (two different genotypes should be characterized by two different vectors in theparameter space Letort, 2008 and Yin and Struik, 2010), the accuracy of the estimation algorithms has to be assessed. Inthis context, it is very important to profit from advanced simulation techniques in order to reduce the Monte Carlo errorassociatedwith a given estimation algorithm. For this reason, we focus on the comparison of different simulation techniqueswhich are performed in the E-step. The resulting approximation of the so-called Q -function (computed in the E-step) iscrucial to the quality of parameter estimation. Themost efficient algorithm can subsequently be used to calibrate agronomicplants with the method of MLE and then make model comparison and selection. An example of this type is presented in thecurrent paper based on a dataset from the sugar beet plant. Moreover, in order to enhance computational efficiency, thedesign of automated and data driven algorithms should help by making an efficient use of Monte Carlo resources, for thebenefit of the users. The above arguments motivate the current study.

In this paper,we compare different simulation techniques to perform the E-step. In particular, three variants of sequentialimportance sampling with resampling (SISR) and a Markov Chain Monte Carlo algorithm (MCMC). The three variantsconcern: (i) the SISR presented in Trevezas and Cournède (2013), where the resampling step is multinomial (Gordon et al.,1993), (ii) a modification of the previous algorithm by performing the resampling step with a combination of residual andstratified resampling (see, eg., Cappé et al., 2005 and references therein) and (iii) a sequential importance sampling algorithmwith partial rejection control (see, eg., Liu et al., 1998, 2001). The variant of the MCMC algorithm that we developed is ahybrid Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990), where simulations from the full conditionaldistributions of the hidden states were replaced by Metropolis–Hastings (MH) steps (Metropolis et al., 1953; Hastings,1970). Having as target to further optimize the MCMC approach and to simplify a routine use of the method, a data-drivenautomated algorithm is proposed. Moreover, a data-driven automated algorithm is proposed in order to simplify a routineuse of the proposedmethod. Themain benefits of such an algorithm in the EM context concern the automatic determinationof the Monte Carlo sample size in each EM step and of the total number of EM steps. One of the most commonly used inpractice algorithms of this type which is efficient and computationally cheap is the one presented in Caffo et al. (2005). Weadapted this algorithm to our context to find an appropriate sample size in each ECM step and also an appropriate numberof ECM steps. The details can be found in Section 4. Simulation studies from a synthetic example and also a real dataset fromthe sugar-beet plant were used to illustrate the performance of the competing algorithms.

The MCMC–ECM algorithm proved to be the most efficient in parameter estimation for the plant growth model that westudy in this paper. Additional to the significant reduction of the Monte Carlo error, the MCMC algorithm revealed anotheradvantage compared to SISR in the specific context of this study. Since plant organs have generally large expansion durations,then by censoring plant’s evolution at a given time, a whole batch of organs will have not completed their expansion(immature organs). As will be explained in Section 3, the MCMC algorithm can better handle this type of asymmetry. Theautomated version ofMCMC–ECMwas thus selected tomake a further statistical inference. In particular, two different typesof hidden Markov models were described and tested on a real dataset for their fitting quality.

Page 3: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

84 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

The rest of this paper is organized as follows. In Section 2, we review the basic assumptions of the GreenLab FSPM andwegive a short description of the non-homogeneous hidden Markov model developed in Trevezas and Cournède (2013). Wedescribe as well a new competing model which operates in the log-scale and give the framework for making MLE feasiblewithin the framework of EM-type algorithms. In Section 3, we describe the MCMC approximation to the Q -function of theE-step and compare the current approach based onMCMCwith the one proposed in Trevezas and Cournède (2013) based onthree variants of SISR. AutomatedMonte-Carlo EMalgorithms are reviewed in Section 4, and the adaptation of the automatedalgorithm of Caffo et al. (2005) in our context is also provided. The resulting automated MCMC–ECM is compared with thenon-automated one in synthetic examples. In Section 5, the performance of the aforementioned algorithms is tested on areal dataset from the sugar-beet plant and a model comparison is also achieved. Finally, in the last section an extendeddiscussion is provided.

2. Description of the plant growth model

In this section we recall the basic assumptions of the GreenLab model and its formulation as an HMM given inTrevezas and Cournède (2013). Additionally, we propose a new candidate model and describe an appropriate version of anECM-algorithm for MLE. Starting with the initial mass of the seed q0, plant development is considered to be the result of acyclic procedure. The cycle duration is determined by the thermal time needed to set up new organs in the plant and is calledGrowth Cycle (GC). At eachGC the available biomass is allocated to organs in expansion and at the same time new biomass isproduced by the green (non-senescent) leaves and will be available for allocation at the next GC . The set of different classes(types) of organs of a plant is denoted by O. In our application context with the sugar-beet plant, the set of organs consistsof blade b, petiole p and the root r , i.e. O = b, p, r. Let us now give the assumptions concerning biomass production andallocation.

2.1. Modeling assumptions

In the rest of the paper, we use the compact notation xi:j for vectors (xi, . . . , xj), where i ≤ j.

Assumption 1. (i) At the n-th GC , denoted by GC(n), the produced biomass qn is fully available for allocation to allexpanding (preexisting + newly created) organs and it is distributed proportionally to the class-dependent empiricalsink functions given by

so(i ; poal) = po c(ao, bo)i + 0.5

to

ao−1 1 −

i + 0.5to

bo−1

, i ∈ 0, 1, . . . , to − 1, (1)

where poal = (po, ao, bo) ∈ R∗+

×[1, +∞)2 is the class specific parameter vector with (po)o∈O a vector of proportionalityconstants representing the sink strength of each class (by convention pb = 1), to is the expansion time period for organsbelonging to the class o ∈ O and c(ao, bo) is the normalizing constant of a discrete Beta(ao, bo) function, where itsunnormalized generic term is given by the product of the two last factors of (1).

(ii) As in Lemaire et al. (2008), we suppose that expansion durations are the same for blades and petioles and T denotestheir common value: tb = tp = T .

We denote by pal , (poal)o∈O the vector of all allocation parameters and by (Non)o∈O the vector of organs preformed at

GC(n), for all n ∈ N (determined by plant organogenesis, and deterministic in this study).

Definition 1. The total biomass demand at GC(n), denoted by dn, is the quantity expressing the sum of sink values of allexpanding organs at GC(n).

Since we consider that there is only one root compartment and the fact that an organ is in its ith expansion stage if andonly if it has been preformed at GC(n − i) (see Assumption 1), we have that

dn(pal) =

o∈O−r

min(n,T−1)i=0

Non−i so(i; p

oal) + sr(n; pral). (2)

Except for the initial mass of the seed q0 subsequent biomasses qnn≥1 are the result of photosynthesis by leaf blades.

Definition 2. (i) The photosynthetically active blade surface at GC(n + 1), denoted by sactn , is the quantity expressingthe total surface area of all leaf blades that have been preformed until GC(n) and will be photosynthetically active atGC(n + 1),

(ii) the ratio (percentage) of the allocated ql which contributes to sactn will be denoted by π actl,n .

Assumption 2. (i) The initial mass of the seed q0 is assumed to be fixed and known,(ii) the leaf blades have a common photosynthetically active period which equals T ,(iii) the leaf blades have a common surfacic mass denoted by eb.

Now, we describe how biomasses qnn≥1 are obtained.

Page 4: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 85

Assumption 3. In the absence of modeling errors, the sequence of produced biomasses qnn≥1 is determined by thefollowing recurrence relation known as the empirical Beer–Lambert law (see Guo et al., 2006):

qn+1 = Fn(q(n−T+1)+:n, un+1; p) = un+1 µ spr1 − exp

−kB

sactn (q(n−T+1)+:n; pal)spr

, (3)

where x+= max(0, x), un denotes the product of the photosynthetically active radiation during GC(n) modulated by a

function of the soil water content, p , (µ, spr , kB, pal), µ is the radiation use efficiency, spr is a characteristic surface thatrepresents the two-dimensional projection on the ground, of the space potentially occupied by the plant, kB is the extinctioncoefficient in the Beer–Lambert extinction law, sactn is given by

sactn (q(n−T+1)+:n; pal) = e−1b

nl=(n−T+1)+

π actl,n (pal) ql, (4)

and

π actl,n (pal) =

1dl(pal)

min(l,l+T−n−1)j=0

Nbl−jsb(j; p

bal), (n − T + 1)+ ≤ l ≤ n, (5)

where dl is given by (2), sb by (1) and Nbn is the number of blades preformed at GC(n).

Note that qn+1 also depends on pal, but only through sactn , and that p could have lower dimension if some of the afore-mentioned parameters are fixed or calibrated in the field.

In Trevezas and Cournède (2013) the available data Y were rearranged sequentially into sub-vectors Yn by taking intoaccount the preformation time (one GC before their first appearance) of all available organs except for the root mass whichwas excluded from the data vector. In this paper we use the same data decomposition and we also indicate a way to takeinto account the root mass. Each sub-vector Yn contains the masses of the organs which are preformed at GC(n). Wheneverthe root mass is included, it is contained in the last sub-vector. If we denote by Gn the vector-valued function that expressesthe theoretical masses of all the different classes of organs which started their development at GC(n), then by summing theallocated biomass at each expansion stage and Assumption 1 we obtain directly

Gn(qn:(n+T−1); pal) =

T−1j=0

qj+n

dj+n(pal)so(j; poal)

o∈O−r

. (6)

The theoretical root mass, whenever included, is given by

Gr(q0:N; pal) =

Nj=0

qjdj(pal)

sr(j; pral). (7)

The following assumptions determine the stochastic nature of the model.

Assumption 4. Let (Wn)n∈N and (Vn)n∈N be two mutually independent sequences of i.i.d. random variables and vectors re-spectively, independent of Q0, whereWn ∼ N (0, σ 2) and Vn ∼ Nd(0, Σ), with Σ an unknown covariance matrix and d thecardinality of O − r. By setting No

n = 1, ∀o ∈ b, p two types of model equations will be assumed:(a) model M1: for n ≥ 0,

Qn+1 = Fn(Q(n−T+1)+:n; p)(1 + Wn), (8)

Yn = Gn(Qn:(n+T−1); pal) + Vn, (9)(b) model M2: for n ≥ 0,

Qn+1 = Fn(Q(n−T+1)+:n; p) eWn , (10)

Yn = Gn(Qn:(n+T−1); pal) eVn , (11)

where Fn is given by (3), Gn is given by (6), ex , (ex1 , . . . , exd) for a d-dimensional vector x = (x1, . . . , xd) and x y isthe Hadamard (entrywise) product of two vectors.

Remark 1. (i) Assumption 4(a) corresponds to the model equations adopted in Trevezas and Cournède (2013).(ii) When a dataset Y0:N is available and the root mass is included, then the dimension of YN ,GN and VN given in (9) or (11)

is increased by one to incorporate the root mass given by (7) observed with error Vn,d+1 ∼ N (0, σ 2r ).

Both models given above correspond to state-space models with state sequence Q, satisfy Assumptions 1–3, and differin the state and observation equations given by Assumptions 4(a) or 4(b).

Now, we give their equivalent formulation as hidden Markov models (HMM), see Cappé et al. (2005). The proof is directand will be omitted.

Page 5: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

86 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Proposition 1. Under Assumptions 1–4, the bivariate stochastic process (Q, Y) defined on a probability space (Ω, F , Pθ ), whereθ = (p, Σ) or (p, σ 2, Σ) can be represented as an HMM, where(i) the hidden sequenceQ, with values in R+, evolves as a time-inhomogeneous T-th order Markov chain with initial distribution

Pθ (Q0 ∈ ·) = δq0(·) (dirac at q0), where q0 ∈ R∗+, and transition dynamics due to Assumption 4(a) for model M1:

Pθ (Qn+1 ∈ · | Q(n−T+1)+:n) ≈ NFn(Q(n−T+1)+:n; p), σ

2F 2n (Q(n−T+1)+:n; p)

, (12)

and due to Assumption 4(b) for model M2:

Pθ (Qn+1 ∈ · | Q(n−T+1)+:n) = logNlog Fn(Q(n−T+1)+:n; p), σ

2 , (13)

where logN stands for the log-normal distribution,(ii) the observable sequence Y, with values in (R+)d, conditioned on Q forms a sequence of conditionally independent random

vectors and each Yn given Q depends only on Qn:(n+T−1) with conditional distribution due to Assumption 4(a) for model M1:

Pθ (Yn ∈ · | Qn:(n+T−1)) ≈ NdGn(Qn:(n+T−1); pal), Σ

, (14)

and due to Assumption 4(b) for model M2:

Pθ (Yn ∈ · | Qn:(n+T−1)) = logNdlogGn(Qn:(n+T−1); pal), Σ

, (15)

where log x , (log x1, . . . , log xd) for a d-dimensional vector x = (x1, . . . , xd).

Remark 2. The model M1 is the one assumed in Trevezas and Cournède (2013) and normality in (12) and (14) is only validapproximately (with small variances) since we deal with positive r.v.

2.2. Maximum likelihood estimation

The available data y0:N contain organ masses, measured at a given GC(N) by censoring plant’s evolution (destructivemeasurements). In Cournède et al. (2011) a parameter identification method was proposed for the GreenLab model inthe absence of modeling errors in biomass production (σ 2

= 0) and correlation (diagonal covariance matrix Σ) in themass measurements. In Trevezas and Cournède (2013) the method was extended to cover the case of a special type ofmodeling errors and to introduce correlation in the mass measurements. The authors managed to estimate the parameterbased on an appropriate stochastic variant of a generalized EM-algorithm (Dempster et al., 1977; Meng and Rubin, 1993;Jank, 2005b; McLachlan and Krishnan, 2008). Each iteration of an EM algorithm consists of an expectation step (E-step)and a maximization step (M-step). The E-step involves the computation of the conditional expectation of the completedata log-likelihood given the observed data under the current parameter value (called Q -function). In the M-step, theparameters are updated bymaximizing the Q -function of the E-step.When the integral involved in the E-step is analyticallyintractable, then the Q -function, denoted here by Q(θ; θ ′), should be approximated. Several efforts have been made in thisdirection, e.g., the Stochastic EM (SEM) (Celeux and Diebolt, 1985), the Monte Carlo EM (MCEM) (Wei and Tanner, 1990),the Stochastic Approximation EM (SAEM) (Delyon et al., 1999), as well as the Quasi-Monte Carlo EM (Jank, 2005a). Thecommon characteristic of the aforementioned variants is the approximation of the Q -function by simulating the hiddenstate sequence from its conditional distribution given the observed data (see Jank, 2005b and Jank, 2006). In the context ofhidden Markov models (Cappé et al., 2005) the two most popular and efficient simulation mechanisms concern SISR, seeGordon et al. (1993), Doucet et al. (2001), Cappé et al. (2005), andMCMC, seeMetropolis et al. (1953), Hastings (1970), Gemanand Geman (1984), Gelfand and Smith (1990). The resulting algorithms will be referred to as the SISR–EM and MCMC–EMalgorithm. In order to perform the E-step for the HMMM1 the authors in Trevezas and Cournède (2013) approximate the Q -function via an SISR estimate Q(θ; θ ′). In the next section we propose an approximation of the Q -function based on MCMC.The estimate of the Q -function can be expressed in a unified way as:

Q(θ; θ ′) =

Mi=1

wi log pθ (q(i)0:N , y0:N), (16)

where pθ (q0:N , y0:N) is the density function of the complete model when the true value is θ and wi, q(i)0:N is a weighted

M-sample (wi, q(i)0:N andM depend on θ ′) from the conditional distribution of the hidden states q0:N given the observed data

y0:N when the true parameter is θ ′. In the case of an MCMC estimate the weights wi are equal to 1/M .Very often in real-life applications the M-step is analytically intractable as well. Unfortunately, any stochastic EM-type

algorithm that can be designed for the HMMs M1 and M2 given by Proposition 1 leads to a non-explicit M-step as well. Forthis reason, a numerical maximization procedure of quasi-Newton type could be implemented at each iteration of a stochas-tic EM algorithm (see Trevezas and Cournède, 2013) in the same way as it is implemented in a deterministic EM algorithm(see Lange, 1995). Nevertheless, it is certainly desirable, whenever possible, for computational cost and accuracy reasonsto reduce the number of parameters to be updated via numerical maximization. A way to overcome a complicated M-stepwas proposed in Meng and Rubin (1993) with the so-called ECM (Expectation Conditional Maximization) algorithm, wherethe authors separated the intractable M-step into smaller tractable conditional M-steps and updated in a cyclic fashion theparameters of the model. In order to perform the M-step for the HMM M1 the authors in Trevezas and Cournède (2013)

Page 6: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 87

combined conditional and numerical maximization. First, they updated explicitly in a conditional maximization step theparameters which have explicit updates given fixed values of the rest and then updated the rest of the parameters by theBroyden–Fletcher–Goldfarb–Shanno (BFGS) quasi-Newton algorithm. This approach is also adopted here for both modelsM1 and M2. Let Q(θ; θ (t)) denote the approximation of Q (θ; θ (t)) given by (16) in the t-th EM-iteration (θ ′

= θ (t)) and(θ1, θ2) be a decomposition of θ in two sub-vectors, where θ1 can be explicitly updated given θ2. The maximization ofQ(θ; θ (t)) with respect to θ = (θ1, θ2) is described by the following two steps:

θ(t+1)1 = argmax

θ1Q(θ1, θ

(t)2 ; θ

(t)1 , θ

(t)2 ), (17)

θ(t+1)2 = BFGSmax

θ2Q(θ

(t+1)1 , θ2; θ

(t)1 , θ

(t)2 ),

where the notation BFGSmax corresponds to the solution of the maximization problem with the BFGS quasi-Newtonalgorithm. The explicit step (17) corresponding to model M1 can be found in Trevezas and Cournède (2013). The solution to(17) for the model M2 is given below. The proof is provided in the Appendix.

Proposition 2. Let θ = (θ1, θ2), where θ1 = (µ, Σ) and θ2 contains all parameters of p except for µ. The update equations forθ1 are given as follows:

µN(θ2; θ ′) = µ′ exp

N−1

Nn=1

Eθ ′ [logQn − log Fn−1(θ2) | y0:N ]

, (18)

ΣN(θ2; θ ′) = (N + 1)−1N

n=0

Eθ ′

log yn − logGn(θ2)

log yn − logGn(θ2)

⊤| y0:N

. (19)

If σ 2 is estimated as well, then its update equation is given by:

σ 2N(θ2; θ ′) = N−1

Nn=1

Eθ ′

logQn − log Fn−1(θ2) + logµ′

2| y0:N

−logµN(θ2; θ ′)

2. (20)

3. MCMC approximation of the Q -function

In this section we propose a suitable approximation of the Q -function by using anMCMC algorithm andwe compare thisapproach with the one based on SISR developed in Trevezas and Cournède (2013), and with two improvements of the latteralgorithm that will be briefly described in Section 3.2.

3.1. E-step via Markov Chain Monte Carlo

At each iteration of the EM-algorithm, the basic problem is to sample in the most effective way from pθ ′(q1:N | q0, y0:N),where θ ′ is the current value of the model parameters. For the rest of this paper, we alleviate the index θ ′ since we focus onthe general sampling problem (θ ′ is known and fixed at each iteration). Thus, conditionally on Q0 = q0 and Y0:N = y0:N , thehidden states are sampled from:

Q1:N ∼ p(q1:N |q0, y0:N) ∝ p(q1:N , y0:N |q0). (21)One of the most important MCMC algorithms for sampling from a multidimensional distribution (such as (21)) is the

Gibbs sampler (Geman and Geman, 1984; Gelfand and Smith, 1990). The Gibbs sampler uses only the full conditionaldistributions in order to sample a Markov chain with stationary distribution the corresponding multidimensionaltarget one. The full conditional distribution of Qn given all the other variables, denoted by πn(qn|q0:n−1, qn+1:N) ,p(qn|q0:n−1, qn+1:N , y0:N) corresponding to model M1, defined by Eqs. (12) and (14), can be written in the form:

πn(qn|q0:n−1, qn+1:N) ∝

(n+T )∧Ni=n+1

1 − exp

−δ sacti−1(qn)

−1

× exp

12σ 2

qn

Fn−1(qn)− 1

2

+

(n+T )∧Ni=n+1

qi

Fi−1(qn)− 1

2

× exp

12

ni=(n−T+1)+

(yi − Gi(qn))⊤Σ−1(yi − Gi(qn))

, (22)

where δ = kB/spr > 0, see (3), and all the other quantities that appear in this expression are explained in Section 2 andare expressed here only as functions of qn. The full conditional distributions corresponding to model M2 can be defined in asimilar manner.

Page 7: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

88 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Clearly, direct simulation from (22) is impossible. For this reason, alternative sampling techniques are required,such as a hybrid Gibbs sampler. The hybrid Gibbs sampler is a Gibbs sampler where at least one of the simulationsof the full conditional distributions is replaced by a Metropolis–Hastings (MH) step (Metropolis et al., 1953; Hastings,1970). Let πn, n = 1, . . . ,N , be the densities of the unnormalized full conditional distributions given by (22) andfn(zn|q1:n−1, qn+1:N), i = 1, . . . ,N be the densities of the proposal distributions. Let also

α(qt−1n , zn) = min

1,

πn(zn|qt1:n−1, qt−1n+1:N)

πn(qt−1n |qt1:n−1, q

t−1n+1:N)

×fn(qt−1

n |qt1:n−1, zn, qt−1n+1:N)

fn(zn|qt1:n−1, qt−1n , qt−1

n+1:N)

denote the acceptance probability of the MH-step. The hybrid Gibbs sampler can be described as follows:

Initialize q01:NFor t = 1 toM

For n = 1 to NDraw zn ∼ fn(qn|qt1:n−1, q

t−1n+1:N)

Set qtn = zn with probability α(qt−1n , zn)

otherwise set qtn = qt−1n

EndEnd

The proposal distribution can be chosen arbitrarily with the limitation to satisfy the conditions that ensure theconvergence to the target distribution (Meyn and Tweedie, 1993; Robert and Casella, 2004). Nevertheless, the proposaldistribution affects significantly the convergence rate of the Markov chain to the target distribution. The convergence isfaster if the proposal is closer to the target. Moreover, it should be easy to sample from and not computationally expensive.In this paper, we used as a proposal distribution for the hidden states the one resulting from the prior transition kernel ofthe hidden chain under the current parameters’ values given by (12) for model M1 and by (13) for model M2. In the nextsubsection we give numerical evidence that the MCMC–ECM algorithm based on this type of proposal distributions wasgenerally more efficient than all the versions of the SISR–ECM. Even if the latter versions involve more informative proposaldistributions (Trevezas and Cournède, 2013, p. 258) than using the prior transition kernel, this seeming advantage is notenough to outperform the gain from the smoothing property of the Gibbs sampler, where at each conditional simulation(for a given sweep of the algorithm) of a hidden state, all data are taken into account. For this reason, we kept the proposalsin the Metropolis–Hastings step as simple as possible. We also tried a Random-Walk Metropolis–Hastings with differentvariances and the results that we obtained were worse.

3.2. MCMC versus sequential importance sampling using simulated datasets

In order to evaluate the effect of the MCMC approximation of the Q -function in parameter estimation and to comparethis approach with the one proposed by Trevezas and Cournède (2013) using SISR, we performed several tests withsimulated data. Moreover, we add in this comparison two improved versions of the original SISR–ECM, which was based onmultinomial resampling, by improving the resampling step.

The first one consists in replacing multinomial resampling with a combination of residual and stratified resampling.Residual sampling is a variance reduction technique, which is useful when sampling should be performed, and it is alsoknown as remainder stochastic sampling (Whitley, 1994). Stratified sampling is based on the idea of stratification (Kitagawa,1996), where the support (0, 1] of the uniform distribution is partitioned into different strata. This method is based on analternative way of generating the uniform random variables, which are needed for the inversion method in the resamplingstep, from the strata instead of just (0, 1]. Both resampling techniques dominate multinomial resampling in the sense thatthe conditional variances of both the residual and the stratified sampling are always lower than that ofmultinomial sampling(see, eg., Cappé et al., 2005). It is also known that the combination of these two sampling mechanisms can only decrease theconditional variance (Cappé et al., 2005, p. 248, Remark 7.4.5). The details of the proposed implementation can be found inCappé et al. (2005).

The second one is sequential importance sampling with partial rejection control (Liu et al., 2001), which is a combinationof rejection control (Liu et al., 1998) and resampling and can be seen as a delayed resampling as explained in Liu et al. (2001).The rejection control method enables the rejuvenation of the particles in appropriately chosen check-points. These pointscan be chosen statically (in advance) or dynamically (according to a control threshold). In the variant that we implemented,the check-points are selected dynamically, and rejection control takes place when the effective sample size drops belowa first threshold, exactly as in the case of the multinomial resampling. In particular, when a check-point is encountered, asecond control threshold is computed, which in our case is the median of the weights, and particles with weights inferiorto the control threshold are only accepted with probability equal to the ratio of their weights to the control threshold.When a rejection takes place in the classical rejection control algorithm, the particle is totally discarded and a new oneis proposed. The procedure is repeated, until the proposed particle survives from all the previous checkpoints. The partialrejection control overcomes the computational inefficiency of this procedure by introducing a delayed resampling step.Instead of starting from scratch, when a rejection takes place, new states for the current particle are only simulated from

Page 8: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 89

Table 1Parameters’ values used to generate the data (for σ ∈ 0.02, 0.1), where σb, σp and ρ are the standarddeviations and the correlation coefficient of the measurement error model. The explanation of the otherparameters is given in Section 2. The parameters that should be estimated are given in the first column.

param. unknown param. known param. known

ab 3 q0 0.003 eb 0.0083ap 3 ar 5.5 br 2pp 0.8165 pr 400 bb 2µ−1 100 kB 0.7 bp 2σb 0.05 tr 100σp 0.05 T 10ρ 0.8 spr 500

Fig. 1. Parameter estimation during 100 ECM iterations for four model parameters, µ−1 (leaf resistance), pp (sink petiole), ab (alpha blade), and ap (alphapetiole), by using three independent realizations of the MCMC–ECM algorithm (σ = 0.1). In every iteration of the ECM, the sample size was fixed at150.000. The dotted lines correspond to the parameters used to generate the data.

the previous checkpoint. The previous states of the current particle are selected by drawing one particle from those thathad survived at the previous checkpoint. A residual and stratified resampling mechanism can be chosen also in this case formore efficiency. Further details for the proposed implementation can be found in Liu et al. (2001).

In the following synthetic example, we present a comparison of the aforementioned competing algorithms. All the testswere run in one core of a Xeon X5650 (2.67 GHz). We generated data vectors y0:N from the model M1 for N = 50 and forseveral values of σ and we present here the cases where σ ∈ 0.02, 0.1. The parameters’ values that we used to simulatethe data are presented in Table 1.

As a stopping criterion for the EM algorithmwe used a predefined number of EM steps (100 EM steps) just for comparisonpurposes. For each independent run of the algorithm, the sample size was increased piecewise linearly (with increasingslope) for the first 50 iterations (starting from250, then increasedby10 for the first 25 iterations andby20 for the subsequent25 iterations), and for the last 50 iterationswe used a quadratic increase until we reached a size of 10.000. In the next sectiona more sophisticated method for the specification of the EM steps and also the Monte Carlo sample size in each EM step isproposed. The burn-in period for the MCMC was fixed at 500 iterations. For a similar type of simulation schedule and somediscussion on some alternatives see Cappé et al. (2005).

In such a model the theoretical solution of the MLE is unknown, but several runs from different initial values could beperformed and if more than one solution exists, the estimated log-likelihood values corresponding to the different solutionsare compared.Moreover, the convergence properties of these algorithms have only been established for likelihood functionsthat belong to (curved) exponential families (Fort andMoulines, 2003). Here, it is not generally the case. Nevertheless, a goodapproximation of the theoretical solution can be obtained in a preliminary stage by simulating (any stochastic version canbe used) with a very large sample size (Cappé et al., 2005). In Fig. 1, we give an example of this procedure for one of the testspresented here. Notice that the parameter paths which result from 3 independent realizations of the algorithm are almostidentical.

Page 9: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

90 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Fig. 2. Boxplots of the estimates of six parameters (row-wise: µ−1, pp, ab, ap, σb and ρ) based on the synthetic example of the paper when σ = 0.02 forthe four competing algorithms: sequential importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR),(iii) partial rejection control (SISprc) and finally Markov chain Monte Carlo (MCMC). The estimates are based on 200 independent runs for each algorithmand averaging is used for the last 25 ECM iterations of each run.

Table 2Mean CPU execution times (time/run) for all the competing algorithms and both values ofσ ∈ 0.02, 0.1 corresponding to the synthetic example presented in Figs. 2 and 3.

SISmR SISrsR SISprc MCMC

σ = 0.02 5 m 27 s 5 m 50 s 6 m 24 s 7 m 19 sσ = 0.10 6 m 43 s 6 m 35 s 7 m 48 s 8 m 35 s

In order to judge the effectiveness of parameter estimation for the different algorithms, we performed 200 independentruns of the algorithms. The sample distribution of the resulting estimates is a good index of theMonte Carlo error associatedwith the solution obtained from a single run of each algorithm under a given simulation schedule. We present in Figs. 2 and3 the boxplots of the solutions from the independent runs, for σ = 0.02 and σ = 0.1 respectively. The ranking of thealgorithms w.r.t. their concentration around the mean value (the one obtained by averaging the independent realizations)was the same when different datasets were generated from the same parameter. In all our tests the worst performancewas that of the SISR with multinomial resampling and the best performance was that of the MCMC. The other types of SISR(residual and stratified resampling versus partial rejection control) had similar performance. All algorithms give similarmeans for both values of σ and the means were closer when σ was smaller. We also noticed with some supplementarytests that as σ increases the superiority of MCMC–ECM is clearer, that is, as compared to the other algorithms, it gives muchmore concentrated estimates of the structural parameters (the first four) for independent runs of the algorithms. In theseexamples, for very small values of σ , i.e., σ < 0.01, all the algorithms achieved comparable results.

Notice also that the mean estimates that we obtain for the structural parameters with both algorithms are closer to thetrue ones (see also Table 1) when σ = 0.02, and this could be expected since as σ increases (directly related to the modeluncertainty), the uncertainty for the values of the structural parameters becomes larger.

Themean CPU time per run is given for all the algorithms and both values of σ in Table 2.We remark that theMCMC–ECMis slightly more computationally expensive, but since it has significantly less Monte Carlo error, its overall performance isthe best one.

We also tested the effect of the averaging technique developed by Fort and Moulines (2003) (see also Cappé et al.,2005, p. 407). The authors proposed to smooth parameter estimates during the last iterations of the ECM algorithm byusing a weighted moving average of the last ECM-updates. The averaging starts near the convergence region at iteration t0and the subsequent averaged parameter estimates θ (t) are given by θ (t)

=t

u=t0wuθ

(u), t0 < t ≤ tf , where θ (u) and wu arethe ECM-updates and their corresponding weights which are proportional to the Monte-Carlo sample size used at iterationu respectively, and tf is the total number of ECM iterations. This technique is typically used when the simulation noiseat convergence is still significant. We tested three different scenarios: no-averaging and averaging from the last 25 or 50

Page 10: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 91

Fig. 3. Boxplots of the estimates of six parameters (row-wise: µ−1, pp, ab, ap, σb and ρ) based on the synthetic example of the paper when σ = 0.1 forthe four competing algorithms: sequential importance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR),(iii) partial rejection control (SISprc) and finally Markov chain Monte Carlo (MCMC). The estimates are based on 200 independent runs for each algorithmand averaging is used for the last 25 ECM iterations of each run.

iterations for both values of σ ∈ 0.02, 0.1. The best results were obtained for the majority of the parameters by averagingthe solutions from the 25 last iterations. Also, it acts in a different way for the different values of σ . When σ = 0.02,averaging improved most of the estimators of all the algorithms with respect to the standard error. Finally, if we increasethe size of the averaging too much (from 25 to 50), then, although it is not significant, the improvement decreases. This isreasonable since averaging should be used near the convergence region and not too early.

The twomethods have been tested in several sets of parameters and in all of them, both methods returned similar meanestimates for independent runs of the algorithms. Nevertheless, their standard errors are dependent on the value of σ andon the algorithm employed. In the examples that we run, the MCMC–ECM gave smaller standard errors than the otheralgorithms except for very small σ , but these values are not appropriate for the model fitting with the real dataset as willbe explained in Section 5.

Another advantage of theMCMC approach concerns the number of data taken into account for the estimation. For a largevalue of T , the SISR versions that we presented can generally take into account only some (and not all) of the organs thathad not reached their full expansion stage when the plant was cut (the immature members). The reason behind this is thatthe underlying hidden Markov process is T -dependent and consequently the last weights associated with the particles ina sequential implementation could degenerate before taking into account all the data. We refer to Trevezas and Cournède(2013) for further details of this implementation. In Eq. (4.5) of the above reference, the following result holds for the finalweights of the improved filter:

w(i)N−T+1 = w

(i)N−T pθ (yN−T+1|q

(i)N−T :N−1)

Nn=N−T+2

pθ (yn|q(i)n:N−1, q

(i)N ),

where w(i)N−T , q

(i)N−T :N−1

Mi=1stands for the available weighted sample one iteration before the last update, and q (i)

N are thefinal proposed particle positions. It is clear that since the last product has T − 1 factors, a practical implementation of thisfilter needs to stop the algorithm when the effective sample size (ESS) will be lower than a threshold for the first time(see Trevezas and Cournède, 2013 for the explanation of the ESS). This is the reason why some data may be lost and thiscould be a serious problem for large values of T . In the case that MCMC algorithm is used this problem does not exist. In thisexamplewe excluded from the data vector all the immaturemembers in order to compare both algorithms on the same data.

In order to evaluate the aggregated performance of the MCMC–ECM algorithmwe also generated multiple datasets fromthe same model M1 for the two different values of σ ∈ 0.02, 0.1. Since the variability of parameter estimates betweendifferent datasets, which is related to the distribution of the MLE, is expected to be much larger than the one that we obtainwithin the same dataset (due to the Monte Carlo error), the general performance of all the algorithms presented in this

Page 11: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

92 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Fig. 4. Boxplots of the estimates of six parameters (µ−1, pp, ab, ap, σb and ρ) based on 200 independent datasets generated from the same model M1 forthe two different values of σ , σ = 0.02 in the first column and σ = 0.1 in the second column. For each subfigure, the left boxplot corresponds to theestimates from the SISR–ECM with partial rejection control and the right boxplot to the MCMC–ECM.

section should be comparable. For this reason, in this test, together with the performance of the MCMC–ECM, we onlypresent the results from one representative of the class of SISR–ECM algorithms, the onewhich uses partial rejection control.Note that if we assume that the solution that we obtain with any of these algorithms is a good approximation of the MLEcorresponding to each dataset, then this test gives also an idea of the properties of the MLE. In Fig. 4, we present the resultsfor both algorithms based on 200 datasets.

The results showed that both algorithms are identical in their mean performance for all the parameters, without anyatypical behavior. Also, on one hand, the mean estimates were very close to the true ones and no significant bias for any ofthe parameters was detected, and on the other hand, the uncertainty in the parameter estimates was relatively small. Thistest also revealed that the increase in the noise level from 0.02 to 0.1, increased the uncertainty in the parameter estimationof three parameters, the leaf resistance µ−1 and those involved in the allocation, that is, αb and αp.

4. Ascent-based MCMC–ECM algorithm

In the previous sectionwe did not emphasize on the specification of theMonte Carlo sample size in each ECM step and/orthe number of the ECM steps. It is known that, if the MC sample size remains constant at each EM iteration, the MCEMalgorithm will not converge due to a persistent Monte Carlo error, unless this size is unnecessarily too large. Moreover, it isinefficient to start with a large sample size since the initial guess of theMLEmay be far from the true value (Wei and Tanner,1990). Many authors use a deterministic increase in the MC sample size (independently of the data) and stop the algorithmafter a predefined number of EM steps (McCulloch, 1994, 1997; Chan and Kuk, 1998). Nevertheless, these are not the mosteffective ways to tackle these problems. Thus, several data-driven strategies have been proposed recently in the literature.

4.1. Data driven automated stochastic EM algorithms

The last decades data-driven strategies have been proposed in the literature to control the Monte Carlo sample size andthe number of the EM steps. In Booth and Hobert (1999) an automated procedure to determine at each EM step if the MonteCarlo sample size should be increased or not was proposed. This procedure concerns those Monte Carlo EM algorithms forwhich the random sample in the E-step is simulated either by exact draws from the corresponding conditional distributionor by importance sampling from a proposal distribution close enough to the exact one. Based on the random sample of eachstep t , an asymptotic confidence interval of the current estimate of the parameter θ (t) is constructed. If the past value θ (t−1)

Page 12: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 93

lies in it, then the EM step is said to be swamped by the Monte Carlo error and the sample size is increased. The size of theadditional sample is arbitrary (e.g. mt → mt + mt/c, c = 2, 3, . . .). Moreover, in Booth and Hobert (1999), they proposedto stop the MCEM algorithm when a stopping rule is satisfied for three consecutive iterations. The most commonly usedstopping criterion is a sufficiently small relative change in the parameters’ values.

The automated Monte Carlo EM algorithm of Booth and Hobert (1999) was generalized from random to dependentsamples by Levine and Casella (2001). One basic difficulty which arises with dependent samples is how to determine theaforementioned confidence interval. In this direction, the authors in Levine and Casella (2001) evoke a central limit theorem(see Theorem 1, Levine and Casella, 2001) on the basis of the subsampling scheme of Robert et al. (1999). In particular, theMonte Carlo sample size is increased, if at least one of the estimated partial derivatives of the Q -function with respect toθ (t−1), computed on the basis of the subsample, lies in the appropriately designed confidence interval.

Following the steps of Booth and Hobert (1999) and Levine and Casella (2001), the authors in Levine and Fan (2004)proposed an alternative automated MCMC–EM algorithm. The method of increasing the sample size is based as well onthe construction of an appropriate confidence interval. The main innovation of this paper is that the authors give a specificformula for quantifying the increase in theMC sample size. In this approach, the EMprocedure should be applied two times ateach iteration, one for the complete sample and one for the subsample. This is not an issuewhen the overall implementationof the EM algorithm is not time consuming, but if, for example, a numerical maximization is needed for the M-step, thismethod could be computationally expensive.

In the rest of this subsection we present the data-driven automated MCEM algorithm proposed by Caffo et al. (2005)which is computationally cheap and can be easily adapted in our case where numerical maximization is involved as well.Now, we give a short description of the basic ideas of the algorithm. Let

1Q = Q(θ (t); θ (t−1)) − Q(θ (t−1)

; θ (t−1)), (23)

1Q = Q(θ (t); θ (t−1)) − Q(θ (t−1)

; θ (t−1)), (24)

where Q corresponds to the true Q -function of the model and Q to its estimation given by (16), where the approximationis based on the mt-sample generated at the t-th iteration of the EM. The most important feature of this algorithm is that itis an Ascent-based Monte-Carlo EM algorithm, since the main goal is to recover with high probability the ascent propertyof the EM. This means that the MC sample size should be chosen throughout iterations in such a way that 1Q > 0 withhigh probability. The authors claim that1Q is a strongly consistent estimator of1Q and by evoking the appropriate centrallimit theorem the following asymptotic result holds true:

√mt (1Q − 1Q )

d−−−−→mt→∞

N (0, σ 2Q ), (25)

where the regularity conditions and the asymptotic variance σ 2Q depend on the sampling mechanism employed. A sketch

of the proof is given in the case that simulations result from i.i.d. draws and a remark is made that if an MCMC algorithm isemployed, then (25) holds true under stringent regularity conditions. With the help of (25) and a consistent estimator σ 2

Q ofσ 2Q the following asymptotic lower bound (ALB) with confidence level 1 − α can be given for 1Q :

ALB = 1Q −σQ

√mt

zα, (26)

where zα is the upper α-quantile of the standard normal distribution. In the same way, an asymptotic upper bound (AUB)with confidence level 1 − γ can also be obtained for 1Q :

AUB = 1Q +σQ

√mt

zγ . (27)

The authors use (26) to decide if the current update based on themt-sample will be accepted or not. In particular:• if ALB > 0, then with high probability 1Q > 0 and θ (t) is accepted as the new update,• if ALB ≤ 0, then Q is said to be swamped with MC error and a new sample is appended to the existing one to obtain a

new parameter estimate. A geometric increase is recommended (e.g., mt → mt + mt/k, k = 2, 3, . . .). The process isrepeated until ALB > 0 for the first time.

After the acceptance of θ (t), the MC sample size for the next MCEM step is determined by using the approximation

1Qt+1 ∼ N

1Qt ,

σ 2Q

mt+1

, (28)

where Qt is given by (24) and Qt+1 corresponds to the same quantity by letting t → t + 1. Indeed, the size mt+1 is chosenin such a way so as to prespecify the probability to reject the estimate θ (t+1) (ALB < 0), when 1Q > 0 (type-II error). If weset this probability equal to β and add the logical requirementmt ≤ mt+1, then it can be easily shown by (28) that

mt+1 = maxmt , σ2Q (zα + zβ)2/(1Qt)

2, (29)

Page 13: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

94 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Table 3Parameter estimation results with the automated MCMC–ECM algorithm for the synthetic example when σ ∈ 0.1, 0.02. Means and standard deviationsof the estimates are based on 50 independent runs. The results are obtained for different values of α, β and γ (see Eqs. (26) and (27)) and for geometricincrease in the sample size (mt → mt + mt/3) when ALB ≤ 0.

σ 0.1 0.02α, β, γ 0.25 0.1–0.25–0.1 0.1 0.25 0.1–0.25–0.1 0.1

ab 2.8961 2.9004 2.9057 2.9728 2.9744 2.9749(7.73 · 10−3) (5.74 · 10−3) (7.31 · 10−3) (1.61 · 10−3) (1.43 · 10−3) (1.39·10−3)

ap 2.8996 2.9039 2.9092 2.9736 2.9750 2.9755(7.67 · 10−3) (5.71 · 10−3) (7.28 · 10−3) (1.45 · 10−3) (1.28 · 10−3) (1.23·10−3)

Pp 0.8153 0.8153 0.8153 0.8153 0.8153 0.8153(0.05 · 10−4) (0.06 · 10−4) (0.05 · 10−4) (7.00 · 10−6) (5.00 · 10−6) (5.00·10−6)

µ−1 100.6717 100.6421 100.6004 100.2758 100.2671 100.2637(0.0583) (0.0494) (0.0559) (0.0094) (0.0087) (0.0088)

σb 0.0505 0.0506 0.0506 0.0477 0.0477 0.0477(1.48 · 10−4) (0.74 · 10−4) (0.64 · 10−4) (4.80 · 10−5) (2.00 · 10−5) (2.50·10−5)

σp 0.0537 0.0537 0.0537 0.0504 0.0504 0.0504(2.02 · 10−4) (0.86 · 10−4) (0.71 · 10−4) (3.70 · 10−5) (2.90 · 10−5) (2.21·10−5)

ρ 0.8536 0.8540 0.8540 0.8283 0.8284 0.8284(1.47 · 10−3) (0.62 · 10−3) (0.52 · 10−3) (2.96 · 10−3) (2.25 · 10−3) (1.60·10−3)

where mt corresponds to the initial MC sample size of iteration t (before any eventual augmentation) and zβ to the upperβ-quantile of the standard normal distribution. The last requirement is the stopping criterion. The MCEM algorithm stopsif AUB < δ, where AUB is given in (27) and δ is a predefined small constant. If this criterion is satisfied, then the change inthe Q -function is acceptably small. The adaptation of this approach in the case of the MCMC–ECM that we propose in thispaper is straightforward as long as a method for estimating the variance σ 2

Q is available.There are several methods for estimating σ 2

Q (see, e.g., Geyer, 1992). One of the most well-known relies on the spectralestimator which involves the estimation of autocorrelations weighted by a prespecified function. A presentation of differentchoices of weight functions can be found in Priestley (1981). Another popular method is batch means (BM) (Bratley et al.,1987) which is based on the division of the MC sample into a predefined number of batches of equal size. The batch meansare treated as independentwhich is only approximately true if the length of each batch ismuch longer than the characteristicmixing timeof the chain. If the batch size is allowed to increasewith respect to the sample sizem, then thismethod is referredto as CBM. Usually, the batch size is set equal to

ml, where l = 1/2 or l = 1/3. An alternative method for variance estima-

tion is based on regenerative simulation (RS) (see, Mykland et al., 1995), where random times at which the Markov chainprobabilistically restarts itself, are identified. In fact, the CBM can be viewed as an ad hoc version of the RSmethod (see, Jonesand Hobert, 2001). Both methods split the sample into pieces with the difference that the RS method guarantees that thepieces are truly independent. Nevertheless, the conditions of RS are hard to verify. The different variance estimation meth-ods are compared in several papers (see, Jones and Hobert, 2001, Jones et al., 2006 and Flegal and Jones, 2010). In Jones andHobert (2001), the authors concluded that CBMandRS give similar results. Despite the theoretical advantages of RS andof thespectral estimator we adopt the CBMmethod in the proposed algorithmwhich is significantly simpler and faster in practice.

4.2. The automated MCMC–ECM algorithm in simulated datasets

In order to evaluate the performance of the automated MCMC–ECM, we performed the same synthetic tests as the onespresented in Section 3.2.

The augmentation rule for theMonte-Carlo sample size is given by (29). As a stopping criterion we used AUB < 10−3, see(27). The initial sample sizewas fixed at 250 and the burn-in period at 500. Our final estimates for all the tests were based onmeans from 50 independent runs. In Table 3,the parameters’ estimates and the corresponding standard errors are presentedfor different combinations of the asymptotic levels α, β and γ for σ = 0.1 and σ = 0.02, see (26) and (27). For each suchcombination, we compared two different rates of the geometric increase in the sample size (mt → mt +mt/k, for k = 2, 3)when ALB ≤ 0, see (26), but the results are quite similar thus, we present here only the case where mt → mt + mt/3. InTable 3, we present the results for some specific choices of asymptotic levels in the cases where σ = 0.1 and σ = 0.02.Moreover, in Table 4, the corresponding descriptive statistics for the total sample size (TSS), the final sample size (FSS)and the number of ECM iterations (Iter) until convergence are given. Note that since it is an automated algorithm the finalsample size and the number of iterations until convergence will differ among independent realizations. For this reason wealso present in Table 5 the effect of weighting the estimates from independent runs with weights proportional to their finalsample size.

For all the tested values of α, β and γ , the best results with respect to the standard errors were given in the majorityof the cases for the values 0.1 and then for 0.1 − 0.25 − 0.1 as expected (with some exceptions). This is better reflectedto the parameters of the measurement error. However, if we run the algorithm by setting the values at 0.1, then a great

Page 14: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 95

Table 4Descriptive statistics for the total sample size (TSS), the final sample size (FSS) and the number of iterations until convergence (Iter) corresponding to thetests given in Table 3. The mean execution times are also given.

σ 0.1 0.02α, β, γ 0.25 0.1–0.25–0.1 0.1 0.25 0.1–0.25–0.1 0.1

Min TSS 49247 90505 137264 3685 9895 7193Mean TSS 250800 593939 794573 22413 41078 59441Max TSS 637305 1401011 1936174 89009 159998 235068Min FSS 3405 8508 13309 1066 2988 3861Mean FSS 16495 48803 65199 6314 12106 17465Max FSS 56985 164524 169807 27098 40842 66197Min Iter 46 45 28 12 10 8Max Iter 86 74 66 27 18 18Mean time 10 m 42 s 27 m 05 s 28 m 13 s 1 m 15 s 3 m 06 s 3 m 28 s

Table 5Parameter estimation results with the automated MCMC–ECM algorithm for the synthetic example when σ ∈ 0.02, 0.1. Means and standard deviationsof the weighted estimates based on 50 independent runs when the estimates have weights proportional to the final sample size. The results are obtainedfor different values of α, β and γ (see Eqs. (26) and (27)). The sample size was increased asmt → mt + mt/3, when ALB ≤ 0.

σ 0.1 0.02α, β, γ 0.25 0.1–0.25–0.1 0.1 0.25 0.1–0.25–0.1 0.1

ab 2.8949 2.8988 2.9039 2.9726 2.9742 2.9745(6.46 · 10−3) (4.85 · 10−3) (5.31 · 10−3) (1.44 · 10−3) (1.27 · 10−3) (1.14·10−3)

ap 2.8984 2.9023 2.9074 2.9734 2.9748 2.9751(6.41 · 10−3) (4.82 · 10−3) (5.30 · 10−3) (1.29 · 10−3) (1.15 · 10−3) (1.02·10−3)

Pp 0.8153 0.8153 0.8153 0.8153 0.8153 0.8153(5.00 · 10−6) (5.00 · 10−6) (5.00 · 10−6) (6.00 · 10−6) (5.00 · 10−6) (5.00·10−6)

µ−1 100.6825 100.6528 100.6140 100.2767 100.2685 100.2660(0.0481) (0.0372) (0.0393) (0.0080) (0.0078) (0.0067)

σb 0.0505 0.0505 0.0506 0.0477 0.0478 0.0477(1.45 · 10−4) (6.70 · 10−5) (6.30 · 10−5) (3.60 · 10−5) (1.70 · 10−5) (2.20·10−5)

σp 0.0537 0.0537 0.0537 0.0504 0.0504 0.0504(1.91 · 10−4) (0.83 · 10−4) (0.67 · 10−4) (3.10 · 10−5) (2.30 · 10−5) (1.90·10−5)

ρ 0.8537 0.8538 0.8541 0.8283 0.8284 0.8284(1.38 · 10−3) (5.97 · 10−4) (4.77 · 10−4) (2.47 · 10−3) (1.77 · 10−3) (1.54·10−3)

computational cost is involved (see Table 4), which is not compensated for the gain in precision. For this reason it could bewiser to decrease the asymptotic levels (by increasing α, β and γ ) to have a rapid algorithm with an acceptable precision.Furthermore, the weighted averages (see Table 5) with respect to the final sample size generally decreased the standarddeviations independently of the values of the asymptotic levels.

It is noteworthy that the automated algorithm gives mean estimates which are closer to the real values than the originalMCMC–ECM algorithm. On the other hand, even the ‘‘best’’ automated algorithm gives more variable estimates than thenon-automated onewith independent runs of the algorithm. This could be expected due to the variability in the final samplesize and in the number of iterations until convergence of the automated algorithm. The main point here is that the resultingestimators are of acceptable accuracy in significant less ECM steps and thus in less CPU time if the asymptotic levels are notset too low. This is very important for a routine use of the algorithm combined with the fact that the automated algorithmuses efficiently the computational resources.

5. Application to a real dataset and model comparison

In this section, we present an application of our method with experimental data from the sugar-beet. The experimentalprotocol is presented in detail in Lemaire et al. (2008). This real-data case was presented in Trevezas and Cournède (2013)to motivate the use of a hidden Markov model as the best choice among competing models. The current data containmass measurements from 42 blades and petioles, assumed to have expansion durations T = 10. With this assumptionall measurements correspond to leaves which have completed their expansion when the plant was cut. The measurementsare given for reference in Table 6. The parameters are divided into two categories, those which were calibrated directly inthe field and the unknown parameter θ that has to be estimated. In Table 7 we give the values of the fixed parameters andthe initial values that we used for the parameters that have to be estimated (determined in a preliminary searching stage).

In Table 8 we present the parameter estimation results that we obtained for the four competing algorithms, sequentialimportance sampling with (i) multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR), (iii) partialrejection control (SISprc) and Markov chain Monte Carlo (MCMC), by fitting the real data with the model M1. Thecorrespondingmean CPU times are given aswell. The details of the implementation are given in Section 3. The parameter σ 2

Page 15: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

96 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

Table 6A dataset from the sugar-beet plant. Mass measurements from 42 blades (bl) and petioles (pe).

1 2 3 4 5 6 7 8 9 10 11 12 13 14bl 0.021 0.069 0.084 0.138 0.246 0.414 0.604 0.85 0.892 0.99 1.398 1.627 1.568 1.774pe 0.01 0.014 0.023 0.045 0.079 0.29 0.475 0.529 0.537 0.649 0.857 0.988 1.059 1.216

15 16 17 18 19 20 21 22 23 24 25 26 27 28bl 1.728 1.625 1.349 1.297 1.212 1.184 1.097 1.028 0.943 0.856 0.744 0.615 0.555 0.476pe 1.317 1.263 1.154 1.204 1.134 1.106 1.056 0.964 0.904 0.889 0.797 0.687 0.655 0.532

29 30 31 32 33 34 35 36 37 38 39 40 41 42bl 0.422 0.361 0.326 0.277 0.238 0.191 0.179 0.15 0.124 0.117 0.079 0.089 0.106 0.095pe 0.52 0.471 0.392 0.365 0.296 0.241 0.242 0.186 0.167 0.126 0.091 0.094 0.094 0.083

Table 7Initial values for both unknownand fixed parameters used to initialize the algorithms in the real data case,whereσb, σp andρ are the standarddeviations and the correlation coefficient of the measurement error model (see Section 2 for the explanation of the other parameters).

param. unknown param. known param. known

ab 2.829 σ 0.1 eb 0.0083ap 1.813 ar 3.1 spr 500pp 0.8139 pr 329.48 bb 2µ−1 97.95 kB 0.7 bp 2σb 0.076 tr 60 br 2σp 0.059 T 10ρ 0.136 q0 0.003

Table 8Parameter estimation results based on the real dataset. Means and standard deviations of the estimates are based on 50 independent runs for the fourcompeting algorithms: sequential importance samplingwith (i)multinomial resampling (SISmR), (ii) residual and stratified resampling (SISrsR), (iii) partialrejection control (SISprc) and Markov chain Monte Carlo (MCMC). Averaging is used for the last 25 ECM iterations of each run of the algorithm. The meanCPU times are also provided.

param. Mean Standard deviationSISmR SISrsR SISprc MCMC SISmR SISrsR SISprc MCMC

ab 2.8296 2.8306 2.8219 2.8379 0.0296 0.0239 0.0274 0.0128ap 1.8037 1.8241 1.8173 1.8297 0.0208 0.0175 0.0196 0.0091Pp 0.8150 0.8147 0.8147 0.8147 1.24 · 10−4 1.13 · 10−4 1.18 · 10−4 0.25 · 10−4

µ−1 98.2648 98.0964 98.2490 97.9714 0.545 0.4286 0.5001 0.2336σb 0.0752 0.0762 0.0763 0.0761 2.77 · 10−4 2.18 · 10−4 2.18 · 10−4 1.12 · 10−4

σp 0.0588 0.0589 0.0588 0.0589 1.18 · 10−4 1.07 · 10−4 1.09 · 10−4 0.52 · 10−4

ρ 0.1373 0.1258 0.1275 0.1246 3.02 · 10−3 2.82 · 10−3 2.64 · 10−3 1.23 · 10−3

Mean time 6 m 22 s 6 m 28 s 7 m 21 s 8 m 43 s

represents a standard level of uncertainty for the mean biophysical model given by (3). This value of σ = 0.1 correspondsto the model which best fits the data as shown in Trevezas and Cournède (2013). In Table 9 we present the parameterestimation results that we obtainedwith the automatedMCMC–ECM algorithm. The details of the implementation are givenin Section 4. Moreover, in Table 10 the corresponding descriptive statistics for the total sample size (TSS), the final samplesize (FSS) and the number of ECM iterations (Iter) until convergence are given.

We remark that themeanparameter estimates thatwe obtainedwith the different variants of SISR–ECM, theMCMC–ECMand the automated MCMC–ECM algorithm are similar. We reach the same conclusion even if we use the averagingtechniques. Nevertheless, notice in Table 8 that the standard deviations from the mean estimates among independentrealizations are roughly from two to six times smaller with the MCMC–ECM than any variant of SISR–ECM. The gain inprecision is an important advantage of the MCMC–ECM since the CPU time needed for a single run is slightly more forthe MCMC–ECM (with a mean factor of 1.2–1.4 in our tests) than for the other algorithms. The results of the automatedMCMC–ECM algorithm are presented in Table 9. The choice α = β = γ = 0.25 results in significantly less standarddeviations than the SISR variants, and slightly lower than the non-automated MCMC. When the parameters α, β and γdecrease, then as expected the standard deviations decrease since the final and the total Monte Carlo sample size increases.Notice also in Table 10 that smaller values of α, β and γ decrease the total number of ECM steps until convergence. Theadvantages of the automated algorithm cannot be counterbalanced by using averaging in the non-automated algorithm aswe can see in Table 8. Consequently, the choice of a single run of an automated MCMC–ECM is very reasonable even withthe choice α = β = γ = 0.25. Nevertheless, depending on the desired accuracy, it is always possible to combine a smallnumber of independent runs to obtain weighted mean estimates. Furthermore, the automated algorithm makes indeed anefficient use of Monte Carlo resources and there is no need to determine a priori the total number of ECM steps and how theMonte Carlo sample size should be increased.

Page 16: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 97

Table 9Parameter estimation results based on the real dataset with the automated MCMC–ECM algorithm. Means and standard deviations of the estimates arebased on 50 independent runs. The results are obtained for different values of the asymptotic levels α, β and γ , see relations (26) and (27). Averaging isused for the last 25 ECM iterations of each run of the algorithm.

α, β, γ Mean Standard deviation0.25 0.1–0.25–0.1 0.1 0.25 0.1–0.25–0.1 0.1

ab 2.8372 2.8345 2.8346 0.0120 0.0078 0.0068ap 1.8290 1.8268 1.8264 0.0087 0.0054 0.0045Pp 0.8147 0.8147 0.8147 0.57 · 10−4 0.49 · 10−4 0.47 · 10−4

µ−1 97.9824 98.0311 98.0220 0.2076 0.1473 0.1307σb 0.0761 0.0761 0.0761 2.03 · 10−4 1.18 · 10−4 1.43 · 10−4

σp 0.0589 0.0589 0.0588 0.87 · 10−4 0.49 · 10−4 0.63 · 10−4

ρ 0.1253 0.1257 0.1267 2.62 · 10−3 1.74 · 10−3 1.86 · 10−3

Table 10Descriptive statistics for the total sample size (TSS), the final sample size (FSS) and the number of iterationsuntil convergence (Iter) corresponding to the tests given in Table 9. The mean CPU times are also provided.

α, β, γ 0.25 0.1–0.25–0.1 0.1

Min TSS 7376 4504 4530Mean TSS 30173 41645 36744Max TSS 66311 105438 230250Min FSS 2675 3158 3561Mean FSS 9570 19354 22487Max FSS 20820 47450 160257Min Iter 6 4 3Max Iter 21 15 8Mean time 2 m 24 s 6 m 22 s 4 m 18 s

Table 11MLE obtained with the models M1 (ρ as a free parameter), M∗

1 (ρ = 0), M2 (ρ as a free parameter) and M∗

2 (ρ = 0) for the sugar-beet dataset. In the lasttwo columns the corrected Akaike information criterion (AICc) and the Bayesian information criterion (BIC) are estimated based on 100 samples of 5×105

independent evaluations. The standard deviations are given in parenthesis. The above criteria are given by AICc = −2(log L − d) + 2d(d + 1)/(n − d + 1)and BIC = −2 log L + d log n, where d is the number of free parameters and n the sample size.

Model ab ap Pp µ−1 σb σp ρ ˆAICc ˆBIC

M∗

1 2.836 1.852 0.8142 98.48 0.0750 0.0591 0.0000 −344.13 (0.03) −330.63 (0.03)M1 2.837 1.829 0.8147 97.98 0.0761 0.0589 0.1253 −342.17 (0.02) −326.63 (0.02)M∗

2 3.019 2.044 0.8031 98.76 0.1585 0.2119 0.0000 −334.72 (0.03) −321.23 (0.03)M2 3.139 2.172 0.8051 96.83 0.1647 0.2114 −0.3380 −336.18 (0.04) −320.64 (0.04)

In the last part of this section we present the results of the model comparison when fitting the experimental datapresented in Table 6. Two types of models, referred to asmodelsM1 andM2, were considered in this paper and their hiddenMarkov formulation is given by Proposition 1. For each model, we distinguished the cases where the correlation coefficientρ between themassmeasurement errors of the blade and the petiole is a free parameter that has to be estimated (modelM1and M2) or is null (model M∗

1 and M∗

2). In the latter cases we have one parameter less to estimate. We run the automatedMCMC–ECM for all these models with α = β = γ = 0.25 and the obtained results are presented in Table 11. We also givethe estimated corrected Akaike information criterion (AICc) and the Bayesian information criterion (BIC) for all the modelsthat we tested (see, e.g., Bengtsson and Cavanaugh, 2006 and the references therein). Since the best model is the one withthe lowest values in both criteria, the additive error in the mass measurements (models M∗

1 and M1) is better adapted thanthe log-additive one (models M∗

2 and M2). Among all of them, the model M∗

1 had the best performance in this dataset. Eventhough we have restricted ourselves to the comparison between these models, the comparison method is of course general,and could be applied to other formulations of the error models or of the functional models.

6. Discussion

In this paper we proposed simulation techniques based on MCMC for parameter estimation via an ECM algorithm for aclass of plant growthmodels which can be characterized by deterministic structural development and include process errorin biomass production dynamics, initially introduced in Trevezas and Cournède (2013). The resulting estimation algorithmbased on MCMC improves the one developed in Trevezas and Cournède (2013), where the authors used SISR to performthe Monte Carlo E-step, by reducing significantly the variance of parameter estimates obtained by independent runs of thealgorithm. Another important advantage of this algorithmas compared to the one proposed in Trevezas andCournède (2013)is that the organmasses of the last immature members can all be taken into account even for large expansion durations andthis could be very important for improving the quality of parameter estimation. Moreover, the adaptation of the data-driven

Page 17: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

98 S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99

automated algorithm of Caffo et al. (2005) to our algorithm was shown to be a good solution for an intelligent use of MonteCarlo resources. Simulation studies from a synthetic example and a real dataset from the sugar-beet plant were used toillustrate the performance of the proposed algorithm. Two different types of hidden Markov models were described andtested on a real dataset for their fitting quality.

The resulting algorithm is very promising and can be further exploited for decision aid in agricultural science. In thisdirection, further effort is needed for the adaptation of this algorithm to other crop plants with deterministic organogenesisand formodel comparison and validation. Furthermore, despite the interest in individual plant growthmodeling, the geneticvariability of plants, even of the same variety, can be very important and, if we add locally varying climatic effects, then thedevelopment of two plants in the same field could be highly different. Consequently, a population-based model could bemore appropriate to describe the population dynamics and the inter-individual variability (de Reffye et al., 2009). We arecurrently studying an extension to the population level by couplingwith a nonlinearmixed effectsmodel (Kuhn and Lavielle,2005). Another interesting perspective is to broaden the applicability of the proposed statistical methodology in plants withstochastic organogenesis (e.g. trees)where the total number of organs of each class at each growth cycle is a randomvariable(see, e.g., Loi and Cournède, 2008).

Acknowledgments

We would like to thank the anonymous reviewers and also the associate editor for their suggestions, that helped us toimprove the quality of this paper.

Appendix

Proof of Proposition 2. In order to simplify the proof we will change the state variables of the model M2. By settingRn = logQn and Zn = log Yn we can rewrite Eqs. (10) and (11) as follows:

Rn+1 = log Fn(R(n−T+1)+:n; µ, pal) + Wn, (30)

Zn = logGn(Rn:(n+T−1); pal) + Vn. (31)

Now, let us analyze the Q -function of the model. Let us also write Fn given by (3) as Fn = µKn. In the rest, we identify thefunctions Kn and Gn (see (6)) with the induced random variable Kn(θ2) and the induced random vector Gn(θ2) respectively,for an arbitrary θ2 ∈ Θ2, where Θ2 is an appropriate euclidean subset. By the assumptions of the model M2 and Eqs. (30)and (31) we have:

Q(θ; θ ′) = Eθ ′

log pθ (R0:N , z0:N) | z0:N

=

Nn=1

Eθ ′

log pθ (Rn|R(n−T )+:n−1) | z0:N

+

Nn=0

Eθ ′

log pθ (zn|Rn:(n+T−1)∧N) | z0:N

= C(θ2; θ ′) + Q1(µ, σ 2, θ2; θ ′) + Q2(Σ, θ2; θ ′), (32)

where

Q1(µ, σ 2, θ2; θ ′) = −N2

log σ 2−

12σ 2

Nn=1

Eθ ′

(Rn − log Kn−1(θ2) − logµ)2 | z0:N

,

Q2(Σ, θ2; θ ′) = −N + 1

2log(detΣ) −

12

Nn=0

Eθ ′

zn − logGn(θ2)

⊤Σ−1zn − logGn(θ2)

| z0:N

,

and C(θ2; θ ′) is independent of θ1.Note that for fixed θ2 the initial maximization problem of Q w.r.t. θ1 can be separated into two distinct maximization

problems of Q1 and Q2 w.r.t. (µ, σ 2) and Σ respectively. By maximizing Q1 we get easily (18) and (20) and by maximizingQ2 we get (19). In the latter case the proof is the same as in the case of an additive measurement error model (with thetransformed variables) and a detailed proof can be found in Trevezas and Cournède (2013), Web Appendix C.

References

Bengtsson, T., Cavanaugh, J.E., 2006. An improved Akaike information criterion for state-space model selection. Comput. Statist. Data Anal. 50 (10),2635–2654.

Booth, J.G., Hobert, J.P., 1999. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. BStat. Methodol. 61 (1), 265–285.

Bratley, P., Fox, B.L., Schrage, L.E., 1987. A Guide to Simulation. Springer-Verlag, New York.Caffo, B.S., Jank, W., Jones, G.L., 2005. Ascent-based Monte Carlo expectation–maximization. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 235–251.Cappé, O., Moulines, E., Rydén, T., 2005. Inference in Hidden Markov Models. Springer, New York.Celeux, G., Diebolt, J., 1985. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Stat.

Q. 2, 73–82.Chan, J.S.K., Kuk, A.Y.C., 1998. Maximum likelihood estimation for probitlinear mixed models with correlated random effects. Biometrics 53, 86–97.

Page 18: Parameter estimation via stochastic variants of the ECM algorithm with applications to plant growth modeling

S. Trevezas et al. / Computational Statistics and Data Analysis 78 (2014) 82–99 99

Cournède, P.-H., Letort, V., Mathieu, A., Kang,M.-Z., Lemaire, S., Trevezas, S., Houllier, F., de Reffye, P., 2011. Some parameter estimation issues in functional-structural plant modelling. Math. Model. Nat. Phenom. 6 (2), 133–159.

Delyon, B., Lavielle, V., Moulines, E., 1999. Convergence of a stochastic approximation version of the EM algorithm. Ann. Statist. 27, 94–128.Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39,

1–38.de Reffye, P., Hu, B.G., 2003. Relevant choices in botany andmathematics for building efficient dynamic plant growthmodels: the greenlab case. In: Hu, B.G.,

Jaeger, M. (Eds.), Plant Growth Models and Applications. Tsinghua University Press and Springer, pp. 87–107.de Reffye, P., Lemaire, S., Srivastava, N., Maupas, F., Cournède, P.-H., 2009. Modeling inter-individual variability in sugar beet populations. In: Li, B.G.,

Jaeger, M., Guo, Y. (Eds.), 3rd International Symposium on Plant Growth and Applications, Beijing, China, November 9–12. (PMA09). IEEE.Doucet, A., De Freitas, N., Gordon, N., 2001. Sequential Monte Carlo Methods in Practice. Springer-Verlag, New-York.Flegal, J.M., Jones, G.L., 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo. Ann. Statist. 2 (38), 1034–1070.Ford, E.D., Kennedy, M.C., 2011. Assessment of uncertainty in functional-structural plant models. Ann. Bot. 108 (6), 1043–1053.Fort, G., Moulines, E., 2003. Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Statist. 31, 1220–1259.Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85, 398–409.Geman, S., Geman, D., 1984. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6,

721–741.Geyer, C.J., 1992. Practical Markov chain Monte Carlo. Statist. Sci. 7 (4), 473–483.Gordon, N., Salmond, D., Smith, A.F., 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F 140 (2), 107–113.Guo, Y., Ma, Y.T., Zhan, Z.G., Li, B.G., Dingkuhn, M., Luquet, D., de Reffye, P., 2006. Parameter optimization and field validation of the functional-structural

model GREENLAB for maize. Ann. Bot. 97, 217–230.Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 (1), 97–109.Jank, W., 2005a. Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM. Comput. Statist. Data Anal. 48 (4), 685–701.Jank, W., 2005b. Stochastic variants of EM: Monte Carlo, quasi-Monte Carlo and more. In: Proceedings of the American Statistical Association.Jank, W., 2006. The EM algorithm, its stochastic implementation and global optimization: some challenges and opportunities for OR. In: Alt, F., Fu, M.,

Golden, B. (Eds.), Topics in Modeling, Optimization and Decision Technologies: Honoring Saul Gass’ Contributions to Operation Research. Springer-Verlag, pp. 367–392.

Jones, G.L., Haran, M., Caffo, B.S., Neath, R., 2006. Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 (476), 1537–1547.Jones, G.L., Hobert, J.P., 2001. Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16 (4), 312–334.Jullien, A., Mathieu, A., Allirand, J.-M., Pinet, A., de Reffye, P., Cournède, P.-H., Ney, B., 2011. Characterisation of the interactions between architecture and

source:sink relationships in winter oilseed rape (brassica napus l.) using the GreenLab model. Ann. Bot. 107 (5), 765–779.Kang, M.Z., Cournède, P.-H., de Reffye, P., Auclair, D., Hu, B.G., 2008. Analytical study of a stochastic plant growthmodel: application to the GreenLabmodel.

Math. Comput. Simul. 78 (1), 57–75.Kitagawa, G., 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist. 5 (1), 1–25.Kuhn, E., Lavielle, M., 2005. Maximum likelihood estimation in nonlinear mixed effects models. Comput. Statist. Data Anal. 49, 1020–1038.Lange, K., 1995. A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 57 (2), 425–437.Lemaire, S., Maupas, F., Cournède, P.-H., de Reffye, P., 2008. A morphogenetic crop model for sugar-beet (beta vulgaris l.). In: International Symposium on

Crop Modeling and Decision Support: ISCMDS 2008, April 19–22, 2008, Nanjing, China.Letort, V., 2008. Multi-scale analysis of source–sink relationships in plant growth models for parameter identification. Case of the GreenLab model. Ph.D.

Thesis, Ecole Centrale Paris.Levine, R.A., Casella, G., 2001. Implementations of the Monte Carlo EM algorithm. J. Comput. Graph. Statist. 10 (3), 422–439.Levine, R.A., Fan, J., 2004. An automated (Markov chain) Monte Carlo EM algorithm. J. Stat. Comput. Simul. 74 (5), 349–360.Liu, J.S., Chen, R., Logvinenko, T., 2001. A theoretical framework for sequential importance sampling with resampling. In: Doucet, A., Freitas, N., Gordon, N.

(Eds.), Sequential Monte Carlo Methods in Practice. In: Statistics for Engineering and Information Science, Springer, New York, pp. 225–246.Liu, J.S., Chen, R., Wong, W.H., 1998. Rejection control and sequential importance sampling. J. Amer. Statist. Assoc. 93 (443), 1022–1031.Loi, C., Cournède, P.-H., 2008. Generating functions of stochastic L-systems and application tomodels of plant development. Discrete Math. Theor. Comput.

Sci. Proc. AI, 325–338.Mathieu, A., Cournède, P.-H., Letort, V., Barthélémy, D., de Reffye, P., 2009. A dynamic model of plant growth with interactions between development and

functional mechanisms to study plant structural plasticity related to trophic competition. Ann. Bot. 103 (8), 1173–1186.McCulloch, C.E., 1994. Maximum likelihood variance components estimation for binary data. J. Amer. Statist. Assoc. 89, 330–335.McCulloch, C.E., 1997. Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92, 162–170.McLachlan, G.J., Krishnan, T., 2008. The EM Algorithm and Extensions. John Wiley & Sons Inc..Meng, X.-L., Rubin, D.B., 1993. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278.Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E., 1953. Equation of state calculation by fast computing machines. J. Chem. Phys.

21 (6), 1087–1092.Meyn, S.P., Tweedie, R.L., 1993. Markov Chains and Stochastic Stability. Springer-Verlag.Mykland, P., Tierney, L., Yu, B., 1995. Regeneration in Markov Chain samplers. J. Amer. Statist. Assoc. 90, 233–241.Pallas, B., Loi, C., Christophe, A., Cournède, P.-H., Lecoeur, J., 2011. Comparison of three approaches to model grapevine organogenesis in conditions of

fluctuating temperature, solar radiation and soil water content. Ann. Bot. 107 (5), 729–745.Priestley, M.B., 1981. Spectral Analysis and Time Series. Academic, London.Robert, C.P., Casella, G., 2004. Monte Carlo Statistical Methods. Springer.Robert, C.P., Ryden, T., Titterington, D.M., 1999. Convergence controls for MCMC algorithms, with applications to hidden Markov chains. J. Stat. Comput.

Simul. 64, 327–355.Sievänen, R., Nikinmaa, E., Nygren, P., Ozier-Lafontaine, H., Perttunen, J., Hakula, H., 2000. Components of a functional-structural tree model. Ann. For. Sci.

57, 399–412.Trevezas, S., Cournède, P.-H., 2013. A sequential Monte Carlo approach for MLE in a plant growth model. J. Agric. Biol. Environ. Stat. 18 (2), 250–270.Vos, J., Evers, J.B., Buck-Sorlin, G.H., Andrieu, B., Chelle,M., De Visser, P.H.B., 2010. Functional-structural plantmodelling: a newversatile tool in crop science.

J. Exp. Bot. 61 (8), 2101–2115.Wei, G., Tanner, M., 1990. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc.

85, 699–704.Whitley, D., 1994. A genetic algorithm tutorial. Stat. Comput. 4 (2), 65–85.Yin, X., Struik, P.C., 2010. Modelling the crop: from system dynamics to systems biology. J. Exp. Bot. 61 (8), 2171–2183.