A Bayesian Analysis of Very Small Unreplicated Experiments

14
A Bayesian Analysis of Very Small Unreplicated Experiments Víctor Aguirre-Torres a * and Román de la Vara b It is not uncommon to deal with very small experiments in practice. For example, if the experiment is conducted on the production process, it is likely that only a very few experimental runs will be allowed. If testing involves the destruction of expensive experimental units, we might only have very small fractions as experimental plans. In this paper, we will consider the analysis of very small factorial experiments with only four or eight experimental runs. In addition, the methods presented here could be easily applied to larger experiments. A Daniel plot of the effects to judge signicance may be useless for this type of situation. Instead, we will use different tools based on the Bayesian approach to judge signicance. The rst tool consists of the computation of the posterior probability that each effect is signicant. The second tool is referred to in Bayesian analysis as the posterior distribution for each effect. Combining these tools with the Daniel plot gives us more elements to judge the signicance of an effect. Because, in practice, the response may not necessarily be normally distributed, we will extend our approach to the generalized linear model setup. By simulation, we will show that not only in the case of discrete responses and very small experiments, the usual large sample approach for modeling generalized linear models may produce a very biased and variable estimators, but also that the Bayesian approach provides a very sensible results. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: Generalized Linear Models; Unreplicated Factorial Experiments; Bayesian Model Selection; Signicant Effects; Small Sample Analysis 1. Introduction I n practice, we may nd situations where experimental runs are very expensive, and thus, small unreplicated experiments are used. For example, in a study of a cookie production process, each run was meant to produce a batch of 500 kg of product, and then, the responses were measured on samples of cookies taken from the batch. For this case, we were able to use at most two factors each at two levels. Some of the responses were hardness (continuous), likeness of texture evaluated by a panel of 10 judges (binomial), and sweetness evaluated by a panel of 10 judges (binomial). Hence, in this paper, we will consider experiments with four or eight experimental runs and models that allow for non-normal data. Unreplicated experiments are typically analyzed using the Daniel 1 plot, but for small fractions, this plot of the effects could provide little or no information. We will give two examples in order to illustrate our ideas in this paper. First, we will consider a simulated 2 3 unreplicated full factorial experiment where the response has a gamma distribution. The observations are generated using a linear predictor η = 1.5A + 1.5B 1.5C, with log as the link function (both terms will be dened in Section 2). The observations and the estimated coefcients are given in Table I. The coefcients were obtained using the function glmfrom the R (Foundation for Statistical Computing, Vienna, Austria) statistical package 2 after tting the saturated model. Clearly, the coefcients for A and C stand out from the rest. The usual procedure to determine their signicance is based on the Daniel plot 3 in Figure 1. Signicance is hard to judge because all the coefcients fall along a straight line. If we include the ShapiroWilk normality test 4 as part of the analysis of unreplicated experiments, the resulting p-value is 0.9894, suggesting that the discrepancy observed in A and C falls within a normal variation. This is in agreement with the Daniel plots interpretation. Now, we will consider the more challenging situation of the cookie production process. As we mentioned before, several responses were measured, one of them being sweetness. In Section 4, we will discuss in more detail this case; meanwhile, we will consider Table II which shows the number of times that a panel of 10 judges found the formula given by the treatment combination sweeter than a control formulation. This table also shows the estimates of the effects provided by glmusing a binomial model. Figure 2 is the corresponding Daniel plot. a Statistics Department, Instituto Tecnológico Autónomo de México (ITAM), Río Hondo #1, México, D.F. 01080, Mexico b Quality Engineering Department, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, GTO 03600, Mexico *Correspondence to: Víctor Aguirre-Torres, Statistics Department, Instituto Tecnológico Autónomo de México (ITAM), Río Hondo #1, México, D.F. 01080, Mexico. E-mail: [email protected] Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413426 Special Issue Article (wileyonlinelibrary.com) DOI: 10.1002/qre.1578 Published online 27 December 2013 in Wiley Online Library 413

Transcript of A Bayesian Analysis of Very Small Unreplicated Experiments

Page 1: A Bayesian Analysis of Very Small Unreplicated Experiments

Special Issue Article

(wileyonlinelibrary.com) DOI: 10.1002/qre.1578 Published online 27 December 2013 in Wiley Online Library

A Bayesian Analysis of Very Small UnreplicatedExperimentsVíctor Aguirre-Torresa*† and Román de la Varab

It is not uncommon to deal with very small experiments in practice. For example, if the experiment is conducted on theproduction process, it is likely that only a very few experimental runs will be allowed. If testing involves the destruction ofexpensive experimental units, we might only have very small fractions as experimental plans. In this paper, we will considerthe analysis of very small factorial experiments with only four or eight experimental runs. In addition, the methods presentedhere could be easily applied to larger experiments. A Daniel plot of the effects to judge significance may be useless for thistype of situation. Instead, we will use different tools based on the Bayesian approach to judge significance. The first toolconsists of the computation of the posterior probability that each effect is significant. The second tool is referred to inBayesian analysis as the posterior distribution for each effect. Combining these tools with the Daniel plot gives us moreelements to judge the significance of an effect. Because, in practice, the response may not necessarily be normallydistributed, we will extend our approach to the generalized linear model setup. By simulation, we will show that not onlyin the case of discrete responses and very small experiments, the usual large sample approach for modeling generalizedlinear models may produce a very biased and variable estimators, but also that the Bayesian approach provides a verysensible results. Copyright © 2013 John Wiley & Sons, Ltd.

Keywords: Generalized Linear Models; Unreplicated Factorial Experiments; Bayesian Model Selection; Significant Effects; SmallSample Analysis

1. Introduction

In practice, we may find situations where experimental runs are very expensive, and thus, small unreplicated experiments are used.For example, in a study of a cookie production process, each run was meant to produce a batch of 500 kg of product, and then, theresponses were measured on samples of cookies taken from the batch. For this case, we were able to use at most two factors each

at two levels. Some of the responses were hardness (continuous), likeness of texture evaluated by a panel of 10 judges (binomial), andsweetness evaluated by a panel of 10 judges (binomial). Hence, in this paper, we will consider experiments with four or eightexperimental runs and models that allow for non-normal data. Unreplicated experiments are typically analyzed using the Daniel1

plot, but for small fractions, this plot of the effects could provide little or no information.We will give two examples in order to illustrate our ideas in this paper. First, we will consider a simulated 23 unreplicated full factorial

experiment where the response has a gamma distribution. The observations are generated using a linear predictor η=1.5A+1.5B� 1.5C,with log as the link function (both terms will be defined in Section 2). The observations and the estimated coefficients are given in Table I.The coefficients were obtained using the function ‘glm’ from the R (Foundation for Statistical Computing, Vienna, Austria) statisticalpackage2 after fitting the saturated model.

Clearly, the coefficients for A and C stand out from the rest. The usual procedure to determine their significance is based on theDaniel plot3 in Figure 1. Significance is hard to judge because all the coefficients fall along a straight line. If we include theShapiro–Wilk normality test4 as part of the analysis of unreplicated experiments, the resulting p-value is 0.9894, suggesting thatthe discrepancy observed in A and C falls within a normal variation. This is in agreement with the Daniel plot’s interpretation.

Now, we will consider the more challenging situation of the cookie production process. As we mentioned before, several responseswere measured, one of them being sweetness. In Section 4, we will discuss in more detail this case; meanwhile, we will consider Table IIwhich shows the number of times that a panel of 10 judges found the formula given by the treatment combination sweeter than acontrol formulation. This table also shows the estimates of the effects provided by ‘glm’ using a binomial model. Figure 2 is thecorresponding Daniel plot.

aStatistics Department, Instituto Tecnológico Autónomo de México (ITAM), Río Hondo #1, México, D.F. 01080, MexicobQuality Engineering Department, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, GTO 03600, Mexico*Correspondence to: Víctor Aguirre-Torres, Statistics Department, Instituto Tecnológico Autónomo de México (ITAM), Río Hondo #1, México, D.F. 01080, Mexico.†E-mail: [email protected]

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

413

Page 2: A Bayesian Analysis of Very Small Unreplicated Experiments

Table I. Observations and estimated coefficients for simulated 23 factorial experiment with gamma response

Label Observation Coefficients

0 0.57 �0.43A 7.69 1.36B 0.61 0.66AB 52.08 �0.39C 0.01 �1.66AC 0.48 �0.41BC 0.37 0.16ABC 0.21 �0.84

V. AGUIRRE-TORRES AND R. DE LA VARA

414

The Daniel plot is very difficult to interpret, because a ‘central line’ is not evident. The p-value of the Shapiro–Wilk normality test is0.08 indicating a lack of normality. However, it is difficult to judge significance because all effects have a similar magnitude (around 6).Therefore, based on the previous examples, we can now enumerate some of the challenges present in the analysis of very smallunreplicated factorial experiments:

1. Let us assume that the response is normally distributed:

Fig

Co

1. The Daniel plot alone could be very difficult to interpret with 7 points, not to mention 3 points.2. Normality tests may have very little power to detect departures for 3 or even 7 points. Evidence of this assertion will be

shown in Section 5.3. The interpretation of Daniel plot depends on effect sparsity. This can be very restrictive for three effects.

2. When normality cannot be assumed, the analysis can be carried out by using a generalized linear model (GLM). This is anonlinear model that is estimated iteratively.

1. A referee noticed that when the response is binomial, the maximum likelihood estimator (MLE) may not exist.5 Then, theanalysis of the Daniel plot of the output of the iterative procedure, which may not be the MLE, could be worthless.

2. If the response is assumed to be Poisson, even if the MLE existed, the Daniel plot would rely on the MLE’s asymptoticnormality, which is dubious for four or even eight observations. Challenge 1.1 applies here too.

3. In fact, we ran some simulations for two 23 experiments, one experiment with binomial response and the other with Poissonresponse. In both cases, we collected the estimates produced by the R function ‘glm’. These estimates behaved very widelyin that the bias and dispersion around the true values of the effects were huge for both cases. The latter is understandablefor the binomial case because, theoretically, the MLE may not exist, but it was unexpected in the Poisson model. Morediscussion about these results is given in Section 5.

4. In the case of a gamma response, when none of the observations is zero, and the link function is the one mentioned inTable III, it is shown6 that the MLE exists and that it is finite and unique. In addition to MLE’s lack of normality in such smallsamples, challenges 1.1, 1.2, and 1.3 apply here too.

ure 1. Daniel plot. Simulated 23full factorial with gamma response

pyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 3: A Bayesian Analysis of Very Small Unreplicated Experiments

Table II. Observations and estimated coefficients for the cookie experiment

Label Sweetness Effects Sweetness GLM

0 3A 6 7.1B 0 �5.6AB 9 6.5

V. AGUIRRE-TORRES AND R. DE LA VARA

3. There are other approaches used to analyze the GLM besides the Daniel plot. They correspond to information criteria7

(Bayesian information criterion [BIC] and Akaike information criterion [AIC]) or stepwise model selection based on deviance.8

1. Bayesian information criterion and AIC are very similar. They both are a function of the MLE, something that at least for thebinomial and Poisson distribution would be dubious.

2. Bayesian information criterion and AIC are computed for all submodels, and then, a decision is made by maximizing thecriteria. Although, theoretically, the MLE exists for the gamma distribution, when we simulated several 23 experiments, itturned out that the R function ‘glm’ failed to converge for several submodels. The latter may be caused by numericalproblems of the iterative procedure. Hence, the approach could not be applied for the gamma model and very smallfactorials. You may see the supporting information in the online version of this paper.

3. The stepwise procedure based on deviance depends on the large sample approximation of the χ2 distribution to thedifference of the deviances. This is very controversial for very small factorials.

4. One of the referees suggested considering the bootstrap. This approach has been used in experimental design;9 however,the authors mentioned that their method is more convenient for moderate size designed experiments and replications.None of the previous conditions hold in the situation considered in this paper. There are, however, other forms of bootstrapin a regression setup that are interesting to explore.

To overcome the challenges mentioned previously in this paper, we propose to supplement the Daniel plot with tools from theBayesian approach. The first tool is the posterior probability that the effects are active (PPA). And the second tool is the posteriordistribution of each effect (PDE). We will now discuss the opportunities that these tools represent.

1. Both the PPA and PDE are an exact solution for small samples. Hence, they do not depend on the MLE’s asymptotic normality orany other large sample approximation.

2. Both the PPA and PDE could work even if the MLE does not exist. This is because the posterior distributions or posteriorprobabilities do not depend on the MLE.

3. Even if the Daniel plot does not show significance, PPA and PDE could provide evidence of significance. This is exemplified withthe simulated experiment with gamma response.

4. If the 95% posterior probability interval of an effect is completely on one side of zero, then there would be evidence that theeffect is active. The center of the distribution also gives the sign and an estimate for the effect.

5. Even if the posterior probability interval is not completely on one side of zero, there is a possibility of the odds of an effect beingpositive (or negative) as an indication of significance. This interpretation is possible because the PDE is a probability density forthe unknown effect.

Figure 2. Daniel plot. Cookie sweetness 22factorial experiment

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

415

Page 4: A Bayesian Analysis of Very Small Unreplicated Experiments

Table III. Log-likelihood functions for generalized linear models

Family link ln f(y|Mi, θi)

Binomial ηj ¼ ln μj

1�μj

h i∑n

j¼1 yjxtjβi � nj ln 1þ ex

tj βi

� �h i

Poisson ηj= log(μj) ∑nj¼1 �ex

tj βi þ yjx

tjβi

� �

Gamma ηj= log(μj) n r lnr � ln Γ rð Þ½ � � rnβ0 � r∑nj¼1 yje

xtj βi þ r � 1ð Þ∑nj¼1 lnyj

V. AGUIRRE-TORRES AND R. DE LA VARA

416

6. In addition, a by-product of the PPA is the probability that none of the effects are active. The latter is useful because if thisprobability is small then there would provide evidence that at least one of the effects should be active, and consequently,the pattern of probabilities of the effects would suggest which of them are active.

7. Both the PPA and PDE do not depend on factor sparsity.

Some authors10,11 compare directly the posterior probability of all possible models and then, choose the model with the highestposterior probability. As seen in those papers, several models may differ by a very small amount of probability and then, the decisionis ambiguous. Instead, we have observed that computing the posterior probabilities for each effect is more informative.

The paper is organized in the following way. Section 2 presents the GLM formulation and the frequentist large sample approach ofanalysis. It also shows some simulation results for small samples, displaying the need to supplement this approach with other tools.Section 3 gives the Bayesian setup and its application to the computation of the PPA and PDE. Section 4 shows the application of theBayesian tools to the examples of the first section. Section 5 shows the results of simulation studies evaluating the PPA’s performance,a comparison with the normality test approach, and it also contains a sensitivity analysis to evaluate the impact of different priorchoices on PPA.

2. The generalized linear model

As shown in the cookie experiment, non-normal responses are present in industrial experiments.12,13,3 We also found an experimentin our statistical practice where the response was just a Bernoulli random variable that indicated a change in color of a shampoo overtime. Instead of trying to normalize the response, it is more natural to model it directly with an appropriate distribution. The previoussituations led us naturally to consider a GLM to analyze these experiments, because this approach intrinsically incorporates the non-normal nature of the response.

However, a GLM is a nonlinear model, where the analysis relies on parameters MLE’s large sample properties. To examine this moreclearly, let yt= [y1,y2,…,yn] be a vector of independent observations, with vectors of means [μ1, μ2,…,μn]. Under the GLM, theobservation yi, has a distribution that is a member of the exponential family, that is

f yi ζ i;ϕjð Þ ¼ exp r ϕð Þ yiζ i � b ζ ið Þ½ � þ c yi;ϕð Þf g;where r(), b() and c() are specific functions depending on the family of distributions, ζ i is the natural location parameter, ϕ is thedispersion parameter, and the systematic part of the model involves the factors of the experiment plus their interactions, representedby the variables x1,… xk. The model is built around the linear predictor, η= β0 + β1x1 +⋯+ βkxk. The model is constructed through theuse of a link function ηj= g(μj) j= 1,…, n. The link function is required to be monotonic and differentiable. A link is canonical if ηj= ζ j.The variance Var(yj) also depends on μj; this specific functional dependence changes with the distribution.

Let f(y|θ) be the likelihood function of the model. The vector θ contains the vector β of parameters in the linear predictor. If β̂ is theMLE of the coefficients of the linear predictor, then typically, the analysis relies on the asymptotic normality given by

ffiffiffin

pβ̂ � β

� �L→N 0; V βð Þ�1� �

; n→∞:

The matrix V(β) has the form XtD(X,β)X. The matrix X is the usual design matrix containing main effects and interactions, and D(X,β)is a diagonal matrix that depends on the covariates, the density of yi and the link function’s derivatives. For explicit expressions of D(X,β) see, for example,14 page 5.

It is tempting to use the ratio of β̂ i divided by the estimator of the asymptotic standard error obtained from V β̂� �

to find the

significant effects. However, we tried this approach in example 7.4 of,3 and the estimate was enormous relative to the effects.Therefore, the individual normal standardized values are worthless. Instead, it is customary to make a normal plot of the standardizedeffects and then judge which points that deviate from a central line could be considered as candidates of active effects.

In order to demonstrate clearly the impact of a small sample and the MLE’s nonexistence, we generated 200 replicates of a23 factorial experiment with a binomial response with 10 trials and a probability of success given by the linear predictorη= 2A� 3B+ 3C + 2BC. By the same arguments as in,5 it can be shown that for the data of this experiment the MLE never exists.Figure 3 is the effect B’s histogram.

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 5: A Bayesian Analysis of Very Small Unreplicated Experiments

Figure 3. Histogram of effect B. Monte Carlo simulated binomial 23factorial experiment

V. AGUIRRE-TORRES AND R. DE LA VARA

This plot clearly points several problems: (i) the distribution is clearly non-normal; (ii) it takes only a discrete set of values; and (iii)the distribution is centered around �11, far from the true value of �3, suggesting that the estimator could be very biased. Similarfacts were observed for the rest of the effects. In addition, we used the Shapiro–Wilk normality test as an indicator in order to verifywhether the Daniel plot gives evidence of significance. Figure 4 is the histogram of the p-value of this test statistic for the 200simulated experiments. It shows that less than 5% of the experiments gave a p-value of less than 0.1 suggesting that in most ofthe cases the Daniel plot did not give evidence of significance in terms of departure from the straight line. This also shows thenormality test’s low power.

We also generated 200 replicates of a 23 factorial experiment with a Poisson response with the linear predictor η=1+A� B+0.75C+0.5BC. Figure 5 gives the histogram for the effect B. It should be around �1, but instead it is around �6 and with a large dispersion.Similar things happen with the other effects. This is similar to what we observed for the binomial distribution, suggesting that theMLE does not exist in this situation as well. Thus, it does not make sense to use it. Furthermore, the standard errors computed fromthe asymptotic variance were large (12313.17 ) in this case and therefore, useless.

3. The Bayesian approach

The use of Bayesian methods in GLMs is not new.11,14–16 The latter works mainly address the case of large data sets. This paper differsfrom those works in that we consider very small structured data sets that present all of the challenges mentioned previously.

The Bayesian approach incorporates prior knowledge of the parameters. This knowledge is represented by a prior density functiondenoted by f(θ). Then, the prior knowledge and the empirical evidence represented by the likelihood function are blended by Bayes’Theorem. This blend can take several forms. In this paper, we only use two, namely, to compute the PPA and the parameter’s posteriorprobability distribution.

Figure 4. Histogram of p-values, Shapiro–Wilk normality test. Simulated 23factorial experiment with binomial response and 10 trials

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

417

Page 6: A Bayesian Analysis of Very Small Unreplicated Experiments

Figure 5. Histogram of effect B. Monte Carlo simulated Poisson 23factorial experiment

V. AGUIRRE-TORRES AND R. DE LA VARA

418

3.1. A prior density

We will first consider the case of β. It is treated in several ways in the literature, see, for example,14,17. In particular, in,17 page 421, theauthors suggest the use of a multivariate normal prior with mean β0 and covariance matrix V0 without indicating a specific procedureon how to obtain these parameters. More explicit proposals for the binomial model are found in,18 which assume the existence of aprevious study from which the prior density for β is given. By using Bayes’ Theorem, they obtain a posterior density, which is used asthe prior density for the current study. If a previous study is not available, then a subset of the data could be used to furnish the prior.In our experience, no previous study is available, and thus, our data sets are so small that splitting the data would not be feasible.

In Bayesian data analysis,17 the use of the conjugate prior distribution approach consisting of adding hypothetical data points andthen applying the non-informative prior distributions approach to the augmented data set is suggested. This approach is not usefulfor our situation because, in general, there is no guarantee that the integrated likelihood (10) exists for a general GLM when animproper prior is used.

Given the previous considerations, we used the following approach to define a prior. First, we considered the parameter β0, whichis linked to the original scale by the relationship

β0 ¼ g μ0ð Þ: (1)

In order to obtain a prior distribution for β0, notice that μ0 could have the following two interpretations: first, it could be consideredthe mean response when none of the effects are significant or if the factors in the experiment are continuous; second, it is the meanresponse when all the factors in the experiment are set to zero, namely, the mean response in the central region of the experiment.Notice that μ0 is related to an observable characteristic in the experiment. Regardless of the interpretation, this approach assumesthat the experimenter has a broad idea about the mean response’s value, and it is stated in terms of a probability interval of the form

P Lμ < μ0 < Uμ� � ¼ 1� δμ; (2)

where Lμ and Uμ are a lower and upper bound for μ0, and δμ is a small fraction, 5% or 1%. Assuming for a strictly increasing linkfunction from (1) and (2), it follows that:

P g Lμ� �

< β0 < g Uμ� �� � ¼ 1� δμ (3)

Then, one way to employ in (3), a normal distribution is by choosing the following parameters for the prior density of β0 as

μβ0¼ g Lμ

� �þ g Uμ� �

2;σ β0 ¼

g Uμ� �� μβ0

z1� δμ=2ð Þ(4)

where zξ is the ξ � th percentile of the standard normal distribution. If the link function is strictly decreasing, then μβ0remains the

same, but σ β0 changes to σ β0 ¼ g Lμ� �� μβ0

h i=z1� δμ=2ð Þ .In the extreme case, when there is no information about μ0, a normal

distribution around zero could be used. In Section 5, we give the results of this choice in a brief sensitivity study for all threedistributions, where if we have no information about μ0, we use a normal distribution N(0,10) (variance is equal to 100).

For the rest of the parameters, βi and 1 ≤ i ≤ k, we will assume that they are independent Cauchy random variables centered at zero,because it is assumed that there is no information about the sign of the effect. If, however, there is information that an effect shouldhave a certain sign, this could easily be incorporated by truncating when sampling from the prior distribution accordingly.

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 7: A Bayesian Analysis of Very Small Unreplicated Experiments

V. AGUIRRE-TORRES AND R. DE LA VARA

419

The assumption of independence is common in.19–21,10 The Cauchy distribution is chosen so that the prior is less informative. Thischoice is also supported by the sensitivity study of Section 5.

The previous proposal is sufficient for binomial and Poisson models. We use the same procedure for β, but there is an extraparameter for the gamma distribution, namely, the shape parameter r. To deal with this parameter, it is assumed that the densityof yj is given by

f yjjMi

� �¼ 1

Γ rð Þλrje�yj=λj yr�1

j : (5)

This model supposes that the parameter r is the same for all observations, E(yj) =μj= rλj, and that the canonical link isg μj

� �¼ 1

μj¼ ηj ,

but this link is not used because some estimates may give negative values for ηj. Consequently, a log link will be used, that is, log(μj) = ηj.

It is well known that Var yj

� �¼ μ2

j =r, thus the coefficient of variation of yj is given by: CV yj

� �¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiVar yj

� �r=μj ¼ 1=

ffiffir

p.To obtain a

prior distribution for r, we assume that a probability interval for the coefficient of variation of the response is available, namely, thatthere are constants Lc, Uc , and δc such that

P Lc < CV yð Þ < Ucð Þ ¼ 1� δc;

then

P1

U2c

< r <1

L2c

� �¼ 1� δc; (6)

because r is positive, a two parameter Γ (a,b) distribution is used as a prior. In order to fulfill (6), we need to solve in (a,b) the followingsystem of nonlinear equations:

P Γ a; bð Þ < 1

U2c

� �¼ wδc and P Γ a; bð Þ > 1

L2c

� �¼ 1� wð Þδc (7)

where 0<w< 1 is a weight that could be taken as 0.5 if both tails of the interval have the same probability. A solution for (7) could beobtained by minimizing

Qw a; bð Þ ¼ P Γ a; bð Þ < 1

U2c

� �� wδc

2þ P Γ a; bð Þ > 1

L2c

� �� 1� wð Þδc

2;

with respect to (a,b). A possible strategy is to start with w=0.5, and if a solution is not found, then iterate on w. When there is noinformation about CV(yj), we propose tentatively to use the interval (Lc,Uc) = (0.15,3.6) and δc= 0.01; this choice represents a verygenerous interval.

3.2. Posterior probability that an effect is active

This approach is an extension of GLM of the ideas given in.19–21 They dealt with cases of observations having a normal distributionand where the link function is the identity.

For the fractional factorial with n=2k runs, let m= 2n� 1. Consequently, there are M0, M1, M2,…, and Mm� 1 possible models, whereM0 is the constant model and none of the effects is significant. Model i denoted by Mi has as a parameter vector θi, which contains theactive effects and the shape parameter in the case of gamma distribution. The approach is focused on computing the PPA, which isgiven by

Pj ¼ ∑Mi :xj is present

p Mi yjð Þ (8)

where p(Mi|y) is the posterior probability of model Mi given that the experiment has been observed. From Bayes’ Theorem,

p Mi yjð Þ ¼ p Mið Þf ðy Mij Þ∑m

h¼0p Mhð Þf y Mhj Þð :

(9)

In the previous formula, p(Mi) is a prior probability that model Mi is true (to be determined later) and f(y|Mi) is the so-calledintegrated likelihood of model Mi ,which is given by

f y Mij Þð ¼ ∫Θi

f y Mi; θiÞf θi Mij Þdθið ;jð (10)

and the density f(θi|Mi) is the prior density for the parameters present in model Mi. If a parameter is not active, then it is set to zero.

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 8: A Bayesian Analysis of Very Small Unreplicated Experiments

V. AGUIRRE-TORRES AND R. DE LA VARA

420

Effects with relative large posterior probabilities are considered candidates of active effects. Notice that this proposal does notdepend on a large sample size; in fact, it could be computed for any sample size. However, it may depend on the choice of priordistribution, hence, it is convenient to conduct sensitivity analysis and observe how the pattern of posterior probabilities change withrespect to changes in the prior distribution.

Notice also that prior knowledge of the phenomenon can be incorporated by means of a prior distribution. For example, thesubject matter expert may know that an effect should a have positive sign. If the latter is so, it can be easily incorporated into thecalculation of the integrated likelihood by truncation.

For the normal distribution and the identity link function, Box and Meyer derived an analytic expression for (10). However, there isno closed form formula for this integral in a general GLM case. Hence, we estimated the integral using a crude Monte Carlo average bysimulating from the prior distribution of the parameters f(θi|Mi).

In order to define the prior probability of a model p(Mi), we considered that in a screening experiment all columns may have the sameimportance. If we let α be the prior probability that any one effect is active, then the probability of observing a model with ti significant

effects will be19αti 1� αð Þn�ti . From empirical evidence on fractional factorial experiments, known as factor sparsity, it has been observedthat 0< α< 0.4, thus, we consider the value of 0.2 for 8 and 16 experimental runs, and 0.3 for four experimental runs.

In the Bayesian variable-selection approach,10 a weighting procedure is used that tries to impose effect heredity, meaning, that aninteraction term is likely to be present when the corresponding main effects are present in the model. We did not pursue that optionbecause simulations given in Section 5 shows that the choice of19 gives sensible results.

Table III gives the corresponding links and log-likelihood functions for each of the models considered here. Regardingcomputation, it is convenient to calculate first the log-likelihood. The latter is particularly useful for gamma distribution becausethe computation of the log-gamma function is much more stable than the gamma function itself.

3.3. Posterior distribution of the effects

Now, let us consider a second approach. From Bayes’ Theorem, the PDE given the experiment is obtained from

π θi Mi; yj Þð ∝f y Mi; θij Þð f θi Mij Þð (11)

In order to obtain a simulated sequence from the posterior distribution, a numerical method, namely, Bayesian Markov Chain MonteCarlo22,23 is used. This method simulates a sequence from aMarkov chain whose limiting distribution is the posterior distribution π(θi|Mi, y).This method’s application uses the terms in the sequence after a burn-in period. The method is implemented in a free software, openBUGS(Copyright © 1989, 1991 Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA).24,25

The density π(θi|Mi, y) could be used to find posterior probability intervals as well as to compute probabilities of events. Forexample, if A is an event, then its odds are P(A)/(1� P(A)). With the posterior density, we can calculate the odds that a parameter ispositive or negative and use it as evidence of whether the effect is active.

4. Some examples

In this section, we will show the proposal’s application to the examples of the first section.

4.1. Simulated gamma example

Figure 6 shows the PPA for the simulated experiment with a gamma response. The probability labeled with zero is the probability thatnone of the effects is significant. In this example, it is very small, thereby, providing evidence that at least one of the effects should besignificant. Furthermore, in this plot, effects A and C stand out from the rest supporting the fact that they should be significant.

We were able to obtain the posterior distribution for β0, r and five effects with eight observations and a gamma model. Figure 6also shows PDE’s boxplots. They give strong evidence that effect A is positive and effect C should be negative.

Table IV gives some quantiles from the posterior distribution. In particular, the posterior probability 95% intervals support the factthat effect A is positive and effect C should be negative. Notice also that they contain the true value of the parameters. In this case, theodds are: for A to be positive, 0.975/0.025 = 39 or 39 to 1 and for C to be negative, 0.98/0.02 = 49 or 49 to 1. Both are huge.

Therefore, in this case, the tools from the Bayesian approach led us to a much more enriching analysis than that of the Daniel plot.

4.2. Cookie experiment: sweetness.

In order to evaluate sweetness, factory management considered it convenient to use a 2-alternative forced choice response.26 Each ofthe 10 judges evaluated four pairs of cookies in random order between and within pairs. Each pair contained a control formulationand a formula coming from the corresponding treatment combination. If the formula from the treatment combination wasconsidered sweeter than the control, then, it was considered a success. Thus, the response to be modeled was the total number ofsuccesses for each treatment combination.

Table II contains this response as well as the estimates of the effects produced by ‘glm’. Figure 2 is the corresponding Daniel plotand, as mentioned previously, its analysis is difficult. In contrast, Figure 7 which gives the PPA provides the following interpretations:

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 9: A Bayesian Analysis of Very Small Unreplicated Experiments

[1]

[2]

[3]

[4]

[5]

box plot: beta

-2.0

-1.0

0.0

1.0

2.0

Figure 6. Posterior probability that an effect is active and posterior distribution of the effect boxplots. Simulated gamma response. [1] A, [2] B, [3] C, [4] AB, and [5] AC

V. AGUIRRE-TORRES AND R. DE LA VARA

1. The probability that none of the effects is active and practically zero, thus, there is at least one active effect2. Effect A is clearly active3. Effect B is clearly inactive4. Effect AB is very likely to be active

Figure 7 gives the PDE’s boxplots and suggests that both effects A and AB are active and positive. This is also confirmed fromTable V and the posterior probability 95% intervals for A and C. From Figure 7, the third quartile is close to zero, hence, the posteriorodds that effect B is negative are around 3 to 1, which is not very definite. Therefore, in this example and for this response, the toolsfrom the Bayesian approach gave us a more enriching analysis of the data.

5. Some simulation results

In order to compare the sampling behavior of the posterior probability method and the frequentist approach, we ran a simulation forthe three distributions discussed here. We will only present the results for the binomial and gamma distribution because the resultsfor Poisson distribution are similar to those of the binomial case. We used a 23 experiment in all cases with different linear predictors.In all cases, we ran 200 Monte Carlo replications. This is a small number but sufficient to show the main features of both the Bayesianand frequentist approaches.

For each replication, we also computed the estimated effects from the function ‘glm’ from the R package, the Shapiro–Wilknormality test on the estimated effects to resemble the interpretation of the Daniel plot, its corresponding p-value, and the posteriorprobability that each of the effects are active.

Then, we ran a sensitivity study. The purpose is to evaluate the impact of three different aspects of prior choices on the PPA. Weused examples with binomial, Poisson, and gamma responses.

5.1. Binomial distribution

The linear predictor was η= 2A� 3B+3C+ 2BC. Prior considerations were (Lμ,Uμ) = (0.1,0.9) and δμ=0.01. As wementioned in Section 2,the estimators of the effects are very biased and show a strong effect of discreteness. See, for example, Figure 8, which is the scatterplot of the estimates of effects C and AB.

The estimator of effect C is around 10 when its true value is 3. The estimator of effect AB is around �2.5 when its true value is 0.There is also a positive correlation between the two estimators around 0.84; however, this contaminates the interpretation of theDaniel plot because in some samples AB may look significant, but it is due to this correlation with a significant effect. Other effects

Table IV. Means, standard deviations, and posterior distribution of the effect’s quantiles for a simulated 23 factorial experimentwith a gamma response

Effect node mean sd 2.5% median 97.5%

A beta[1] 0.83 0.43 �0.05 0.84 1.64B beta[2] 0.38 0.43 �0.47 0.38 1.22C beta[3] �0.98 0.45 �1.83 �0.98 �0.08AB beta[4] 0.04 0.43 �0.80 0.04 0.90AC beta[5] �0.48 0.43 �1.33 �0.48 0.38

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

421

Page 10: A Bayesian Analysis of Very Small Unreplicated Experiments

box plot: beta

beta

-4.0

-2.0

0.0

2.0

4.0

6.0

[1]

[2]

[3]

1.26

Figure 7. Posterior probability that an effect is active and posterior distribution of the effect boxplots. Cookie sweetness experiment. [1] A, [2] B, and [3] AB

V. AGUIRRE-TORRES AND R. DE LA VARA

422

are also highly correlated. Because it is not possible to visually interpret the 200 replications, in order to evaluate the Daniel plot’sperformance, we used the p-values obtained from the Shapiro–Wilk normality test as a proxy. Figure 4 shows the histogram of thesep-values. We noticed that they are below 0.10 for less than 5% of the replications, thus, implying a low power to detect non-normality.

On the other hand, Figure 13 (1) gives the boxplots of the posterior probabilities that each of the effects are active. It is clearlyshows that the posterior probabilities for effects B and C are practically one for most of the replications, and that the correspondingprobabilities for effects A and BC are similar, because they are of the same size. Furthermore, the probabilities of the rest of the effectsare, in general, close to zero, but for some samples are exceptional in that they are greater than 0.5.

It is remarkable that in this very small experiment, there is no need for the effect sparsity because four effects (50%) were active,and they were all detected effectively.

5.2. Gamma distribution

The linear predictor was η=1+ 2A� B+ 0.5C. Prior considerations were for μ0 (Lμ,Uμ) = (0.5,12) and δμ= 0.01 for the coefficient ofvariation (Lc,Uc) = (0.15,3.6) and δc=0.01. Both intervals are very generous. Figure 9 gives the histogram for the estimates of effectA where the distribution is unimodal, centered around the true value, and without gaps.

Figure 10 is the scatter plot between the estimates of effects A and AB. It shows that the estimates of effect AB are centered aroundzero as they should be, and also that they are not correlated (the estimated correlation is 0.027).

Figure 11 gives the histogram of the p-values of the Shapiro–Wilk normality test. It shows that p� value< 0.10 for less than 20% ofthe replications, which is a very low score if we expect the Daniel plot to show signs of non-normality.

Finally, Figure 12 gives the boxplot of the PPA. The probability that none of the effects are active is quite close to zero with very lowvariation as expected. The probabilities of effect A, the largest effect, are also very close to one and with low variation. Theprobabilities of effect B have an interquartile range between 0.2 and 0.8. The probabilities of the smallest active effect are, in general,larger than those of the null effects. Thus, this approach behaves as expected.

5.3. A sensitivity analysis

As mentioned by both referees, in analyzing a small fractional factorial experiment with a Bayesian approach, the prior used caninfluence the result. For this reason, we included a sensitivity analysis with respect to three aspects relative to the prior choice. Theanalysis is constructed using a fractional factorial setup. The linear predictor and prior considerations are the same for the binomialand gamma distributions as in Section 5. For the Poisson response, the linear predictor is η=1+A� B+ 0.75C+ 0.5BC, priorconsiderations for μ0 are (Lμ,Uμ) = (0.5,50) and δμ= 0.01.In this case, μ0 = e1 and it is contained in the a priori interval.

The factors and their levels are given in Table VI. The experiment consisted of a 23� 1 fraction with a defining contrast I= ABC=� 1and 200 independent Monte Carlo replications for each treatment combination for each response: binomial, Poisson, and gamma.Notice that the treatment (�1,� 1,� 1) is the choice used in the examples and in the previous simulation results. The order oftreatments or scenarios is given in Table VII.

Table V. Means, standard deviations, and posterior distribution of the effect’s quantiles for cookie sweetness 22 factorialexperiment

Effect node mean sd 2.5% median 97.5%

A beta[1] 2.68 0.95 1.19 2.53 4.88B beta[2] �0.91 0.94 �3.04 �0.78 0.62AB beta[3] 2.00 0.95 0.51 1.85 4.22

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

Page 11: A Bayesian Analysis of Very Small Unreplicated Experiments

Figure 8. Scatter plot of effect estimates, AB versus C. Simulated 23experiment with binomial response and 10 trials. Linear predictor: η= 2A� 3B+ 3C+ 2BC

V. AGUIRRE-TORRES AND R. DE LA VARA

Factor A represents the experimenter’s willingness to assume that there are active effects. Factor B represents the prior knowledgeof the values of the response. Factor C represents a choice between a peaked infinite variance versus a flat finite variance distributionfor the effects; both choices are symmetric around zero.

Figure 13 gives the PPA’s boxplots for each scenario for the binomial distribution. Generally speaking, effects with PPA close to oneor zero remained quite similar. A comparable thing happened with the other responses. More evident changes occurred for effectswith probabilities in the middle. In order to measure their impact, we used the factorial structure of the study and as a responsethe mean PPA. Then, we computed the effects for each aspect in the usual way. Table VII gives the means and standard deviationsfor the average PPA of effect A and the binomial response.

We calculated the estimated effects and corresponding standard deviations using the data of Table VII for some variables wherethe previous exploratory analysis suggested an important change. Table VIII gives the impacts for chosen variables and distributions.

Let us consider the prior probability α , for the binomial example. It has a negative effect of around 8%, but for the Poisson andgamma distributions, it has a positive effect of 8% and 46%, respectively. All of these effects are significant. Given this resultand because this choice is inexpensive to change, we suggest getting the PPA for both values of α and then, comparing theresulting patterns.

Regarding prior information on β0, there is a significant negative effect; the exception is the binomial distribution. The latter isexpected because, in this case, β0 is equal to zero, which corresponds to μ0 = 0.5 and is within the prior interval. Therefore, thissuggests the importance of giving an approximate interval for μ0, which according to our experience is not too difficult to elicitbecause we have found, in general, that the people involved in experiments have a broad idea of the average values of the response.

The effect of peaked or flat distribution, at least for all the examples shown, are negative and significant, meaning, that the Cauchydistribution is a better choice than the uniform distribution. We chose the Cauchy distribution because it has an infinite variance,which seems to be useful.

Figure 9. Histogram of effect A. Monte Carlo simulated 23factorial experiment with gamma response

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

423

Page 12: A Bayesian Analysis of Very Small Unreplicated Experiments

Figure 10. Scatter plot of effect estimates, AB versus A. Simulated 23experiment with gamma response. Linear predictor: η= 1+ 2A� B+ 0.5C

Figure 11. Histogram of p-values, Shapiro–Wilk normality test. Simulated 23factorial experiment with gamma response

Figure 12. Posterior probability that an effect is active boxplots. Simulated 23factorial experiment with gamma response. Linear predictor: η= 1+ 2A� B+ 0.5C

V. AGUIRRE-TORRES AND R. DE LA VARA

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

424

Page 13: A Bayesian Analysis of Very Small Unreplicated Experiments

Table VI. Factors and levels sensitivity analysis

Factor Prior aspect (�) (+)

A Prior probability that an effect is active α 0.2 0.3B Prior information on β0 and density Yes, N μβ0

;σ β0

� �No, N(0,10)

C Prior distribution for βi Cauchy Uniform(�30,30)

Figure 13. Posterior probability that an effect is active boxplots. Sensitivity analysis with binomial response. Linear predictor: η= 2A� 3B+ 3C+ 2BC

Table VII. Sensitivity analysis. average posterior probability that an effect is active, and standard deviation for binomial response,variable A

A B C Average PPA. Std Dev. PPA.

�1 �1 �1 0.6316 0.32481 1 �1 0.5275 0.41091 �1 1 0.4405 0.3989�1 1 1 0.4904 0.3859

PPA: posterior probability that an effect is active

Table VIII. Summary sensitivity analysis for selected variables: Effects of aspects on average posterior probability that an effect isactive

Aspect Binomial, A Poisson, A Gamma, B

Effect t-stat Effect t-stat Effect t-stat

A �0.08 �2.85 0.08 3.12 0.46 17.16B �0.03 �1.00 �0.09 �3.32 �0.23 �8.34C �0.11 �4.23 �0.18 �6.66 �0.25 �9.120

V. AGUIRRE-TORRES AND R. DE LA VARA

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426

425

Page 14: A Bayesian Analysis of Very Small Unreplicated Experiments

V. AGUIRRE-TORRES AND R. DE LA VARA

426

6. Concluding remarks

In this paper, we proposed to supplement the use of Daniel plots with tools from the Bayesian approach, namely, PPA and PDE whenthe experiments are unreplicated. We showed that this is particularly useful when the size of the experiment is very small for bothnormal and non-normal responses.

Acknowledgements

Víctor Aguirre-Torres is a Professor in ITAM’s Statistics Department, and Román de la Vara is a Professor in CIMAT’s Quality EngineeringDepartment. The first author was partially supported by Asociación Mexicana de la Cultura A.C and conducted part of this researchwhile on sabbatical leave at CIMAT’s Probability and Statistics Department. Both authors wish to thank the editor and two anonymousreferees for their helpful comments that helped to improve the paper.

References1. Daniel C. Applications of Statistics to Industrial Experimentation. John Wiley and Sons: New York, NY, 1976.2. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0,

URL http://www.R-project.org/, 2012.3. Myers RH, Montgomery DC, Vining G. Generalized Linear Models: with Applications in Engineering and the Sciences. Second edition. John Wiley

and Sons: New York, NY, 2010.4. Benski H. C. Use of a normality test to identify significant effects in factorial designs. Journal of Quality Technology 1989; 21:174–178.5. Albert A, Anderson JA. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984; 71:1–10.6. Wedderburn R. On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika 1976;

63:27–32.7. Wang X, George E, Adaptive Bayesian criteria in variable selection for generalized linear models. Statistica Sinica 2007; 17:667–690.8. SAS Institute, 9.2 User’s Guide (2nd ed.), SAS Institute: Cary, NC, 2013.9. Kenett R, Rahav E, Steinberg D, Bootstrap analysis of designed experiments. Quality and Reliability Engineering International 2006; 22:659–667.

10. Chipman H, Hamada M, Wu CFJ. A Bayesian variable-selection approach for analyzing designed experiments with complex aliasing. Technometrics1997; 39:372–381.

11. Weaver BP, Hamada M. A Bayesian approach to the analysis of industrial experiments: an illustration with binomial count data. Quality Engineering2008; 20:269–280.

12. Hamada M, Nelder J. Generalized linear models for quality-improvement experiments. Journal of Quality Technology 1997; 29:292–303.13. Lewis S, Montgomery DC. Examples of designed experiments with Non-normal responses. Journal of Quality Technology 2001; 33:265–278.14. Dey K, Sujit K, Bani, K. Generalized Linear Models. A Bayesian Perspective. Marcel Dekker: New York, NY, 2000.15. Wu CFJ, Hamada M. Experiments: Planning, Analysis, and Parameter Design Optimization. John Wiley and Sons: New York, NY, 2000.16. Ntzoufras I, Dellaportas P, Forster JJ. Bayesian variable and link determination for generalised linear models. Journal of Statistical Planning and

Inference 2003; 111: 165–180.17. Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis (2nd edition). Chapman & Hall/CRC: Boca Raton, FL, 2004.18. Chen M, Shao Q, Ibrahim J. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics: New York, NY, 2000.19. Box G, Meyer R. An analysis for unreplicated fractional factorials. Technometrics 1986; 28:11–18.20. Box G, Meyer R. Analysis of Unreplicated Factorials Allowing for Possibly Faulty Observations. In Design, Data, and Analysis, C. Mallows (ed.). John

Wiley and Sons: New York, NY, 1987.21. Box G, Meyer R. Finding the active factors in fractionated screening experiments. Journal of Quality Technology 1993; 25:94–104.22. Gilks WR, Richardson S, Spiegelhalter DJ (eds.) Markov Chain Monte Carlo in Practice. Chapman and Hall: London, UK, 1996.23. Brooks SP. Markov chain Monte Carlo method and its application. The Statistician 1998; 47:69–100.24. OpenBUGS, version 3.2.2 rev 1063. http://www.openbugs.info/w.cgi/FrontPage (3 February 2013).25. Gilks W. Derivative-Free Adaptive Rejection Sampling for Gibbs Sampling. In Bayesian Statistics 4, J M Bernardo, J O Berger, A P Dawid, A F M Smith

(eds). Oxford University Press: UK, 1992; 641–665.26. Gacula M, Singh J, Bi J, Altan A. Statistical Methods in Food and Consumer Research (2nd edition). Academic Press: New York, NY, 2009.

Authors' biographies

Víctor Aguirre-Torres has a PhD in Statistics from North Carolina State University. He is an elected member of Mexico’s SistemaNacional de Investigadores in the area of Physics and Mathematics. He is a professor in the Statistics Department at InstitutoTecnológico Autónomo de México.

Roman de la Vara has a Doctoral degree in Probability and Statistics from the Center for Research in Mathematics in México. He is aprofessor in Center for Research in Mathematics in México’s Quality Engineering Department.

Supporting information

Additional supporting information may be found in the online version of this article at the publisher’s web site.

Copyright © 2013 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2014, 30 413–426