Design of Experiments Example...

21
ROK Bioconsulting [email protected] Design of Experiments Example 009 Khurana et al (2007) India J. Microbiolol (47) 144-152 Statistical optimization of alkaline xylanase production from Streptomyces violaceoruber under submerged fermentation using response surface methodology ©ROK Bioconsulting 2016 Page 1 of 21 Background to Example This example has been adapted from the original paper to show how to design, run and analyse a DOE experiment. In addition, this example shows how to complement a factorial design to account for significant curvature in the data. An excel data file for this example can be obtained from [email protected]. In the example, the authors describe the optimisation of fermentation growth conditions and growth medium raw materials. This optimisation aim is to improve the production of the industrial enzyme, Xylanase, from Streptomyces violaceoruber fermentations. Optimisation experiments were carried out in 125mL shake flasks with 30mL of growth medium which were incubated in a shaking incubator. The five factors selected for optimisation are shown in Table 1. Factors (Parameters) Abbreviation Units Ranges Wheat Bran concentration A % (w/v) 1.5 to 3.5 Peptone concentration B % (w/v) 0.4 to 0.8 Agitation speed C rpm 150 to 250 Incubation time D hours 36 to 72 Beef Extract concentration E % (w/v) 0.4 to 0.8 Table 1. Factor names and ranges Model set up To start setting up the experimental design, open MODEE and start a new design with File>New>Experimental design. Figure 1. Start Experimental design setup. The Design wizard (Figure 2) will ask you to set up the design factors and responses

Transcript of Design of Experiments Example...

Page 1: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ROK Bioconsulting [email protected]

Design of Experiments Example 009

Khurana et al (2007) India J. Microbiolol (47) 144-152

Statistical optimization of alkaline xylanase production from Streptomyces violaceoruber under submerged fermentation using response surface methodology

©ROK Bioconsulting 2016 Page 1 of 21

Background to Example This example has been adapted from the original paper to show how to design, run and analyse a DOE experiment. In addition, this example shows how to complement a factorial design to account for significant curvature in the data. An excel data file for this example can be obtained from [email protected]. In the example, the authors describe the optimisation of fermentation growth conditions and growth medium raw materials. This optimisation aim is to improve the production of the industrial enzyme, Xylanase, from Streptomyces violaceoruber fermentations. Optimisation experiments were carried out in 125mL shake flasks with 30mL of growth medium which were incubated in a shaking incubator. The five factors selected for optimisation are shown in Table 1.

Factors (Parameters) Abbreviation Units Ranges

Wheat Bran concentration

A % (w/v) 1.5 to 3.5

Peptone concentration B % (w/v) 0.4 to 0.8

Agitation speed C rpm 150 to 250

Incubation time D hours 36 to 72

Beef Extract concentration

E % (w/v) 0.4 to 0.8

Table 1. Factor names and ranges

Model set up To start setting up the experimental design, open MODEE and start a new design with File>New>Experimental design.

Figure 1. Start Experimental design setup.

The Design wizard (Figure 2) will ask you to set up the design factors and responses

Page 2: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 2 of 21

Figure 2.

In the factor table (Figure 4) enter the names the details of each factor from Table 1 . Use Wheat Bran, Peptone, Agitation, Incubation time, and Beef Extract as factor names. Modify factor abbreviations to A,B,C,D, & E. Enter the ranges shown in Table 1 for each factor directly into ranges column. For more detailed changes to factor set up, select a factor and double click. In the dialog (Figure 3), select the “Advanced” Tab and change the No. of decimals to “1”. Repeat this for each factor. While this step is needed for this design, the step is optional for your designs.

Figure 3. Advanced factor definition settings.

Close the factor definition dialog and you should have a factor table similar to Figure 4.

Page 3: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 3 of 21

Figure 4.

In the Design wizard, click next to enter the details of the responses. In this experiment there is only one response. Add “Xylanase” as the response. Change abbreviation to “Prod” and units to “IU/mL”.

Figure 5. Define responses

Click next and select a screening design. In the select model and design dialog (Figure 6) Select the Full factorial (2 levels) design and add 4 center points to the design and select finish.

Page 4: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 4 of 21

Figure 6. Select model and design

MODEE will create a experimental worksheet similar to Figure 7. This table shows the experimental conditions sorted by default in Exp No order (also known as Standard order). Experiments are typically run using the randomised run order and experiments can be sorted into run order by selecting the column and right clicking to select sort order. For this example make sure that the worksheet is sorted according to Exp No.

Figure 7

Open “KHURANA Xylanase data.xls” file in FF design worksheet. Select and copy Xylanase activity results from the coloured area and paste into the Xylanase column of the MODEE worksheet.

Page 5: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 5 of 21

Evaluation of Raw data In the home menu, fit the model (and ignore any errors). Then select the analysis wizard.

The analysis wizard dialog will guide you through diagnostic steps to check the fitted model. The first dialog below shows the replication analysis for the Xylanase (Figure 8). If a design has more than one response, the replicate plot for each response can be selected from the response drop down menu indicated. For this design, the only 4 replicated center points are shown as blue points in the replicate plot. The bar charts on the right show R2 and Q2. R2 is a measure of variability captured by the model while Q2 is the variance predicted by the model. As a rule of thumb R2 should be > 0.85 and Q2 should be within 0.2 of R2. The R2 and Q2 values show ( 0.5 and 0.374) are too low for an acceptable model suggesting that there is a problem with the design or outlier points.

Figure 8.

Click next to move to the histogram diagnostic screen. The histogram shows the distribution of response results. The results look skewed which can be corrected by transforming the Xylanase response data. For simplicity, try to avoid applying transformation unless

Page 6: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 6 of 21

absolutely necessary. For this example, no transformation will be applied to the response data.

Figure 9

Click next to move to the Fit summary diagnostic. The missing model validity measure indicates that there is a problem with the model such as outliers, an incorrect model or a transformation problem. Click next to move to the coefficient plot.

Figure 10

The coefficients plot (Figure 11) shows the coefficient for each term in the model and error bars to indicate whether coefficients are significant. Error bar for non-significant terms cross the zero line. In this plot, only the A term (Wheat bran) in significant. In addition, potential

Page 7: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 7 of 21

model problems were detected. A square test has detected significant curvature in the dataset. Press “Edit model” to add squared terms to the model.

Figure 11.

In the edit model dialog (Figure 12), select all the factors in the factor colum and press the “SQUARES=>” button to add squared terms to the model.

Figure 12.

You will get an error dialog (Figure 13) and will not be able to add squared terms. This is because the dataset is “ill conditioned” and there is not enough data to accurately fit the curvature. Additional experimental runs will need to be carried out and added to the dataset to resolve the curvature and create a reliable model.

Page 8: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 8 of 21

Figure 13.

Save the current design file. To add experiments to the design, select New in the file menu and select Complement design (Figure 14).

Complementing the design

Figure 14

In the complement design wizard dialog (Figure 15) , select “Estimate square terms in a screening design”. This option will allow you to select the additional experiments that you would like to carry out. Click next.

Page 9: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 9 of 21

Figure 15

As you want to estimate the squared terms from all the factors, select all the factors in the list (Figure 16). Typical star distance values for face centred, central composite and orthogonal designs are suggested in the dialog. For this dataset, enter 2.38 into the star distance box. Click next

Figure 16

The dialog box in Figure 17 gives the option to change the name and location of the new design file. The new complemented design will by default be saved in the same folder as the original design with “- Complemented” added to the file name. Four additional centre point are also added the design to estimate any effect of the two blocks in the design (the factorial design block and the complemented design block)

Page 10: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 10 of 21

Figure 17

After this step a dialog will ask whether you want to include a block variable in the new worksheet. As this new block of experiments are independent of the first factorial design block, the effect of the new experimental block needs to be estimated, so select yes to include the block variable. Additional experiments C37-C50 will been added to the worksheet. Experiment C37 to C46 are the star points while experiments C47 to 50 are the additional center points.

Figure 18

Switch to the excel workbook “KHURANA Xylanase data.xls” and select the worksheet “Complemented design results”. Select the Xylanase response results indicated by the green area. Copy and paste all the results into the first cell of the Xylanase response column.

Page 11: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 11 of 21

Analysis of complemented data set You now need to analyse the new complemented data set. In the home menu select the analysis wizard and the replicates analysis will be displayed (Figure 19). Replication of the centre points for each block was good as shown as blue markers. The R2 and Q2 values are much improved ( R2 = 0.995 and Q2=0.986) when compared to the original design analysis (Figure 8).

Figure 19

Click next to move to the histogram response plot (Figure 20). Two peaks are visible in the histogram which are a result of the strong curvature. Since this is captured by the model (good R2 and Q2) no Xylanase response transformation is required.

Figure 20

Page 12: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 12 of 21

Click next to move to the summary of fit plot (Figure 21). The R2 and Q2 fit values for the model are much improved when compared to the original factorial model. The data reproducibility is very good. However, there is no estimate of model validity which will be investigated further by examining the ANOVA table later.

Figure 21

Click next to move to the coefficients plot. This plot shows the relative effects of each of the model parameters All the parameters except interaction parameter C*E are significant including a small block effect ($Bl). Term A and C (Wheat Bran and Agitation) have strong positive affects while term D (Incubation time) has a negative effect on Xylanase production). There is a strong interaction between terms A & D (A*D) and all the squared terms added to the model are significant. (A*A, B*B, C*C,D*D & E*E). Insignificant model terms such as C*E can be removed at this stage by editing the model.

Figure 22

Page 13: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 13 of 21

Click next to move to the residual plot. This plot show a cumulative normal probablility plot of the data set and is useful for identifying outlier points. Experiment 44 is eight standard deviations from the mean and is well outside the 4 SD limits. Experiment 44 results should be checked to ensure that assay calculations were correct and that no other error occurred in experimental execution. The root cause of this outlier also should be identified and noted as a reason for excluding the experimental point. Double click on the point to exclude experiment 44 from the model. The model will be recalculated and an updated probability plot displayed (Figure 24).

Figure 23

The new probability shows some minor deviations outside the 4 SD limits which can be ignored. Click next to move to the observed vs predicted plot.

Figure 24

Page 14: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 14 of 21

The observed vs predicted plot show the Xylanase predictions from the model compared in a parity plot with the observed experimental values. The parity line show how close the predictions are to the observed. As all the points in sit on the parity line this shows that the model predicts the experimental results well.

Figure 25

For an additional overview of the model, select ANOVA table from the Analyse menu. The ANOVA table (Figure 26) shows that the model diagnostic parameters are very good. R2, R2(ADJ) and Q2 are all > 0.95 As all the centre point experiments yielded the same xylanase response value, the replicate error and lack of fit could not be estimated which explains why the model validity is missing. This excellent replication of center points over two independent experimental blocks is highly unusual!

Figure 26

Page 15: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 15 of 21

The model can be investigated in more detail by viewing the list of coefficient values, available from the Analyse>Coefficients menu (Figure 27).

Figure 27

The table of coefficient values, errors and probabilities can be reviewed in Figure 28. As all of the probabilities are less than 0.05 (95% confidence level) all the model factors, interactions and squared terms are significant. The block effect ($B & $Block) also appears have a small but significant effect.

Figure 28

A similar effects plot, available from the Analyze menu, which show the relative effects of each factor on the Xylanase concentration. The effect are plotted in the order of the size of the effect (Figure 29). Main effect factors Wheat Bran (A) and Agitation (C) have strong positive effects while incubation time (D) has a strong negative effect on Xylanase production. Beef extract (E) has a small positive effect while Peptone (B) has a smaller effect similar to the small block effect. It could be argued that peptone could be reduced further or removed from the growth medium to reduce costs.

Page 16: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 16 of 21

Figure 29

All the squared terms have a significant negative effects. Incubation time (D*D and D) has a strong negative effect suggesting that Xylanase degrades as the culture incubation time increases possibly as a result of protease action. There is also a strong interaction between Wheat bran and incubation time (A*D) which can be investigated further by plotting an interaction plot (Figure 30) from the Analyze menu. Right click on the plot and change the properties of the interaction plot in the drop down menu to “A*D”.

Figure 30

The curves show that there is a strong non-linear effect of Wheat Bran concentration on Xylanase concentration which also interacts with Incubation time. For low Wheat Bran concentration, high incubation time results in increased Xylanase concentration. For high Wheat Bran concentration , Xylanase concentration is much greater at lower incubation times.

Page 17: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 17 of 21

Visualization & Prediction MODEE offers a wide range of visualization options to view the model. One of the most useful is the 4D Contour plot available from the contour tab in the Home menu.

Figure 31

The contour figure shows how Xylanase concentration is affected by four factors, Wheat bran (%)- lower axis, Peptone (%) – Left axis, Agitation (rpm) top axis and Incubation time (hours) – Right axis. The remaining parameters can be adjusted using the properties window on the right. This clearly shows how increased Xylanase concentrations can be achieved using a combination of high wheat bran concentrations, high agitation and low incubation times. Optimum conditions from Xylanase production can be identified using the Optimizer tool in the Home menu

Figure 32

In the optimizer window (Figure 33), the criterion to optimize each response can be set to minimize, maximize target, predicted or excluded and multiple responses can be optimized together. In the lower window, each factor be modified freely or kept constant. In addition, the factor range limits can also be changed. For this example, set the criterion to maximize Xylanase.

Page 18: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 18 of 21

Figure 33

Run the optimizer and switch to the setpoint tab (Figure 34) which shows the optimum setpoint selected based on lowest DPMO (defaults per million operations). The setpoint tab also gives an option to find the most robust setpoint. This is shown in the alternative setpoints list as R.

Figure 34

Page 19: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 19 of 21

Validation A prediction model is only good as the predictions that it makes. Best practice for DOE based models and experiments is to verify the predictions by carrying out additional experiment(s) at the optimized setpoint (s). It is also good practice to verify a number of different predicted setpoints to evaluate the stability of the predictions in different parts of the model. In this example ,the authors published a set of conditions that they validated their model with. These conditions are listed in excel work book KHURANA Xylanase data.xls in work sheet “model validation”. Select the validation conditions from the excel sheet area marked in green. In MODDE Predict menu, open the Prediction worksheet and paste the validation conditions into the worksheet and set all $Block cells to 0.0. MODDE will predict Xylanase for each of the validation conditions.

Figure 35

To compare these to the experimentally obtained values select the Xylanase predictions and copy. Switch to the Excel spreadsheet and paste the predictions into the blue area. Excel will update a graph to compare the actual experimental values obtained with the predicted results. The should look like Figure 36. The data in Figure 36 has been fitted with a best fit regression line and the equation for the line is show in the top left hand corner. While the R2 value shows that the fit is good (0.88), the equation shows that there are significant differences between the model predictions and the experimental validation results. An ideal validation should have a slope close to 1 and intercept close to 0.0. In this case the slope is = 0.889 and intercept = 118.

Page 20: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 20 of 21

Figure 36. Model Validation

Table 1 shows more detail on the regression. The standard error on slope suggests that the slope is significantly different from the ideal value of 1. The standard error on intercept suggests that the intercept is not significantly different from the ideal value of 0.

Slope - > 0.888886 118.7253 <- Intercept

Std Err on Slope -> 0.090587 126.8788 <- Std Err on Intercept

R2 -> 0.923287 33.28819 <- Std Err on Y estimate

F-value-> 96.28527 8 <- Degrees of freedom

ssreg -> 106694 8864.827 <- ssresid

Table 2. Detailed regression analysis of the validation plot.

y = 0.8889x + 118.73R² = 0.9233

1000

1100

1200

1300

1400

1500

1600

1000 1100 1200 1300 1400 1500 1600

Pre

dic

ted

Xyl

an

ase

Act

ivit

y

Actual Xylanase activity

Page 21: Design of Experiments Example 009rokbioconsulting.com/wp/wp-content/uploads/2016/04/160301-KHUR… · The first dialog below shows the replication analysis for the Xylanase (Figure

ronan@rokbioconsulting .com ©ROK Bioconsulting 2016

[Title] ©ROK Bioconsulting 2016 Page 21 of 21

Summary This example has shown how an initial experimental design can be augmented with additional experiments to better model significant curvature. Models can be quickly assessed using the analysis wizard and more detail on resulting models can be determined from the analyse menu. Like other DOE software MODEE has extensive model prediction capabilities. If you would like to discuss this example or other examples, please contact [email protected]

Notes on issues with paper Some issues were identified with this paper while preparing this example CCC design stated in paper does not match default designs in MODDE. The set of 50 experiments is made up of 32 design points, 10 star (alpha) points and 8 centre points. There is an error on Table 1 standard order 40,41,& 42. Beef extract should be 0.6 , 0.1 and 1.1. The alpha factor implied in Table 1 ( alpha =2.0) of the report does not match the alpha points in table 2. It is likely that the original design was a factorial design that was augmented to a CCC design with a fold over using alpha = 2.38. The authors created a 25 factorial central composite design with 8 replicate centre points using Design Expert6 leading to a set of 50 experiments (32