Detection of phase shifts in batch fermentation via statistical analysis ...

11
Detection of phase shifts in batch fermentation via statistical analysis of the online measurements: A case study with rifamycin B fermentation Xuan-Tien Doan a , Rajagopalan Srinivasan a,b,, Prashant M. Bapat c , Pramod P. Wangikar c,∗∗ a Institute of Chemical and Engineering Sciences, 1 Pesek Road, Jurong Island, Singapore 627833 b Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117576 c Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India Abstract Industrial production of antibiotics, biopharmaceuticals and enzymes is typically carried out via a batch or fed-batch fermentation process. These processes go through various phases based on sequential substrate uptake, growth and product formation, which require monitoring due to the potential batch-to-batch variability. The phase shifts can be identified directly by measuring the concentrations of substrates and products or by morphological examinations under microscope. However, such measurements are cumbersome to obtain. We present a method to identify phase transitions in batch fermentation using readily available online measurements. Our approach is based on Dynamic Principal Component Analysis (DPCA), a multivariate statistical approach that can model the dynamics of non-stationary processes. Phase-transitions in fermentation produce distinct patterns in the DPCA scores, which can be identified as singular points. We illustrate the application of the method to detect transitions such as the onset of exponential growth phase, substrate exhaustion and substrate switching for rifamycin B fermentation batches. Further, we analyze the loading vectors of DPCA model to illustrate the mechanism by which the statistical model accounts for process dynamics. The approach can be readily applied to other industrially important processes and may have implications in online monitoring of fermentation batches in a production facility. Keywords: Amycolatopsis mediterranei; Complex medium; Multivariate statistical analysis 1. Introduction Fermentation processes have innumerable applications in food, agrochemical and pharmaceutical industries. For safety and health reasons, fermentation products are subjected to strin- gent regulatory standard (de Noronha Pissarra, 2004). Further, the cost-competitive nature of such products demands an opti- Abbreviations: Glc, glucose; DSF, defatted soybean flour; CSL, corn steep liquor; AMS, ammonium sulfate; DPCA, dynamic principal component analy- sis; SP, singular point; PC, principal component; PLS, partial least square mal operation of the process (Nielsen, 1998; Nissen et al., 2000; Olsson et al., 1998; Vara et al., 2002). Therefore, fermenta- tion process supervision is of particular importance to ensure consistent operation and thereby achieve high quality products. Industrial fermentation is typically carried out in batch or fed- batch mode to overcome the limitations of carbon and nitrogen catabolite repression (Bapat et al., 2006b). The key challenges in the monitoring of fermentation processes are batch-to-batch variation and complex dynamics. The batch-to-batch variation may result from the variation in the raw material quality or the variations in the seed culture. The variables that are desired to be monitored and controlled may include the biomass or prod- uct concentration(s). These variables are typically available only via offline measurements. Online measurements that are readily available include pH, temperature, agitation speed, dissolved oxygen, and exhaust CO 2 , and O 2 . However, these measure- ments do not give direct information on the state of the process (Vaidyanathan et al., 1999).

Transcript of Detection of phase shifts in batch fermentation via statistical analysis ...

Page 1: Detection of phase shifts in batch fermentation via statistical analysis ...

A

ppmt(datrf

K

1

fagt

ls

Detection of phase shifts in batch fermentation via statistical analysis ofthe online measurements: A case study with rifamycin B fermentation

Xuan-Tien Doan a, Rajagopalan Srinivasan a,b,∗,Prashant M. Bapat c, Pramod P. Wangikar c,∗∗

a Institute of Chemical and Engineering Sciences, 1 Pesek Road, Jurong Island, Singapore 627833b Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117576c Department of Chemical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India

bstract

Industrial production of antibiotics, biopharmaceuticals and enzymes is typically carried out via a batch or fed-batch fermentation process. Theserocesses go through various phases based on sequential substrate uptake, growth and product formation, which require monitoring due to theotential batch-to-batch variability. The phase shifts can be identified directly by measuring the concentrations of substrates and products or byorphological examinations under microscope. However, such measurements are cumbersome to obtain. We present a method to identify phase

ransitions in batch fermentation using readily available online measurements. Our approach is based on Dynamic Principal Component AnalysisDPCA), a multivariate statistical approach that can model the dynamics of non-stationary processes. Phase-transitions in fermentation produceistinct patterns in the DPCA scores, which can be identified as singular points. We illustrate the application of the method to detect transitions such

s the onset of exponential growth phase, substrate exhaustion and substrate switching for rifamycin B fermentation batches. Further, we analyzehe loading vectors of DPCA model to illustrate the mechanism by which the statistical model accounts for process dynamics. The approach can beeadily applied to other industrially important processes and may have implications in online monitoring of fermentation batches in a productionacility.

ical a

mOtc

eywords: Amycolatopsis mediterranei; Complex medium; Multivariate statist

. Introduction

Fermentation processes have innumerable applications inood, agrochemical and pharmaceutical industries. For safety

nd health reasons, fermentation products are subjected to strin-ent regulatory standard (de Noronha Pissarra, 2004). Further,he cost-competitive nature of such products demands an opti-

Abbreviations: Glc, glucose; DSF, defatted soybean flour; CSL, corn steepiquor; AMS, ammonium sulfate; DPCA, dynamic principal component analy-is; SP, singular point; PC, principal component; PLS, partial least square

Ibcivmvbuvaom(

nalysis

al operation of the process (Nielsen, 1998; Nissen et al., 2000;lsson et al., 1998; Vara et al., 2002). Therefore, fermenta-

ion process supervision is of particular importance to ensureonsistent operation and thereby achieve high quality products.ndustrial fermentation is typically carried out in batch or fed-atch mode to overcome the limitations of carbon and nitrogenatabolite repression (Bapat et al., 2006b). The key challengesn the monitoring of fermentation processes are batch-to-batchariation and complex dynamics. The batch-to-batch variationay result from the variation in the raw material quality or the

ariations in the seed culture. The variables that are desired toe monitored and controlled may include the biomass or prod-ct concentration(s). These variables are typically available onlyia offline measurements. Online measurements that are readily

vailable include pH, temperature, agitation speed, dissolvedxygen, and exhaust CO2, and O2. However, these measure-ents do not give direct information on the state of the process

Vaidyanathan et al., 1999).

Page 2: Detection of phase shifts in batch fermentation via statistical analysis ...

cnBdmYireic

ptcs2MsfonchAaptfvaAacasCpsbcola(

b(cBncmaau

2

2

2

ds2av(10tpswa

2

(mpa(ccspesBl

2

loBaDHaTdaso

2

Industrial fermentations typically use a multi-substrateomplex medium, which may result in sequential and/or simulta-eous utilization of the available substrates (Bapat et al., 2006a;apat and Wangikar, 2004). The metabolism in each phase isifferent and hence deserves its own consideration in terms ofodeling, and supervisory control strategy (Konstantinov andoshida, 1989; Muthuswamy and Srinivasan, 2003). In addition,

t is desirable to minimize offline sampling and the concomitantisk of contamination, yet obtain sufficient information on nutri-nt uptake and product formation in real time. As a result, onlinedentification of phases and phase shifts in complex media is ofritical importance.

The recently published methods for the identification ofhase shift via online measurements have required the quan-itative evaluation of key components such as the biomassoncentration. Consequently these methods require advancedensors such as infrared/mass spectrometers (Feng and Glassey,000; Grube et al., 2002), electronic nose (Bachinger andandenius, 2001; Pinheiro et al., 2002), or calorimetric sen-

ors (Voisard et al., 2002). In addition, these methods sufferrom the disadvantage that extensive time and experience areften required to implement them. Further, the low signal-to-oise ratio (Schugerl, 2001) and specific requirements of asepticonditions (Clementschitsch and Bayer, 2006) of such sensorsas limited their application in large-scale industrial processes.nother class of methods has focused on utilizing the routinely

vailable online data for qualitatively identifying fermentationhases. Qualitative trend analysis and expert system are thewo most common methods belonging to this class. A formalramework for deducing process trends from the online processariables was developed (Cheung and Stephanopoulos, 1990)nd applied to fermentation data (Stephanopoulos et al., 1997).lternatively, (Srinivasan et al., 2004) proposed a clustering

pproach using similarity factor derived from dynamic prin-ipal component analysis for process state identification. Thepproach relied on identifying the steady states to locate and sub-equently segment historical data into different process phases.onsequently, it is not readily applicable in batch fermentationrocesses, where steady states do not normally exist. An expertystem uses process knowledge gathered from experts such asiochemical engineers, biochemists, and microbiologists andoded in forms of “if–then” rules. These rules may be crispr based on fuzzy logic (Kamimura et al., 1996). However, theimitation of expert system technique is that it is system-specificnd difficult to customize for different fermentation processesVenkatasubramanian et al., 2003).

Here, we present a method for the detection of phase shifts inatch fermentation via dynamic principal component analysisDPCA) of the online measurements. We illustrate the appli-ation of the method for rifamycin B fermentation. Rifamycinis a polyketide antibiotic from ansamycin family with a pro-

ounced anti-mycobacterial activity and is extensively used inlinical treatment of tuberculosis, leprosy and AIDS-related

ycobacterial infections (Sepkowitz et al., 1995). Further, we

nalyze the DPCA model in terms of the loading vectors in anttempt to understand the mechanism by which the DPCA modelses the process history.

2

av

157

. Materials and methods

.1. Experimental methods

.1.1. Strain and fermentation mediumProf. Heinz Floss (Washington University, USA) kindly

onated the rifamycin B overproducing strain of Amycolatop-is mediterranei S699 that does not require barbital (Yu et al.,001). The preculture was propagated as described by (Kim etl., 1996). One hundred and fifty milliliters of pre culture (10%,/v) was used to inoculate the bioreactor. The media containedper liter of distilled water) glucose, 80 g; potassium phosphate,g; magnesium sulphate, 1 g; ferrous sulfate, 1 g; zinc sulfate,.010 g; cobalt chloride, 0.0030 g. In addition, the medium con-ained one or more of the following: ammonium sulfate, 4 g;otassium nitrate, 5.1 g; defatted soybean flour (DSF), 8 g; cornteep liquor solids (CSL), 8 g. After adjusting the pH to 7.0ith 1 N sodium hydroxide, the fermentor was sterilized by

utoclaving at 121 ◦C for 15 min.

.1.2. Bioreactor and cultivation conditionsBatch cultivations were conducted in 6.5-l BIOSTAT® B

BBI; B. Braun Biotech International, Schwarzenberger, Ger-any) bioreactor at working volume of 1.50 l at 28 ◦C. The

H and the dissolved oxygen (pO2) were recorded by usingutoclavable pH-electrode and polarographic pO2-electrodeINGOLD, USA), respectively. Agitator speed was used as aontrol variable to maintain dissolved oxygen at 40% via cas-ade control. Mass flow controller (BBI, Germany) was used toupply a constant airflow of 1.0 vvm (volume of air per minuteer volume of media). The concentration of O2 and CO2 in thexit gas stream from the bioreactor were measured by infraredpectroscopy and paramagnetic analysis, respectively (AnalyzerINOS1002M® with sample conditioning unit, Rosemount ana-

ytical, Germany).

.1.3. Analytical techniquesSamples were drawn from the fermentation medium at regu-

ar intervals to analyze the dry cell weight and the concentrationsf glucose, ammonium sulfate, free amino acids and rifamycin

as described previously (Bapat et al., 2006a). Glucose wasnalyzed via RI detector on HPLC (Hitachi, Merck KgaA,armstadt, Germany) using HP-Aminex-87-H column (Biorad,ercules, CA, USA) at 60 ◦C. The concentration of free amino

cids was estimated via the ninhydrin method (Moore, 1968).he concentrations of the ammonium and nitrate ions wereetermined by the respective ion specific electrodes (EA940 Ionnalyzer, Thermo Orion, USA). Rifamycin B was detected onpectrophotometer (V-540, Jasco, Tokyo, Japan) at a wavelengthf 425 nm.

.2. Data analysis methods

.2.1. Principal component analysisPrincipal component analysis (PCA) is a linear dimension-

lity reduction technique, which is optimal in capturing theariance in the data. It determines a set of orthogonal vectors,

Page 3: Detection of phase shifts in batch fermentation via statistical analysis ...

1

cipbtiertPTl

2

smeadst

dpo

X

wvoaowCi

c

S

alcb

wvpp

aco

T

w∑vmtis

2

tostiSptm

3

3

bmturatscnasD

raisstpp

58

alled loading vectors, which are used to transform the orig-nal variables into a new set of variables, often referred to asrincipal components (PCs). The PCs are weighted, linear com-inations of the original variables and due to the orthogonality ofhe loading vectors, are uncorrelated with each other. The load-ng vectors are usually ordered by the amount of variance theyxplain; thus although the total variance in the new variablesemains unchanged by the transformation, it is redistributed sohat most of the variance is explained in the first PC (denoted asC1); the next largest amount goes to the next (PC2), and so on.he development of the PCA model can be found in numerous

iterature including (Ralston et al., 2001; Russell et al., 2000).

.2.2. Dynamic principal component analysis (DPCA)The underlying assumption of classical PCA is that the mea-

urements at one time instant are statistically independent ofeasurements at past time instants (Russell et al., 2000). How-

ver, for dynamic systems such as batch fermentation, thisssumption is not valid and the value of a variable at an instantepends on past values. In other words, the state of such dynamicystems at time t needs to be represented by the observations overhe time interval [t − D, t], rather than at time t alone.

In order to capture the process dynamics as well as the time-ependent relationships between variables, (Ku et al., 1995)roposed the dynamic PCA (DPCA), where PCA is performedn a time-lagged version of input data X0, formed as follows:

0d =

⎛⎜⎜⎜⎜⎝

x(d + 1)T x(d)T · · · x(1)T

x(d + 2)T x(d + 1)T · · · x(2)T

......

. . ....

x(n)T x(n − 1)T · · · x(n − d)T

⎞⎟⎟⎟⎟⎠ (1)

here x(t) = [xt,1 xt,2 · · ·xt,m]T is the m-dimensional measuredariable vector at time t. n is the number of sampling times. Ifnline samples are collected every τ h then specifying d is equiv-lent to specifying the duration D = d × τ. Note that the numberf columns in the time lagged input X0

d is (n − d) × (md + d),hich usually represents data with very high dimensionality.onsequently, dimension reduction becomes necessary and this

s where PCA can be applied.After auto-scaling the time-lagged data X0

d, the correspondingovariance matrix S can be obtained as

= XTd Xd

n − d − 1(2)

nd eigenvalues and loading vectors (i.e. eigenvectors) calcu-ated by eigen-decomposition of S. The principal componentsan be obtained from the data and the loading matrix P as shownelow.

PC(t) = xd(t)T P

= [x(t)T x(t − 1)T · · ·x(t − d)T ]P(3)

here xd(t) = [x(t)T x(t − 1)T · · · x(t − d)T]T is the time-laggedector of the current measurement x(t). In this work, we uselots of a PC versus another (e.g.: PC1 versus PC2), called scorelots, for depicting the process trajectory. Hypothesis testing can

itma

lso be performed using a generalization of Student’s t-statistic,alled Hotelling’s T2 statistic, which is a scaled squared-normf an observation vector from its mean.

2(k) = xd(t)T PaΣ−2a PT

a xd(t) (4)

here Pa is the loading matrix containing a loading vectors.a is a diagonal matrix containing the a corresponding singular

alues. Note that unless T2 statistic is used for confidence esti-ation, an assumption of a multivariate normal distribution of

he process measurements is not required. Nevertheless, changesn the process dynamics can be observed from plot of the T2

tatistic versus time.

.2.3. Singular points of a signalIt has been observed that the information content in a

ime varying signal is not homogenously distributed through-ut (Srinivasan and Qian, 2005). Some landmarks, termed asingular points (SPs), in the trajectory contain more informa-ion about the dynamic behavior than others. Examples of SPsnclude points of discontinuities, trend changes, and extrema.Ps can be used for annotating signals as well as signal com-arison (Srinivasan and Qian, 2005). In this paper, SPs are usedo detect phase shifts during rifamycin B fermentation experi-

ents.

. Results

.1. Description of the case studies and measured variables

Here we present DPCA analysis for several case studies ofatch runs involving rifamycin B production via Amycolatopsisediterranei fermentation. The case studies differ in their ini-

ial medium composition and in turn in the profiles of substratetilization and production of biomass and rifamycin B. Thiseflects in the profiles of the measured variables, both offlinend online. The case studies include (i) defined medium con-aining glucose as the sole carbon substrate and AMS as theole nitrogen substrate; (ii) defined medium containing glu-ose as the sole carbon substrate and AMS and KNO3 asitrogen substrates; (iii) complex medium containing glucoses a carbon substrate and DSF-CSL as a carbon and nitrogenubstrate; (iv) complex medium containing glucose, AMS andSF-CSL.A total of five process variables were measured online and

ecorded every 5 min for each of the case studies. These vari-bles, which are typically observed in any fermentation process,nclude vent CO2 (%), vent O2 (%), pH, dissolved O2 (%) andtirring rate (rpm). Although this work treats the online mea-urements as random variables in the development of the DPCAechnique, note that the measurements indicate some physicalhenomenon related to the fermentation process. For exam-le, the values of vent CO2 (%) and vent O2 (%) indicate the

nstantaneous rates of CO2 production and O2 consumption inhe fermentor, respectively. Similarly, the trends of pH valuesay be qualitative indicators of phenomena such as organic

cid production or consumption. The dissolved O2 concentra-

Page 4: Detection of phase shifts in batch fermentation via statistical analysis ...

ttr1oapttaptacm

3

gfptfpan

Fmc

ion is maintained at 40% of saturation and hence its valueypically varies between 40 and 100% of saturation. The stir-ing rate is a control variable, which is varied between 120 and200 rpm in order to maintain dissolved O2 at 40%. A higherxygen demand by the fermentation culture usually leads tohigher stirring rate. Thus, the values of vent CO2, vent O2,

H, dissolved O2 and stirring rate, together contain informa-ion on the overall state of the fermentation process. Note thathe knowledge of the relationships between the measured vari-bles and the physical phenomena is not used in the methodresented here. Further, the offline variables such as concentra-

ions of substrates, cell mass and product in the liquid mediumre not used in the DPCA model, rather the DPCA results areompared qualitatively with the profiles of the offline measure-ents.

wgl8

ig. 1. Growth profile of A. mediterreniae S699 in the glucose–AMS case study. Initialicronutrients included potassium phosphate, 1 g/l; magnesium sulphate, 1 g/l; ferro

oncentration measurements of substrates and products; (B) online measurements.

159

.2. Case study with defined medium

In this case study, the initial fermentation medium containedlucose and AMS. Based on the profile of the biomass, theermentation appears to have progressed through three mainhases: an initial lag phase of 20 h followed by an exponen-ial growth phase until around 80 h and a slower growth phaserom 80 h until end of the batch (Fig. 1A). During the laghase, the cells utilize free amino acid as a source of carbonnd nitrogen (data not shown). Although free amino acids wereot added to the initial medium, they get transferred along

ith the seed culture. During the second phase, the biomassrowth and product formation were concomitant with the uti-ization of glucose and AMS. AMS was exhausted around0 h.

media contained glucose 80 g/l and ammonium sulfate (AMS, 4 g/l). Additionalus sulfate, 1 g/l; Zinc sulfate, 0.010 g/l; cobalt chloride, 0.0030 g/l. (A) Offline

Page 5: Detection of phase shifts in batch fermentation via statistical analysis ...

1

AemwswebDfinτ

wtasvdoo

Fc0cp

60

The profiles of the online measurements are shown in Fig. 1B.lthough certain trends may be observed in these profiles, there

xists no single measurement from which all phases of the fer-entation process can be inferred. To that end, a DPCA modelas developed by using the profiles of the five online variables

hown in Fig. 1B. The values of the model parameters D and τ

ere chosen based on the preliminary process knowledge (Bapatt al., 2006a). The microorganism used in this study has a dou-ling time of approximately 10 h. Thus, as an initial exercise, thePCA model was evaluated for the values of D and τ ranging

rom 2–20 h to 0.2–1.0 h, respectively. The results were not sat-sfactory for values of D less than 5 h or greater than 12 h (dataot shown). Moreover, the results did not vary substantially forvalues in the range of 0.2–0.5 h. Thus, for subsequent analysis,e decided to use the values of D and τ as 8 h and 0.5 h, respec-

ively, resulting in d = 16. With the five online measurementsnd d = 16, each time point of the batch process was repre-ented in an 85-dimensional-space spanned by the time-lagged

ariables. The variables were normalized and then subjected toimension reduction via PCA. The first three PCs explain 92%f the variance in the data. First we examined the score plotsf PC1 versus PC2 (Fig. 2A) and PC2 versus PC3 (Fig. 2B).

ig. 2. Dynamic PCA score plots for online measurements of the glucose–AMSase study. The DPCA model parameters D and τ were chosen as 8 h and.5 h, respectively. (�) Singular points (SP) where the direction of the trajectoryhanges at least 90◦ and are marked based on visual examination. (A) Scorelot: PC1 vs. PC2; (B) score plot: PC2 vs. PC3.

Fig. 3. Hotelling’s T2 statistic from DPCA study of the glucose–AMS case study.Dashed and solid lines represent T2 statistic with two and three PCs retained,rP

NagisSpTIre

aiopttsmrt

3

aAK

espectively. (x): SP in T2 statistic with two PCs; (�) SP in T2 statistic with threeCs.

ote that the direction of the trajectory of score plots changest a finite number of locations. These have been marked as sin-ular points (SPs) in the score plots (Fig. 2A and B). It is ofnterest to understand if the SPs correlate with the known phasehifts in the batch. For example, the score plots show up to fivePs. Of these, the SP at 17.5 h corresponds with the end of laghase as seen from the offline cell mass measurements (Fig. 1A).he SP at 80 h corresponds with exhaustion of AMS (Fig. 1A).

n addition, the SPs at times between 17.5 and 80 h may cor-espond to substrate acclimatization, morphological changes,tc.

Fig. 3 shows Hotelling’s T2 statistic evaluated using two PCsnd three PCs. The SPs for T2 statistics, which are local max-ma and minima, have been marked. The time points where SPsccur in these plots are approximately same as those in the scorelots (Fig. 2A and B). Note that the score plots and T2 statis-ic are more sensitive when three PCs are used. In this case,he number of SPs is greater than the number of known phasehifts in the fermentation. The additional SPs may correspond toorphological/physiological changes in the organism. For this

eason, three PCs have been retained for other case studies inhis work.

.3. Glucose–AMS–KNO3 case study

In this case study, the medium contained glucose as C source,nd AMS and KNO3 as substitutable N sources (Fig. 4). Of these,MS is known to be the preferred N substrate. The utilization ofNO3 is more complex because the organism needs to convert

Page 6: Detection of phase shifts in batch fermentation via statistical analysis ...

161

F g/l), AF T2 sta

Kscttc

Soascadf

3

d(taabfp

ig. 4. Glucose–AMS–KNO3 case study. Initial media contained glucose (80ig. 1. (A) Online measurements; (B) offline concentration measurements; (C)

NO3 into ammonia before utilization. Offline data (Fig. 4B)hows utilization of AMS from 12 to 60 h. Interestingly, AMSoncentration increases between 80 and 95 h. This may be dueo the rapid synthesis of ammonia from KNO3. Subsequently,here appears to be a balance between ammonia synthesis andonsumption rates.

DPCA analysis shows three SPs for this batch (Fig. 4C). TheP at 12 h signals the end of the initial lag phase and the startf the AMS uptake. It can also be observed that the second SPt 47 h divides the T2 profile into two segments: a first half oflowly changing pace and a second half with fluctuations. This

orrelates well with the fact that AMS is utilized in the first halfnd nitrate in the second half. The third marked SP (at 112 h)oes not correlate with any of the known phenomena observedrom the offline data.

i

sc

MS (1.3 g/l), KNO3 (4.76 g/l) and other micronutrients as shown in legend totistic from DPCA study (retaining three PCs).

.4. Glucose–DSF–CSL case study

In this batch, initial fermentation medium contains glucose,efatted soybean flour (DSF) and corn steep liquor solids (CSL)Fig. 5). DSF–CSL is primarily a mixture of proteins, pep-ides and amino acids. The organism is able to take up the freemino acids and small peptides directly while the larger peptidesnd proteins are first hydrolyzed in the extra-cellular mediumefore their uptake (Bapat et al., 2006b). The measurements forree amino acids (AA), glucose, and rifamycin B and a model-redicted profile for biomass concentration is shown are shown

n Fig. 5B (Bapat et al., 2006a).

The organism has a choice of either utilizing AA as the soleource of carbon and nitrogen or simultaneously utilizing glu-ose and AA. It is observed that the organism takes up AA as the

Page 7: Detection of phase shifts in batch fermentation via statistical analysis ...

162

F , defam ine com

somotcA

affdoeapt

ttf

3

AAbmri

ig. 5. Glucose–DSF–CSL case study. Initial media contained glucose (80 g/l)icronutrients as shown in legend to Fig. 1. (A) Online measurements; (B) offleasurements (retaining three PCs).

ole substrate for the first 30 h (Fig. 5B). This may result becausef the fact that the organism has been adapted to an AA-richedium during the seed culture. During the second phase, the

rganism starts utilizing glucose in addition to the AA. Duringhis phase, the glucose concentration decreases but the AA con-entration remains steady as the medium gets a steady supply ofA by hydrolysis of proteins available in the DSF and CSL.DPCA analysis for this batch shows three SPs between 20

nd 40 h (Fig. 5C). The first two SPs correspond to the transitionrom the AA phase to the glucose + AA phase which occurredrom 21 to 26 h. The third SP at 35 h may correspond with theepletion of free amino acids in the medium. From this pointnward, the rate of production of amino acid via hydrolysis is

qual to that of consumption so that the concentration of aminocids remains at low but unchanged value. In addition, other SPsrovide additional insights into the fermentation. For example,he last two SPs at approximately 98 and 105 h coincide with

stfg

tted Soybean flour (DSF, 8 g/l), corn steep liquor solids (CSL, 8 g/l) and otherncentration measurements; (C) T2 statistic from DPCA analysis of the online

he choking of the vent filter. Thus, SPs that occur at unexpectedime points may provide clues on the abnormal behaviour of theermentor.

.5. Glucose–AMS–DSF–CSL case study

In this batch, initial fermentation medium contains glucose,MS, and DSF–CSL (Fig. 6). Offline measurements for AMS,A, glucose, and rifamycin B and model-predicted profile ofiomass concentration are shown in Fig. 6B. The fermentationedium contains several carbon and nitrogen substrates. As a

esult, the major phases of substrate utilization in this batchnclude (1) utilization of AA as a sole carbon and nitrogen

ubstrate, (2) utilization of glucose and AA, and (3) utiliza-ion of glucose and AMS. The AA utilization phase is observedor the first 20 h (Fig. 6B). The figure also indicates that thelucose–AMS phase starts at around 35 h when both glucose
Page 8: Detection of phase shifts in batch fermentation via statistical analysis ...

163

F (80 g/l ents;

atbilysmSu

3l

aa

mcsvid(soprat

ig. 6. Glucose–AMS–DSF–CSL case study. Initial media contained glucoseegend to Fig. 1. (A) Online measurements; (B) offline concentration measurem

nd AMS concentrations start to decrease significantly. Notehat substantial amount of the amino acids have been consumedy 20 h. This seems to indicate that AA has become the limit-ng substrate and simultaneous utilization of AA and AMS isikely during the transition period (20–35 h). The DPCA anal-sis shows four SPs (Fig. 6C). The first SP at 20 h seems toignal the end of AA utilization phase, and the last SP at 34.5 harks the beginning of the glucose–AMS phase. The other twoPs (26, 30.5) are believed to signal the adaptation or possiblytilization of some other substrate combination.

.6. DPCA and the process dynamics: Interpretation of theoading vectors

The dynamics of a process over an interval [t1 t2] is usuallynalyzed by asking two questions: (i) have the measured vari-ble values changed in the interval? (ii) have the trends of the

viSa

l), AMS (4 g/l), DSF (8 g/l), CSL (8 g/l) and other micronutrients as shown in(C) T2 statistic from DPCA study (retaining three PCs).

easured variables changed in the interval? The first questionan be answered by trivial methods. However, to answer theecond question, one needs to look at the profiles of variablealues in the neighborhood of t1 and t2. The techniques of qual-tative trend analysis use the information on the first and seconderivative to distinguish between different categories of trendsCheung and Stephanopoulos, 1990). The DPCA technique pre-ented here seems to be able to detect changes in the trendsf the variables when the fermentation process enters a newhase. While the DPCA technique is based on a statistical dataeduction method, it would be of interest to understand the mech-nism by which the reduced dimensional data vector captureshe important characteristics associated with the trends of the

ariables. Note that DPCA uses time-lagged input data over thenterval “D” and then compresses it by using the loading vectors.pecifically, a loading vector may combine the original vari-bles to obtain a simple arithmetic mean or a simple difference
Page 9: Detection of phase shifts in batch fermentation via statistical analysis ...

1

oboiaPdcmf

p

P

wlm

t1dlfino

P

c

w

p

tTso

w

b

b

s

P

wtso

P

descstmam

ftvtbtbbtvtfrfitd

64

f the time lagged data. Such combinations of the data woulde equivalent to the average value or the first derivative of thenline measurements in the interval D. Alternatively, the load-ng vector may be based on some linear combination of thebove two, which in turn would affect the sensitivity of a givenC to the perturbations in the measured variables. Further, theifferent PCs may have different levels of sensitivity to smallhanges in the online measurements. Below, we show mathe-atically, a qualitative way of interpreting the loading vectors

or this purpose.The projection of a time lagged measurement vector to the

rincipal component space is

C = xdP (5)

here PC = [PC1 PC2 · · · PCa] is the score vector; xd is the timeagged measurement vector; P = [p1 p2 · · · pa] is the loading

atrix.For the parameter settings in previous case studies, both the

ime lagged measurement xd and the loading vectors pi(i =, a) contain m(d + 1) = 85 elements. This implies that there are+ 1 = 17 elements for each category of measurement. Hence,

et us assume that the elements in xd and pi are arranged as therst d + 1 elements correspond to variable 1 (say vent CO2); theext d + 1 elements correspond to variable 2 (vent O2), and son. Thus,

Ci = xd.pi =m(d+1)∑

j=1

xjpi,j =m∑

v=1

⎛⎝d+1∑

j=1

xvj · pv

i,j

⎞⎠ (6)

For each v (corresponding to the online variables), let usonsider the term

d+1

j=1

xjpi,j = pi

d+1∑j=1

xj +d+1∑j=1

xj(pi,j − pi) (7)

here

i = 1

d + 1

d+1∑j=1

pi,j

In the above equation, the first term pi

∑d+1j=1xj is equivalent

o the integral (or average) of the variable value in the interval D.he second term can be rearranged in the form of first derivativeince the consecutive xj are essentially time-lagged variables ofne kind (Eq. (8)).

d+1

j=1

xj(pi,j − pi) =d∑

j=1

boj (xj − xj+1) =

d∑j=1

bj

∂x

∂t(t = tj) (8)

here

l∑

oj =

l=1

(pi,l − pi) and bod+1

=d+1∑l=1

(pi,l − pi) =d+1∑l=1

pi,l − (d + 1)pi = 0

m

4

i

j = boj

�t

Hence, the ith principal component PCi can be rearranged ashown in Eq. (9)

Ci =m∑

v=1

⎛⎝d+1∑

j=1

xjpi,j

⎞⎠

v

=m∑

v=1

⎛⎝pi

d+1∑j=1

xj +d∑

j=1

bj

∂x

∂t(t = tj)

⎞⎠

v

=m∑

v=1

⎛⎝d+1∑

j=1

(pixj + bj

∂x

∂tj

)⎞⎠v

(9)

Eq. (9) states that the ith principal component PCi is aeighted sum of the average of the variable and its first deriva-

ive at various time points in the interval D. Continuing in aimilar fashion, an expression involving the second derivativef the variables can be derived as follows:

Ci =m∑

v=1

⎛⎝d+1∑

j=1

(pixj + b

∂x

∂tj+ cj

∂2x

∂t2j

)⎞⎠v

(10)

The weights pi, bj, b and cj in Eqs. (9) and (10) are depen-ent on the original loading vector Pi for each score PCi. Thexpression of the score in terms of the variables, its first andecond derivatives highlights the extra capability of DPCA inapturing dynamic information embedded in the online mea-urements, which ordinary PCA is unable to achieve. Moreover,he coefficients in the dynamic expression are calculated auto-

atically by DPCA. Hence, the weights of the online variablesnd their derivatives are automatically obtained by the PCAethodology.The glucose–AMS case study (Section 3.2) has been chosen

or illustration. We analyzed the first four loading vectors whichogether explain 96% of the variance in the original time-laggedersion of the data. Specifically, the weights associated withhe average variable values, pi and the first derivative valuesj were evaluated. Fig. 7 shows the weights pi associated withhe first loading vector. The weights show a positive correlationetween vent CO2 and stirring speed and a negative correlationetween stirring speed and dissolved O2. Further, we analyzedhe relative weights of the first derivative and the average variablealue (bj/pi) for the first four loading vectors (Fig. 8). Note thathe relative weight of the first derivative increases significantlyrom the first loading vector to the fourth loading vector. As thisatio represents the weight of the dynamics (as captured in therst derivative ∂x/∂t), the increase in the weighting ratio implies

hat retaining more loading vectors for further analysis mightescribe process dynamics better but possibly at the expense ofagnifying process noise.

. Discussion

The identification of phase shifts in batch fermentations ismportant in monitoring the process. A direct approach would

Page 10: Detection of phase shifts in batch fermentation via statistical analysis ...

Fig. 7. Weighting coefficient pi for the first loading vector in DPCA study ofthe glucose–AMS case study. pi was evaluated using definition in Eq. (7).

itithgaDpiamirtt

cne

Fig. 8. Weighting ratio bj/pi (Eqs. (7) and (8)) in DPCA study of the glucose–AMS(D) loading vector 4.

165

nvolve the examination of the time-profiles of the concentra-ions of the substrates, products and cell mass. Although rich innformation content, these offline measurements are expensive,ime consuming and hence infrequently available. On the otherand, the online measurements such as pH and dissolved oxy-en are readily available but do not provide direct informationbout the phase shifts. Here, we present a technique based onPCA coupled with T2 statistic to obtain information on thehase shifts solely from the online measurements. The principaldea is that when the fermentation progresses from one phase tonother, the information is captured and reflected in the onlineeasurements. The challenge is that this information is not read-

ly apparent from any single variable at any particular time butather spread across all online variables over an interval. Theechnique presented here is able to extract this information inhe form of SPs (Srinivasan and Qian, 2005).

The key parameters of the DPCA model include the pro-ess history to be considered (D), the sampling time (τ) and theumber of principal components a retained. The first two param-ters are process dependent and hence some process knowledge

case study. (A) Loading vector 1; (B) loading vector 2; (C) loading vector 3;

Page 11: Detection of phase shifts in batch fermentation via statistical analysis ...

1

stmasDct

nJoaaatcofiamoam

tpnr

R

B

B

B

B

C

C

d

F

G

J

K

K

K

K

M

M

N

N

O

P

R

R

S

S

S

S

S

V

V

V

V

66

uch as the process time constant is required while choosingheir values. After experimentation with different values of the

odel parameters, we chose values of D, τ and a as 8 h, 0.5 hnd 3, respectively. Retaining a larger number of PCs gives extraensitivity but may introduce noise in the analysis. Although thePCA results may vary with the parameter values, the overall

onclusions about the phase shifts remain unchanged as long ashe parameter values are within a reasonable limit.

In the past, phase shifts have been identified by using the tech-iques of trend analysis (Cheung and Stephanopoulos, 1990;anusz and Venkatasubramanian, 1991). This involves the usef first and second derivatives of the measured variables. In ournalysis, DPCA scores are obtained as a weighted sum of vari-ble values over an interval. The scores can be rearranged aslinear combination of the first derivative and the average of

he variable values in that interval. We find that the first scoreonsists of higher weights for the average value while the sec-nd and subsequent scores consist of higher weights for therst derivative. Note that in DPCA, the relative weights of theverage variable values and the first derivative value are auto-atically decided. Thus, the DPCA technique, although based

n time-lagged data, offers a novel method of analyzing trendsnd phase shifts compared to the conventional trend analysisethods.The technique presented here can be applied to other fermen-

ation processes where minimal process knowledge regardinghase shifts is available. For industrial fermentation, the tech-ique may have implications in online process supervision ofepetitive batches in a production facility.

eferences

achinger, T., Mandenius, C.F., 2001. Physiologically motivated monitoring offermentation processes by means of an electronic nose. Eng. Life Sci. 1,33–42.

apat, P.M., Bhartiya, S., Venkatesh, K.V., Wangikar, P.P., 2006a. Structuredkinetic model to represent the utilization of multiple substrates in com-plex media during rifamycin B fermentation. Biotechnol. Bioeng. 93, 779–790.

apat, P.M., Sohoni, S.V., Moses, T.A., Wangikar, P.P., 2006b. A cyberneticmodel to predict the effect of freely available nitrogen substrate on rifamycinB production in complex media. Appl. Microbiol. Biotechnol. 2, 662–670.

apat, P.M., Wangikar, P.P., 2004. Optimization of rifamycin B fermentationin shake flasks via a machine-learning-based approach. Biotechnol. Bioeng.86, 201–208.

heung, J.T.Y., Stephanopoulos, G., 1990. Representation of process trends. PartI. A formal representation framework. Computers Chem. Eng. 14, 495–510.

lementschitsch, F., Bayer, K., 2006. Improvement of bioprocess monitoring:development of novel concepts. Microb. Cell Factories 5, 19.

e Noronha Pissarra, P., 2004. Changes in the business of culture. Nat. Biotech-nol. 22, 1355–1356.

eng, M., Glassey, J., 2000. Physiological state-specific models in estimation of

recombinant Escherichia coli fermentation performance. Biotechnol. Bio-eng. 69, 495–503.

rube, M., Gapes, J.R., Schuster, K.C., 2002. Application of quantitative IRspectral analysis of bacterial cells to acetone–butanol–ethanol fermentationmonitoring. Anal. Chim. Acta 471, 127–133.

Y

anusz, M.E., Venkatasubramanian, V., 1991. Automatic generation of qualita-tive descriptions of process trends for fault detection and diagnosis. Eng.Appl. Artif. Intell. 4, 329–339.

amimura, R., Konstantinov, K., Stephanopoulos, G., 1996. Knowledge-basedsystems, artificial neural networks and pattern recognition: applications tobiotechnological processes. Curr. Opin. Biotechnol. 7, 231–234.

im, C.G., Kirschning, A., Bergon, P., Zhou, P., Su, E., Sauerbrei, B., Ning,S., Ahn, Y., Breuer, M., Leistner, E., Floss, H.G., 1996. Biosynthesis of 3-amino-5-hydroxybenzoic acid, the precursor of mC7N units in ansamycinantibiotics. J. Am. Chem. Soc. 118, 7486–7491.

onstantinov, K., Yoshida, T., 1989. Physiological state control of fermentationprocesses. Biotechnol. Bioeng. 33, 1145–1156.

u, W., Storer, R.H., Georgakis, C., 1995. Disturbance detection and isolationby dynamic principal component analysis. Chemom. Intell. Lab. Syst. 30,179–196.

oore, S., 1968. Amino acid analysis: aqueous dimethyl sulfoxide as solventfor the ninhydrin reaction. J Biol. Chem. 243, 6281–6283.

uthuswamy, K., Srinivasan, R., 2003. Phase-based supervisory control forfermentation process development. J. Process Control. 13, 367–382.

ielsen, J., 1998. The role of metabolic engineering in the production of sec-ondary metabolites. Curr. Opin. Microbiol. 1, 330–336.

issen, T.L., Kielland-Brandt, M.C., Nielsen, J., Villadsen, J., 2000. Opti-mization of ethanol production in Saccharomyces cerevisiae by metabolicengineering of the ammonium assimilation. Metab. Eng. 2, 69–77.

lsson, L., Schulze, U., Nielsen, J., 1998. On-line bioprocess monitoring—anacademic discipline or an industrial tool? TrAC Trends Anal. Chem. 17,88–95.

inheiro, C., Rodrigues, C.M., Schafer, T., Crespo, J.G., 2002. Monitoring thearoma production during wine-must fermentation with an electronic nose.Biotechnol. Bioeng. 77, 632–640.

alston, P., DePuy, G., Graham, J.H., 2001. Computer-based monitoring andfault diagnosis: a chemical process case study. ISA Trans. 40, 85–98.

ussell, E.L., Chiang, L.H., Braatz, R.D., 2000. Data-driven Techniques forFault Detection and Diagnosis in Chemical Process. Springer–Verlag, Lon-don.

chugerl, K., 2001. Progress in monitoring, modeling and control of biopro-cesses during the last 20 years. J. Biotechnol. 85, 149–173.

epkowitz, K.A., Raffalli, J., Riley, L., Kiehn, T.E., Armstrong, D., 1995. Tuber-culosis in the AIDS era. Clin. Microbiol. Rev. 8, 180–199.

rinivasan, R., Qian, M.S., 2005. Off-line temporal signal comparison using sin-gular points augmented time warping. Ind. Eng. Chem. Res. 44, 4697–4716.

rinivasan, R., Wang, C., Ho, W.K., Lim, K.W., 2004. Dynamic principal com-ponent analysis based methodology for clustering process states in agilechemical plants. Ind. Eng. Chem. Res. 43, 2123–2139.

tephanopoulos, G., Locher, G., Duff, M.J., Kamimura, R., Stephanopoulos,G., 1997. Fermentation database mining by pattern recognition. Biotechnol.Bioeng. 53, 443–452.

aidyanathan, S., Macaloney, G., Vaughan, J., McNeil, B., Harvey, L.M., 1999.Monitoring of submerged bioprocesses. Crit. Rev. Biotechnol. 19, 277–316.

ara, A.G., Hochkoepple, A., Nielsen, J., Villadsen, J., 2002. Production ofteicoplanin by Actinoplanes teichomyceticus in continuous fermentation.Biotechnol. Bioeng. 77, 589–598.

enkatasubramanian, V., Rengaswamy, R., Kavuri, S.N., Yin, K., 2003. A reviewof process fault detection and diagnosis. Part III. Process history basedmethods. Computers Chem. Eng. 27, 327–346.

oisard, D., Pugeaud, P., Kumar, A.R., Jenny, K., Jayaraman, K., Marison, I.W.,Stockar, U.V., 2002. Development of a large-scale biocalorimeter to monitorand control bioprocesses. Biotechnol. Bioeng. 80, 125–138.

u, T.-W., Muller, R., Muller, M., Zhang, X., Draeger, G., Kim, C.-G., Leistner,E., Floss, H.G., 2001. Mutational analysis and reconstituted expression of thebiosynthetic genes involved in the formation of 3-amino-5-hydroxybenzoicacid, the starter unit of rifamycin biosynthesis in Amycolatopsis mediterraneiS699. J. Biol. Chem. 276, 12546–12555.