A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

17
Metabolic Engineering 8 (2006) 491–507 A three-level problem-centric strategy for selecting NMR precursor labeling and analytes Soumitra Ghosh a , Ignacio E. Grossmann a , Mohammad M. Ataai b , Michael M. Domach a, a Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA b Department of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, PA 15213, USA Received 21 January 2006; received in revised form 24 April 2006; accepted 1 May 2006 Available online 10 May 2006 Abstract We have developed a sequential set of computational screens that may prove useful for evaluating analyte sets for their ability to accurately report on metabolic fluxes. The methodology is problem-centric in that the screens are used in the context of a particular metabolic engineering problem. That is, flux bounds and alternative flux routings are first identified for a particular problem, and then the information is used to inform the design of nuclear magnetic resonance (NMR) experiments. After obtaining the flux bounds via MILP, analytes are first screened for whether the predicted NMR spectra associated with various analytes can differentiate between different extreme point (or linear combinations of extreme point) flux solutions. The second screen entails determining whether the analytes provide unique flux values or multiple flux solutions. Finally, the economics associated with using different analytes is considered in order to further refine the analyte selection process in terms of an overall utility index, where the index summarizes the cost–benefit attributes by quantifying benefit (contrast power) per cost (e.g., NMR instrument time required). We also demonstrate the use of an alternative strategy, the Analytical Hierarchy Process, for ranking analytes based on the individual experimentalist’s-generated weights assigned for the relative value of flux scenario contrast, unique inversion of NMR data to fluxes, etc. r 2006 Elsevier Inc. All rights reserved. Keyword: Metabolic flux analysis; Nuclear magnetic resonance; Non-linear programming; Global optimization; E. coli 1. Introduction 13 C nuclear magnetic resonance (NMR) is amongst the tools employed in metabolic engineering for metabolic flux identification. Such a method is employed because extra- cellular measurements such as glucose uptake are insuffi- cient to fully determine all the metabolic reaction rates. To obtain flux information, the NMR-determined labeling pattern of specific metabolites derived from a labeled precursor (e.g., 1- 13 C glucose) in conjunction with the extracellular measurements are ‘‘inverted’’ in order to uncover the intracellular reaction rates. A related, but more focused application is using spectroscopic data for scenario discrimination. Here, mathematical analysis has led to a particular mutation strategy that will achieve a desirable flux distribution. For example, linear programming has indicated that attenuat- ing the activity of pyruvate kinase will diminish acetate formation from glucose (Phalakornkule et al., 2000; Lee et al., 2000). However, two different flux distributions can reconcile the mutational block and acid-reduction out- come. The NMR spectra associated with different analytes that can help to discriminate between fluxes in subnetworks can be predicted. The extent to which the observed and predicted spectra concur will then either validate or refute the hypothesis that a particular flux distribution is actually used by the cells. Several problems can arise that confound the use of NMR data for flux identification. When inverting NMR data to fluxes using isotopomer mapping matrices (Schmidt et al, 1997, 1999), a non-linear optimization problem can result that is highly non-convex due to the bi-linear and tri- linear terms present in the isotopomer balance equations. Consequently, local versus global solutions can be found, especially when conventional gradient-based optimizers are ARTICLE IN PRESS www.elsevier.com/locate/ymben 1096-7176/$ - see front matter r 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ymben.2006.05.001 Corresponding author. Fax: 1 412 268 7139 E-mail address: [email protected] (M.M. Domach).

Transcript of A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

Page 1: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

1096-7176/$ - se

doi:10.1016/j.ym

�CorrespondE-mail addr

Metabolic Engineering 8 (2006) 491–507

www.elsevier.com/locate/ymben

A three-level problem-centric strategy for selecting NMR precursorlabeling and analytes

Soumitra Ghosha, Ignacio E. Grossmanna, Mohammad M. Ataaib, Michael M. Domacha,�

aDepartment of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USAbDepartment of Chemical and Petroleum Engineering, University of Pittsburgh, Pittsburgh, PA 15213, USA

Received 21 January 2006; received in revised form 24 April 2006; accepted 1 May 2006

Available online 10 May 2006

Abstract

We have developed a sequential set of computational screens that may prove useful for evaluating analyte sets for their ability to

accurately report on metabolic fluxes. The methodology is problem-centric in that the screens are used in the context of a particular

metabolic engineering problem. That is, flux bounds and alternative flux routings are first identified for a particular problem, and then

the information is used to inform the design of nuclear magnetic resonance (NMR) experiments. After obtaining the flux bounds via

MILP, analytes are first screened for whether the predicted NMR spectra associated with various analytes can differentiate between

different extreme point (or linear combinations of extreme point) flux solutions. The second screen entails determining whether the

analytes provide unique flux values or multiple flux solutions. Finally, the economics associated with using different analytes is

considered in order to further refine the analyte selection process in terms of an overall utility index, where the index summarizes the

cost–benefit attributes by quantifying benefit (contrast power) per cost (e.g., NMR instrument time required). We also demonstrate the

use of an alternative strategy, the Analytical Hierarchy Process, for ranking analytes based on the individual experimentalist’s-generated

weights assigned for the relative value of flux scenario contrast, unique inversion of NMR data to fluxes, etc.

r 2006 Elsevier Inc. All rights reserved.

Keyword: Metabolic flux analysis; Nuclear magnetic resonance; Non-linear programming; Global optimization; E. coli

1. Introduction

13C nuclear magnetic resonance (NMR) is amongst thetools employed in metabolic engineering for metabolic fluxidentification. Such a method is employed because extra-cellular measurements such as glucose uptake are insuffi-cient to fully determine all the metabolic reaction rates. Toobtain flux information, the NMR-determined labelingpattern of specific metabolites derived from a labeledprecursor (e.g., 1-13C glucose) in conjunction with theextracellular measurements are ‘‘inverted’’ in order touncover the intracellular reaction rates.

A related, but more focused application is usingspectroscopic data for scenario discrimination. Here,mathematical analysis has led to a particular mutationstrategy that will achieve a desirable flux distribution. For

e front matter r 2006 Elsevier Inc. All rights reserved.

ben.2006.05.001

ing author. Fax: 1 412 268 7139

ess: [email protected] (M.M. Domach).

example, linear programming has indicated that attenuat-ing the activity of pyruvate kinase will diminish acetateformation from glucose (Phalakornkule et al., 2000; Leeet al., 2000). However, two different flux distributions canreconcile the mutational block and acid-reduction out-come. The NMR spectra associated with different analytesthat can help to discriminate between fluxes in subnetworkscan be predicted. The extent to which the observed andpredicted spectra concur will then either validate or refutethe hypothesis that a particular flux distribution is actuallyused by the cells.Several problems can arise that confound the use of

NMR data for flux identification. When inverting NMRdata to fluxes using isotopomer mapping matrices (Schmidtet al, 1997, 1999), a non-linear optimization problem canresult that is highly non-convex due to the bi-linear and tri-linear terms present in the isotopomer balance equations.Consequently, local versus global solutions can be found,especially when conventional gradient-based optimizers are

Page 2: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESSS. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507492

used. In an attempt to overcome the problem of localsolutions, we have recently described the use of a two-stepstrategy (Ghosh et al., 2005). First, a Mixed Integer LinearProgramming (MILP) formulation is used to provide lowerand upper bounds on individual fluxes for the NMR data-to-fluxes problem. Apart from their utility for analyzingNMR data, MILP solutions are useful to obtain becausethey provide the flux bounds for the trafficking alternativesthat equally fulfill an objective or a set of extracellularmeasurements. Thus, such solutions not only indicatepotentially different targets for metabolic engineering, thetightened flux bounds enable the inverse problem forspectra-to-fluxes to be solved by using a deterministic non-linear global optimization algorithm of the branch-and-bound type (BARON, Sahinidis and Tawarmalani, 2002,Ghosh et al., 2005).

Another related problem is that more than one fluxdistribution might lead to an analyte exhibiting a similarNMR spectrum; hence, the data-to-fluxes problem could,in turn, generate alternative solutions that are not strictlylocal solutions. Intuitively, one would expect that as moredata is accumulated and used (e.g., spectra from moreanalytes) the less likely that alternate solutions would arise.Overall, when a hydrolyzed protein or cell extract sample isused, a finite set of metabolites or amino acids are availablethat can provide unique flux solutions, and it is desirable toidentify those analytes in advance.

In an attempt to address the above problems, we havedeveloped a sequential set of computational screens thatmay prove useful for evaluating analyte sets for their abilityto accurately report on metabolic fluxes. The methodologyis problem-centric in that the screens are used in thecontext of a particular metabolic engineering problem.That is, flux bounds and alternative flux routings are firstidentified using METABOLOGICA (Zhu et al., 2003) for aparticular problem, and then the information is used toinform the design of NMR experiments.

The general outline of the screens is as follows. Afterobtaining the flux bounds via MILP, analytes are firstscreened for whether the predicted NMR spectra asso-ciated with various analytes can differentiate betweendifferent extreme point (or linear combinations of extremepoint) flux solutions. The second screen entails determiningwhether the analytes provide unique flux values or multipleflux solutions. The screening method employed is mathe-matically different than MILP, but the outcome is similar.Our MILP formulation enumerates all feasible extremepoint flux solutions that meet constraints and an objective(Phalakornkule et al., 2001). The computation performedfor analyte selection similarly enumerates all flux distribu-tions that can yield within a tolerance a particular NMRspectrum from a metabolite (or a set of metabolites).Finally, the economics associated with using differentanalytes is considered in order to further refine the analyteselection process.

In this paper, a ‘‘test’’ metabolic network will be firstdescribed that will enable the illustration of how flux

scenarios are enumerated, and how NMR calculations areperformed. The aforementioned screens will then beillustrated in the context of the ‘‘test’’ network where theattributes of different analyte sets are calculated. Thecomputational screening strategies will also be applied tothe more practical case of a larger Escherichia coli networkwhere pyruvate kinase activity has been deleted to inhibitacetate by-product formation. Economics will be addressedin two ways. For the ‘‘test’’ problem, the intensity ofanalyte sample preparation will be qualitatively weighedagainst NMR attributes. For the larger problem, abun-dance data will be used to in conjunction with screeningresults to infer how much NMR resource time and costwould be needed to acquire a usable spectrum. Finally, toillustrate an alternate means of using the quantitativeresults from the screens, the Analytic Hierarchy Process(AHP; Saaty, 2000) is applied to the larger problem inorder to rank order potential NMR analytes.

2. Abstracted test network: flux bounds, NMR calculations,

and screens

The ‘‘test’’ network is shown in Fig. 1a, where r

represents a net flux (e.g., mmol/g cell h). The model isbased on the abstracted representation of central metabo-lism described by Forbes et al. (2001). We assume, as posedby the originators, that the extracellular fluxes r1, r2, and r9are known (i.e., directly measured). The values arefr1; r2; r9g ¼ f1; 0:3; 0:4g.

2.1. Determination of flux bounds

To find the flux bounds, the network in Fig. 1a can beformulated as a linear programming problem given byEq. (1) by considering the flux balances around each nodein the network, and the three additional constraintsimposed by measurements of extracellular fluxes. As notedearlier, such bounds greatly enable the operation of globaloptimization methods such as BARON when used to invertNMR data to fluxes. A dummy objective of Z ¼ 0 isprovided so that any feasible solution to the linear programis accepted:

MinZ ¼ 0 (1a)

Subject to

r1 þ r2 ¼ r3 þ r4, (1b)

r4 þ r7 ¼ r5, (1c)

r5 ¼ r7 þ r6, (1d)

r6 þ r8 ¼ r5, (1e)

r9 ¼ r8 þ r10, (1f)

r1 ¼ 1, (1g)

r2 ¼ 0:3, (1h)

Page 3: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

Fig. 1. Hypothetical metabolic network used to demonstrate NMR

calculations and analyte screens. (a) The conversion of precursors G, S,

and Q to excreted products P and R are shown as well as a cyclic series of

reactions. (b) Shown are the metabolic fluxes obtained from the

optimization using synthetic NMR data. The black circles indicate 13C

while the white ones are 12C. The grey metabolites, A and E are the NMR

analytes, which are among the analytes that can be used to successfully

solve the inverse NMR - fluxes problem (Eq. (2)).

Table 1

Alternate optima obtained from the recursive MILP formulation

Fluxes Solution I Solution II Solution III Solution IV

r1a 1 1 1 1

r2a 0.3 0.3 0.3 0.3

r3 1.3 0 1.3 0

r4 0 1.3 0 1.3

r5 0.4 1.3 0 1.7

r6 0 1.3 0 1.3

r7 0.4 0 0 0.4

r8 0.4 0 0 0.4

r9a 0.4 0.4 0.4 0.4

R10 0 0.4 0.4 0

aDenotes fixed fluxes.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 493

r9 ¼ 0:4, (1i)

rjX0 for all j ¼ 1; 2; . . . ; 10. (1j)

One characteristic of metabolic networks is that multiplefeasible flux scenarios can arise. Multiplicity arises whenthe number of experimental observations is less than thenumber of unknowns. Another source is redundancy,which can manifest when, for example, multiple reactionscan contribute to providing a set amount of ATP orNADPH. In order to resolve the existence of multiple fluxdistributions, a recursive MILP algorithm (Phalakornkuleet al., 2001) or a Depth-First-Search (DFS) method (Zhuet al., 2003) can be employed, and either method yields thesame four flux distributions shown in Table 1. These

distributions correspond to the extreme points of thefeasible region for the metabolic flux solutions.

2.2. NMR data-to-fluxes calculation method

The actual fluxes within the ‘‘test’’ network can be alinear combination of the four solutions shown in Table 1.To determine the actual intracellular flux values, a13C-labeled precursor is added to label the metaboliteswithin the network. The relative abundance of the 2n

isotopomers for a n-carbon analyte reflects which reactionpaths and the relative rates that yielded the isotopomersfrom the labeled precursor.To briefly describe the NMR experimentation and

calculations in this context, we assume that 1-13C G isfed to the system, where G denotes a carbon source such asglucose. A sample of labeled intracellular metabolites isthen analyzed after attaining isotopic steady state. Byintegrating the group of NMR peaks, the IsotopomerDistribution Vectors (IDV) of various metabolites such asA, E, C, etc. can be obtained (Schmidt et al., 1997, 1999).Obtaining the fluxes from the IDV data is achieved by

solving the Inverse Problem, which is stated in Eq. (2). Theobjective function chosen (Z; Eq. (2a)) is the Euclideannorm for the error (i.e., a least-squares criterion). Theconstraints are comprised primarily of the IsotopomerMapping Matrix (IMM) equations constructed for eachintracellular metabolite. There are 10 net fluxes,½r1; r2; . . . ; r10� in the system, and as noted earlier, thereaction O2E is reversible. Eqs. (2b)–(2f) are theisotopomer balances. Also, each of the net fluxes is aconvex combination of the four MILP/DFS solutions asrequired by Eqs. (2g)–(2i).To account for the reversible reaction (O2E), an

exchange coefficient variable, �8 ¼ ðv11=v8Þ, could beintroduced. As an alternative, the reaction rates are splitinto 10 forward fluxes and one backward flux,½v1; v2; . . . ; v11�, and �8 is later extracted from the calcula-tions. By adopting this formulation, further bi-linearity canbe prevented from being introduced into the problem. Inorder to account for the fact that 0p�8p1, the inequalityconstraint, Eq. (2j), is provided. This constraint is based on

Page 4: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

Aug

ular

Dev

iatio

n be

twee

nE

xtre

me

Poi

nt S

cena

rios

(rad

ians

)

A E C O

NMR Analyte Candidates

Fig. 2. Performance of NMR signal molecules with respect to contrast

power over the entire flux space. Analyte C performs the best.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507494

noting that �8 ¼ ðv11=v8Þ, and v8Xv11 follows from r8X0(Eq. (1a)) and r8 ¼ v82v11. Finally, Eqs. (2k) and (2l)provide the appropriate bounds to the flux variables to aidthe branch-and-reduce algorithm for global optimization:

min Z ¼X2i¼1

ðIDVA � IDVexp tA Þ

2

þX4i¼1

ðIDVE � IDVexp tE Þ

2, ð2aÞ

Subject to

ðv4 þ v3ÞIDVF ¼ v2IMMS!FIDVS þ v1IMMG!FIDVG,

ð2bÞ

v5IDV A ¼ v4IMMF!AIDVF þ v7IMMC!AIDVC, (2c)

ðv7 þ v6ÞIDVC ¼ v5IMMA!CIDVA

� IMMO!CIDVO, ð2dÞ

ðv5 þ v11ÞIDVO ¼ v6IMMC!OIDVC

þ v8IMME!OIDVE, ð2eÞ

ðv8 þ v10ÞIDVE ¼ v11IMMO!EIDVO

þ v9IMMQ!EIDVQ, ð2fÞ

a1 þ a2 þ a3 þ a4 ¼ 1, (2g)

vk ¼ a1r1k þ a2r2k þ a3r3k þ a4r4k

8k 2 irreversible reactions; ð2hÞ

v8 � v11 ¼ a1r18 þ a2r28 þ a3r38 þ a4r48 , (2i)

v8Xv11, (2j)

0pIDV Mip1 8M 2 metabolites,

8i ¼ 1; 2; :::; lengthIDVM, ð2kÞ

minfr1k; r2k; r3k; r4kgpvkpmaxfr1k; r2k; r3k; r4kg

8k 2 irreversible reactions; ð2lÞ

2.3. Analyte screen 1: how do analyte spectra contrast over

entire feasible flux space?

First, available precursor labeling and analyte optionsneed to be considered. For the ‘‘test’’ problem, the onlypossible labeling scenario for G in Fig. 1(a, b) is a 1-13Clabel, because a label in any other position would beimmediately lost in the two products of reaction r4. Aninspection of the network suggests that the analytecandidates are A, C, O, E, and P. Note that E and P

would have equivalent NMR spectra because they are thesame molecule, but only differ in intracellular versusextracellular localization.

One indication of how well the analytes would performover the entire flux space would be to establish how wellthey can distinguish between the four extreme pointsolutions obtained from the MILP formulation (Table 1).We will henceforth refer to the ability to distinguish betweenflux distributions as ‘‘contrast power.’’ To determinecontrast power, the spectra corresponding to the fouralternatives are first determined. One spectrum is chosenas a reference. Then, the angular differences (in radians)between the spectra determined from the other three fluxsolutions with the spectrum reference are computed. In thecalculation, the exchange coefficients are set to 0.5. Theaverage angular differences between the different extremepoint NMR spectra are compared in Fig. 2. C provides thebest contrast power over the entire flux space, while E offersthe worst. Note that an angle difference is not presented forA. A possesses only one atom; hence, the angle differencebetween the vectorized spectra of A will always be zero.Such an analyte omission will generally not arise in lessabstracted problems because one carbon analytes are notprevalent (CO2 is the major one carbon species) and the lackof line splitting structure for small carbon compounds limitsthe information they provide.The discrepancy in the analytes’ contrast power can be

understood by examining Table 1. Note that r8 is the fluxconnecting E to the rest of the network (Fig. 1a), and r8 ¼ 0in both Solutions II and III of the MILP (Table 1). This zeroflux detaches the sub-network of Q, E, and P from the mainmetabolic network; hence, the 13C label will never be able tomake it to E. Additionally, E is infused with unlabeled carbonfrom Q via r9. E will thus tend to be not highly enriched with13C, which reduces the magnitude of its average difference.

2.4. Analyte screen 2: do analytes invert differently to

fluxes?

The second screen entails determining if the labelinginformation possessed by an analyte reverts to a unique

Page 5: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

y

y0

local minima

y=f(x)

multiple globalminima (withinprescribed tolerance)

tolerance (o)

x

RQ

Fig. 3. Existence of ‘‘multiple’’ global minima or clones of a function

y ¼ f ðxÞ. Point P is clearly a local minimum while Q is the global

minimum. Within the prescribed tolerance, however, R is also a ‘global’

minimum.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 495

flux distribution. Failure to revert to the correct fluxdistribution would indicate that other analytes need to beincluded in the set, or some analytes should be omittedbased on the allowable flux distribution(s) and availableprecursor labelings. The varying ability to revert to thecorrect flux distribution will henceforth be referred to as‘‘resolving power.’’

For the ‘‘test’’ network and a particular problem,variable resolving power can be demonstrated by firstassuming that a mutational strategy is hypothesized to leadto a particular flux distribution R. An example of a‘‘hypothesized’’ flux distribution is shown in Table 2; it is aconvex combination of the four extreme point solutionsshown in Table 1. The experimental design task thus entailsselecting one or more analytes that can provide informa-tion that confirms or refutes the hypothesis that R is indeedthe flux distribution operating within the cell.

The hypothesized flux distribution was used to generateeach analyte’s IDV, and each analyte or set was subjectedto the Inverse Problem (Eq. (2)). Table 2 shows the fluxesidentified when each analyte or a combination of them wasused. These results indicate that A and E (or even E byitself) can generate an accurate flux map, which also is inagreement with Forbes et al. (2001). Thus, E is a very goodchoice for an NMR analyte. In contrast, the flux solutionprovided from C is vastly different from the hypothesizedsolution, which indicates that C has low resolving power.The existence of such a solution suggests that an NMRexperiment based on analyzing C may not help to confirmor refute the hypothesis because the spectrum of C isconsistent with R or another much different flux solution.The ‘‘uneven’’ performance of analytes indicates the needfor a more structured framework that can better screen themetabolites so that accurate flux information is obtained.A more structured version of a screen for resolving poweris described next.

From the results of Table 2, analyte C can be perceivedto provide a ‘‘local’’ solution to the inverse problem. Anoperational way of viewing this is that there might existmore than one flux distribution (or ‘‘clone’’), which wouldall give the same IDV of C (within the limits of theprescribed tolerance). Detecting and enumerating suchclones potentially provides a means for ascertaining how‘‘tight’’ the correspondence is between an analyte’sspectrum and the flux distribution.

Table 2

NMR to flux inversion results when using different analytes

Analyte v3 v4 v5 v6

Hypothesized 0.43 0.87 1.19 0.87

A&E 0.43 0.87 1.19 0.87

C 1.3 0 0 0

O 1.3 0 0 0

A 1.3 0 0 0

E 0.43 0.87 1.19 0.87

This problem can be considered further with the aid ofFig. 3. The function y ¼ f ðxÞ has a distinct local minimumat P. Q is the global minimum here, but the dottedrectangular regions around both Q and R would qualify asglobal minima within the tolerance shown in Fig. 3. This isakin to saying that all these clonal abscissas would defineflux distributions that would give the same IDV or NMRspectra, as defined by the value of y ¼ y0 � d. Thus, toprobe for clonal solutions, we modified the globaloptimization procedure (BARON, Sahinidis and Tawar-malani, 2002) so that instead of determining the globaloptima, a set of solutions is obtained that differ from theglobal solution within a small tolerance.Traditionally, BARON implements a deterministic

global optimization procedure that relies on a spatialbranch-and-bound method that is applied to structuredproblems that involve non-convex functions such as bi-linear, tri-linear, and concave terms. The basic idea of themethod consists of successively partitioning the solutionspace using a tree enumeration search, where the lower andupper bounds of the continuous variables define the space.At each node of this tree, a lower bound on the globaloptimum of the objective function is computed using validunderestimators of the non-convex functions. An upperbound to the global optimum is computed by solving the

v7 v8 v10 v11 e8

0.32 0.78 0.08 0.46 0.59

0.32 0.78 0.08 0.46 0.59

0 0 0.4 0 0.12

0 0 0.4 0 0.37

0 0 0.4 0 0.9

0.32 0.78 0.08 0.46 0.59

Page 6: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

Table 3

Clonal solutions obtained with metabolite C as the NMR analyte

Fluxes Actual Multiple global solutions from C

Solution I Solution II Solution III

vl 1 1 1 1

v2 0.3 0.3 0.3 0.3

v3 0.43 1.3 0.37 0.47

v4 0.87 0 0.93 0.83

v5 1.19 0 1.27 1.13

v6 0.87 0 0.93 0.83

v7 0.32 0 0.34 0.3

v8 0.78 0 0.18 0.6

v9 0.4 0.4 0.4 0.4

v10 0.08 0.4 0.058 0.095

v11 0.46 0 0.15 0.3

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507496

original non-linear problem from a starting point that liesin the current partition. Nodes whose lower bound liesbelow the upper bound are further partitioned, while nodeswhose lower bound exceeds the upper bound are fathomed.The search terminates when the lower and upper bounds ofthe objective function lie within a small tolerance. Theeffectiveness of this global optimization search relieslargely on the quality of the lower bound for the globaloptimum, which in turn relies on how tight the lower andupper bounds are for the variables. Although BARON usesa reduction strategy in an attempt to improve the variablebounds, it is very important to provide tight values toreduce the computational cost of the search, whichotherwise may be substantial. Thus, having predeterminedflux bounds from either a MILP or DFS solution isbeneficial.

In order to obtain multiple solutions to the inverseproblem with the IDV of the metabolite C in the objectivefunction, we changed the options file in BARON to obtainthe K-best solutions of the model. When implemented, the(up to) K-best solutions found during every branch-and-bound iteration are maintained, and the nodes provablyworse than the Kth best solution are deleted (Sahinidis andTawarmalani, 2002). For K ¼ 3, the ‘‘clonal’’ solutionslisted in Table 3 were obtained. Solution I lies the furthestfrom the desired flux distribution. The 2nd and the 3rdsolutions are reasonably close to the correct fluxes. Asimilar analysis was performed on A and E, and it wasobserved that no such alternative solutions exist within thegiven tolerance. The optimization was performed on aPentium IV, 1.8GHz machine with 512 MB RAM. Thenetwork model had 51 variables and 47 constraints, andtook �1min to solve with BARON.

2.5. Analyte screen 3: economic considerations

After establishing the resolving and contrast powers ofcandidate analytes, the next screen that is applied iseconomics. Here, one may be able to determine, for example,what minimal outlay will yield satisfactory results. The

economics of analyte selection is driven, in part, by thematerial and labor costs associated with their isolation. Thus,at the minimum, analytes can be ranked in coarse (low,medium, high) expense categories that are associated with thecomplexity of their isolation. How much NMR facility time isrequired is also a comparative economic factor. The timecomponent will be considered in the next case study.While the test problem is an abstracted view of

metabolism, it presents some options for analyte isolationthat can be potentially economically distinguished, andthen combined with their contrast and resolving powerattributes in order to devise different cost–benefit scenar-ios. Metabolites C, E, and O would probably require atleast a cytosolic extraction. Alternately, O can be viewed tobe the carbon skeleton for aspartate; hence, NMR analysisof protein hydrolysis products would provide a source ofinformation on how O is labeled. Thus, cell concentration,cell lysis, protein hydrolysis, chromatography, sampleclean up (e.g., metal chelation), and concentration (e.g.,lyophilization) would all be required.In contrast, P is excreted in the growth medium; hence, a

simpler cell–liquid separation (e.g., centrifugation or rapidfiltration) followed by cleanup (e.g., metal chelation) andconcentration (e.g., lyophilization) would be required. Pwould be especially useful if a defined minimal medium(e.g., glucose plus inorganic salts) was used to grow thecells, and significant consumption of the labeled glucoseoccurred. Fewer signals would appear in the samplespectrum that originates from naturally labeled andabundant carbon compounds (e.g., amino acids in nutrientbroth) as well as unconsumed labeled glucose. The absenceof such signals would lessen the potential for confoundingthe interpretation of the line splitting pattern of P.E and thus P invert reliably to the correct flux distribution.

However, C and O perform significantly better when thecontrast power over the entire flux space is considered. Thus,one low expense, but effective strategy that would simulta-neously maximize flux resolution and contrast power wouldbe to use P (found in cell-free growth medium; high resolvingpower) and C (cytosolic extraction; high contrast power). Toenvision implementing the strategy, the actual flux distribu-tion would lie between the extreme points. In a sequentialapproach, the data from C in an objective function woulddrive the solution towards the correct flux distribution or analternate that fits C’s data well. Data from P would then beused to determine if the correct solution has been found.Using the data from C and P simultaneously would then beexpected to locate the correct solution in one step, which hasbeen observed (not shown).

3. Larger metabolic engineering network problem

3.1. Network and flux solutions to contrast by NMR

The methodologies described in the prior section havebeen used to investigate a metabolic engineering problem inE. coli that involves a larger metabolic network. The

Page 7: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

GLUCOSEPEP

Pyruvate

G6Pcell envelope

cell envelope F6P

ATP

FDPDHAP

Lipid T3P

PGP

PEP

glycine PG3

P2G

aa cell envelope

CO2

aspartateOAA

ATP

Pyruvate

Malate

Fumerate

Succ

SuccCOA

succinate

glutamate

aapolymines

KG

CO2

ATP

CO2

NADPH

CIT

ATP

acetate

aa lipid

AcCOA

CO2

alanine

aa peptidoglycan

organic acid

acE4P

ATP

S7P T3P

R5PXy5P

Ribulose 5P

CO2

NADPH

nucleic acidsamino acids

Fig. 4. The metabolic network of E. coli. The glycolytic, pentose phosphate pathway, malic enzyme, and Krebs cycle reactions are shown. The double-

ended arrows represent potentially reversible reactions and the solid-headed arrows depict the net directions. AcCoA, acetyl-CoA; CIT, citrate; DHAP,

dihydroxy-acetone phosphate; E4P, erythrose-4-phosphate; F6P, fructose-6-phosphate; FDP, fructose-1,6-bisphosphate; G6P, glucose-6-phosphate; KG,

a-ketoglutarate; OAA, oxaloacetate; PEP, phosphoenolpyruvate; PGP, phosphoglycerol phosphate; PG3, 3-phosphoglycerate; P2G, 2-phosphoglycerate;

PYR, pyruvate; R5P, ribose-5-phosphate; Succ, succinyl-CoA; T3P, triose phosphate; 3PG, 3-phosphoglycerate; MAL, malate; S7P, sedoheptulose-7-

phosphate; aa, amino acids; na, nucleic acids.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 497

network is shown in Fig. 4. Through prior experimenta-tion, it has been proposed that removing the enzymepyruvate kinase, which catalyzes PEP-pyruvate, couldeliminate acid by-product formation (Goel et al., 1995)while preserving the wild-type’s rapid growth rate.Subsequent network analysis and mutation (Lee et al.,2000; Zhu et al., 2001) confirmed the efficacy of the

mutational strategy. More recently, Siddiquee et al. (2004)have also studied the metabolic regulation of E. coli lackinga functional pykF gene. They also observed low flux ratiosthrough the lactate- and acetate-forming pathways, whichcorroborates the efficacy of the mutation strategy.Interestingly, network analysis indicates that for similar

glucose uptake, there are two feasible extreme point flux

Page 8: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESSS. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507498

distributions where the pyruvate kinase-catalyzed flux iszero and acid production is nil. The bounds of the fluxspace differ mainly by the flux ratios at the G6P branchpoint (Ghosh et al., 2005). In one case, the flux to thehexose monophosphate pathway (G6P-Ribulose 5P) isgreater than G6P isomerization to F6P, and the otherextreme point exhibits the opposite partitioning at thebranch point. The analyte screening methodology illu-strated for the test problem was applied to the problem ofdesigning an NMR experiment that would resolve thefluxes in a pyruvate kinase-deficient mutant. This parti-cular problem of flux scenario differentiation is represen-tative of a practical problem, and the larger size of thenetwork was envisioned to provide useful information onhow fast computations can be performed for lessabstracted network problems.

In order to implement the analyte screening methodol-ogy, NMR data was generated using a linear combinationof a1 ¼ 0:4 and a2 ¼ 0:6 of the fluxes in the two solutions.We assumed that resolving a ‘‘middle of the road’’ case(i.e., one extreme solution does not dominate) would bereasonably challenging. Relaxing the ‘‘middle of the road’’assumption would be straightforward and would entailrepeating the analysis for several different values of a toexplore the utility of different analytes.

3.2. Analyte set

The analysis limited the scope to cytosolic or proteinhydrolysis-derived amino acids, which are commonly usedin NMR experimentation. The computational methods,however, are not limited to evaluating amino acidanalytes. The criteria used to select candidate amino acidanalytes for the purpose of demonstrating the computa-tional methods were (1) abundance and (2) metabolicorigin. Abundance was considered a virtue because aswill be discussed in more detail later, the more abundantan analyte is, the less sample preparation effort and usageof NMR facility time that is required, which lowers thecost of experimentation. Metabolic origin considerswhether all the analytes are derived from differentmetabolites within mainstream metabolism or if someof the analytes report on the same metabolite. Weattempted to keep the example set tractable while yetinsuring that the analytes are derived from different andimportant metabolic nodes and with some redundancy.Within Section 4, we comment further on the origin of theexample analytes in terms of their ‘‘connectivity tometabolic nodes.’’

The cytosolic candidates chosen were the three mostabundant amino acids: glutamate, alanine, and glycine(Tempest and Meers, 1970). The later amino acid analytesare synthesized from the central metabolites, a-ketogluta-rate, pyruvate, and 3-phosphoglycerate, respectively. Forbrevity, the top four prevalent amino acids that can bederived from protein hydrolysis were chosen for demon-stration purposes. They were alanine, glutamate/glutamine,

aspartate/asparagine, and glycine. The latter four aminoacids are synthesized from pyruvate, a-ketoglutarate,oxaloacetate, and 3-phosphoglycerate, respectively.Based on metabolic origin, alanine and glycine are

somewhat redundant. Glycine is derived from 3-phospho-glycerate, which maps carbon-to-carbon into alanine’sprecursor, pyruvate. Thus, both alanine and glycine arederived from similarly labeled 3-carbon intermediates inglycolysis. However, the spectra and isotopomer possibi-lities differ for a two (glycine) versus a 3-carbon compound(alanine). Overall, the analyte set can be viewed as coveringthe top three abundant amino acids in the cytosol, and allthree are diverse in metabolic origin. For the top fourabundant amino acids in proteins, three are diverse inorigin and two (alanine and glycine) both report on thelabeling of 3-carbon compounds in glycolysis. Thus, thescreening results can be viewed as a report on the overallutility of different analytes as well as a means fordifferentiating between two analytes that report on thelabeling of similarly labeled glycolytic metabolites. There isalso overlap between the cytosol- and protein hydrolysis-derived analyte candidates. However, the relative abun-dances and thus signal strengths of each analyte will differin a cytosol- versus a protein hydrolysis-derived sample. Aswill be seen later, abundance difference can affect theutility of an analyte. Finally, other amino acids could beincluded in the computational screens that follow such asthe much less abundant aromatic amino acids that aresynthesized from erythrose-4-phosphate.

3.3. Analyte screen 1: how do analyte spectra contrast over

entire feasible flux space?

Unlike the ‘‘test’’ problem, different precursor (i.e.,glucose) labelings are possible. Quantitatively, detectable

contrast is provided by high 13C enrichment (i.e., sig-nificant NMR line intensity) and how the allowable fluxesvariably enrich different carbons in the analyte. Thus,glucose labeling was considered in parallel with extremepoint comparisons when determining the contrast powersof the analytes. Singly (from 1 through 6) and uniformlylabeled glucose were evaluated. The NMR spectra for thecandidate analytes were simulated in case of the twoextreme point solutions (corresponding to a1 ¼ 1 anda1 ¼ 0).For 1-13C and 2-13C labeling of glucose, it was found

that the extreme point spectra were notably different,while the other labels provided very little contrast. Forexample, glutamate spectra (three internal carbons) areshown in Fig. 5 for when 1-13C, 2-13C, or U–13C labeledglucose is used. When 1-13C and 2-13C labeled glucose isused, the spectra corresponding to the extreme pointsolutions can easily be distinguished by eye. The sameoccurs for the other analytes (not shown). The differencebetween glutamate spectra is difficult to discern by eyewhen U–13C labeled glucose is used (Fig. 5), as is alsothe case for other analytes (not shown). Difficult to

Page 9: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

0.25

0.2

0.15

0.1

0.05

020 25 30 35 40 45 50 55 60

ppm

2-13C0.12

0.1

0.08

0.06

0.04

0.02

020 25 30 35 40 45 50 55 60

u-13C0.4

0.35

0.3

0.25

0.2

0.15

0.1

0.05

020 25 30 35 40 45 50 55 60

ppm

C4

C4

C3

C4

C3

C4 C2

C2

C2

C3

C3

C4

C3

C4 C2

C2

C2

0.25

0.2

0.15

0.1

0.05

0

α1=1

20 25 30 35 40 45 50 55 60

20 25 30 35 40 45 50 55 60

20 25 30 35 40 45 50 55 60

ppm

0.12

0.1

0.08

0.06

0.04

0.02

0

ppmppm

0.4

0.35

0.25

0.15

0.05

0

0.3

0.2

0.1

ppm

α1 =0

1-13C

Fig. 5. NMR spectra of glutamate when derived from 1-13C, 2-13C, or U–13C labeled glucose. Comparison of the extreme point spectra (corresponding to

a1 ¼ 0 and 1) shows that 1-13C glucose provides the highest contrast, followed by 2-13C glucose, while using U–13C glucose results in significantly more

spectral lineforms, but hard to discern contrast.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 499

differentiate glutamate spectra also arise in both of theextreme cases when 3, 4, 5, or 6-13C labeled glucose is used(not shown).

In order to quantify the above observations, the spectrawere vectorized as was previously described. The results areshown in Table 4. As far as comparing extreme point

spectra is concerned, using 1-13C labeled glucose endowsglutamate, alanine, and aspartate with better contrast,while using 2-13C labeled glucose provides better contrastfor glycine. Using U–13C labeled glucose, however, wasfound to provide nil contrast between the two extremepoints for all the analytes considered.

Page 10: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

Table 4

Performance comparison of candidate NMR analytes when 1-13C, 2-13C, or U–13C labeled glucose is used

Analytes Angular deviation of the vectorized extreme point spectra Sum(NMR)

1–13C U–13C 2–13C 1–13C U–13C 2–13C

Alpha1 ¼ 1 Alpha1 ¼ 0 Alpha1 ¼ 1 Alpha1 ¼ 0 Alpha1 ¼ 1 Alpha1 ¼ 0

Glutamate 0.486 2.70E�06 0.1629 0.2193 0.3985 0.75 0.7476 0.2063 0.2669

Alanine 0.0908 9.44E�05 0.0784 0.1403 0.2428 0.9999 0.9996 0.1889 0.2498

Glycine 0.046 8.50E�04 0.0886 0.1403 0.2347 0.9999 0.9986 0.1908 0.2498

Aspartate 0.5536 1.90E�06 0.1564 1.0034 0.5681 1.7497 1.7483 0.1998 0.2651

Table 5

Performance comparison of candidate NMR analytes when different mixtures of –13C/U–13C labeled glucose are used

Analytes Angular deviation Sum (NMR)

1–13C:U–13C 0.25:0.75 0.5:0.5 0.75:0.25

0.25:0.75 0.5:0.5 0.75:0.25 Alpha1 ¼ 0 Alpha1 ¼ 1 Alpha1 ¼ 0 Alpha1 ¼ 1 Alpha1 ¼ 0 Alpha1 ¼ 1

Glutamate 0.0473 0.1091 0.2135 0.6943 0.713 0.5885 0.6331 0.4299 0.5083

Alanine 0.0335 0.0939 0.1778 0.792 0.8101 0.5794 0.6156 0.3622 0.4161

Glycine 0.0342 0.0958 0.1814 0.792 0.8082 0.5794 0.612 0.3622 0.411

Aspartate 0.102 0.2292 0.2987 1.5803 1.5664 1.4002 1.3625 1.2078 1.1368

1Stable Isotopes 1997–1998, Cambridge Laboratories Inc.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507500

Label selection can be viewed from another vantagepoint. The spectral strength can be defined as the sum of theline intensities present in the analyte’s NMR spectrum.From this standpoint, U–13C glucose produces the bestspectra in both the extreme cases (a1 ¼ 0 and a1 ¼ 1). Thisresult follows intuitively from the fact that U–13C glucoseyields the lowest fraction of unlabeled analyte; hence,greater total NMR signal will be yielded per quantity ofanalyte. Moreover, more analyte carbons will becomelabeled despite the activity of CO2 evolving reactions thatexpel 13C from the cell. For example, when 1-13C glucose isused, the reaction 1-13C glucose 6P-ribulose 5P+13CO2

removes 13C from the system that would otherwise labelamino acid analytes. Thus, a trade-off exists betweenspectrum contrast and strength.

One-way to resolve the trade-off would be to use amixture of 1-13C and U–13C labeling (i.e., the winninglabels in the two categories). The effect of label mixing canbe quantified by computing the analytes’ IDVs when 1-13Cor U–13C glucose is used, and then taking an average of theIDVs that is weighted according to the proportions of1-13C and U–13C glucose. Based on the weighted IDVs andline splitting rules, a spectrum can be calculated. Table 5summarizes the results when different ratios of1-13C:U–13C labeling are used. The distinction betweenextreme points is better when there is a higher fraction of1-13C label in the mixture (e.g., 0.2135 with 75% 1-13C asopposed to 0.0473 with 25% 1-13C in the case ofglutamate). However, the spectral strength decreases asthe fraction of 1-13C labeling grows.

Based on the results in Table 5, one way to select theglucose labeling is to maximize contrast. Another way is touse ‘‘enough’’ 1-13C glucose to provide ‘‘sufficient’’contrast, and add U–13C glucose to provide more strongand thus usable spectral features. Using glutamate as anexample, the different results obtained when the formerand latter approaches are used are shown in Figs. 5 (top)and 6, respectively. Using 1-13C glucose (Fig. 5 top) resultsin a simple spectrum with a few strong features thatsignificantly change between extreme points. Using a 25/75mixture of U–13C and 1-13C glucose results in a largernumber of strong features that significantly changebetween extreme points. Both approaches could work interms of yielding fluxes, but the approach that providesmore ‘‘varying and above the noise data points’’ could beargued to provide to better flux results. For example, theeffect of random noise on the ratio of the intensity ofstrong and weaker lines could skew the results more whenthere are few line features to analyze.Interestingly, Fischer et al. (2004) used U–13C and 1-13C

labeled glucose in two separate experiments and concludedthat a comparable 20/80 mixture of U–13C/1-13C glucoseallowed one to resolve the metabolic network of E. coli

MG1655 strain with high resolution. It is also interesting tonote that 1-13C and U–13C are the least expensive labelingsof glucose.1 Based on the results (Figs. 5 (top) and 6) andprior experimental work (Fischer et al., 2004), thesubsequent analyses focused on refining the selection of

Page 11: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

0.12

0.1

0.08

0.06

0.04

0.02

020 25 30 35 40 45 50 55 60

ppm

0.12

0.1

0.08

0.06

0.04

0.02

020 25 30 35 40 45 50 55 60

ppm

C3

C4

C2

α1=0

C3

C4

C2

α1=1

Fig. 6. NMR spectra of glutamate when derived from a mixture of 75/25

1-13C and U–13C glucose. Comparison of the extreme point spectra

(corresponding to a1 ¼ 0 and 1) to that of glutamate derived from 1-13C

glucose (see Fig. 5) shows that the mixture provides more lineforms that

change between the extreme points.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 501

analytes when a 75/25 mixture of 1-13C/U–13C labeledglucose is used.

Note that using a 25/75 mixture of U–13C/1-13C glucosewill provide high contrast power and spectral strength forthree of the four analyte candidates. Glycine was ‘‘sacri-ficed’’ to be suboptimal because the chosen labeling forglucose can (1) provide high spectral strength and contrastpower for the majority of the candidate analytes and (2) thesubset majority reports on the labeling of different

metabolites (i.e., pyruvate, a-ketoglutarate, and oxaloace-tate). Overall, the experimental design used here empha-sizes the reporting ability of analytes derived from differentmetabolites, while retaining some contrast power andspectral strength for an analyte that provides redundantinformation on the labeling of 3-carbon compounds inglycolysis. Similar computations could enable the imple-

mentation of an alternative experimental design (Mollneyet al., 1999).

3.4. Analyte screen 2: do analytes invert differently to

fluxes?

All the candidate molecules were subjected to the‘‘multiple flux distribution from NMR data test.’’ Thespectra from the signal molecules all inverted to a a1 ¼ 0:4and a2 ¼ 0:6 combination of the extreme point fluxdistributions, when labeling is provided by a 75/25 mixtureof 1-13C/U–13C label of glucose. Multiple flux distributionsdo not arise in this case upon the inversion of NMR databecause even though the E. coli problem is significantlylarger than the prior example, several extracellularmeasurements are available. The degrees of freedom isreduced in the NLP of the inverse problem, which guidesthe solution to the optimal a’s.

3.5. Analyte screen 3: economics-related overall utility index

While all the candidate analytes were found to yieldspectra that reliably invert to fluxes, the analytes differ incontrast power. The analytes also differ in abundance. Lowabundance and/or spectral strength is problematic becausesignal to noise will be low, and thus little information forflux resolution will be provided unless effort is expendedfor sample concentration and/or many free inductiondecays are collected. Both add to the cost of experimenta-tion. Long-term data acquisition can be especially costlybecause signal to noise increases only with the square rootof the number of free induction decays. Throughputthrough an NMR facility is also lowered.To introduce the relative cost of NMR experimentation

time, we assume that the inverse of NMR time depends onthe product of analyte abundance and spectral strength.Analyte abundance is now introduced as opposed to usingonly the spectral strength because the spectral strength wasbased on an analyte’s IDV. An IDV records the molefractions of isotopomers, which must sum to one. Thus, thespectra (e.g., Figs. 5 top and 6) and spectral strengths(Table 5) computed from the information in IDVs are allon the same concentration basis. Therefore, scaling furtherby relative analyte abundance attempts to further distin-guish the analytes based on the magnitude scale theexperimentalist would actually ‘‘see.’’ An overall utilityindex can then be defined as data quality (i.e., contrastpower) per experiment cost (measure of NMR instrumenttime requirement). This index is provided by the product ofthe three terms: (1) contrast power of an analyte, (2)spectral strength, and (3) an analyte’s relative abundance.The alternative to using linear scaling to develop an

overall utility index is to contrast analytes on the basis ofsimilar signal to noise ratio (SNR). That is, one cancompare analytes by constrast power/time (i.e., cost) toachieve similar SNR. As before, one would expect that asabundance and/or spectral strength decrease, more time

Page 12: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

Table 6

Combination of NMR resolution and amino acid abundance of candidate analytes when 75/25-13C/U–13C labeled glucose is useda

Amino

acids

1 2 3 4 1� 3� 4 2� 3� 4

Pool-free

content

Relative

frequency in

proteins

Extreme point

NMR

comparison

Average sum of

NMR peaks

Combination based on

cytosolic content

Combination based on

relative frequency in

proteins

Glutamate 72.5000 10.8100 0.2135 0.4691 7.2611 1.0827

Alanine 17.5000 13.0200 0.1778 0.3892 1.2108 0.9009

Glycine 3.7500 7.8100 0.1814 0.3866 0.2630 0.5477

Aspartate 0.0000 9.9000 0.2987 1.1723 0.0000 3.4666

aAbundance data obtained from Tempest and Meers (1970).

2Multiple Global Optima Test (MGOT) has not been included as a

criterion in AHP, because all the candidates successfully passed this test.

In the event that the analytes perform differently in this test, MGOT

should be included as a criterion with the appropriate weights.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507502

(i.e., money) will be required to acquire a usable spectrum.However, the statistics of signal averaging introduces anon-linearity between SNR and time allotted to spectrumacquisition. Here, SNR depends on the square root of thenumber of free induction decays. Because the number offree induction decays is proportional to total instrumenttime, an alternative version of the utility index is (contrastpower)�{spectral strength�relative abundance)2. Wheneither version of the overall utility index is used, the samerank order for the analytes results.

The abundance of amino acids in the cytosol of E. coli

(Tempest and Meers, 1970) and within proteins (Lehnin-ger, 1972) is shown in Table 6. Using the aforementionedutility index, the results in Table 6 indicate that glutamateemerges as the best choice for an analyte when NMRanalytes are obtained via cytosolic extraction. Alanine isranked second followed by glycine. In case of proteinhydrolysis-derived analytes, aspartate emerges as thebest followed in rank order by glutamate, alanine, andglycine.

3.6. Alternative analyte selection using the AHP

The previously described manner of ranking analytesentailed posing a multifactor objective function, and thenattempting to maximize the function. Situations canconceivably arise, however, where different experimen-talist- or problem-dependent values are placed on thevarious aspects of resolving fluxes from NMR data. Forexample, NMR facilities may be scant so minimizingfacility usage is a priority and sufficient versus maximalflux resolution is acceptable. In this case, one could pose adifferent objective function or focus entirely on relativeanalyte abundance and spectral strength. An alternativeapproach that retains all information while allowingdifferent relative emphases to be introduced is providedby the AHP (Saaty, 2000), which is briefly described inAppendix A. This section illustrates the use of the AHP forthe purpose of ranking NMR analytes.

AHP is based on the innate human ability to useinformation, experience, and/or individual preferences toestimate relative magnitudes through paired comparisons.These comparisons are used to construct ratio scales on avariety of dimensions, both tangible and non-tangible.

Thus, the method proceeds from judgments on compar-isons with respect to dominance, which is the generic termfor expressing importance, preference or likelihood, of aproperty which they have in common, to their numericalrepresentation according to the strength of that dominanceand then derives a ratio scale. Arranging these dimensionsin a hierarchic or network structure allows a systematicprocedure to organize our basic reasoning and intuition bybreaking down a problem into its smaller constituent parts.The AHP thus leads from simple pairwise comparisonjudgments to the priorities in the hierarchy.In this problem, the following criterions2 have been

considered while ranking the NMR analytes:

(i)

extreme point comparison (ExP), (ii) abundance data (Abun) based on both cytosolic

content (cc) and frequency in proteins (fP),

(iii) sum�abundance data.

In the first step, the criterions are compared in pairs, e.g.,sum�abundance data is considered seven times moreimportant than ExP and nine times more important thanAbun. Once all the pairwise comparisons in a group arecompleted, a scale of relative priorities is derived fromthem. This process is repeated for all groups on all levels(i.e., the analytes).Next, each criterion is assigned a weight relative to a

node in the next higher level, e.g., both cytosolic contentand frequency in proteins are considered to contributeequally to the net abundance criterion. Since each of thesenodes carries only its priority of the unit goal, the derivedscale is suitably transformed through multiplication by theweights of the criteria so that each alternative receives itsportion of the unit goal.Two separate hierarchical analyses have been consid-

ered, one considering frequency of amino acids in proteins(Table 7) while the other takes into account cytosoliccontent (Table 8). The extreme point criterion is, however,the same in both cases.

Page 13: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

Table 7

Implementation of the Analytic Hierarchy Process for the selection of NMR analytes (considering frequency of amino acids in proteins only)

Sum�abun fP ExP abun fP Overall priorities Overall rank

Analytes 0.772 0.1734 0.0545

Glutamate 0.1875 0.3650 0.2718 0.2229 2

Alanine 0.1875 0.0795 0.2718 0.1733 3

Aspartate 0.5625 0.3650 0.2718 0.5124 1

Glycine 0.0625 0.0314 0.1017 0.0592 4

Table 8

Implementation of the Analytic Hierarchy Process for the selection of NMR analytes (considering abundance of amino acids in cytosol only)

Sum�abun cc ExP Abun cc Overall priorities Overall rank

Analytes 0.772 0.1734 0.0545

Glutamate 0.6430 0.3650 0.4846 0.5861 1

Alanine 0.2683 0.0795 0.3360 0.2392 2

Aspartate 0.0372 0.3650 0.0448 0.0945 3

Glycine 0.0515 0.0314 0.0448 0.0477 4

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 503

Thus, in the final tabulation, the decision-maker judgesthe importance of each criterion in pairwise comparisonsand the outcome is a prioritized ranking of eachalternative. Tables 7 and 8 show the performance of eachNMR analyte according to the abovementioned criteria.Aspartate ranks first when considering frequency inproteins while glutamate ranks first when abundance incytosolic content is taken into account. This ranking isconsistent with the results of our structured methodologyused in the previous sections.

It is important to note, however, that the relative weightsare subjective and are at the discretion of the experimenter.Hence, changing their values might lead to a change inoverall ranking of the candidates. In this particularinstance, however, glutamate and aspartate performsignificantly well in their respective categories; so, theyare unlikely to be ‘dethroned’ if the weights are changed.

The advantage of the AHP is that the structure of theproblem represented in the hierarchy can be easily extendedto include more comparison criteria and/or more metabolites.Further, the judgment process is so simple that the decisionmakers are in command of the problem as they see it.

3.7. Computational characteristics for reversion

Glutamate was found to provide high contrast andresolving power. To conclude, we report on the computa-tional characteristics of inverting a glutamate spectrum tofluxes using the methods described in this paper. A least-square-like objective function (akin to Eq. (2a)) was used toreconcile the NMR spectra of glutamate to the fluxes.When BARON was used for the optimization, the fluxesshown in Fig. 7 were obtained, where the values of someimportant fluxes are indicated numerically while those withzero flux values have been removed. The problem had 910variables and 897 constraints, and took �30min to solve.

4. Discussion

A structured and problem-centric framework for screeningmetabolites for their potential use as NMR analytes for fluxidentification has been described. The first step entailsprobing the contrast power of each analyte candidate overthe whole feasible flux space. Having MILP solutions in-hand facilitates this step. Such solutions provide the fluxbounds a network can attain when varying amounts ofextracellular data exist. Moreover, the labeling of a precursorsuch as glucose can be considered in this step such thatcontrast and the desired data type (e.g., few strong lines vs.many strong spectral lines) are obtained. Next, the analytesare evaluated on the ability of their NMR spectra to invert toa unique flux distribution. Here, it is of interest to determinein advance whether an analyte’s NMR spectrum could revertto clonal solutions that are characterized by highly similarvalues of an objective function. Determining this prospect inadvance can facilitate how analyte sets are constructed aswell as illuminate what uncertainties may lie within anexperimental approach. Appendix B summarizes the numer-ical recipes used.An alternative exists to finding clonal solutions, where

the clonal solutions correspond to different flux vectorsthat satisfy the objective function within a set, smalltolerance. The MILP-determined bounds of a particularflux can instead be contracted further for different analytesand/or precursor labelings. A possible formulation toachieve ‘bound contraction’ problem involves min/max vi

(where vi is the ith flux value), and an additional constraintin the form ofX

i

ðNMRi;analyte �NMRexpt;analyteÞ2p�.

A rather similar implementation using Baron wouldresult. The results one would obtain, however, would be

Page 14: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESS

GLUCOSEPEP

Pyruvater1=4.1

r2=1.8

r3=3.0

r53=0.9

r32=1.8

r49=1.0

r26=2.1

r28=0.8r22=0.8

r20=1.2

r19=1.2

r14=2.4

cell envelope

cell envelope

G6P

F6P

ATP

FDPDHAP

Lipid T3P

PGP

PG3

P2G

PEPaa cell envelope

ATP

OAAaa na

Malate

FumerateATP

Succ

SuccCoA

glutamate

KGaa

polyamines

CO2

NADPH

CIT

ATP

AcCoA

CO2

Pyruvate

aa lipid

alanine

aa peptidoglycan

organic acid

ac

ATP

E4P

S7P

Xy5P

T3P

R5Pnucleic acidsamino acids

Ribulose 5P

CO2

NADPH

CO2

Fig. 7. Metabolic flux map of E. coli. The numbers associated with some important fluxes are indicated, while pathways with zero flux values have been

removed from the network. The abbreviations for the species can be found from the legend for Fig. 4.

S. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507504

the relative variation of an individual flux (ri) that isconsistent with the NMR data to some tolerance. Thevariation could be expressed as a relative deviation ofindividual fluxes Dri=hrii, where hrii is a midpoint value.For the problems investigated in this paper, both findingclonal solutions and individual flux variations lead to thesame ranking of analytes.

There are positives and negatives with such an alter-native formulation. The positive is the results arequantified in a way many experimentalists may more

naturally think about data (e.g., relative error ¼ variation/mid interval value). The negative is more computations areinvolved, and one is mainly getting a refinement ofinformation already in-hand from the MILP—namely,the bounds on individual fluxes. In contrast, the methodpresented in this paper finds alternate flux vectors, which issignificant because the experimentalist is sometimes moreconcerned with (i) resolving disparate trafficking patterns/mechanisms as opposed to (ii) whether a particular flux is1.0 or 1.2mmol/g h. The former implies radically different

Page 15: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESSS. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 505

regulatory mechanisms while the latter is usually related tomeasurement error. Thus, both approaches have merit, andsimilar global optimization strategies are used whereflux bounds are established beforehand, and solutions thatyield similar values of an objective within a tolerance aresought.

Some examples of how economics can be introduced togenerate cost–benefit scenarios were also presented. Oneexample sought to determine an effective yet a minimal andeasily isolated analyte set. Another example first found thatthe least expensive forms of labeled glucose providedsignificant contrast and resolving power, and then analytescould be ranked with an overall index that considered theNMR facility time requirement. Finally, as an alternativeto posing an arbitrary objective function, we presented howa relative value-based scheme, the AHP, can also be used torank analytes. It is important to note that both rankingmethodologies used the same numerical inputs from thescreens.

When these screens were applied to the case of an E. coli

mutant system, glutamate emerged as a highly useful NMRanalyte in that it can provide high flux resolution per unitof experimental cost. This result is interesting fromnumerous standpoints. First, glutamate has desirableNMR properties. Glutamate’s three central carbonshave equivalent spin-lattice relaxation times and NuclearOverhauser Enhancement effects. Thus, signal magnitudesfrom each of the three internal carbons are directlycomparable to one another because saturation affects eachnucleus equally. Secondly, glutamate is readily isolatedby cytosolic extraction due to its high abundance.Glutamate is also a prevalent amino acid residue inproteins. Overall, glutamate provides a strong and easilyinterpreted signal. For the particular E. coli problemexamined, glutamate NMR data also inverted reliably tothe correct fluxes.

The candidate analytes were chosen based on highabundance in the cytosol or within proteins, and metabolicorigin. Abundance was deemed desirable because lessstrenuous sample preparation would be required to obtainanalyte spectra the exhibited significant signal to noise.Concerning origin, the candidate set attempted to covermost of the amino acid families as a starting point fordemonstration. Also, an analyte such as glutamate is oftenqualitatively viewed as lying ‘‘deep’’ within metabolismwith the result that ample flux history is ‘‘recorded’’ as the13C nuclei within labeled glucose traverse the branching,converging, and molecular rearranging reactions of glyco-lysis, the Krebs cycle, and other pathways. The simplereasoning applied to origin could exclude some usefulanalytes, and ‘‘deep’’ is a subjective notion.

The recent application of the ‘‘Kevin Bacon Test’’ (Felland Wagner, 2000) to the structure of the E. coli coremetabolism suggests some potentially interesting overlapsmay exist between the test’s conclusions and the NMRanalyte selection problem. Fell and Wagner (2000) foundthat glutamate lies at the core of E. coli metabolism as

indicated by the minimum separation distance between itand other metabolites. Alanine scored second in minimalseparation distance.Interestingly, glutamate and alanine are first and second

in abundance in the E. coli cytosol, and within proteinsthey are the top two residues when glutamine is scored asequivalent to glutamate. Thus high abundance andminimal separation distance correlate. Here, the correla-tion to abundance may reflect the development of high andcoordinated mass action potentials that can drive con-nected reactions (e.g., Majewski and Domach, 1989).One speculative connection between ‘‘degree of separa-

tion’’ and NMR is given that glutamate and alanine areclosely connected to all other metabolites, then the originand thus the labeling of glutamate and alanine are highlyrepresentative of the network’s reaction processes. Overall,the connected feature plus their abundance may underliewhy glutamate and alanine score and perform well whenused as NMR analytes. The connected nature of glutamateand alanine may also suggest that the set of analytes usedfor demonstration purposes is a reasonable starting point.Scrutinizing these possible connections further might entailediting out transaminase reactions from a ‘‘degree ofseparation analysis.’’ Transaminase reactions transferamino groups between different carbon skeletons asopposed to altering or building new carbon skeletons,which shuffles 13C and create isotopomers.

Acknowledgments

This publication is based upon work supported, in part, byEPA Grant R82 9589 and NSF Grants BES-0224603 andBES-0118961 in cooperation with the Interagency Program inMetabolic Engineering. Any opinions, findings, and conclu-sions or recommendations expressed in this material are thoseof the authors and do not necessarily reflect the views of theEPA or the National Science Foundation.

Appendix A. Analytic Hierarchy Process (AHP)

The AHP is a procedure for ranking a given number ofalternatives using a specified set of attributes. A key featureof the AHP is that it is based on performing qualitativepairwise comparisons of the alternatives and attributesusing an arbitrary quantitative scale. In this appendix wepresent the simplest version of the AHP procedure.Assume that we are given NA alternatives,

Ai; i ¼ 1; . . . ; NA, that are to be ranked according to NC

criteria or attributes, Cj ; j ¼ 1 . . . NC.The steps are as follows:

1.

For each attribute Cj ; j ¼ 1 . . .NC:Define a matrix Bj ¼ ½b

jik� for the alternatives where the

coefficient bjik; i; k ¼ 1 . . .NA represents a measure of

the pairwise comparison between alternatives i and k.(a) An arbitrary scale between 1 and 9 is commonly

used to denote the following for upper triangular

Page 16: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESSS. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507506

elements, bjik; iok : b

jik ¼ 1 alternative i equally

preferred over k; bjik ¼ 3 alternative i moderately

preferred over k; bjik ¼ 5 alternative i strongly

preferred over k; bjik ¼ 7 alternative i very strongly

preferred over k; bjik ¼ 9 alternative i extremely

strongly preferred over k.

The numbers 2,4,6 and 8 are used when acompromise is in order.When alternative k is preferred over i use the inverseof the above values.

(b) Set diagonal elements to bjii ¼ 1 and lower trian-

gular elements bjki ¼ 1=b

jikipk.

The priority aij for alternative i under criterion j iscalculated as

aij ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQNA

k¼1

bjik

� �NA

s

PNA

i¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQNA

k¼1

bjik

� �NA

s .

2

A weighting scheme for the attributes Cj is developed asfollows:2.1 Define matrix D ¼ ½djl �; j; l ¼ 1; . . . ; NC:

(a) Assign scale between 1 and 9 for djl ; jol, inpairwise comparison of criteria j and l usingscale as in step 1.1.a.,

(b) Set dlj ¼ 1=djl jpl.2.2 Compute weights wj of each attribute j as follows:

wj ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQNC

l¼1

djl

� �NC

s

PNC

j¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiQNC

l¼1

djl

� �NC

s .

3

For each alternative i compute priority pi; pi ¼P

j

wjaij :The alternatives i ¼ 1 . . . NA are ranked from highest to

4 lowest value of the priority pi.

As a simple example consider the case of 2 alternativesA1, A2, under criteria C1, C2. If the matrices B1, B2, D, aregiven as follows (where Bi is the comparison of alternativesbased on criterion Ci and D is the comparison of thecriteria themselves):

B1 ¼1 7

1=7 1

" #B2 ¼

1 1=3

3 1

� �D ¼

1 5

1=5 1

" #,

then the priority vectors can be computed as: a1 ¼

½0:875 0:125� (for matrix B1); a2 ¼ ½0:25 0:75� (for matrixB2); and w ¼ ½0:8333 0:1667� (for matrix D).

Therefore, pA1¼ 0:7708 and pA2

¼ 0:2292.Hence, alternative A1 is chosen over alternative A2.

Appendix B. Summary of numerical recipes for screening

analytes

Step 1: Formulate the metabolic network as a linearprogramming (LP) model by considering flux balancesaround each metabolite and any additional (linear)physiological constraints.

min Z ¼ 0,

subjectto Ar ¼ b,

rX0.

Step 2: A MILP/Depth First Search (DFS) Formulationis adopted to enumerate all K feasible extreme pointsolution of this LP.

Step 3: A set of M 13C substrate labels as well as N NMRsignal molecules (based on abundance data) are identified apriori to be used as candidates in our selection framework.

Step 4 (analyte screen 1):

For i ¼ 1 to M:For j ¼ 1 to N:

Simulate the intensity vectors of the NMRspectra for the 1st extreme point solution-NMRij0.For k ¼ 2 to K:

Simulate the intensity vectors of the NMRspectra for the kth extreme point solution-NMRijk.Compute Tijk ¼ cos�1ðNMRij0 �NMRijkÞ=ðjNMRijojjNMRijkjÞ (radians)

End forCompute Sij ¼

PKk¼2Tijk=ðK � 1Þ

End for

End for

The label–molecule combination(s) with the highest Sij

(contrast power) perform best in the Screen 1.Another term, signal strength (zij) is also computed to be

used in Step 6:

zij ¼XK

k¼1

NMRijk

,K .

Step 5 (analyte screen 2): For a given analyte-labelcombination and some synthetic experimental data, theInverse NLP is formulated akin to Eq. (2). The parametersnumsol (number of feasible solutions to be found) andisoltol (separation distance between these solutions) in theBARON options file (baron.opt) are now altered in order toaccommodate the multiple global solutions. The isoltol

parameter has been kept at its default value (1e-4) in ourcase.Those analyte-label combinations which give closely

spaced global solutions are considered to perform best inScreen 2.

Page 17: A three-level problem-centric strategy for selecting NMR precursor labeling and analytes

ARTICLE IN PRESSS. Ghosh et al. / Metabolic Engineering 8 (2006) 491–507 507

Step 6 (analyte screen 3): The Economic-related OverallUtility Index is formulated as: OUIij ¼ Sijnzijnabundanceof the jth signal molecule.

The analyte–label (I–j) combination(s) with the best OUIis(are) considered most suitable for the NMR experiment.

References

Fell, Wagner, 2000. The small world of metabolism. Nat. Biotechnol. 18,

1121–1122.

Fischer, E., Zamboni, N., Sauer, U., 2004. High throughput metabolic

flux analysis based on gas chromatography–mass spectrometry derived13C constraints. Anal. Biochem. 325 (2), 308–316.

Forbes, N.S., Clark, D.S., Blanch, H.W., 2001. Using isotopomer path

tracing to quantify metabolic fluxes in pathway models containing

reversible reactions. Biotech. Bioeng. 74 (3), 196–211.

Ghosh, S., Zhu, T., Grossmann, I.E., Ataai, M.M., Domach, M.M., 2005.

Closing the loop between feasible flux scenario identification for

construct evaluation and resolution of realized fluxes via NMR.

Comput. Chem. Eng. 29, 459–466.

Goel, A., Lee, J.W., Domach, M.M., Ataai, M.M., 1995. Suppressed acid

formation by co-feeding glucose and citrate in Bacillus cultures:

Emergence of pyruvate kinase as a potential metabolic engineering site.

Biotechnol. Prog. 11, 380–386.

Lee, S., Phalakornkule, C., Ataai, M.M., Domach, M.M., Grossmann, I.E.,

2000. Recursive MILP model for finding all the alternate optima in LP

models for metabolic networks. Comput. Chem. Eng. 24, 711–716.

Lehninger, A.L., 1972. Biochemistry, sixth printing. Worth Publishers,

New York, p. 93.

Majewski, R.A., Domach, M.M., 1989. Effect of Regulatory Mechanism

on Hyperbolic Reaction Network Properties. Biotechnol. Bioeng. 36,

166–178.

Mollney, M., Wiechert, W., Kownatzki, D., de Graaf, A.A., 1999.

Bidirectional reaction steps in metabolic networks IV: optimal design

of isotopomer labeling experiments. Biotechnol. Bioeng. 66 (2),

86–103.

Phalakornkule, C., Fry, B., Kopesel, R., Ataai, M.M., Domach, M.M.,

2000. 13C NMR evidence for pyruvate kinase flux attenuation

underlying suppressed acid formation in B. subtilis. Biotechnol. Prog.

16, 169–175.

Phalakornkule, C., Lee, S., Zhu, T., Koepsel, R., Ataai, M.M.,

Grossmann, I.E., Domach, M.M., 2001. A MILP-based flux alter-

native generation and NMR experimental design strategy for

metabolic engineering. Metab. Eng. 3, 124–137.

Saaty, T.L., 2000. Fundamentals of Decision Making and Priority Theory,

second ed. RWS Publications, Pittsburgh, PA.

Sahinidis, N.V., Tawarmalani, M., 2002. GAMS/BARON 5.0—Global

Optimization of Mixed-Integer Nonlinear Programs.

Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J., 1997. Modeling

isotopomer distributions in biochemical networks using isotopomer

mapping matrices. Biotechnol. Bioeng. 55, 831–840.

Schmidt, K., Carlsen, M., Nielsen, J., Villadsen, J., 1999. Quantitative

analysis of metabolic fluxs in E. coli, using 2-dimensional NMR

spectroscopy and complete isotopomer models. J. Biotechnol. 71,

175–189.

Siddiquee, K.A., Arauzo-Bravo, M.J., Shimuzu, K., 2004. Effect of a

pyruvate kinase (pykF-gene) knockout mutation on the control of gene

expression and metabolic fluxes in E. coli. FEMS Microbiol. Lett. 235

(1), 25–33.

Tempest, D.W., Meers, J.L., 1970. Influence of environment on the

content and composition of microbial free amino acid pools. J. Gen.

Microbiol. 64, 171–185.

Zhu, T., Phalakornkule, C., Koepsel, R.R., Domach, M.M., Ataai, M.M.,

2001. Cell growth and by-product formation in a pyruvate kinase

mutant of E. coli. Biotechnol. Prog. 17, 624–628.

Zhu, T., Phalakorkule, C., Ghosh, S., Grossmann, I.E., Koepsel, R.R.,

Ataai, M.M., Domach, M.M., 2003. A metabolic network analysis

and NMR experiment design tool with user interface-driven

model construction for depth-first search analysis. Metab. Eng. 5,

74–85.