An Automated Translation from a Narrative Language for ... · An Automated Translation from a...

17
Technical Report CoSBi 12/2007 An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero DISI, University of Trento [email protected] John K. Heath School of Biosciences, University of Birmingham [email protected] Corrado Priami CoSBi and DISI, University of Trento [email protected] This is the preliminary version of a paper that will appear in Proceedings of CMSB07, LNBI 4695, 136-151, 2007 available at http://www.springerlink.com/content/vt23126776012835/

Transcript of An Automated Translation from a Narrative Language for ... · An Automated Translation from a...

Page 1: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Technical Report CoSBi 12/2007

An Automated Translation from a Narrative

Language for Biological Modelling intoProcess Algebra

Maria Luisa Guerriero

DISI, University of Trento

[email protected]

John K. Heath

School of Biosciences, University of Birmingham

[email protected]

Corrado Priami

CoSBi

and

DISI, University of Trento

[email protected]

This is the preliminary version of a paper that will appear inProceedings of CMSB07, LNBI 4695, 136-151, 2007

available at http://www.springerlink.com/content/vt23126776012835/

Page 2: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

An Automated Translation from a NarrativeLanguage for Biological Modelling into

Process Algebra

Maria Luisa Guerriero1, John K. Heath2, and Corrado Priami1,3

1 Dipartimento di Informatica e Telecomunicazioni, Universita di Trento, Italy2 CRUK Growth Factor Group, School of Biosciences, University of Birmingham, UK

3 The Microsoft Research - University of TrentoCentre for Computational and Systems Biology, Italy

{guerrier,priami}@dit.unitn.it, [email protected]

Abstract. The aim of this work is twofold. First, we propose an highlevel textual modelling language, which is meant to be biologically intu-itive and hence easily usable by life scientists in modelling intra-cellularsystems. Secondly, we provide an automatic translation of the proposedlanguage into Beta-binders, a bio-inspired process calculus, which allowslife scientists to formally analyse and simulate their models. We use theGp130 signalling pathway as a case study.

1 Introduction and Motivations

Process calculi, originally developed for modelling mobile communicating sys-tems [10], have proved to be both useful and powerful tools for modelling biolog-ical signalling pathways [14, 12, 13, 2, 9]. The need for such modelling approacheshas arisen from the complexity of these processes, which is not easily analysedby biological intuition [6]. A key challenge in the application of process calculi tobiological problems is specification of models. The issue poses several practicaland computational problems. First, it is unlikely that most practising biologistswould be motivated to formulate their ideas in a non-intuitive language devisedfor computational execution. Second, the process of specifying the model canidentify areas of ignorance or uncertainty in biological understanding which maynot be apparent to a computer scientist taking models from the literature. Thebiological literature frequently articulates biological models in the form of infor-mal diagrams or employs various formal graphical notations (e.g. Kohn molecularinteraction maps [8] and Kitano diagrams [7]), but these can be confusing or am-biguous and particularly may conceal or abstract biological knowledge that maybe useful for the computational modeller. It is also clear that a two-dimensionalstatic representation may not adequately represent all pertinent features of thedynamic evolution of a temporal and spatially directed process. Indeed theseways of describing a biological process in reality involve the biologist translatinga narrative of events into a graphical format.

Prompted by these considerations we here present an approach to biologicalmodel specification which is based on a narrative style language. In developing

Page 3: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

this language there were three desiderata. First, the language should be bio-logically intuitive using formalisms and syntax familiar to biologists. Second,the language should exploit the specific advantages of a process calculus basedapproach to modelling highlighting, for example, the roles of concurrency, de-pendency and spatial confinement. Thirdly, the language should encourage thebiologists to critically examine their understanding in the process of model spec-ification highlighting points of uncertainty.

In the proposed framework, biologists can specify their model by providinga textual description of the system, containing the list of compartments, the listof entities (i.e. proteins) composing the model, the list of reactions with theirrate parameters, and a narrative describing the evolution of the system.

A translation into process calculus (namely, Beta-binders [13]) has been de-veloped and a few models have been specified into the proposed language andtranslated into Beta-binders. This allowed us to run the models by using theBetaWB simulator [15].

The front-end of the translation is designed to be independent of the lan-guage, so that when implementing a translation to other modelling languagesit could be reused. The main aim of the current work, in fact, is to define anhigh level language which hides the implementation details to the modeller, andis generic so that it could be translated into other kinds of formalisms di!erentfrom Beta-binders (e.g. other process algebras, SBML [5], and possibly di!eren-tial equations). This would allow us to have a common ground in which modelscould be specified, but also allow modellers to choose the output language thatbetter fits their aims. In fact it is di"cult to find one formal language which isappropriate to describe all kinds of systems, or that could be used to study allaspects of the systems. In addition, starting with the same input model, wouldmake the comparison of di!erent formalisms much easier.

We use as a case study the Gp130 signalling pathway [11]. This was chosen asan example of signalling concurrency which has yet to be explicitly articulatedin a process calculus model.

In Sect. 2 the modelling language is described, using the Gp130 pathway asa modelling example. In Sect. 3 the translation algorithm is described, and thegenerated Gp130 pathway Beta-binders model is shown. Finally, we draw someconclusions on the pros of our approach and on issues that need to be tackled.

2 The Narrative Language

In this section we present the modelling language. The structure of models in theproposed language is inspired by SBML [5], one of the most well known descrip-tion languages for biology. The main di!erences between the two languages arethe following ones. First, SBML species have only one possible state, while in ourlanguage they can have multiple states (this choice was made bearing in mindthat our target language, process algebra, is a multistate one. Secondly, SBMLis very abstract and it does not represent details about species and reactions,while the proposed language can easily represent them. Finally, the description

Page 4: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

of system evolution in SBML is given in terms of reactions with reactants andproducts, while in our language it is given in terms of a narrative of events.

A model in our language is composed of four sections:

– the description of the biological compartments in which the involved entitiescan be located during the evolution of the system;

– the description of the entities composing the system;– the description of the reactions occurring;– the narrative description of the evolution of the system, i.e. the list of the

events occurring.

A compartment is identified by an integer number; moreover, its name, size,and number of spacial dimensions can be specified. Compartments could repre-sent cellular or sub-cellular compartments, but also abstract locations.

A component (i.e. a protein) is identified by its name, and it can be seen asa list of interaction sites. Each site is defined by a name and a state (e.g. phos-phorylated, bound, active, etc.). This choice reflects the fact that each basicevent in the evolution of intra-cellular systems is the modification of one inter-action site for each protein involved in the reaction. However, since interactionsites are not always known, states can also be associated to the protein itself,to represent modifications involving generic interaction sites. If the position ofthe protein is relevant, the compartments in which it can be located during thesystem evolution can be specified. Finally, the initial quantity/concentration ofthe component should be set.

A reaction is identified by an integer number; its type (e.g. phosphorylation,binding, etc.) and the reaction rate parameter should also be specified.

A reliability value can be associated to each numerical value (e.g. rate pa-rameters and initial quantities); it is a percentage value that can be used todistinguish between values that are certain because obtained from wet experi-ments, and others which are the result of not verified assumptions. Modellers cantake this information into account during the important step of model refiningby means of parameter space search and sensitivity analysis.

Finally, the evolution of the system is described by means of a narrative ofevents. This narrative is a sequence of basic events, each of which is a textual de-scription of a reaction involving at most two components. Events can be groupedinto processes.

An event is identified by an integer number; in addition to the textual semi-formal description of the event, the identifier of the reaction associated withthe event should be specified. The description of the event is a string of theform if condition then event descr, with the conditional part being optional.A condition is a string of the form component is state, component.site is state,component is in compartment, etc. Multiple conditions can be specified by sep-arating them with the keyword and. Event descr is a string of the form compo-nent reaction for monomolecular events, or component reaction component forbimolecular events, where reaction can be for example phosphorylates, de-phosphorylates, binds, activates. As mentioned before, we assume that each

Page 5: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

event involves an interaction or modification of one site for each involved pro-tein. If the exact site is known, it can be specified with a clause of the formon sites; otherwise, it is assumed that one of the component’s defined states isinvolved. Hence, conceptually, each step involves a single site; however, a list ofsites (separated by semicolons) can be specified as a shortcut for simultaneoussteps involving di!erent sites (e.g. simultaneous phosphorylation of two sites).

In real life, two events occurring in a system of interacting entities can beeither concurrent (independent events, e.g. events involving di!erent proteins),or sequential (one event can occur only after the other one has occurred, e.g. aphosphorylation of a site of a protein allowed only after it is bound to anotherprotein), or alternative (if one event occurs, the other one cannot occur, e.g.binding of competing ligands to a receptor).

Consequently, in our language it is necessary to distinguish between con-current, sequential, and alternative events. If a reaction event is alternative toanother one, the identifier of the alternative event should be specified. Conditionsshould be used to enforce the ordering of sequential events. Events that are notexplicitly declared either alternative or sequential, are considered independentand are treated as concurrent events.

The grammar of the language follows. Some attributes are left as optionaland are not actually used in the translation to Beta-binders, because they referto details that cannot be easily handled in Beta-binders, or about which we arenot interested at the moment; they have been added to allow them to be used inan hypothetical translation to other languages that might handle those details.

!model" ::= !comparts decl"!compons decl"!reacts decl"!procs decl"!comparts decl" ::= Compartments !comparts list"!compons decl" ::= Components !compons list"!reacts decl" ::= Reactions !reacts list"!procs decl" ::= Narrative !procs list"

!comparts list" ::= !compartment"| !compartment"!comparts list"

!compons list" ::= !component"| !component"!compons list"

!reacts list" ::= !reaction"| !reaction"!reacts list"

!procs list" ::= !proc"| !proc"!procs list"

!compartment" ::= (!id", !compart name", !opt size", !opt unit", !opt dim")!component" ::= (!name", !opt inform descr", !opt sites def",

!opt states def", !opt comparts def", !initial quantity")!reaction" ::= (!id", !react type", !rate const")!proc" ::= Process !opt inform descr"!events list"!events list" ::= !event"

| !event"!events list"!event" ::= (!id", !form descr", !react id", !opt altern event")

Page 6: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

!opt sites def" ::=| !sites def"

!sites def" ::= !site def"| !site def"; !sites def"

!site def" ::= !name" : !state name" : !is active"

!opt states def" ::=| !states def"

!states def" ::= !state def"| !state def"; !states def"

!state def" ::= !state name" : !is active"

!opt comparts def" ::=| !comparts def"

!comparts def" ::= !compart def"| !compart def"; !comparts def"

!compart def" ::= !id" : !is active"

!initial quantity" ::= (!quantity", !opt reliability")!rate const" ::= (!rate", !opt unit", !opt reliability")

!form descr" ::= !event descr"| if !conds" then !event descr"

!conds" ::= !cond"| !cond" and !conds"

!cond" ::= !names" is !state name"| !names" is not !state name"| !names" is in !id"| !names" is not in !id"

!names" ::= !name"| !name".!name"| !name"; !names"| !name".!name"; !names"

!sites" ::= !name"| !name"; !sites"

!event descr" ::= !complex name"!bimol react"!complex name" on !sites"| !complex name"!bimol react"!complex name"| !complex name"!monomol react" on !sites"| !complex name"!monomol react"| !complex name" relocates from !id" to !id"| !complex name" degrades| !complex name" degrades !complex name"| !complex name" synthesises !complex name"| !complex name" homodimerises| !complex name" dehomodimerises| !complex name" dimerises with !complex name"| !complex name" dedimerises from !complex name"

!complex name" ::= !name"| !name" : !complex name"

Page 7: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

!id" ::= Int!opt size" ::=

| Int!opt unit" ::=

| Str

!opt dim" ::=| Int

!name" ::= Ide!opt inform descr" ::=

| Str!quantity" ::= Int | Real!opt reliability" ::=

| Int!rate" ::= Int | Real | inf!react id" ::= Int!opt altern event" ::=

| alternative to !id"!is active" ::= Bool

!compart name" ::= nucleus | cytosol | exosol| cellMembrane | nucleusMembrane | Ide

!react type" ::= phosphorylation | dephosphorylation| binding | unbinding| homodimerization | dehomodimerization| dimerization | dedimerization| activation | deactivation| hydrolysis | dehydrolysis| degradation | synthesis | relocation

!state name" ::= phosphorylated | bound | active | hydrolysed | dimer

!bimol react" ::= phosphorylates | dephosphorylates | binds | unbinds| activates | deactivates | hydrolyses | dehydrolyses

!monomol react" ::= phosphorylates | dephosphorylates | hydrolyses | dehydrolyses

2.1 Case Study: a Narrative Model of the Gp130 SignallingPathway

In this section we describe a model of the Gp130 signalling pathway in theproposed language.

The Gp130 pathway is the subject of significant clinical and biological inter-est not least due to the key role it plays in human fertility, neuronal repair andhaematological development [11]. Robust experimental platforms are availableand there is much information on pathway components and behaviour. Vari-ous features of this pathway make it an attractive case study for the modellingapproach. First, the Gp130 signalling pathway involves a family of private and

Page 8: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

public receptors where biological outcomes are dictated by the relative occupa-tion of di!erent receptor combinations. Thus, it exhibits signalling concurrency,which has not been explicitly tackled by computational modelling approachesyet. Moreover, a key feature of the Gp130 system is nuclear/cytoplasmic shut-tling of key signalling components whose dynamics have been investigated usingimaging techniques; hence, this is an excellent test case for developing spatiallyconfined compartment models which can be tested against high quality datasets.

The model is made of six entities: two ligands (LIF and OSM), three membrane-bound receptors (gp130, LIFR and OSMR) and one e!ector (STAT3).

Four compartments are involved in the system: the exosol (the extracellularspace, where the two ligands are located), the cell membrane (location of thereceptors), and the cytosol and the nucleus (compartments between which thee!ector shuttles). Table 1 lists the compartments with their attributes.Table 1. List of compartments (The volumes are calculated based on the average cellradius and ratio between intra-cellular compartments volumes stated in [1]).

id name size unit of measure dimensions

1 exosol 9.95 · 10!12 l 32 cellMembrane 12.57 · 10!8 dm2 23 cytosol 2.10 · 10!12 l 34 nucleus 0.25 · 10!12 l 3

The components representing the involved proteins are listed in Table 2. Thedefinition of each component contains its name, an informal description, the listof sites (each defined by its name, state, and a boolean flag specifying whetherit is or not in an active state at system initialisation), the list of protein states(each defined by its name and the active/inactive flag), the list of compartmentsin which the protein could be located (each defined by its identifier referringto the definition in Table 1, and the active/inactive flag), and finally its initialquantity and the reliability of this numerical value.

Table 3 contains the list of reactions. Each reaction definition contains anidentifier, its type, and a rate parameter with its unit of measure and reliability.The rate parameter is the reaction kinetic constant (Ka, Ko! , etc.). In order to beused in stochastic models (such as Beta-binders), the given kinetic constants needto be translated into stochastic reaction rates. As described in [3], for first orderreactions (unbinding, phosphorylation and relocation events) the stochastic rater is equal to the kinetic rate k; for second order heterologous reactions (bindingevents) r = k

V ·NA, where V is the reaction volume and NA is Avogadro’s number,

while for second order homologous reactions (dimerization events) r = 2·kV ·NA

.Finally, the narrative of events is shown in Table 4. The events are grouped

into processes, relative to the binding/unbinding of ligand/receptor pairs, thedownstream LIF and OSM pathways, and the downstream STAT3 pathway,which starts with the binding of STAT3 to one of the receptors and ends inits translocation into the nucleus. Each event is described by an identifier, thesemiformal description, the identifier of the reaction referring to the definitionin Table 3 and the optional identifier of the alternative event.

Page 9: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Table 2. List of components (The initial quantities values for the ligands LIF andOSM are calculated based on the known extracellular concentration values, 500pM).

name descr site site state site act state state act compart compart act initial quant reliab

LIF ligand bound false 1 true 3000 100%OSM ligand bound false 1 true 3000 100%gp130 receptor LIF bound false dimer false 2 true 1000 50%

OSM bound falseY767 phospho falseY814 phospho falseY905 phospho falseY915 phospho false

LIFR receptor LIF bound false bound false 2 true 1000 50%OSM bound false dimer falseY981 phospho falseY1001 phospho falseY1028 phospho false

OSMR receptor OSM bound false bound false 2 true 1000 50%Y917 phospho false dimer falseY945 phospho false

STAT3 e!ector Y705 phospho false bound false 3 true 5000 0%gp130 bound false dimer false 4 falseLIFR bound falseOSMR bound false

Table 3. List of reactions.

id type rate unit of measure reliability

1 binding 8 · 105 M!1s!1 (ka) 50%2 unbinding 6 · 10!4 s!1 (ko!) 50%3 binding 8 · 105 M!1s!1 (ka) 50%4 unbinding 6 · 10!3 s!1 (ko!) 50%5 binding 8 · 105 M!1s!1 (ka) 50%6 unbinding 6 · 10!3 s!1 (ko!) 50%7 binding 8 · 105 M!1s!1 (ka) 50%8 unbinding 6 · 10!4 s!1 (ko!) 50%9 binding 8 · 105 M!1s!1 (ka) 50%

10 unbinding 6 · 10!4 s!1 (ko!) 50%11 dimerization inf M!1s!1 50%12 phosphorylation 0.2 s!1 (kcat) 50%13 binding 104 M!1s!1 (ka) 50%14 dimerization inf M!1s!1 (ka) 50%15 phosphorylation 0.2 s!1 (kcat) 50%16 unbinding 10!3 s!1 (ko!) 50%17 homodimerization inf s!1 50%18 relocation 10 min (t1/2) 50%19 relocation 100 min (t1/2) 50%

3 The Translation into Beta-binders

3.1 Beta-binders

Beta-binders [13] is a language, which belongs to the family of the bio-inspiredprocess calculi, strongly inspired by pi-calculus. The main advantage of process

Page 10: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Table 4. List of events.

id description react alt

LIF-LIFR binding1 if LIFR.LIF is not bound and LIF is not bound then LIF binds LIFR on LIF 12 if LIFR.LIF is bound and LIF is bound then LIF unbinds LIFR on LIF 2

LIF-gp130 binding3 if gp130.LIF is not bound and LIF is not bound then LIF binds gp130 on LIF 34 if gp130.LIF is bound and LIF is bound then LIF unbinds gp130 on LIF 4

OSM-LIFR binding5 if LIFR.OSM is not bound and OSM is not bound then OSM binds LIFR on OSM 5 16 if LIFR.OSM is bound and OSM is bound then OSM unbinds LIFR on OSM 6

OSM-OSMR binding7 if OSMR.OSM is not bound and OSM is not bound then OSM binds OSMR on OSM 78 if OSMR.OSM is bound and OSM is bound then OSM unbinds OSMR on OSM 8

OSM-gp130 binding9 if gp130.OSM is not bound and OSM is not bound then OSM binds gp130 on OSM 9 3

10 if gp130.OSM is bound and OSM is bound then OSM unbinds gp130 on OSM 10LIF pathway

11 if LIFR.LIF is bound then LIFR dimerises with gp130 11 212 if gp130 is dimer then gp130 phosphorylates on Y767;Y814;Y905;Y915 1213 if LIFR is dimer then LIFR phosphorylates on Y981;Y1001;Y1028 12

OSM pathway14 if LIFR.OSM is bound then LIFR dimerises with gp130 11 615 if OSMR.OSM is bound then OSMR dimerises with gp130 14 816 if gp130 is dimer then gp130 phosphorylates on Y767;Y814;Y905;Y915 1217 if OSMR is dimer then OSMR phosphorylates on Y917;Y945 12

STAT3 pathway18 if gp130.Y767 is phospho and STAT3 is in 3 then gp130 binds STAT3 on gp130 1319 if LIFR.Y981 is phospho and STAT3 is in 3 then LIFR binds STAT3 on LIFR 1320 if OSMR.Y917 is phospho and STAT3 is in 3 then OSMR binds STAT3 on OSMR 1321 if STAT3.gp130 is bound then STAT3 phosphorylates on Y705 1522 if STAT3.LIFR is bound then STAT3 phosphorylates on Y705 1523 if STAT3.OSMR is bound then STAT3 phosphorylates on Y705 1524 if STAT3.gp130 is bound and STAT3.Y705 is phospho then gp130 unbinds STAT3 1625 if STAT3.LIFR is bound and STAT3.Y705 is phospho then LIFR unbinds STAT3 1626 if STAT3.OSMR is bound and STAT3.Y705 is phospho then OSMR unbinds STAT3 1627 if STAT3.Y705 is phospho and STAT3 is not bound then STAT3 homodimerises 1728 if STAT3 is dimer then STAT3 relocates to 4 1829 if STAT3 is in 4 then STAT3 relocates to 3 19

calculi is that, in addition to simulation, they allow for analysing statically themodels (e.g. causality, locality, equivalence and reachability analysis). They alsoallow the modeller to easily execute so called in silico genetics experiments, i.e.to modify some components of the system, and execute simulations to verify themodified system behaviour.

Beta-binders language is quite new, but much work has been done with it inthe past few years, and a simulator has also been recently developed [15].

Beta-binders was developed to better adhere to the structure and dynamicsof biological systems. By introducing the concept of a!nity, the calculus relaxesthe key-lock model of interaction, commonly assumed in classical process cal-culi, and hence it permits us to model more correctly domains and interactionsbetween enzymes and small molecules based on their types and a"nities. In Beta-binders, pi-processes are encapsulated into boxes (also called bio-processes) withinteraction capabilities, represented by specialised binders (called beta binders).

Page 11: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Beta binders have the form !(x : " ) (active) or !h(x : " ) (hidden) where thename x is the subject of the beta binder and " represents the type of x. Theactions that bio-processes can execute are communications (x(y) for input andx!y" for output) and operations to manipulate the interaction sites of the boxes(expose(x, " ), hide(x) and unhide(x)). The system is a parallel composition ofbio-processes that can be either the deadlock bio-process Nil or the elementarybio-process B[ P ].

The reader is referred to [15] for a detailed description of the syntax andsemantics of the language.

3.2 The Translation Algorithm

Beta-binders bio-processes are an intuitive representation of proteins, and hencein the translation into Beta-binders we choose to have one bio-process for eachcomponent. Component interaction sites, states and locations are translated intobeta binders on the interface of the respective bio-processes. Two beta bindersare defined for each state and site, to represent their active/inactive state, andonly one of them is set to be initially visible depending on the active/inactiveflag value.

The translation is subdivided in two main steps. First, one bio-process iscreated for each component (Algorithm 1). Then, the pi-processes representingthe translation of events are added to the bio-processes (Algorithm 7).

As Algorithm 1 describes, the bio-processes are initially empty, and then betabinders are added on their interface.

Algorithm 1 ComponentsToBioprocesses1: for all component # Components do ! each component is a bio-process2: component.bioprocess $ new empty bio-process3: CompartsToBetaBinders (component)4: StatesToBetaBinders (component)5: SitesToBetaBinders (component)6: end for

The names of the beta binders representing possible compartments, statesand sites are constructed based on their names. As Algorithm 2 shows, thesubject of the beta binder representing a compartment is the compartment name,while its type is the compartment name concatenated with the compartmentidentifier. In the example, STAT3 is located in compartment 3 (the cytosol),hence a visible binder !(cytosol : cytosol 3) is added to STAT3 bio-process.

As Algorithm 3 shows, two beta binders are created to represent a state in itsactive/inactive forms, and the subjects of the beta binders are the state names,while their types are the state names concatenated with the component name. Inthe example, LIFR can be a dimer (but it is a monomer at system initialisation),hence a visible binder !(monomer : monomer LIFR) and an hidden binder!(dimer : dimer LIFR) are added to LIFR bio-process.

As Algorithm 4 shows, two beta binders are created to represent a site inits active/inactive forms, and the subjects of the beta binders are the site name

Page 12: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Algorithm 2 CompartsToBetaBinders (c)1: for all compart # c.compartments do ! each location is a beta binder2: binder $ new beta binder in c.bioprocess3: binder.name $ compart.name"compart.id ! "(cytosol : cytosol 3)4: binder.type $ compart.name"compart.id5: if c is in compart at initial state then6: binder.is visible7: else8: binder.is hidden9: end if

10: end for

Algorithm 3 StatesToBetaBinders (c)1: for all state # c.states do ! each state is a pair of beta binders2: binder1, binder2 $ new beta binder in c.bioprocess3: binder1.name $ state.name ! "(dimer : dimer LIFR)4: binder2.name $ state.opposite name5: binder1.type $ state.name"c.name ! "(monomer : monomer LIFR)6: binder2.type $ state.opposite name"c.name7: if c is in state at initial state then8: binder1.is visible and binder2.is hidden9: else

10: binder1.is hidden and binder2.is visible11: end if12: end for

concatenated with the states names, while their types are the state names con-catenated with the component name and the site name. In the example, site Y981of receptor LIFR can be phosphorylated (but it is dephosphorylated at systeminitialisation), hence a visible binder !(Y 981 depho : depho LIFR Y 981) and anhidden binder !(Y 981 pho : pho LIFR Y 981) are added to LIFR bio-process.

Algorithm 4 SitesToBetaBinders (c)1: for all site # c.sites do ! each site is a pair of beta binders2: b1, b2 $ new beta binder in c.bioprocess3: b1.name $ site.name"site.state.name ! "(Y 981 pho : pho LIFR Y 981)4: b2.name $ site.name"site.state.opposite name

! "(Y 981 depho : depho LIFR Y 981)5: b1.type $ site.state.name"c.name"site.name6: b2.type $ site.state.opposite name"c.name"site.name7: if site is in state at initial state then8: binder1.is visible and binder2.is hidden9: else

10: binder1.is hidden and binder2.is visible11: end if12: end for

Page 13: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

As Algorithms 5, 6 and 7 describe, each monomolecular event step is trans-lated into one sequential pi-process which is placed into the bio-process repre-senting the involved component, while each bimolecular event step is translatedinto two sequential pi-processes which are placed into the bio-processes repre-senting the involved components.

The constructed pi-processes consist in sequences of communications, hideand unhide operations (Algorithms 5 and 6). The names of bio-processes, pi-processes, beta binders and types follow a template, so that they are standard-ised, it is possible to refer to the previously defined beta binders, and no nameclash occurs. Reaction rates and types a"nities are assigned based on the in-put definitions. The constructed sequence of events represents a single eventstep, so the reaction rate is assigned to the first action, while the others areassigned infinite rates (so that after the first one occurs, then the others areimmediately executed). The actual sequence of actions depends on the reac-tion type. In the example, event 17 involves the phosphorylation of sites Y917and Y945 on OSMR, translated into a sequence of two pairs of hide/unhideactions. Moreover, the phosphorylations can occur only if OSMR is a dimer,hence the sequence is prefixed by an additional hide/unhide pair on the betabinder representing the dimer state. The final pi-process is, therefore, Pho 17 =hide(0.2, dimer) . unhide(inf, dimer) . hide(inf, Y 917 depho) . unhide(inf, Y 917 pho) .hide(inf, Y 945 depho) . unhide(inf, Y 945 pho) . Pho 17, and it is added to OSMRbio-process.

Algorithm 5 EventToPiprocess (event)1: if event.reaction is relocation then2: piproc $ hide(binder from) . unhide(binder to)3: else if event.reaction is phosphorylation then4: piproc $ hide(binder site depho) . unhide(binder site pho)5: else if event.reaction is dephosphorylation then6: piproc $ hide(binder site pho) . unhide(binder site depho)7: else if ... then8: end if9: if event.condition is specified then

10: piproc $ hide(binder cond) . unhide(binder cond) .piproc11: end if

The ordering of the events is given in the following way (Algorithm 7). If tworeactions are concurrent, they are translated into processes placed in parallelcomposition. If, instead, they have to be executed one after the other, the secondone is prefixed by one operation on a binder which is unblocked at the end ofthe first one. If, finally, they are mutually exclusive, both events are prefixed byone operation on a binder which is blocked at the end of the other one.

3.3 Case Study: the Translation of the Gp130 Signalling PathwayModel

A prototype of the tool has been developed and it is currently under integrationinto BetaWB. The model described in Sect. 2.1 was translated into Beta-binders

Page 14: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Algorithm 6 EventToPiprocesses (event)1: if event.reaction is phosphorylation then2: piproc1 $ binder event.id!"3: piproc2$ binder site depho() . hide(binder site depho) . unhide(binder site pho)4: AddA!nity (binder event.id, binder site depho)5: else if ... then6: end if7: if event.condition is specified on event.component1 then8: piproc1 $ hide(binder cond) . unhide(binder cond) .piproc19: end if

10: if event.condition is specified on event.component2 then11: piproc2 $ hide(binder cond) . unhide(binder cond) .piproc212: end if

Algorithm 7 NarrativeToPiprocesses1: for all ev ! Events do ! each event is one or two sequential pi-processes2: if ev is monomolecular then ! one pi-process is inserted into the bio-process

of the involved component3: pproc " EventToPiProcess (ev)4: if ev is alternative to prev ev then5: pproc " hide(binder prev ev blocked) . unhide(binder prev ev blocked) .pproc6: prev ev.pproc" hide(binder ev blocked) . unhide(binder ev blocked) .prev ev.pproc7: end if8: AddPiprocessInParallel (ev.component.bioproc, pproc)9: else if ev is bimolecular then ! one pi-processes is inserted into the bio-process

of each involved component10: # pproc1, pproc2 $ " EventToPiProcesses (ev)11: if ev is alternative to prev ev then12: if both ev and prev ev involve ev.component1 then13: pproc1" hide(binder prev ev blocked) . unhide(binder prev ev blocked) .pproc114: prev ev pproc " hide(binder ev blocked) . unhide(binder ev blocked) .prev ev pproc15: end if16: if both ev and prev ev involve ev.component2 then17: pproc2" hide(binder prev ev blocked) . unhide(binder prev ev blocked) .pproc218: prev ev pproc " hide(binder ev blocked) . unhide(binder ev blocked) .prev ev pproc19: end if20: end if21: AddPiprocessInParallel (ev.component1.bioproc, pproc1)22: AddPiprocessInParallel (ev.component2.bioproc, pproc2)23: end if24: end for

by using this prototype, and then simulated by using BetaWB. Figure 1 is aBetaDesigner screenshot showing the graphical visualisation and part of theimported Beta-binders code which has been obtained from the translation of theGp130 pathway model.

We do not present in this work any biologically relevant result: in order toachieve this final goal, more details on the pathway dynamics should be takeninto account and precise information on reaction rates should be acquired. More-over, some aspects of the translation should be improved for the tool to be prac-tically used for translation of complex models. This work was primarily meantto be a sort of feasibility study. We believe that the proposed language, togetherwith the automatic translation into Beta-binders and the existing simulator, al-lows the modeller to describe biological systems in simple words, simulate the

Page 15: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

Fig. 1. The Gp130 pathway Beta-binders model imported in BetaDesigner.

model, and obtain sensible results. The Gp130 model we have described in thiswork was developed by a biologist who had no previous experience in modelling.This gives us some confidence that our main goal, which is to have a user-friendlylanguage which biologists feel comfortable with, was achieved.

4 Conclusions and Further Work

The proposed modelling language and the automatic translation into a formallanguage allow us to hide the formal details from life scientists. Therefore, we be-lieve that life scientists could easily use the textual language to describe systems,and automatically obtain simulation and analysis results.

The choice of which primitives had to be included in our language has beendone to have a simple and basic set of events which could be described. The lan-guage and the tool can be extended with new constructs and some improvementson the already present ones can be done. For example, the fact that two bindersare used to model the active/inactive states of sites and proteins is not optimal,in terms of code readability, of simulation e"ciency and biological correctness.The usage of events [15] and biological transactions [4] to tackle this problem isunder investigation. We are also considering the suitability of extending Beta-binders with constructs representing conditions on the state of beta binders,which would greatly reduce the number of actions needed to translate condi-tions for sequential and alternative events.

Page 16: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

A formal comparison with other description languages will be done, and theinterchangeability with graphical notations should be taken into account.

Finally, we believe that in order to fully benefit of this approach an integra-tion with some of the many biological databases (i.e. the automatic extractionof data and parameters) would be much useful. In addition to this, another in-teresting aspect is the automatic extraction of information from the simulationoutput to be used again to refine the input model.

Acknowledgments. The authors wish to thank Nicholas Underhill-Day (Can-cer Research UK).

References

1. B. Alberts, A. Johnson, J. Lewis, M. Ra", K. Roberts, and P. Walter. Molecularbiology of the cell. Garland Science, 2002.

2. Mu"y Calder, Stephen Gilmore, and Jane Hillston. Modelling the Influence ofRKIP on the ERK Signalling Pathway Using the Stochastic Process Algebra PEPA.T. Comp. Sys. Biology, 7:1–23, 2006.

3. Luca Cardelli. On Process Rate Semantics. Available athttp://lucacardelli.name/Papers/On%20Process%20Rate%20Semantics.pdf,2007.

4. F. Ciocchetta and C. Priami. Beta-binders with Biological Transactions. TechnicalReport TR-10-2006. Technical report, The Microsoft Research - University ofTrento Centre for Computational and Systems Biology, 2006.

5. M. Hucka, A. Finney, H.M. Sauro, H. Bolouri, J.C. Doyle, and H. Kitano. The sys-tems biology markup language (SBML): a medium for representation and exchangeof biochemical network models. Bioinformatics, 19:524–531, 2003.

6. Boris N. Kholodenko. Cell signalling dynamics in time and space. Nature ReviewsMolecular Cell Biology, 7(3):165–176, 2006.

7. Hiroaki Kitano. A graphical notation for biochemical networks. Biosilico, 1(5):169–176, 2003.

8. K. W. Kohn. Molecular interaction map of the mammalian cell cycle control anddna repair systems. Molecular Biology of the Cell, 10:2703–34, 1999.

9. M. Kwiatkowska, G. Norman, D. Parker, O. Tymchyshyn, J. Heath, and E. Ga"ney.Simulation and verification for computational modelling of signalling pathways. InProc. Winter Simulation Conference, pages 1666–1675. Omnipress, 2006.

10. R. Milner. Communication and Concurrency. International Series in ComputerScience. Prentice hall, 1989.

11. Underhill-Day N. and Heath J. K. Oncostatin m (osm) cytostasis of breast tu-mor cells: Characterization of an osm receptor "-specific kernel. Cancer Research,66(22):10891–10901, 2006.

12. A. Phillips and L. Cardelli. A Correct Abstract Machine for the Stochastic Pi-calculus. In BioConcur ’04, Workshop on Concurrent Models in Molecular Biology,2004.

13. C. Priami and P. Quaglia. Operational patterns in Beta-binders. Transactions onComputational Systems Biology, 1:50–65, 2005.

14. Corrado Priami, Aviv Regev, William Silverman, and Ehud Shapiro. Applicationof a stochastic name-passing calculus to representation and simulation of molecularprocesses. Information Processing Letters, 80(1):25–31, 2001.

Page 17: An Automated Translation from a Narrative Language for ... · An Automated Translation from a Narrative Language for Biological Modelling into Process Algebra Maria Luisa Guerriero1,

15. A. Romanel, L. Dematte, and C. Priami. The Beta Workbench. Technical Re-port TR-03-2007. Technical report, The Microsoft Research - University of TrentoCentre for Computational and Systems Biology, 2007.