Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik...

242
Language Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184

Transcript of Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik...

Page 1: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Language Based Techniques forSystems Biology

Henrik Pilegaard

Kongens Lyngby 2007

IMM-PHD-2007-184

Page 2: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Technical University of Denmark

Informatics and Mathematical Modelling

Building 321, DK-2800 Kongens Lyngby, Denmark

Phone +45 45253351, Fax +45 45882673

[email protected]

www.imm.dtu.dk

IMM-PHD: ISSN 0909-3192

Page 3: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Summary

Process calculus is the common denominator for a class of compact, idealised,domain-specific formalisms normally associated with the study of reactive con-current systems within Computer Science. With the rise of the interaction-centred science of Systems Biology a number of bio-inspired process calculi havesimilarly been used for the study of bio-chemical reactive systems.

In this dissertation it is argued that techniques rooted in the theory and practiceof programming languages, language based techniques if you will, constitute astrong basis for the investigation of models of biological systems as formalisedin a process calculus. In particular it is argued that Static Program Analysisprovides a useful approach to the study of qualitative properties of such models.

In support of this claim a number of static program analyses are developed forRegev’s BioAmbients – a bio-inspired variant of Cardelli’s Ambient Calculusthat incorporates all features of Milner’s π-calculus:

The property of spatial reachability, which is related to the function of cellulartransport mechanisms, is addressed by two traditional Control Flow Analyses(CFAs). The simpler of the two, a mono-variant analysis (0CFA), is contextinsensitive, while the other, a poly-variant analysis (2CFA), is context-sensitive.These analyses compute safe approximations to the set of spatial configurationsthat are reachable according to a given model. This is useful in the qualitativestudy of cellular self-organisation and, e.g., the effects of receptor defects ordrug delivery mechanisms.

The property of sequential realisability. which is closely related to the function ofbiochemical pathways, is addressed by a variant of traditional Data Flow Anal-ysis (DFA). This so-called ‘Pathway Analysis’ computes safe approximations tothe set of reaction sequences that is realisable according to given model. This

Page 4: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

ii

is useful in the qualitative study of the metabolic pathways that emerge from agroup of connected biochemical agents.

Technically, these approaches are complementary, but the analyses all over-approximate the set of run-time enabled reactions. This is used in an iterativenarrowing scheme that achieves considerable synergy between CFA and DFA,and dramatically improves the results of both.

The specified analyses are proved correct with respect to the semantics of Bio-Ambients, and their strength is illustrated by application to abstract models ofbiological phenomena:

One is a model of the LDL degradation pathway, where it is shown that theanalyses are able to pinpoint the effects of certain genetic defects that are knownto be associated to cardiovascular disease.

The other is a model of genetic transcription that relies only on the π calculusfragment of BioAmbients.

In both cases the analyses compute very precise estimates of the temporal struc-ture of the underlying pathways; hence they are applicable across a family ofwidely used bio-ware languages that descend from Milner’s Calculus of Com-municating Systems.

The presented set of analyses constitutes a nice toolbox for the analysis of bi-ological models. The individual tools range in complexity from low polynomialto exponential, while the precision scales similarly. Thus, the toolbox may pro-vide useful information at all stages of a models lifetime, including development,where one is interested in frequent quick estimates, verification, and prediction,where one is willing to wait longer for more precise estimates.

Page 5: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Resume

Process kalkule fungerer som fællesbetegnelse for en klasse af kompakte ide-laiserede domæne specifikke formalismer, der normal associeres med studiet afreaktive systemer indendfor datalogien. I forbindelse med udviklingen af deninteraktionsorienterede videnskab, der nu kaldes systembiologi, er et antal biol-ogisk inspirerede process kalkuler ligeledes blevet brugt som basis for studiet afbiokemiske reaktive systemer.

I denne afhandling argumenteres for at sprog-basserede teknikker, der har deresrod i det teoretiske og praktiske fundament for programmeringssprog, udgøret stærkt udgangspunkt for analysen af modeller af biologiske systemer, derer formaliserede i en process kalkule. Særligt argumenteres der for, at statiskprogram analyse er et nyttigt værktøj til studiet af de kvalitative egenskaberved sadanne modeller.

For at understøtte denne pastand er der udviklet en række analyser til RegevsBioAmbients kalkule, der er en biologisk inspireret variant af Cardellis ambientkalkule, some desuden indeholder Milners π kalkule.

De rumlige egenskaber ved modeller, der er relevante for de cellulære transportmekanismer, angribes med to traditionelle kontrol flow analyser (CFA).Den sim-pleste af de to - en mono-variant analyse - er ikke sensitiv overfor kontekst, menden anden - en poly-variant analyse - er. De beregner begge sikre tilnærmelsertil mængden af spatielle konfigurationer, som ifølge modellen kan nas fra engiven starttilstand, Dette er nyttigt for det kvalitative studie af cellers selv-organisering og, for eksempel, effekten af defekte receptorer eller medicinalefremførings mekanismer.

De tidslige egenskaber ved modeller, der er relevante for biokemiske stier generelt,angribes med en variant af traditionel data flow analyse (DFA). Denne sakaldte

Page 6: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

iv

‘pathway analyse’ beregner sikre tilnærmelser til mængden af transitionssekvenser,som ifølge modellen kan realiseres af et givent system.

Teknisk set er disse tilgange komplæmentære, men begge overaproksimerermængden af reaktioner som modellens dynamik tillader. Dette udnyttes i eniterativ tilgang, som opnar betydelig synergi mellem CFA og DFA og dervedmarkant forbedrer resultatet af begge.

Det bevises at analyserne er korrekte med hensyn til sprogets (BioAmbients)formelle semantik og deres styrke illustreres ved brug pa abstrakte modeller afbiologiske fænomener:

Den ene modellerer den sekvens af biologiske operationer, der leder til optagelseog nedbrydelse af Low Density Lipo-protein i den den eukaryotiske celle. Deteftervises her, at analyserne er i stand til at udpege effekten af genetiske fejlkod-ninger, der vides at være forbundet med hjerte-/kar-sygdomme.

Den anden modellerer genetisk transkription pa et meget abstrakt niveau, ogkun ved brug af π kalkule delen af BioAmbients sproget. Det faktum, at anal-yserne i begge tilfælde er i stand til at uddrage meget præcise estimater af deunderliggende stiers tidslige struktur, viser at de kan bruges pa et spektrum afbiologisk inspirede kalkuler der stammer fra Milners Calculus of CommunicatingSystems.

De udarbejdede analyser udgør en slagkraftig værktøjskasse til brug ved anal-yse af modeller af biologigiske systemer. De individualle værktøjer spænderkompleksitetsmæssigt fra lav polynomiel til eksponentiel kompleksitet, menspræcisionen skallerer tilsvarende. Dermed kan værktøjskassen levere brugbarinformation i alle faser af en models liv, lige fra udviklingsfasen, hvor man er in-teresseret i hurtige estimater relativt ofte, over verfikationsfasen, til driftsfasen,hvor man i højere grad er villig til at vente pa præcise estimater.

Page 7: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Preface

This dissertation was prepared at the department of Informatics and Math-ematical Modelling, Technical University of Denmark, in partial fulfillment ofthe requirements for acquiring the Ph.d. degree in engineering. The Ph.d. studyhas been carried out under the supervision of Professor Flemming Nielson andProfessor Hanne Riis Nielson.

Acknowledgements. I am grateful to my supervisors, Flemming Nielson andHanne Riis Nielson, for their continuous support and guidance. Furthermore Ithank them, as well as the rest of the LBT group: Jorg Bauer, Han Gao,Christoffer Rosenkilde Nielsen, Sebastian Nanz, Christian W. Probst, TerkelTolstrup, Ye Zhang, and Fan Yang, for providing an inspiring and motivatingworking environment.

A special thanks goes to Terkel Tolstrup for being a good source of inspirationand alternative viewpoints, as well as a good friend, both during our threeyears of sharing an office and after. I also thank Terkel Tolstrup, ChristofferRosenkilde Nielsen, and Christian Probst for commenting on various fragmentsof this dissertation.

I thank Professor Mila Majster Cederbaum and Verena Wolf, Univesitat Mann-heim, for enthusiastically welcoming me into their group and making the ar-rangements for a visit that was, regrettably, cancelled for health related reason.

Finally, I am in debt to family and friends, especially Anette and Leonora – thegems of my life, for their patience and support, without which I could not havefinished this dissertation.

Lyngby, July 2007

Henrik Pilegaard

Page 8: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

vi

Page 9: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Contents

Summary i

Resume iii

Preface v

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Preliminary Conclusion . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . 8

I Setting the Scene 11

2 The Eukaryotic Cell 13

2.1 Cellular Information Processes . . . . . . . . . . . . . . . . . . . 14

Page 10: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

viii CONTENTS

2.2 Cellular Organisation . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Modelling in Process Calculus 27

3.1 The BioAmbients Modelling Language . . . . . . . . . . . . . . . 28

3.2 Semantics of BioAmbients . . . . . . . . . . . . . . . . . . . . . . 38

3.3 CASE: Modelling the LDL Degradation Pathway . . . . . . . . . 44

3.4 CASE: Modelling Genetic Transcription . . . . . . . . . . . . . . 47

3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 Static Analysis Techniques 53

4.1 Order Theoretic Preliminaries . . . . . . . . . . . . . . . . . . . . 55

4.2 Monotone Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 58

4.3 Flow Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 70

II Analysing for Structural Properties 71

5 Well-formed Programs and Their Properties 73

5.1 Free Names and Identifiers . . . . . . . . . . . . . . . . . . . . . . 74

5.2 Well-formed and Initial Programs. . . . . . . . . . . . . . . . . . 76

5.3 Substitution of Identifiers and Names. . . . . . . . . . . . . . . . 79

5.4 Properties of Programs . . . . . . . . . . . . . . . . . . . . . . . . 81

5.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 85

Page 11: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CONTENTS ix

6 Context Insensitive Control Flow Analysis 87

6.1 Concurrently Possible Capabilities . . . . . . . . . . . . . . . . . 88

6.2 Control Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3 CASE: Analysing the LDL Degradation Pathway . . . . . . . . . 103

6.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Context Sensitive Control Flow Analysis 109

7.1 The Spatial Shape of Static Scope . . . . . . . . . . . . . . . . . 111

7.2 Relevant Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.3 Relevant Prefixes and Ambient Roles . . . . . . . . . . . . . . . . 118

7.4 Control Flow Analysis . . . . . . . . . . . . . . . . . . . . . . . . 121

7.5 CASE: Analysing the LDL Degradation Pathway . . . . . . . . . 136

7.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 138

III Analysing for Causal Properties 141

8 Pathway Analysis 143

8.1 Extended Multisets . . . . . . . . . . . . . . . . . . . . . . . . . . 145

8.2 Computing and Preserving Exposed Prefixes . . . . . . . . . . . 146

8.3 Constructing the Automaton . . . . . . . . . . . . . . . . . . . . 156

8.4 CASE: Analysing the LDL Degradation Pathway . . . . . . . . . 164

8.5 CASE: Analysing Genetic Transcription . . . . . . . . . . . . . . 165

8.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 166

Page 12: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

x CONTENTS

9 An Iterative Analysis 169

9.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

9.2 CASE: Analysing the LDL Degradation Pathway . . . . . . . . . 172

9.3 CASE: Analysing Genetic Transcription . . . . . . . . . . . . . . 176

9.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 178

10 Conclusion 179

10.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

10.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 181

10.3 Conclusion and Further Work . . . . . . . . . . . . . . . . . . . . 182

A Variants of the LDL Degradation Pathway 185

A.1 The LDL Pathway with Normal Receptors . . . . . . . . . . . . . 186

A.2 The LDL Pathway with Defects in Exoplasmic Domain . . . . . 187

A.3 The LDL Pathway with Defects in Cytosolic Domain . . . . . . . 188

B Analysis Results for the LDL Degradation Pathway 189

B.1 Analysis Results for the 0CFA . . . . . . . . . . . . . . . . . . . . 190

B.2 Analysis Results for the 2CFA . . . . . . . . . . . . . . . . . . . . 194

B.3 Analysis Results for the Iterative 0CFA . . . . . . . . . . . . . . 199

B.4 Analysis Results for the Iterative 2CFA . . . . . . . . . . . . . . 202

C Analysis Results for Genetic Transcription 207

C.1 Ordinary 2CFA - One Gene . . . . . . . . . . . . . . . . . . . . . 208

Page 13: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CONTENTS xi

C.2 Iterative 2CFA - One Gene . . . . . . . . . . . . . . . . . . . . . 209

C.3 Ordinary 2CFA - Two Genes . . . . . . . . . . . . . . . . . . . . 210

C.4 Iterative 2CFA - Two Genes . . . . . . . . . . . . . . . . . . . . . 211

Page 14: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

xii CONTENTS

Page 15: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

List of Tables

3.1 The syntax of BioAmbients processes, P . . . . . . . . . . . . . . 35

3.2 Syntax of BioAmbients capabilities, M . . . . . . . . . . . . . . . 36

3.3 Heating relation, P ⇛ Q, on processes. . . . . . . . . . . . . . . . 39

3.4 The reaction relation, Pℓ

−→ Q, on processes. . . . . . . . . . . . 41

3.5 Abstract model of the LDL degradation pathway. . . . . . . . . . 46

3.6 Abstract model of genetic transcription. . . . . . . . . . . . . . . 48

4.1 Maximal Fixed Point algorithm for Monotone Frameworks. . . . 63

4.2 Syntax of the Alternation-free Least Fixed Point Logic. . . . . . 68

4.3 Constraint generation algorithm for Flow Logics. . . . . . . . . . 69

5.1 Bound names, bn(M), and free names, fn(M), of capabilities M . 75

5.2 Free names, fnΓfn(P ), of processes, P . . . . . . . . . . . . . . . . . 75

5.3 The free process identifiers, fpi(P ), of processes, P . . . . . . . . . 76

Page 16: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

xiv LIST OF TABLES

5.4 Well-formedness, C ⊢ΓfnP , of a process, P , with respect to a set

of constants, C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.5 Substitution, P [Q/X ], of a process Q for an identifier X in aprocess P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.6 Substitution, P [m/x], of a constant m for a name x in a process P . 80

6.1 Occurring capabilities, capΓcap(P ), of a process P . . . . . . . . . . 88

6.2 Concurrently possible capabilities, CPΓcap,∆CP(P ), of a process P . 89

6.3 The 0CFA acceptability judgement. . . . . . . . . . . . . . . . . . 93

6.4 0CFA closure conditions for movement. . . . . . . . . . . . . . . 95

6.5 0CFA closure conditions for communication. . . . . . . . . . . . . 97

6.6 0CFA constraint generation algorithm. . . . . . . . . . . . . . . . 103

7.1 Ambient roles, ambΓamb(P ), occurring in a process, P . . . . . . . 111

7.2 Static scopes, SCPΓamb,∆SCP(P ), of a process, P . . . . . . . . . . . 112

7.3 Free variables, fvΓfv(P ), of a process P . . . . . . . . . . . . . . . . 115

7.4 Relevant variables, RVΓfv,Γcap,∆RV(P ), of a process, P . . . . . . . . 115

7.5 Relevant prefixes and ambients, RPAΓcap,Γamb,∆RPA(P ), of P . . . . . 119

7.6 2CFA acceptability judgement, (I,R,F) |=։

µ P . . . . . . . . . . . 124

7.7 2CFA closure conditions for movement. . . . . . . . . . . . . . . 126

7.8 2CFA closure conditions for communication. . . . . . . . . . . . . 129

7.9 2CFA closure condition for propagation of variable bindings. . . . 131

7.10 Generation of 2CFA constraints. . . . . . . . . . . . . . . . . . . 135

8.1 Exposed capabilities, EΓE[[P ]], of a process, P . . . . . . . . . . . 146

Page 17: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

LIST OF TABLES xv

8.2 Generated capabilities, GΓE ,∆G[[P ]], of a process, P . . . . . . . . 149

8.3 Killed capabilities, K∆K[[P ]], of a process, P . . . . . . . . . . . . 152

8.4 Maximal Fixed Point algorithm for the Pathway Analysis. . . . . 158

8.5 Procedure, update(qs, ℓ, E), for updating states of pathway au-tomata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

8.6 Procedure, clean-up(Q,W, δ), for eliminating dead states. . . . . . 161

9.1 Parameterised 0CFA clause generator for iteration. . . . . . . . . 170

9.2 Iterative analysis algorithm. . . . . . . . . . . . . . . . . . . . . . 171

Page 18: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

xvi LIST OF TABLES

Page 19: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

List of Figures

1.1 The goal of modelling based approaches. . . . . . . . . . . . . . . 2

2.1 Protein-protein interaction. . . . . . . . . . . . . . . . . . . . . . 15

2.2 DNA double helix. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Flow of information in biological systems. . . . . . . . . . . . . . 18

2.4 The transcription of genes [Jon07]. . . . . . . . . . . . . . . . . . 18

2.5 Post-transcriptional processing of pre-mRNA into mRNA. . . . . 19

2.6 Structure of the cell membrane. . . . . . . . . . . . . . . . . . . . 20

2.7 Structure of eukaryotic cells [Vil07a]. . . . . . . . . . . . . . . . . 22

2.8 LDL degradation pathway . . . . . . . . . . . . . . . . . . . . . . 25

3.1 Communication styles of the BioAmbients calculus. . . . . . . . . 36

3.2 Movement styles of the BioAmbients calculus. . . . . . . . . . . . 36

3.3 Annotated LDL degradation pathway. . . . . . . . . . . . . . . . 44

Page 20: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

xviii LIST OF FIGURES

4.1 The nature of approximation. . . . . . . . . . . . . . . . . . . . . 54

5.1 The hierarchy of notions that defines well-formedness. . . . . . . 74

6.1 0CFA closure condition for enter movement. . . . . . . . . . . . . 96

6.2 0CFA closure condition for local communication. . . . . . . . . . 98

6.3 0CFA analysis of normal receptor LDL pathway model. . . . . . 104

6.4 0CFA analysis of defect receptor LDL pathway models. . . . . . 105

7.1 Schematic view of eukaryotic cell. . . . . . . . . . . . . . . . . . . 122

7.2 2CFA closure condition for enter movement. . . . . . . . . . . . . 128

7.3 2CFA closure condition for local communication. . . . . . . . . . 130

7.4 2CFA analysis of normal receptor LDL pathway model. . . . . . 136

7.5 2CFA analysis of defect receptor LDL pathway models. . . . . . 137

8.1 Pathway analysis of normal receptor LDL model. . . . . . . . . . 164

8.2 Pathway analysis of defect receptor LDL models. . . . . . . . . . 165

8.3 Pathway analysis of genetic transcription model. . . . . . . . . . 166

9.1 Iteration of 0CFA and Pathway Analysis — normal receptors. . . 172

9.2 Iteration of 2CFA and Pathway Analysis — normal receptors. . . 173

9.3 Iteration of 0CFA and Pathway Analysis — exo-plasmic defects. 174

9.4 Iteration of 2CFA and Pathway Analysis — exo-plasmic defects. 174

9.5 Iteration of 0CFA and Pathway Analysis — cytosolic defects. . . 175

9.6 Iteration of 2CFA and Pathway Analysis — cytosolic defects. . . 175

Page 21: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

LIST OF FIGURES xix

9.7 Iterative analysis — transcription of single gene. . . . . . . . . . 176

9.8 Iterative analysis — transcription of two genes. . . . . . . . . . . 177

Page 22: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

xx LIST OF FIGURES

Page 23: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 1

Introduction

“How much broader our view of life would be if we could study itthrough reducing glasses.”

— Louis Bolk

Throughout time the concept of life has been one of the most fundamentalmysteries of every human society. And, even though a great number of highlymotivated people, some of whom remarkably clever, have spend entire lifetimespondering the subject, it has remained elusive to us.

This might be about to change. The technological revolution of the last decadeshas drastically accelerated the traditional reductionist bottom-up approach toLife Sciences. As a result the bottlenecks of Life Sciences now lie beyond thedata acquisition phase, where massive amounts of data are now easily col-lected by cell-wide measurements of transcriptomes, proteomes, metabolomes,etc. [vSvdH04].

The first main insight spawned by these post-genomics approaches is that noobservable property of the cell emerges as a simple function of a single geneor protein. Rather it has become evident that individual biochemical entitiesmerely enact roles within one or more elaborate clusters of entities that interactcooperatively in a systematic manner in order to cause the observable behaviour.Thus, the main challenge now is to identify and describe systems, rather thanindividual molecules [vMJ+07].

This challenge has been accepted by the emerging science of Systems Biology,which aims squarely for a holistic understanding of life. The basic tenet ofSystems Biology is that in order to understand life we must understand both

Page 24: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

2 Introduction

Reality

Observ

atio

n

Insig

ht

Result

AnalysisModel

Figure 1.1: The goal of modelling based approaches.

the qualitative and the quantitative aspects of the underlying systems [Kit02].

For a fraction of Systems Biology, known as Computational Biology – the areawhere this dissertation contributes – the goal is the scenario depicted in Fig. 1.1,where in vivo/in vitro experiments, such as those involved in drug development,are carried out in silico. Basically, we hope to compute important propertiesof the connectivity and self-organisation of systems from mathematical modelsthat incorporate our knowledge about the individual constituents and the com-munication and self-organisation mechanisms that are available to them [van05].

An open issue in the modelling approach of Computational Biology is thatof representation. In recent years, however, it has become increasingly clearthat reactive systems, which have been studied intensively within the ComputerScience area of Concurrency Theory for 25 years, are in many ways similar tothe systems of biology. In both cases the systems of interest are composed fromcollections of independent, yet somehow connected, agents, either primitive orthemselves systemic, the interactions of which give rise to the global behaviourof the systems [RS02].

Thus it is hardly surprising that Process Calculi, a class of elegant algebraicdomain-specific modelling languages that has been tremendously successful inthe study of reactive systems, is now also widely applied, and studied, in thecontext of Computational Biology [Reg03, Car05b, DL04, PQ04].

These languages are intensional rather than extensional; hence process calculusmodels are operational in the manner of computer programs — a fact that mightbecome a crucial factor when making the step from drug discovery to drug design[Car04]. Process calculi are also compositional in the sense that the propertiesof a model emerges, in a well-understood manner, from the properties of theimmediate constituents and the method of composition [Mil99] — a featurethat is very desirable both from a modelling and analysis perspective. Finally,

Page 25: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

3

while it is common for process calculi to focus on qualitative aspects, it is well-understood how to integrate quantitative information [Hil96]. This can even bedone in a totally orthogonal manner, which enables a convenient separation ofconcerns [Her02].

However, process calculi are also Turing complete, i.e., powerful to the pointwhere many, not to say most, interesting properties of models cannot be com-puted within finite time and memory [Tur36] — a feature that is hardly ac-ceptable to a research field like Systems Biology, which relies heavily on fullyautomated analysis.

In order to circumvent this issue, computational biologists often defer to simu-lation in favour of formal verification, e.g., [Reg03, PC04]. In a sense this is afairly direct approach to the investigation of models, as it relies on execution ofthe underlying formal semantics. Usually, however, the method relies on sam-pling, rather than exhaustive investigation, and hence it provides probabilitiesrather than formal assurances.

Another option is to go the last mile and perform exhaustive simulation. Thisis the approach of finite state Model Checking [CGP00], which often works wellfor state spaces of moderate size, but, due to the so-called state space explosion,becomes intractable for large state spaces and undecidable for infinite ones.Thus, in the presence of complicated models this approach often fails to provideformal assurances.

Abstraction is the key to such problems. This was realised decades ago withinthe Computer Science area of Compiler Construction. Traditionally, optimisingcompilers rely on the fully automatic, but approximative, approach of StaticProgram Analysis in order to provide the assurances required in order to safelyoptimise a program into another, more efficient program, while preserving thedesired extensional behaviour [NNH99]. The basic tenet of Static ProgramAnalysis is that a loss in precision often makes a property decidable. Thus,the fundamental challenge of the approach is to achieve an appropriate balancebetween efficiency and precision.

Main Thesis. This leads to the main thesis of this dissertation:

The approximative approach of Static Program Analysis can be usedto automatically decide biologically relevant properties of processcalculus expressions that model biological systems.

In principle, any property that can be used to pinpoint errors during modeldevelopment or predict the consequences of perturbations qualifies as interesting.

Page 26: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

4 Introduction

For the purpose of this dissertation, however, the focus will be on the qualitativeaspects of the spatial and temporal structure of models.

1.1 Background

In order to investigate this thesis the dissertation combines the following threeelements:

An Object Domain. The object of interest to this dissertation is the eukary-otic cell – the smallest living component of mammalian organisms – as well asits various subsystems. This constitutes a homeostatic system with two mainfeatures:

The cell is an amazingly flexible reactive system that quickly and precisely re-sponds to a huge array of environmental stimuli. This ability is intimatelyconnected with a highly evolved information processing capability. The under-lying processing plant is implemented by a set of tightly connected regulatorynetworks.

The cell is a robust self-organising system that maintains an almost infalliblespatial separation between the components of external and, various, internalenvironments. The underlying compartment system constitutes an elaboratenetwork of bio-membranes that define compartment boundaries. The essentialdifferences between the fluids of the various compartments are maintained by acollection of transport networks.

These aspects emerge as high-level behaviours from highly connected systemsof interacting bio-chemical agents. Each of the networks comprises a number ofmetabolic pathways that constantly produce, modify, and degrade bio-chemicalagents in a selective manner.

A Modelling Formalism. Process calculi have established themselves astools of choice for the modelling and analysis of concurrent and reactive sys-tems. The use of process algebraic formalisms outside of traditional computingcontexts is a growing trend and, in particular, the emerging area of SystemsBiology has received a lot of attention from the process algebra community.

This dissertation commits to this movement by using a variant of the Bio-Ambients calculus [RPS+04] as the basis for modelling and analysing sub-cellular biological systems. The BioAmbients calculus is a descendant of theCCS [Mil80] family that extends both the π calculus [Mil99] and the ambient

Page 27: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Contributions 5

calculus [CG00]. It retains a notion of spatial boundaries, called ambients, fromthe ambient calculus, which allows for models that preserve the spatial struc-ture of the bio-domain. Furthermore, a notion of interaction capabilities, in thestyle of the synchronous π calculus, allows the modelling of biological reactionsbetween sub-systems.

Thus, the calculus is equally well suited for the modelling of regulatory networks,cellular transport networks, and the underlying metabolic pathways.

A Formal Approach to Model Analysis. The notion of approximation ispervasive in the application of mathematics in the analysis of real-world phenom-ena. If a model is too strong, seemingly simple properties become intractable,and, if it is too weak, the computable properties are of no interest. Thus, engi-neers and scientists alike habitually walk the edge of Ockham’s razor and intro-duce simplifying assumptions in order to strike sensible compromises betweenthe accuracy of estimates and the tractability of computation.

In the context of computer programs, however, such simplifying assumptions areof no help as programs are normally intended to do exactly what the instruc-tions describe. The solution proposed by Static Program Analysis [NNH99]is to incorporate the approximation into the analysis techniques rather thanthe models. The result is a set of techniques that systematically perform ap-proximations in order to calculate some aspect of the behaviour of a model byinspection, rather than execution, of the program code.

For this approach to work analyses are usually required to be 1) correct withrespect to the formal semantics, such that verification of a property impliesassurance, 2) exhaustive in order not to restrict the programmer, 3) fully au-tomatic, in order to pose minimal requirements on the programmer, and 4)efficient, in order to handle the complexity of more realistic models.

It is exactly this set of required features that makes the approach attractive inthe context of Computational Biology also.

1.2 Contributions

In this context this dissertation contributes a number of static analyses in sup-port of the main thesis. It is proved that the specified analyses are both exhaus-tive and correct with respect to BioAmbients models. Furthermore, it is shownthat the analyses admit the computation of best analysis results and thus areimplementable; all results presented in this dissertation have been automatically

Page 28: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

6 Introduction

computed by working prototypes.

Mono-variant Control Flow Analysis. In accordance with the ordinaryapproach of Flow Logic based program analysis, the first analysis is a contextinsensitive or mono-variant Control Flow Analysis (0CFA [Shi88]) for the Bio-Ambients language. The analysis aims to safely over-approximate the set ofspatial configurations that are reachable from a given model. In order to do soit keeps track of the contents of ambients and their abilities for participating invarious interactions.

Thus, the focus of the approximation is on the spatial hierarchy establishedby the nesting of ambient boundaries. The analysis takes the perspective thata nesting hierarchy is a tree, but only approximates the shape of this tree bya binary relation I between ambients and sub-ambients or capabilities, i.e.,(a, b) ∈ I means that the ambient or capability b occurs inside the ambient a.

Analysis results are considered acceptable if the associated relations, withinthe limited precision of the relational representation, a) faithfully represent theinitial configuration, and b) are closed with respect to conditions that mimicthe semantics. In this respect the approach is similar to Abstract Interpretation[CC77].

Parts of this work were previously presented in [NNPdR07].

Poly-variant Control Flow Analysis. The second analysis is a context sen-sitive or poly-variant Control Flow Analysis (2CFA [Shi88]) for the BioAmbientslanguage. The aims and means of the analysis are similar to those of the afore-mentioned 0CFA.

However, where the 0CFA approximates trees in terms of a binary relation the2CFA does it in terms of a quaternary relation, where (a, b, c, d) ∈ I means thatd is inside c, c is inside b, and b is inside a — or d is inside c in the context ofa, b.

This context information makes the analysis context sensitive, i.e., able to differ-entiate between (sub-)configurations that are possible in different settings. Notonly does this, in itself, yield a more precise representation, but it also allowsfor closure conditions that mimic the semantics much more closely.

Parts of this work were previously presented in [PNN06b].

Pathway Analysis. The CFA analyses safely approximate the set of spatialconfigurations that may arise at run-time. In contrast they produce no infor-mation about the sequential order of the transitions that lead to the recorded

Page 29: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Contributions 7

configurations.

This motivates the development of the third analysis: a flow sensitive PathwayAnalysis that aims to safely over-approximate the set of reaction sequences thatare realisable by a given model. In order to do so, it focuses on the notion ofexposed prefixes, i.e., the capability prefixes that might participate in the nextreaction.

Technically, this analysis resorts to techniques normally associated with classicalData Flow Analysis [KU77]. The analysis takes the perspective that a state,i.e., a process, is an extended multiset of exposed prefixes. It then computesa (partially) Deterministic Finite Automaton, that safely approximates the setof realisable transition sequences, by tracking how these multisets change whenreactions occur.

Parts of the work presented here have also been submitted as part of [PNN07].

Iterative Analysis. At first sight the two types of analyses are quite different:The CFA analyses approximate the spatial structure of models, and, in contrast,the Pathway Analysis approximates the temporal structure.

These differences, however, are not as profound as they might appear. Clearly,the analyses are concerned with different aspects of the same thing, namely therun-time behaviour of BioAmbients processes, and, regardless of the analysisapproach, this is closely related to the realisable reactions, i.e., the set of reac-tions that might occur. Indeed, both the CFAs and the Pathway Analysis relyon preliminary safe estimates of this set in order to improve precision, and bothof them compute new, safe – but smaller – estimates as part of the analysisresults.

Due to their inherent differences, the two types of analysis often produce differ-ent estimates of the realisable reactions. This is used in an iterative narrowingscheme that alternates the application of a CFA and the Pathway Analysis untilthe estimate of realisable reactions reaches a fixed point.

While this is remarkably simple, it improves the precision of both CFAs andPathway Analysis dramatically.

This work has not been presented previously.

Case Studies. The analyses are evaluated in the context of two case studiesthat address very different aspects of the eukaryotic cell.

The first case study deals with the internalisation of Low Density Lipo-protein,

Page 30: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

8 Introduction

and is thereby related to cellular self-organisation. The associated model is fairlyconcrete and emphasises the spatial features as well as the receptor dynamicsof the LDL degradation pathway. The modelled system is related to knowncardiovascular diseases; this plays a key role in the investigation of the model,which is analysed with the full battery of static analyses.

The second case study deals with genetic transcription, and is thus related tocellular information processing. The associated model is very abstract and reliesonly on the π calculus fragment of BioAmbients. Due to the omission of spatialproperties the model is primarily investigated with the Pathway Analysis.

The former model and parts of the associated analytic investigation was previ-ously treated in [PNN05]. The latter model is new.

1.3 Preliminary Conclusion

Can Static Analysis decide interesting properties of biological systems?

The conclusion, based on the work and results to be presented in the following, isthat yes, it can. The established battery of analyses, which ranges in complexityfrom (low) polynomial to exponential, is able to precisely, and with increasingaccuracy, pinpoint the most essential aspects of the studied case models.

In particular, the analyses succeed in pinpointing the effects of certain geneticdefects, known to be associated with cardiovascular disease, from the model ofthe LDL degradation pathway. Furthermore, the analyses are able to extractvery precise estimates of the metabolic pathways that emerge from both models;a fact that indicates that the analyses are applicable across an entire family ofwidely used bio-ware languages that descend from Milner’s CCS.

Thus the set of analyses presented in this dissertation constitutes a strong toolthat can both support the development of models and automate significant partsof their post-modelling analysis. The former point in particular is supported bythe, subjective, practical modelling experiences of the author.

1.4 Dissertation Outline

The present dissertation contains ten chapters, of which this introductory chap-ter is the first. The remaining nine chapters are organised into three parts:

Page 31: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Dissertation Outline 9

Part I is mainly concerned with background material.

Chapter 2 examines the main features of the eukaryotic cell – including the twophenomena that are modelled and analysed in later chapters: the process ofgenetic transcription and the LDL degradation pathway.

Chapter 3 explains the central notions of process calculi in a biological con-text, and, in particular, the ’biological compartment as computational ambient’abstraction of the BioAmbients modelling language. Furthermore, the chapterpresents the two case models to be analysed in later chapters.

Chapter 4 introduces some well established approaches to static analysis —both the classical Monotone Frameworks approach to Data Flow Analysis andthe more recent Flow Logic approach to Control Flow Analysis.

Part II is concerned with the use of Control Flow Analysis (CFA) techniquesfor approximating spatial properties of BioAmbients models.

Chapter 5 formalises some of the subtler concepts of the BioAmbients calculusand defines a notion of well-formed programs.

Chapter 6 specifies a classic context-independent Control Flow Analysis (0CFA),which safely over-approximates the set of reachable spatial configurations of anywell-formed BioAmbients program, and submits the model of the LDL degra-dation pathway to analysis.

Chapter 7 presents a context-dependent Control Flow Analysis (2CFA), whichconstitutes an improvement of the 0CFA, and submits the model of the LDLdegradation pathway to analysis.

Part III is concerned with the use of Data Flow Analysis (DFA) techniques forapproximating temporal properties of BioAmbients models.

Chapter 8 introduces the Pathway Analysis, a DFA that safely over-approximatesthe set of realisable causal sequences of any well-formed BioAmbients program,and submits both of the case models to analysis.

Chapter 9 shows how the developed CFAs and DFA can be combined into asingle iterative analysis and submits both of the case models to analysis.

Finally, Chapter 10 summarises, concludes, and presents the perspectives forfurther work.

Page 32: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

10 Introduction

Page 33: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Part I

Setting the Scene

Page 34: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics
Page 35: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 2

The Eukaryotic Cell

“It is astonishing to think that this remarkable piece of machinery,which possesses the ultimate capacity to construct every living thingthat ever existed on Earth, from giant redwood to the human brain,can construct all its own components in a matter of minutes andweighs less than 10−16 grams. It is of the order of several thousandmillion million times smaller than the smallest piece of functionalmachinery ever constructed by man.”

— Michael Denton

In this chapter we review the main features of the eukaryotic cell. The goal is toestablish a rudimentary understanding of the biological domain. In order to fixa universe of discourse for the later technical developments we shall primarilybe interested in the qualitative aspects of cellular information processes andcellular transport. The material covered can be found in any good textbook onthe molecular biology of the cell, such as [AJL+02] or [LBZ+99], and containsno original contributions.

We start, in Section 2.1, with a brief note on the nature of biochemical interac-tions. Then we give an overview of cellular information processing and introducethe underlying bio-chemical polymer plant. This material will help to appreciatethe basic molecule-as-reactive-process abstraction of later chapters. Finally, inSection 2.2, we describe the organisational hierarchy of eukaryotic cells and dis-cuss the underlying self-organising compartment technology. This material willhelp to appreciate the more advanced compartment-as-computational-ambientabstraction of later chapters.

Page 36: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

14 The Eukaryotic Cell

2.1 Cellular Information Processes

The cell is a reactive system. This ability is intimately connected with a highlyevolved information processing capability. The underlying processing plant isimplemented by a highly connected system of interacting bio-polymers — theclass of polymers that are produced by living organisms.

Molecular Complementarity. The notion of molecular complementarity iscentral in all chemical interactions. Like most relationships, the bonds formed byatomic and molecular entities are based on the mutual satisfaction of interests.

Some of the formed relationships are very strong and may form stable moleculesfrom otherwise separate entities. This is the case for covalent bonds caused bythe sharing of electrons between atoms of complementary valence.

Other relationships are weaker and may form stable or less stable molecularcomplexes. Ionic bonds are caused by electrostatic forces between atoms ofcomplementary charge, i.e., one is electronegative and the other electropositive.Hydrogen bonds are caused by (weaker) electrostatic forces between an elec-tronegative atom and an (eletropositive)) hydrogen atom bound in a dipolarconstellation to another electronegative atom. Finally, van der Waals interac-tion is a weak unspecific attractive force between atoms in close proximity.

Water, the solvent of the cell, is an example of a dipole; it exhibits a smallpositive charge near the hydrogen atoms and a small negative one near theoxygen. The water solubility of molecules is determined by their electrochemicalproperties: Charged molecules are generally soluble, whereas the solubility ofuncharged molecules is determined by their ability to form hydrogen bonds withwater molecules. Hydrophobic (water insoluble) molecules tend to associatetightly when submerged in water. This is called the hydrophobic effect.

2.1.1 Bio-molecular Agents and their Interactions

For the small molecules typically found in ordinary solution chemistry it is easyto predict the effects of these various forces. This is different, however, for bio-molecular agents that are mostly long polymeric chains composed of smallermonomeric units with individual electrochemical properties. The complicatedstructure of these entities results in a very involved notion of molecular comple-mentarity that induces a high degree of binding specificity. This is one of thefeatures that most distinguish biochemistry from typical solution chemistry.

Page 37: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Information Processes 15

Figure 2.1: Protein-protein interaction.

Three types of bio-polymers play a central role in cellular information processing:

Proteins. The proteins are the main actors of the cellular metabolism, i.e.,the set of chemical reactions that occur in the living cell. They are sequencesof amino acids, i.e., simple acids consisting of a residue, an amino group, and acarboxylate group. About 20 different amino acids occur in living organisms.

The properties of any protein are determined by the sequence of amino acidsfrom which it is composed. This sequence constitutes the primary structure ofthe protein.

In turn, the properties of each individual amino acid are determined by its acidresidue, which may be non-polar, basic, acid, or uncharged polar. The diversityof these properties allows non-trivial interaction patterns to emerge within theprimary structure. In particular, hydrogen bonds may form between certainresidues and cause stable localised foldings, such as α-helices, β-sheets, andturns. The resulting spatial arrangements constitute the secondary structure ofthe protein.

The properties of the resulting chain of secondary structure elements are largelydetermined by the properties of the individual substructures. Some sectionsmay be hydrophilic (water soluble), and others hydrophobic. When submergedin water, proteins seek the most stable conformation and invariably fold tohide hydrophobic sections and expose the hydrophilic ones. This is an exampleof the hydrophobic effect. Internal formation of strong or, more commonly,weak bonds further stabilises the folded structures. The resulting 3-dimensionalconformations constitute the tertiary structure of the protein.

The properties of a protein that has folded into a stable conformation are deter-mined by the resulting solvent-accessible surface. The physical and electrochem-ical contours of the surface are characteristic for the protein. Two folded poly-mers may exhibit surface areas of (nearly) complementary contour and thesebinding sites then allow them to interact biochemically. If the binding sites

Page 38: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

16 The Eukaryotic Cell

match well they allow the formation of many stabilising bonds. The moleculesthen have a high affinity for one another and may form stable structures calledcomplexes or coordination compounds. The number and organisation of subunitsin such a compound constitutes the quaternary structure. If the interaction is abrief reaction and the protein is a catalyst of change in the other molecule wecall it an enzyme. In contrast, if the protein is changed by the interaction wecall it a substrate.

DNA. Deoxyribonucleic Acids are the main carriers of cellular hereditary in-formation. They are sequences of nucleotides, i.e., monomeric units consisting ofa base, a sugar, and one or more phosphate groups. In DNA the sugar is alwaysa deoxyribose and the bases are Adenine, Guanine, Cytosine, and Thymine.

Every cell has a repository of hereditary information stored in DNA. The small-est unit of heredity is the gene. The collection of all genes present in a cell isthe genome and, under normal circumstances, this is an extremely stable entity.

Figure 2.2: DNA double helix [Jon07].

This stability owes to the structure of DNA. The polymer sequence is formedas the sugar phosphates are linked up sequentially to form the backbone. Fromthis backbone the bases protrude as a sequence of stubs. The bases are prone tothe formation of hydrogen bonds between pairs (A-T and C-G). Thus, comple-mentary strands can base-pair and link up in anti-parallel to form a very stableduplex shaped like a double helix. Genetic information is invariably stored inthis form.

It is usual to represent DNA as strings over the corresponding four letter al-phabet: A,C,G,T. The sequence is directional; one end is denoted by 5′ and theother by 3′ – denotations that reveal the positions of recognisable terminatorson the sugar-ring of the first and last molecule synthesised, respectively. Thus,DNA and RNA are invariably synthesised in the direction of 5′ → 3′, which iscalled downstream. The opposite direction is upstream.

Page 39: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Information Processes 17

RNA. Ribonucleic Acids are the main facilitators of interaction between pro-tein and DNA/RNA. They are nucleic acids where the sugar is a ribose and thebases are adenine, cytosine, guanine, and Uracil.

Structurally, RNA is very similar to DNA. In contrast to DNA, however, itmostly occurs as comparatively short single-stranded chains of nucleotides and,due to the extra hydroxyl group of ribose, it is more prone to hydrolysis. Con-sequently RNA is relatively unstable and tends to fold into more stable (3-dimensional) conformations in order to stabilise. Folded RNA molecules oftenforms mixed ribonucleoprotein (RNP) complexes with proteins.

Thus, RNA strands can act both as information carriers (coding RNA) andfunctional units (non-coding RNA):

Coding RNA is synonymous with messenger RNA (mRNA). This type of mole-cules carry transcripts of genes that encode proteins from the genome to theribosomes, where the proteins are produced in accordance with the transcribedinformation.

The major types of non-coding RNA also play important roles in the transferof information from DNA to proteins. Ribosomal RNA (rRNA) molecules con-stitute the main part of the ribosomes, the enzymes that read mRNA in orderto produce proteins. Two ribosomal ribonucleoproteins (rRNPs), known as thelarge and the small subunit, respectively, form a functional ribosome. TransferRNA (tRNA) molecules are the adaptors that select and hold individual aminoacids in place for the ribosomal processing.

Other types of non-coding RNA serve in various regulatory capacities through-out the information transfer process.

2.1.2 The Central Dogma

The Central Dogma of Molecular Biology, first pronounced by Crick in 1958[Cri58], states that the molecular flow of information is from DNA via tran-scription to RNA and from RNA via translation to protein(s). As shown inFigure 2.3 there are known exceptions, but these are associated with abnormalconditions.

CASE: Transcription of Genes. Each gene encodes either a set of proteinisoforms or a non-coding RNA string. The first step in actually producing theseentities is the transcription of the corresponding DNA into RNA.

Page 40: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

18 The Eukaryotic Cell

Figure 2.3: Flow of information in biological systems.

Figure 2.4: The transcription of genes [Jon07].

The stretch of DNA that is the gene consists of two regions. The coding regionis what describes the actual product(s) and upstream from that is the promoterregion.

A number of enzymes are involved in the transcription process, which has threephases: During initiation an RNA polymerase attaches at the promoter andmelts the DNA locally. During elongation the polymerase synthesises a primarytranscript RNA from the sense strand (the one that codes the gene) by chainingof nucleoside tri-phosphates (NTPs). Finally, during termination the nascent(growing) RNA strand and the polymerase are released from the DNA.

Transcription is heavily regulated. A number of regulatory regions, located inthe upstream or downstream area of the promoter, accommodate the bindingof transcription factors. These affect the affinity of the promoter for RNApolymerase and may be activators or repressors, depending on their influence.

Page 41: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Organisation 19

Figure 2.5: Post-transcriptional processing of pre-mRNA into mRNA.

The gene usually contains both coding regions (exons) and non-coding regions(introns). The introns are removed post-transcriptionally from the primarytranscript, called precursor mRNA (pre-mRNA), by a process called splicing.Regulated variations in this process allows a single gene to code a family ofmRNA.

Translation of RNA. Translation begins when the small ribosomal subunitassembles on the mRNA and seeks out a start codon. Once the start codonis found the large subunit assembles and translation commences by stepwiseelongation. When a stop codon is encountered the subunits disassemble andtranslation terminates.

Each elongation step consumes a tRNA molecule. At one end the tRNA has athree-nucleotide sequence (anti-codon) that can base-pair to a matching three-nucleotide fragment (codon) of mRNA. At the other end they have a bindingdomain that matches one of the twenty available amino acid monomers. Thisimplicitly defines the genetic code.

Proteins are folded and subjected to various enzyme-induced modifications asthey emerge from the ribosome during translation.

2.2 Cellular Organisation

The cell is highly organised and robust due to an amazing capacity for self-organisation. The underlying compartment system constitutes an elaboratenetwork of bio-membranes that define compartment boundaries and maintainthe essential differences between the inside fluid and the outside fluid.

Page 42: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

20 The Eukaryotic Cell

Figure 2.6: Structure of the cell membrane [Vil07b].

2.2.1 Lipid Bi-layer Membranes

Bio-membranes are bi-layered sheets. As shown in Fig. 2.6 they are mainly com-posed of amphiphatic lipid molecules and membrane proteins that perform freelateral diffusion on the surface. As the two monolayers face different biochemicalenvironments their composition is different; hence the bi-layers are oriented.

Phospholipids. Phospholipids are the main constituents of bio-membranes.They consist of a hydrophilic head and two hydrophobic tails made from fattyacids. In order to hide their hydrophobic tails from polar molecules they spon-taneously form self-healing spheres of bi-molecular sheets when submerged inwater.

Membrane lipids perform rapid lateral diffusion within the same mono-layer, orleaflet. Thus, the bi-layer acts as a well-stirred 2-dimensional fluid. Transversediffusion, from one mono-layer to the other, is rare because the hydrophilic headwould have to penetrate the hydrophobic interior of the bi-layer.

Cholesterol. Cholesterol molecules usually intersperse the membrane lipids.Their presence increases the rigidity of the membrane and also lowers the freez-ing temperature by reducing the interaction between molecules. Careful regu-lation of their prevalence allows a fine-grained control over the small moleculepermeability of membranes.

Membrane Proteins. Membranes are not just passive barriers. They provideselective permeability and are generally sites of considerable metabolic activity.The mediators of these functions are the membrane proteins.

Page 43: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Organisation 21

Transmembrane proteins consist of two functional domains that protrude oneither side of the membrane and are separated by a beta barrel (a stable forma-tion comprising multiple β-sheets) or a number of regular α-helices that extendthrough the membranal bi-layer.

Anchored membrane proteins are attached only to one leaflet of a membrane.They are covalently bound to either a fatty acid or a phospholipid that is ableto embed in the leaflet itself, thereby acting as an anchor.

Peripheral membrane proteins are associated to membrane surfaces by non-covalent interactions with membrane proteins or lipids. Such proteins usuallyadhere only temporarily to the membrane, and often in a regulatory capacity.

Like the membrane lipids, the proteins move by lateral diffusion. As they arericher in structure, however, they move considerably slower and never performtransverse diffusion.

2.2.2 Cellular Organelles

The eukaryotic cell, which is shown in Fig. 2.7, is not a monolithic structure.Rather it has elaborate internal structure in the form of membrane-boundedorganelles. The membrane of each organelle maintains a local environment thatis optimal for a set of highly specialised functions.

The most prominent compartment is the cell. It is bounded by the plasma mem-brane, which separates the cytoplasm, where the organelles float in cytosol, fromthe exoplasm. The outermost layer of the plasma membrane is the exoplasmicleaflet, and the innermost layer the cytosolic leaflet. The cytosol is an importantsite of metabolic activity. In particular this is where ribosomes translate mRNAinto protein.

The nucleus is where the genome is maintained and transcribed into RNA. It isshielded by the nuclear envelope — a double-membrane penetrated by nuclearpores that facilitate the extremely selective exchange of materials between thenucleoplasm and the cytosol.

The endoplasmic reticulum (ER) has two parts: The Rough ER is involved in thefolding and stabilisation of proteins for the plasma membrane or secretion. It isrough because the surface is studded with protein manufacturing ribosomes thatattach as soon as the nascent proteins prove to be destined for the membrane orsecretion. The smooth ER extends the rough ER and is involved in the synthesisof fatty acids and phospholipids for the membranes.

Page 44: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

22 The Eukaryotic Cell

Figure 2.7: Structure of eukaryotic cells [Vil07a].

The Golgi complex sorts and processes membrane-bound and secretory proteins,and attaches molecular labels according to their transport destinations.

The lysosomes are the digestive units of the cell. They utilise enzymes to breakdown macromolecules and also act as a waste disposal system.

The mitochondria are responsible for energy production by aerobic respiration.

The peroxisomes are responsible for selective enzymatic oxidation of proteinsand fatty acids.

Finally, vesicles are small, short lived, membrane-enclosed transport units thattransfer molecules between different compartments. Vesicles form as bubblesthat bud off the membrane of an existing compartment.

2.2.3 Cellular Transport

The most important function of the compartment membranes is to ensure thestability and organisation of the cellular environment. In particular it is impor-tant that the cellular environment is:

• electrochemically stable,

Page 45: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Organisation 23

• biochemically well-organised,

• and effectively separated from the outside environment.

These properties are ensured by a number of transport mechanisms that, col-lectively, deal with all of these aspects.

Small Molecule Transport. The electrochemical stability of compartmentsis primarily maintained by the selective permeability of small molecules, whichis implemented by a large class of transmembrane proteins.

Passive transport amounts to diffusion. Some gases may pass the membraneby simple diffusion. Slightly larger molecules, i.e., various ions and even water,pass by facilitated diffusion through molecular channels.

Active transport requires a source of energy. Molecular pumps hydrolyse ATP,while molecular transporters use the power of electrochemical gradients.

Large Molecule Transport. The biochemical organisation of the cell ismaintained by the selective transport of large molecules, e.g., proteins, fromproduction site to deployment site, The central mechanism is that of proteintargeting. Every translated protein has one or more signal sequences, identify-ing the appropriate deployment site, embedded in its amino acid chain. Eachsuch sequence is recognised by the transport machinery associated with thecorresponding destination.

Non-secretory proteins are produced in the cytosol and subsequently trans-ported to the lumen or membrane of an organelle. Nuclear Localisation Sig-nals (NLSs), for example, are recognised by Nuclear Pore Complexes (NPCs)that facilitate transport from cytosol to nucleoplasm. Other signals direct pro-teins to the lumens or membranes of peroxisomes, or to the membranes orsub-compartments of mitochondria.

RNA molecules are produced in the nucleoplasm and, if appropriate, subse-quently transported to the cytosol. In order to be recognised and transportedby the NPCs they must form ribonucleoprotein complexes with proteins thatexhibit Nuclear Export Signals(NESs).

The Secretory Pathway. Secretory proteins are meant for deployment in, oron membranes in contact with, exoplasmic solutions. Such solutions are rich inentities that contain important metabolites but are potentially harmful. Thus,many secretory proteins are hydrolases, i.e., enzymes that break down organiccompounds, and cannot be allowed to roam freely inside the cell.

Page 46: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

24 The Eukaryotic Cell

Signal recognition particles recognise secretory proteins already during transla-tion, and immediately associate them with the rough ER surface. Here theyare injected directly into either the membrane or the lumen of the rough ER byco-translational translocation.

Once folded into the proper conformation within the ER the proteins are pack-aged into anterograde vesicles and moved forward to the Golgi complex. Herethey are matured and sorted into vesicles according to their final destination,which might be either the lysosome or the cell surface. Facilitating proteins arecontinuously shipped back to the ER in retrograde vesicles.

Finally, once fully matured, the secretory proteins leave the Golgi complex invesicles. Some go to the plasma membrane, where they are secreted into theexoplasm by exocytosis. Acid hydrolases, on the other hand, are deployed tothe lysosome, where they are used to break down organic compounds.

In the latter case the vesicle is coated with a double protein coat. The ac-tual formation of the vesicle happens due to a coat of Clathrin particles, butunderneath this coat there is another, comprised of adopter protein complexes(APs).

The Endocytic Pathway. The mechanism also facilitates receptor mediatedendocytosis, which is used by the cell to selectively subsume particles from theexoplasm. The process is facilitated by specialised transmembranal receptorproteins whose exoplasmic domains are able to ligate specific particles.

Meanwhile, and independent of this, clathrin particles continuously assemble onthe cytosolic side of the plasma membrane - thereby forcing it to form clathrincoated pits that grow progressively deeper until released into the cytosol asseparate clathrin coated vesicles.

The diffusing receptors tend to associate with clathrin coated pits because theirintra-cellular domain binds to complementary adaptin (AP) molecules exposedby the clathrin coat. Such associated receptors and the particles that they bind,if any, are internalised when the coated vesicle is formed.

Once internalised, coated vesicles shred their clathrin coat and become earlyendosomes. At this stage the subsumed particles are still ligated by the receptorproteins. This changes, however, when the early endosome merges with a lateendosome. The acidic environment in this compartment makes the receptorsseparate from the ligated particles.

From the late endosomes the receptor proteins are recycled to the plasma mem-brane. The internalised particles, however, are transferred by vesicles to lyso-

Page 47: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Cellular Organisation 25

Figure 2.8: LDL degradation pathway. Copyright 2004 from Molecular CellBiology by Lodish et al [LBZ+99]. Reproduced by permission of W.H. Freemanand Company/Worth Publishers.

somes where they are broken down into useful metabolites,

CASE: The LDL Degradation Pathway. The best known example of thisprocess is the LDL degradation pathway shown in Fig. 2.8. This is one mecha-nism, by which the cell acquires the cholesterol required for membrane synthe-sis. Transmembranal LDL receptors ligate LDL when the exoplasmic domainencounters the ApoB binding site exposed by LDL particles. The cholesterolis released when the tightly packed cholesteryl esters of the LDL particles arehydrolysed in the lysosome [AJL+02, LBZ+99].

Page 48: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

26 The Eukaryotic Cell

Related Diseases. It happens that the gene encoding the transmembranalLDL receptor proteins somehow mutates. Sometimes these mutations are be-nign, and they cause no particular problems. It may also happen, however, thata mutation affects the coding of either the exoplasmic or the cytosolic bindingdomain in an adverse way. Either case leads to transcription of transmembranalreceptor proteins that exhibit reduced affinity between the affected binding siteand the corresponding binding sites of the ordinary ligands.

When such a defect affects the extra-cellular part of the receptor, its ability tobind LDL particles is reduced. In contrast, when the defect affects the intra-cellular part of the receptor protein, it can bind but not internalise LDL parti-cles. Both cases lead to abnormally high blood levels of LDL, which dramaticallyincreases the risk of the cardiovascular disease atherosclerosis. The resulting dis-order is called familial hypercholesterolemia and is hereditary, as it propagateswith the mutated gene.

2.3 Concluding Remarks

The dogma, if you will, of Systems Biology is that we must understand both theconnectivity in systems and their self-organisation in order to understand biol-ogy [vMJ+07]. Therefore this chapter has presented the main features of cellularinformation processing and cellular transport. In the course of the presentationwe have identified two aspects, i.e., genetic transcription and receptor mediatedendocytosis of Low Density Lipo-protein, that we shall abstractly model andsubject to static analysis in later chapters.

Page 49: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 3

Modelling in Process Calculus

“By relieving the brain of all unnecessary work, a good notationsets it free to concentrate on more advanced problems, and in effectincreases the mental power of the race.”

— Alfred North Whitehead

This chapter presents a variant of the BioAmbients calculus of Regev et al.[RPS+04, Reg03, Car04], a sibling of Mobile Ambients (Cardelli and Gordon[CG00]) designed to model biological systems. The material is intended to servemultiple purposes. First of all it introduces the modelling formalism addressedby the technical developments of later chapters. It also introduces the basicconcepts of process calculi, in particular a large family of languages that descendfrom the Calculus of Communicating Systems [Mil89, Mil99, CG00]. And finallyit serves as an introduction to the ‘cells-as-computation’ abstraction that waspioneered by Regev et al [RPS+04].

The BioAmbients calculus is very expressive and includes modelling primitivesfor many essential aspects of the biological domain.

Firstly, the calculus preserves the notion of ambients as bounded, mobile, sitesof activity that may nest hierarchically. This provides an intuitive means formodelling the chemically active membrane-bound compartments that are ubiq-uitous in eukaryotic cells. In order to make this spatial abstraction operational,the calculus incorporates a set of (co-)capabilities that allow processes to alterthe local nesting hierarchy.

Secondly, the calculus preserves the notion of channelled communication fromthe π-calculus. This allows simpler biological entities (i.e., proteins, RNA, and

Page 50: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

28 Modelling in Process Calculus

DNA) and their interaction networks to be modelled as networks of interact-ing π-style processes. In order to make this part of the modelling formalismoperational in the context of the ambient-as-compartment abstraction the cor-responding set of (co-)actions allows communication across ambient boundariesas well as locally.

The resulting calculus is quite extensive in terms of modelling primitives. Alsothe set of control structures for processes is slightly larger than what is tra-ditionally studied for Mobile Ambients, including non-deterministic (external)choice, as well as a general recursion construct, in the manner of CCS [Mil89].This is needed in order to allow a precise modelling of biological phenomena.

What brings all of these elements together and completes the biological ab-straction is a reaction semantics in the style of the Chemical Abstract Machine[BB90]. The interpretation provided by this semantics is exactly the right one:A BioAmbients model describes a chemical soup of reactive entities distributedover some spatial configuration. Two such entities may react (synchronously)if they are close to one another and exhibit molecular complementarity, i.e., ex-pose reactive domains (modelled by prefixes) that are complementary in termsof shape (channel name) and purpose (matching capability/co-capability).

The chapter is in five sections. We start in Section 3.1, by giving an informalstep-by-step introduction to a family of process calculi commonly used for themodelling of biological systems. Here we shall consider a notion of simple reac-tive processes in the style of CCS [Mil80], a notion of complex forming processesin the style of π-calculus [Mil99], and, finally, the class of compartment formingprocesses modelled by the BioAmbients calculus [RPS+04]. Then, in Section3.2, we formalise the reaction semantics of the BioAmbients calculus. After thiswe present a BioAmbients model of the LDL degradation pathway in Section3.3 and a BioAmbients model of genetic transcription in Section 3.4. Finally, inSection 3.5, we summarise and relate to other approaches.

3.1 The BioAmbients Modelling Language

3.1.1 Simple Reactive Processes

In order to gently introduce the analogy between chemical reagents and com-putational reactive processes we start from the basics and consider a simplelanguage of concurrent reactive process expressions. Besides the omission ofinternal actions this language is identical to Milner’s Calculus of Communicat-

Page 51: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The BioAmbients Modelling Language 29

ing Systems (CCS) [Mil80], and forms the core of the BioAmbients modellinglanguage.

As usual for process calculi the language incorporates two design elements:

Firstly, there is a set of primitives, each of which denotes the capability of per-forming some basic biological activity, which we consider as atomic. In the caseof simple reactive processes the fundamental activity is to engage in reaction.As described in the previous chapter, reaction in the bio-molecular domain isprimarily a matter of complementary physical shapes, the binding sites exposedby molecules, coming sufficiently close to each other as shown in Fig. 2.1. In thecalculus we capture this by constants, n,m, that we pick from a denumerableset, Const, and use as abstract representatives of the interactions such facili-tated. The sites that participate in reactions are physical complements of oneanother, i.e. if the one is convex then the other is concave. This leads to a notionof simple capabilities for reaction, where n! and n? denote the complementarybinding sites that facilitate the interaction identified by n. Technically, we shalluse standard terminology and call n a channel to point out that n denotes an(invisible) bond in the interaction topology of bio-chemical reactions.

Definition 3.1 (Simple Capabilities) A simple capability is an element ofthe set defined by the following syntax:

M ∈ Cap ::= n! Reaction capability| n? Reaction co-capability

where n ∈ Const. �

Secondly, there is a set of control structures that allow process expressions to becomposed from the primitives in various ways. Sequential process expressionsare the most basic of such compositions, and they are suitable for abstractdescriptions of isolated reactive entities such as, e.g., proteins.

Definition 3.2 (Sequential processes expressions) A sequential process isan element of the set defined by the following syntax:

P ∈ Proc ::=∑

i∈I M ℓi

i . Pi Summation (guarded choice)

| rec X. P Recursive process (defining X , rec X. P )| X Process identifier

where I is any finite indexing set. We shall use P,Q,R and their primed orsub-scripted variants (e.g., P ′) to denote process expressions.

Remark 3.3 (Labels) In preparation of the static analyses of later chapterswe attach a label, ℓi ∈ Lab, to each capability prefix, Mi. It shall merely serveas a useful pointer into the process, and has no semantic significance. For anyconcrete process, P , we shall assume that Lab is a finite set.

Page 52: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

30 Modelling in Process Calculus

Notation 3.4 It is customary to write concrete k-way instances of summation,i.e., |I| = k, as M1 . P1 + · · ·+ Mk . Pk. Two special cases emerge when k = 0 ork = 1. They implicitly give rise to the following additional constructs:

P ∈ Proc ::= . . .| 0 Inaction| M ℓ . P Prefix

We shall often omit trailing occurrences of 0, i.e., simply write M rather thanM .0, when writing process expressions.

Definition 3.5 (Identifier binding) In rec X. P the displayed occurrence ofX is identifier binding, asserting that X , P , with scope P . An occurrenceof a process identifier Y in a process is bound if it is, or lies within the scopeof, a binding occurrence of Y . If not bound, Y is free. A process with no freeidentifiers is identifier closed.

Notation 3.6 (Identifier substitution) We shall write P [Q/X ] for the sub-stitution of Q for X in P , i.e., for the process that is as P except that every freeoccurrence of the process identifier X is replaced by the process expression Q.

Remark 3.7 In the presence of recursive processes the operational definitionsof bound and free process identifiers as well as application of substitution arenot straightforward and we postpone the formal treatment until Chapter 5. �

Informally, the expressions carry the following meaning:

Inaction, 0, denotes a process that can do nothing. Technically, this is theterminal process that is the end of all things (including recursive analysis).Biologically, this is a system that is completely depleted of reactive entities; weshall think of it merely as superfluous solvent, e.g., water, that may dissipateby evaporation.

The prefix, M ℓ . P , denotes a process that is capable of participating in thereaction identified by the capability M ; exercising M turns the process into thecontinuation P . This corresponds to a biological entity, such as a protein, thatexposes a single binding site and is altered in some way by the correspondingreaction.

The summation,∑

i∈I M ℓi

i . Pi, generalises the prefix and, thus, denotes a pro-cess that is capable of participating in any one of the k = |I| reactions identifiedby the capabilities Mi. Again, exercising some Mj turns the process into thecorresponding continuation Pj ; the remaining capabilities and the correspond-ing continuations disappear. This corresponds to a biological entity, such as a

Page 53: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The BioAmbients Modelling Language 31

protein, that exposes k distinct binding sites and is altered in some way whenone of them engages in reaction.

The recursive process, rec X. P , models recurrent behaviour. Technically, theconstruct denotes a process that behaves as P [recX. P /X ]. Biologically, thiscorresponds to entities that have cyclic behaviour. One example is an entityrec X.M ℓ .X, such as an enzyme, that may participate in the same reaction –identified by M – over and over. Another example, that of a stateful entityrec X. (off? . on? . X + M ℓ . X), emerges if you allow the enzyme to be (oftentemporarily) inhibited. More generally, the recursive process may be used tomodel phenomena such as recycling, replication, or the unbounded supplies of,e.g., nutrients associated with open systems.

Concurrent process expressions emerge when sequential (or concurrent) processexpressions are combined by parallel composition; they are suitable for abstractdescriptions of reactive systems such as, e.g., bio-chemical solutions.

Definition 3.8 (Concurrent processes) A concurrent process is an elementof the set defined by the following syntax:

P ∈ Proc ::= (n)P Name restriction| P P Parallel composition

|∑

i∈I M ℓi

i . Pi Summation (guarded choice)

| rec X. P Recursive process (defining X , rec X. P )| X Process identifier

where I is any finite indexing set.

Definition 3.9 (Constant binding) In (n)P the displayed occurrence of theconstant n is binding with scope P . �

The language extends that of sequential process expressions. Informally the newconstructs carry the following meaning:

The parallel composition, P Q, denotes the concurrent composition of processesP and Q. In the resulting reactice system reactions may occur between P and Qif the one exposes a capability n! and the other the corresponding co-capabilityn?. For example, the system n! . P ′ n? . Q′ may react and become P ′ Q′. IfP and Q are themselves reactive systems, reactions may occur in each of themindependently of the other. This corresponds to a chemical solution of reactiveentities.

Finally, in the name restriction, (n)P , the scope of the name n is restrictedto P . Technically, we may think of the name n as bound to some channel and

Page 54: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

32 Modelling in Process Calculus

the knowledge of the particular association being private to the sub-system P .Thus, two concurrent processes within P may react on n, but no process in Pcan react on n with a process outside of P . Biologically, this corresponds toa notion of confinement; in the context of concurrent process expressions thecorrespondence is rather weak because it is not straightforward how to modelthe dynamics of either physical confinements, such as compartments, or virtualones, such as complexes. In the former case n would simple correspond to atype of reaction only taking place in a particular compartment, in the latter toa reaction facilitated by binding sites that are exposed toward the interior of aparticular coordination compound only.

Remark 3.10 In the context of parallel composition the notion of replicationarises as a special case of the recursive process. Intuitively. rec X. P X denotesa potentially unbounded supply of instances of P . This is usually written !P . �

3.1.2 Complex Forming Processes

In order to strengthen the analogy between chemical reagents and compu-tational processes we now extend the language into one of complex formingprocess expressions. Save the omission of internal actions and conditionals,this language is identical to the π-calculus of Milner, Parrow, and Walker[MPW92, Mil99, SW01, Par01]. Suitable for the modelling of biochemical enti-ties and complexes it forms a powerful subset of the BioAmbients calculus.

Complex forming processes emerge when we allow names to be communicatedwhen reactions occur. This is embodied in the local communication capabili-ties, a new set of primitives that subsumes the simple capabilities as a specialcase. The definition relies on a notion of variables, p, q, that we pick from adenumerable set, Var, and use as placeholders for the constants received duringcommunication. As we shall see, the syntax admits both constants and vari-ables in many positions; hence it is common to refer to them simply as names,x, y, and pick them from the denumerable set, Name, that is composed as thedisjoint union of constants and variables, i.e., Name = Const ⊎ Var.

Definition 3.11 (Local communication capabilities) By a local communi-cation capability we shall understand an element of the set defined by the fol-lowing syntax:

M ∈ Cap ::= x!{y} Local output capability| x?{p} Local input (co-)capability

Definition 3.12 (Variable binding) In x?{p} . P the displayed occurrence ofthe variable p is binding with scope P .

Page 55: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The BioAmbients Modelling Language 33

Definition 3.13 (Free and bound names) An occurrence of a name x in aprocess is bound if it is, or lies within the scope of, a binding occurrence of x.If not bound, x is free. A process with no free name is name closed.

Notation 3.14 (Name substitution) We shall write P [m/x] for the substi-tution of the constant m for the name x in the process expression P , i.e., theprocess that is as P except that every free occurrence of the name x is replacedby the constant m.

Remark 3.15 Again, we postpone the operational definitions of bound and freenames until Chapter 5. �

As was the case for the simple capabilities, the local communication capabilitiesonly become operational when used for prefixing. Because of the involved namepassing, however, reactions are no longer symmetric activities between processes;rather they require one of the involved processes to act as sender and the otherto act as recipient: The prefix n!{m} . P is a sender; it denotes a process thatis capable of participating in the reaction identified by n, and if this capabilityis exercised the process shares its knowledge about the binding of m with thereaction partner and continues as P . In a complementary fashion the prefixn?{p} . Q is a recipient; it denotes a process that is capable of participating inthe reaction identified by n, and if this capability is exercised the process learnsabout some name binding, e.g., m, that is then substituted for p throughout Q,such that the process continues as Q[m/p] rather than Q.

Thus, the extension enables the passing of values from one entity to the other.However, the main strength lies in the notion of scope extrusion that accom-panies this ability. For example, in a system (m)n!{m} . P n?{p} . Q P ′ alocal communication reaction might occur such that in the resulting system(m) (P Q[m/p]) P ′ the scope of m has been extruded to include both P andQ[m/p], i.e., P and Q[m/p] now share the bond denoted by m, but it is notshared by anyone else.

This can be used to model the biological notion of complexes — especially the,often temporary, coordination compounds of molecular machineries that occurin conjunction with, e.g., the transcription of genes or the translation of RNA.These compounds often form in an ad-hoc manner in order to perform some non-atomic piece of work, after which they disassemble again. The ad-hoc nature ofsuch a compound calls for individual modelling of their constituting components.The fixed cooperation structure of the compound, however, requires the modelto enforce a temporary ’lock’ on the set of constituents throughout the lifetimeof a complex. This is exactly what scope extrusion and recursion provides.

Page 56: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

34 Modelling in Process Calculus

Now, when a communication reaction takes place, e.g., in n!{m} . P n?{p} . Q,

then the result is defined in terms of a substitution P Q[m/p] of a constant fora variable. If m is bound in Q, i.e., Q contains a sub-expression (m)Q′, then mbecomes spuriously bound by this binder if Q′ is of the form · · · p · · · .

This is called constant capture and, in order to avoid it, it is usual to demandstatic scope and define a notion of α-equivalence that allows bound constantsto be freely changed.

Definition 3.16 (α-equivalence) A change of bound constants or α-renamingwithin a process P is the replacement of a subterm (n)Q of P by (m)Q[m/n]where m /∈ fn(Q). It is common to say that P and Q are α-equivalent, P ≡α Q,if Q can be obtained from P by a finite number of changes of bound constants.

Remark 3.17 Communication is defined in terms of a substitution, P [n/x], ofa constant for a name, and α-equivalence is defined in terms of a substitution,P [m/n], of a constant for a constant. In contrast we shall ensure in Chapter 5that the need of substituting a variable for a variable or a process identifier fora process identifier shall never arise; hence we define α-equivalence neither forvariables nor process identifiers.

Remark 3.18 (Canonical names) Many of the technical developments ofthe following sections and chapters will rely on the ability to statically track thenames that occur in the processes of interest. Due to α-equivalence, however,the direct syntactical representation of names is not stable under the seman-tic relations; hence ordinary names are not suitable for carrying static analysisinformation.

We solve this issue in the manner that is standard for Flow Logic based staticanalysis, e.g., [NNP04, HJNN99, BDNN01]. We associate each name x with acanonical name ⌊x⌋ ∈ Name and demand that α-renaming be disciplined, i.e.,performed in such a way that canonical names are preserved, even when thesyntactical representations change. When N is a set of names we shall write⌊N⌋ to denote the point-wise extension of ⌊x⌋. Similarly we extend the operatorto capabilities, i.e., ⌊M⌋ is M where every name is canonicalised, and processes,i.e., ⌊P ⌋ is P where every name is canonicalised.

For any concrete process, P , we shall assume that Name, in contrast to Name,is a finite set. And, like the ordinary names, this set is composed as the disjointunion of the canonical constants, ⌊n⌋ ∈ C, and the canonical variables, ⌊p⌋ ∈ V,i.e., Name = C ⊎ V.

In practice we are going to take Name ⊂ Name. In the case of variables, whichare not subject to α-conversion, it then suffices to choose ⌊p⌋ = p. In the case of

Page 57: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The BioAmbients Modelling Language 35

P ∈ Proc ::= (n)P Name restriction| P µ Ambient boundary| P P ′ Parallel composition

|∑

i∈I M ℓi

i . Pi Summation (guarded choice)

| rec X. P Recursive process (defining X , rec X. P )| X Process identifier

Table 3.1: The syntax of BioAmbients processes, P .

constants, however, we shall assume that each distinct constant, n, occurring inthe initial static program code gives rise to an equivalence class of constants rep-resented by the corresponding canonical constant, ⌊n⌋. Disciplined α-renamingthen demands that the α-renaming process always picks the replacement namefrom the equivalence class of the replaced name and that the corresponding α-equivalence (Table 3.3) only holds when the bound names are indeed from thesame class. �

3.1.3 Compartment Forming Processes

As noted by Regev [Reg03] membrane bounded compartments may be modelledas complex forming processes, but the underlying analogy is quite weak. Thusshe proposed a better alternative in the form of BioAmbients. In doing so sheproposed that compartment forming processes emerge from complex formingprocesses, when you allow spatial boundaries to be explicitly modelled.

Definition 3.19 (BioAmbients processes) The set of BioAmbients processexpressions is defined by the syntax of Table 3.1.

Notation 3.20 We use the heavy brackets, and , to represent ambient bound-aries; the ordinary brackets, [ and ], are reserved for the notion of substitution.

Remark 3.21 (Roles) In preparation for the static analyses of later chapterswe attach a role, µ ∈ Role, to each ambient boundary. It shall merely serveas a useful pointer into the process, and has no semantic significance. For anyconcrete process, P , we shall assume that Role is a finite set. �

This extends the language of complex forming processes with the ambient bound-ary construct, P µ, which denotes a process P encapsulated by a spatialboundary. There is an obvious correspondence to the biological concept of acompartment, which is exactly a physical enclosure that creates a strong spatialseparation between inside and outside. In some instances, however, it is also

Page 58: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

36 Modelling in Process Calculus

M ∈ Cap ::= Communication capabilitiesx!{y} | x?{p} local communication

| x !{y} | x ?{p} parent to child communication| x !{y} | x ?{p} child to parent communication| x#!{y} | x#?{p} sibling to sibling communication

Movement capabilities| enter x | accept x enter movement| exit x | expel x exit movement| merge– x | merge+ x merge movement

Table 3.2: Syntax of BioAmbients capabilities, M .

(a) local (b) parent to child (c) child to parent (d) sibling

Figure 3.1: Communication styles of the BioAmbients calculus.

useful for the modelling of ’hard’ compounds, i.e., complexes that are not formedas temporary coordination compounds. Finally, it may model non-membranal,enclosure forming, biological entities, such as the clathrin coats that force theformation of internalisation vesicles by the plasma membrane (Section 2.2.3).

Definition 3.22 (BioAmbients capabilities) The set of BioAmbients capa-bilities is defined by the syntax of Table 3.2. �

The pair x !{y}/x ?{p} causes parent to child communication, i.e., a process maycommunicate a message to a process encapsulated by a neighbouring ambient(Fig. 3.1(b)); this may correspond to a variety of receptor mediated interactionswhere the co-action is perceived as a surface receptor of a compartment.

(a) enter (b) exit (c) merge

Figure 3.2: Movement styles of the BioAmbients calculus.

Page 59: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The BioAmbients Modelling Language 37

In contrast, the pair x !{y}/x ?{p} causes child to parent communication, i.e., aprocess may communicate a message to a process that is a neighbour of its im-mediately enclosing ambient (Fig. 3.1(c)); biologically this could e.g. correspondto interaction between a component of a complex and a molecule neighbouringthe complex.

Finally, the pair x#!{y}/x#?{p} allows sibling to sibling communication, a pro-cess within some ambient may communicate a message to a process locatedwithin an ambient neighbouring the first (Fig. 3.1(d)); biologically this corre-sponds, e.g., to receptor mediated interaction between a complex and a com-partment, communication between components of different complexes, or inter-compartment (e.g., hormonal) signalling.

In the case of movement capabilities the effect of reaction is that the localhierarchy of nested ambients changes.

The effect of an interaction between enter x and accept x is that one ambiententers a neighbouring one (Fig. 3.2(a)); biologically this corresponds, e.g., toendocytosis where a compartment (selectively) subsumes another large entity.

The capabilities exit x and expel x cause one ambient to leave the immediatelyenclosing one (Fig. 3.2(b)); this corresponds, e.g., to exocytosis where a com-partment (selectively) secretes some matter.

Finally, merge+ x and merge– x simply make two neighbouring ambients merge(Fig. 3.2(c)); this corresponds to fusion of biological compartments.

Example 3.23 Our running example shall be the following program, Peat, which

is inspired by the production of metabolites by catabolism of nutrients; as we shall

explain later it models how ’food particles’ (nutrients) may either be ignored, digested,

or emitted as secretion by the cell:

(rj) (ac) (rea) (RL)( rec Z. expel rj1 . Z

( recY. ( rea ?{rl}2 . expel rl3 . Y+exit rj4 . Y+enter ac5 . Y )

exit RL6 .0 nutrient ) food

rec S. ( rea !{RL}7 . S+expel rj8 . S

+accept ac9 . S ) cell ) system

Page 60: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

38 Modelling in Process Calculus

3.2 Semantics of BioAmbients

The notion of Reaction Semantics is inspired by Berry and Boudol’s ChemicalAbstract Machine [BB90], which pioneered the view that concurrent processexpressions are ‘really’ chemical solutions of reactive entities. These solutionsare heated and stirred by an invisible device, and entities may react if theycome close and exhibit molecular complementarity. Being extremely appealing,this abstraction is now widely used in the context of process calculi. In partic-ular, it is the traditional choice for ambient calculi. Thus, when conceiving ofBioAmbients, Regev [RPS+04, Reg03] gave the language a reaction semantics,which ensures a high degree of coherence between the inherently (bio-)chemicalmodelling domain and the operational model of the language.

It is custom to define a reaction semantics in terms of structural congruence, ≡,and reaction, −→, both binary relations on processes. We shall diverge slightlyfrom this tradition and, following Berry and Boudol’s original proposal rather

closely, define the semantics in terms of heating, ⇛, and reaction,ℓ

−→.

Definition 3.24 (Heating relation) The heating relation, ⇛, is the least bi-nary relation on Proc that is inductively defined by the axioms and rules ofTable 3.3. When it holds between two process expressions P and Q, writtenP ⇛ Q, it means that Q arises from P by any number of occurrences of

• insignificant syntactic restructuring (stirring, ≡),

• elimination of an inactive process (evaporation, ⇛),

• elimination of a useless restriction (diffusion, ⇛), and

• unfolding of a recursive process (catalysis, ⇛).

Remark 3.25 (Structural congruence) The ordinary structural congruencerelation, ≡, of BioAmbients [Reg03] is nearly fully embedded in the heating re-lation, where we write ≡ for the conjunction of ⇛ and ⇚. However, we disallowthe random introduction of inactive processes and vacuous restrictions, and as-sert that recursive processes can only be unfolded and not folded back.

Remark 3.26 (Unfolding of recursive processes) We use the heating re-lation to unfold recursive processes. Another choice is usually associated withStructural Operational Semantics of process calculi [Mil99], where it is customto have the following inference rule:

P [recX. P /X ]α

−→ Q

rec X. Pα

−→ Q

Page 61: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Semantics of BioAmbients 39

Scope of restrictions:

h-sres (n) (m)P ≡ (m) (n)P

h-samb (n) ( P µ) ≡ (n)P µ

h-sext (n) (P Q) ≡ ((n)P ) Q if n /∈ fn(Q)

Reordering of parallel processes:

h-pcom P Q ≡ Q P

h-pass (P Q) R ≡ P (Q R)

Reordering of summands:

h-scom P + Q ≡ Q + P

h-sass (P + Q) + R ≡ P + (Q + R)

α-equivalence:

h-alph (n)P ≡ (m) (P [m/n]) if m 6∈ fn(P ) and ⌊n⌋ = ⌊m⌋

Evaporation and diffusion: Unfolding of recursive processes:

h-neva P 0 ⇛ P

h-rdif (n)0 ⇛ 0

h-urec rec X. P ⇛ P [recX. P /X ]

Reflexivity and transitivity:

h-refl P ⇛ Ph-tran

P ⇛ Q Q ⇛ R

P ⇛ R

Congruence:

h-cresP ⇛ Q

(n)P ⇛ (n)Qh-camb

P ⇛ Q

P µ ⇛ Q µh-cpar

P ⇛ Q

P R ⇛ Q R

h-csumP ⇛ Q

M ℓ . P + R ⇛ M ℓ . Q + R

Table 3.3: The heating relation P ⇛ Q on processes. We write P ≡ Q whenboth P ⇛ Q and P ⇚ Q.

This works fine in the presence of a reaction rule

−→ P ′ Qα

−→ Q′

P Qτ

−→ P ′ Q′

that allows two recursive processes to be simultaneously unfolded. In the pres-

Page 62: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

40 Modelling in Process Calculus

ence of a reaction semantics, however, this does not work, and we delegateunfolding to the heating relation in order to allow simultaneous unfolding. �

The heating relation serves both as a stirrer, cleaner, and catalyst, much like realheat does to a real chemical solution. The rules and axioms have the followingmeaning:

The rules for scope of restrictions ensure that a name restriction is (almost)completely free to migrate in and out of ambient boundaries, parallel composi-tions, other name restrictions. This guarantees that a name restriction cannotspuriously block reactions, because sooner or later it will migrate out of theway. Note, that together with α-conversion (application of h-alph) this axiomimplicitly captures the aforementioned notion of scope extrusion.

The rules for reordering of parallel processes make systems behave like multisetsof process expression, i.e. solutions of reactive entities. Similarly, the reorderingrules for summations make summations behave as (subordinate) multisets, i.e.,dynamic entities that spin and thereby make their binding sites accessible fromall directions.

The notion of α-equivalence ensures that solutions are indistinguishable up todisciplined changes of bound names. Thus the concrete representation of aparticular channel is immaterial, as long as the relevant reactive entities agreeon it.

The h-neva rule serves to eliminate inactive processes. This corresponds toBerry and Boudol’s notion of inaction cleanup and biologically we shall thinkof this as evaporation.

The h-rdif rule serves to eliminate useless restrictions. This corresponds toBerry and Boudol’s notion of restriction cleanup and biologically we shall thinkof this as diffusion or secretion of garbage.

The rule for unfolding of recursive processes unfolds a recursive definition once,thereby releasing a single instance of the body into the system. This correspondsto Berry and Boudol’s fixpoint rule and biologically we shall think of this as aprocess that is catalysed by heating.

The rules for reflexivity and transitivity ensure that the heating of a system maycause any number, that is zero or more, concrete stirring, cleaning or unfoldingactions. Note that symmetry of ≡ follows implicitly.

Finally, the rules for congruence ensure that the equivalence extends to all

Page 63: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Semantics of BioAmbients 41

Movement:

r-ent(enter nℓ1 . P + P ′) P ′′ µ1 (accept nℓ2 . Q + Q′) Q′′ µ2

(ℓ1,ℓ2)−→

P P ′′ µ1 Q Q′′ µ2

r-ext(exit nℓ1 . P + P ′) P ′′ µ1 (expel nℓ2 . Q + Q′) Q′′ µ2

(ℓ1,ℓ2)−→

P P ′′ µ1 Q Q′′ µ2

r-mrg(merge– nℓ1 . P + P ′) P ′′ µ1 (merge+ nℓ2 . Q + Q′) Q′′ µ2

(ℓ1,ℓ2)−→

P P ′′ Q Q′′ µ2

Communication:

r-l2l (n!{m}ℓ1 . P + P ′) (n?{p}ℓ2 . Q + Q′)(ℓ1,ℓ2)−→ P Q[m/p]

r-p2c(n !{m}ℓ1 . P + P ′) (n ?{p}ℓ2 . Q + Q′) Q′′ µ (ℓ1,ℓ2)

−→P Q[m/p] Q′′ µ

r-c2p(n !{m}ℓ1 . P + P ′) P ′′ µ (n ?{p}ℓ2 . Q + Q′)

(ℓ1,ℓ2)−→

P P ′′ µ Q[m/p]

r-s2s(n#!{m}ℓ1 . P + P ′) P ′′ µ1 (n#?{p}ℓ2 . Q + Q′) Q′′ µ2

(ℓ1,ℓ2)−→

P P ′′ µ1 Q[m/p] Q′′ µ2

Execution in context:

r-resP

ℓ−→ Q

(n)Pℓ

−→ (n)Qr-amb

Pℓ

−→ Q

P µ ℓ−→ Q µ

r-parP

ℓ−→ Q

P Rℓ

−→ Q R

Stirring, cleaning, and catalysis:

r-auxP ⇛ P ′ P ′ ℓ

−→ Q′ Q′ ⇛ Q

Pℓ

−→ Q

Table 3.4: The reaction relation, Pℓ

−→ Q, on processes.

relevant sub-expressions, i.e., two process expressions denote the same solutionif all sub-expressions denote the same sub-ordinate solutions.

Definition 3.27 (Reaction relation) The reaction relation is the least bi-nary relation on Proc defined inductively by the axioms and rules of Table

3.4. When it holds between two processes P and Q, written Pℓ

−→ Q where

Page 64: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

42 Modelling in Process Calculus

ℓ = (ℓ1, ℓ2) is a pair of labels, it means that P can evolve into Q by a singlemovement or communication reaction involving two prefixes that are labelled ℓ1and ℓ2, respectively.

Example 3.28 The semantics of the example program Peat is illustrated below:

1

2

3

4

6

5

7

8

system

cell food

nutrient

system

cell

food

nutrient

system

cell

food

nutrient

system

cell

food

nutrient

system

cell

foodnutrient

system

cell food

nutrient

system

cell

food

nutrient

system

cell

foodnutrient

rl=RL

(6,3)

(4,8)

(4,8)

(5,9)

(5,9)

(4,1)

(4,1)

(7,2)

(7,2)

rl=RL

The initial configuration is shown in frame 1 and here the tree structure reflects a

scenario where cell and food are siblings inside system and nutrient is a sub-ambient of

food. In this configuration (4, 1) can react to move food out of system and obtain the

stuck configuration of frame 2. Alternatively, (5, 9) can react to move food into cell

(frame 3). Then (4, 8) can react to move food back out of cell (frame 1 again), or (7, 2)

can react to bind the variable rl to the constant RL (frame 4). After that only (6, 3)

can react to move nutrient out of food (frame 5). Here (7, 2) may react to bind the

variable rl to the constant RL once more and thereby obtain the stuck configuration

of frame 8. Alternatively, (4, 8) can react to move food out of cell (frame 6). From

here (5, 9) may react to move food back into cell (frame 5). Alternatively, (4, 1) can

move food out of system and thereby obtain the stuck configuration of frame 7. �

Remark 3.29 (Semantic labels) The labels, ℓ, that we attach to the seman-

tic arrow,ℓ

−→, have no semantic significance. They simply constitute an instru-

mentation that helps us define a collecting semantics,L

−→⋆, i.e., the reflexive

transitive closure ofℓ

−→, defined as follows:

P⋆ε

−→⋆P⋆

P⋆L

−→⋆P Pℓ

−→ Q

P⋆Lℓ−→⋆Q

Thus, for a given derivation, PL

−→⋆Q, the string L constitutes a record, in theform of a trace, of the transitions that have come to pass. We shall use this in

Page 65: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Semantics of BioAmbients 43

the sequel, when formulating our correctness results, and quite often we shall

write P⋆L

−→⋆Pℓ

−→ Q for a sequence of reactions that evolve an initial programP⋆, first into P via the sequence L of reactions, and then into Q via a finalreaction ℓ. �

The reaction relation constitutes our definition of the reactive behaviour of pro-cesses. Every axiom has two ingredients. In the case of movement reactions thelocal ambient hierarchy is changed and in the case of communication reactionsa constant is substituted for a variable in the receiving process. In both casesthe prefixes involved in the synchronising reaction are removed to leave roomfor their continuations and summands are discarded.

The movement reactions are those of enter movement, exit movement, and mergemovement, defined by axioms r-ent, r-ext, and r-mrg, respectively, and theyenable the movement behaviours described in Section 3.1.3.

The communication reactions are those of local communication, parent to childcommunication, child to parent communication, and sibling to sibling commu-nication, defined by axioms r-l2l, r-p2c, r-c2p, and r-s2s, respectively, andthey enable the communication behaviours described in Section 3.1.3.

The rules for execution in context ensures that reactivity extends to all relevantparts of the system. This asserts that sub-expressions of restrictions, ambients,and parallel compositions are indeed reactive and, thus, denote subordinatesolutions.

Finally, the rule for cleaning, stirring, and catalysis allows systems to be stirredand heated before and after reactions. This enables the view that reaction is arelation on warm and well-stirred solutions rather than fixed algebraic expres-sions.

Remark 3.30 (Unrestricted sums vs. guarded choice) Note that we omitthe unrestricted choice, P + Q, of the original language [Reg03] in favour of themore conventional guarded sum,

i∈I Mi . Pi. We do this because unrestrictedchoice a) is difficult to handle in a reaction semantics in the presence of generalrecursion [PNN06a], and b) seems to provide no essential extension of expressivepower. The point a) is of relevance to the developments in Chapter 8, where awell-defined kill component is extremely difficult to specify in the presence ofunrestricted choice. �

Page 66: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

44 Modelling in Process Calculus

Figure 3.3: Annotated LDL degradation pathway [LBZ+99].

3.3 CASE: Modelling the LDL Degradation Path-

way

We now turn to the modelling of the LDL degradation pathway (Section 2.2.3),which is the first of our two case studies. Receptor mediated endocytosis is pri-marily a transport scenario. It involves both transmembranal coordination andcompartment based transport; this is modelled by ambients that communicateand move.

In accordance with Regev’s examples and guidelines we take the approach that

Page 67: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Modelling the LDL Degradation Pathway 45

each kind of physical compartment as well as each kind of multiprotein complexshould correspond to one ambient role. Thus, we have the following correspon-dences:

• The LDL role models LDL particles.

• The EE role models early endosomes (true compartments).

• The LE role models late endosomes (true compartments).

• The CC role models clathrin coats (coating the EE in the coated vesicle).

• The LY SO role models lysosomes (true compartments).

• The CELL role models cells (true compartments).

• The XV role models transfer vesicles (true compartments).

• The CH role models cholesterol.

When we can do so without ambiguity, we will use the abbreviated ambient rolesalso when referring to the biological entities that they model. As will be evidentfrom the explanation below, the model emphasises the receptor dynamics thatfacilitates the initial LDL binding but ignores the details of receptor recycling.Nonetheless this allows the analysis to highlight certain medical issues.

In Nature each compartment and reaction would be present in the thousands.For the purposes of this dissertation, however, we are merely interested in qual-itative information and, hence, it suffices for us to model a single representativefor each biological entity.

In Table 3.3 the LDL (described in LipoProtein) is initially located outside ofthe CELL (described in Cell). Here it offers an ApoB signal via the channelLDLrcpt that corresponds to the extra-cellular binding site of the transmem-branal LDL receptor of the CELL.

At this stage the early endosome has not been formed yet. We model, however,the membrane patch and the transmembranal LDL receptors, which are latergoing to fold into the early endosome, as a process capable of evolving into theEE ambient. As explained in Section 2.2.3, the clathrin coated early endosomemay be formed with or without bound LDL particles. We model this by asummation that demands one of the following three binding scenarios to occurbefore the EE ambient is released:

Page 68: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

46 Modelling in Process Calculus

LipoProtein =LDLrcpt#!{ApoB}1 . enter ApoB2 . enter ee3 . enter xv4 . proc ?{Hydr}5 .

( expel Hydr6 .0exit Hydr7 .0 CH ) LDL

Endo =enter AP223,16,9 . exit AP224,17,10 .merge– Le25,18,11 .0

EarlyEndo =accept ee22,15 .Endo EE

ClathrinCoat =EErcpt ?{ap2}26 . accept ap227 . expel ap228 .0 CC

XferVesicle =( accept xv31 .0

exit Le32 .merge– lyso33 .0 ) XV

LateEndo =( merge+ Le29 . expel Le30 .0

XferVesicle ) LE

Lysosome =merge+ lyso34 . proc !{hydr}35 .0 LY SO

Cell =( ( EErcpt !{AP2}8 . Endo EE

+EErcpt !{AP2}12 . LDLrcpt#?{apob}13 . accept apob14 .EarlyEndo

+LDLrcpt#?{apob}19 . accept apob20 . EErcpt !{AP2}21 .EarlyEndo )ClathrinCoat

LateEndo

Lysosome ) CELL

(LDLrcpt) (EErcpt) (ApoB) (AP2) (ee) (cc) (lyso) (xv)(Le) (proc) (hydr) (ap2) (ap)

( LipoProtein Cell )

Table 3.5: Abstract model of the LDL degradation pathway.

1. The extra-cellular part LDLrcpt of the LDL receptor binds the ApoBsignal of the LDL thus forcing LDL to enter the CELL. Subsequentlythe intra-cellular part EErcpt of the receptor is bound by the AP2 domainexposed by the CC bound adaptin molecules.

2. The intra-cellular part EErcpt of the receptor is bound by the AP2 do-main exposed by the CC bound adaptin molecules. Subsequently theextra-cellular part LDLrcpt of the LDL receptor binds the ApoB signalof the LDL thus forcing LDL to enter the CELL.

3. The intra-cellular part EErcpt of the receptor is bound by the AP2 do-main of the CC and the extra-cellular part LDLrcpt is never bound.

Page 69: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Modelling Genetic Transcription 47

If the LDL is in place inside the CELL after the binding scenario it may enterthe EE otherwise not. Either way, the internalisation of the clathrin coated pitmay be completed by the EE entering the CC.

In Nature this internalisation process is atomic since the

CH LDL EE CC CELL

(or EE CC CELL) configuration arises instantaneously when the coatedvesicle is completed and internalised. By modelling this as a sequence of eventswe are introducing modelling artifacts as is indeed a common phenomenon whenusing process algebras for describing biological systems. Most importantly, forthe LDL to enter the EE we have to allow it into the CELL, which is bio-logically unsound. Thus, we must keep in mind, when interpreting the analysisresults, that the EE must both enter and leave, the latter corresponding to theinternalised early endosome shredding its clathrin coat, the CC in order to enterthe CELL.

When released in the CELL the EE is able to merge with the LE. This releasesthe LDL into the LE from where it may enter the XV . After the merge theXV is free to leave the LE with, or without, the LDL cargo.

Finally, the XV may merge with the LY SO, thus releasing the LDL cargo intoits final destination where it may be hydrolysed into CH.

3.4 CASE: Modelling Genetic Transcription

In a similar manner we shall also fashion an abstract model of genetic tran-scription (Section 2.1.2), which is the second of our case studies. Genetic tran-scription is primarily a coordination scenario. It involves both the formation ofcoordination compounds and recurrent behaviour; this is modelled by recursiveprocesses that coordinate within extruded scopes.

The resulting model has three components:

The genome is a collection of genes. Each gene repeatedly allows polymerase toattach at the promoter and then transcribe the gene by reading the nucleotides ofthe sequence one at a time. Hence, in the model we portray a gene as a recurrentprocess genei where each recurrence corresponds to a complete transcription andhas three phases:

Initiation The gene supports the formation of coordination compounds by

Page 70: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

48 Modelling in Process Calculus

gene1 =rec G1. (g1) (b1) (d1)tr!{g1}1 . g1!{b1}2 . g1!{a}3 .

( g1!{c}4 . ( g1!{a}5 . ( g1!{c}6 . b1!{d1}7 .G1+b1?{d11}8 .G1 )

+b1?{d11}9 .G1 )+b1?{d11}10 .G1 )

genome =gene1

ATP =rec A. ( a!{a}11 .0 A )

CTP =rec C. ( c!{c}12 .0 C )

GTP =rec G. ( g!{g}13 .0 G )

UTP =rec U. ( t!{t}14 .0 U )

nTP =( ATP CTP GTP UTP )

polymerase =rec P. tr?{gene}15 . gene?{b}16 . rec X. ( gene?{p}17 . ( b!{d}18 .P

+p?{dp}19 .X )+b?{dp}20 . ( 0 MRNA P ) )

(a) (c) (g) (t) (tr) (d)( genome nTP polymerase )

Table 3.6: Abstract model of genetic transcription.

offering scope extrusion of the private channels gi and bi on the publicchannel tr.

Elongation The gene supports a sequential ‘reading’ of its nucleotide sequenceby offering a sequence of corresponding communications on the channelgi. At each step the gene is prepared for the coordination compound tobe broken up via a message on the private channel bi.

Termination If all nucleotides are ‘read’ the gene breaks up the coordinationcompound by sending a message on the private channel b1.

Note that, in the concrete example, the genome is a collection of only a singlegene. As the BioAmbients syntax does not allow empty synchronising commu-nications we use the di as dummy messages.

The cell provides a potentially unbounded supply of nucleoside tri-phosphates

Page 71: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Modelling Genetic Transcription 49

that provides the nucleotides for the growing messenger RNA. In the modelwe capture this by replicating processes corresponding to each of the NTPsin question. Each replica offers just a single interaction on a public channelcorresponding to the specific nucleotide. In the concrete model the nTP issimply an unbounded collection of ATP, GTP, CTP, and UTP.

Polymerase is an enzyme that makes RNA copies of DNA templates. It repeat-edly attaches to the promoter region of a (not specific) gene and then synthesisesa corresponding RNA replica by processing one nucleotide at the time. Herewe model the enzyme as a recurrent process, polymerase, where each major re-currence, rec P. · · · , corresponds to a complete transcription and each minorrecurrence, rec X. · · · , corresponds to a single nucleotide. Thereby we achieve ageneric polymerase that will transcribe a gene of any length in three phases:

Initiation The polymerase forms a coordination compound with a gene byengaging in interaction on the public channel tr. Here it receives twoprivate channels gene and b that allows it to coordinate specifically withthe gene in question.

Elongation The recursive (sub-)process, rec X. · · ·, either handles the next nu-cleotide in two steps:

• First, the type of nucleotide is communicated from gene to poly-merase via the private channel gene.

• Then, if too long time passes before a corresponding NTP turnsup, the polymerase breaks the coordination compound by sendinga signal on the private channel b and then recurs to its initial state,(rec P. · · · ). Otherwise, the polymerase reacts with the NTP andrecurs to handle the next nucleotide (rec X. · · · ).

Termination Or the genetic sequence (ACAC in the concrete case) is fullytranscribed and the coordination compound is broken by a signal on b. Thepolymerase then recurs to its initial state (rec P. · · · ) while, at the sametime, releasing a piece of messenger RNA corresponding to the transcribedsequence.

Note that this model does not pay any attention to the cooperativity issues ofactivators, inhibitors, genes, and polymerases. These issues, that are particu-larly relevant to the initiation phase, have been treated in great detail by Kuttlerand Niehren [KN06] and also by Blossey, Cardelli, and Phillips [BCP06].

Also note that the model rests on the assumption that transcription may termi-nate prematurely. This type of behaviour, however, does not seem to be frequent

Page 72: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

50 Modelling in Process Calculus

in living systems; rather it seems to be connected with certain regulatory mech-anisms that we shall not treat in detail. For our purposes the assumption simplyresults in a model that is slightly more complex and, thus, constitutes a moreinteresting subject of analysis in later chapters.

3.5 Concluding Remarks

In this chapter we have introduced a variant of the BioAmbients calculus [Reg03,RPS+04] that incorporates general recursion in the manner of CCS [Mil80] andomits unrestricted choice in favour of guarded sum. The fundamental conceptsof the calculus were introduced in three stages: First the fragment of simplereactive processes, which is strongly related to Milner’s CCS [Mil80]. Thenthe fragment of complex forming processes, which is strongly related to theπ-calculus by Milner, Walker, and Parrow [MPW92, Mil99]. And, finally, com-partment forming processes in the form of the full BioAmbients calculus, whichadds elements of Cardelli and Gordon’s Mobile Ambients [CG00], but in thestyle of Levi and Sangiorgi’s Mobile Safe Ambients [LS03] where all progress isa result of binary reactions rather than unary actions. One may observe thatthe resulting three classes of process expression correspond to the three classesof biological abstract machines identified by Cardelli [Car05a]:

The protein machine is concerned with Biochemical Networks, and seems tobe adequately modelled by the notion of simple reactive processes. This isthe view taken by Calder et al, who use the Performance Evaluation ProcessAlgebra (PEPA), which is basically a stochastic extension of simple reactiveprocesses, for the modelling and analysis of various signalling pathways [CGH06,CGH05, CDGH06, CGHV06], and, especially, Danos and Krivine in their workon formal biology in CCS [DK03]. Danos et al have made a few attempts atcapturing special properties of the biological domain by addressing them at amore fundamental level: Reversible CCS aims to capture the concept of trulyreversible reactions [DK03, DK04]. The κ-calculus focuses on the notion ofcomplexation and its effect on the folding of proteins, in terms of exposure orhiding of binding sites, [DL03a, DL03b, DL04].

The Gene Machine is concerned with Gene Regulatory Networks, and seemsto be adequately modelled by the notion of complex forming processes. Thisview was pioneered by Regev et al [RSS01], who worked with Priami [PRSS01]in order to base their BioSPI stochastic simulation tool on Priami’s Stochasticversion of the π-calculus [Pri96] coupled with Gillespie type rate estimations[Gil77, Gil76]. Technically this line of work has prospered at the hands of Phillipsand Cardelli who have worked on graphical notations [PC05], simulation tools

Page 73: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concluding Remarks 51

[PC04], and modelling methodologies (Blossey et al) [BCP06]. The modellingefforts of Kuttler et al [KN06] have also led to technical extensions in the form ofconcurrent objects [Kut07]. Priami and Quaglia et al have fashioned β-binders[PQ04] – a calculus that combines features from the κ-calculus [DL04], thestochastic π-calculus [Pri96], and the ambient calculus [CG00]. in a mannerthat goes some way towards relinquishing the deterministic key-lock assumptionin favour of a more nuanced notion of affinity.

The Membrane Machine is concerned with Transport Networks, and seems tobe adequately modelled by the notion of membrane forming (BioAmbients) pro-cesses – again a view that was pioneered by Regev et al [Reg03, RPS+04]. Otheralternatives have emerged. Most notably Cardelli’s Brane calculus [Car05b]that quite accurately models biomembranes as oriented 2-dimensional surfacesof bubbles that fuse and split. This model was simplified by Danos et al [DP04],but features for the modelling of large molecules and complexes have only beenproposed quite recently [CP05]. A recent comparative analysis of BioAmbientsand Brane calculus concludes that, from a meta point of view the two calculi arevery similar [Ver07]. The β-binders [PQ04] also introduces a notion of boxedenclosures. These enclosures, however, do not nest and move, which renders themodelling of compartments less intuitive.

Page 74: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

52 Modelling in Process Calculus

Page 75: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 4

Static Analysis Techniques

“It is the mark of an educated mind to rest satisfied with the degreeof precision which the nature of the subject admits, and not to seekexactness where only an approximation is possible.”

— Aristotle

Most semantic properties of Turing complete languages, e.g., BioAmbients, thatcan express any computable function, are undecidable, i.e., not generally com-putable within finite time and memory. This is a consequence of Rice’s Theorem[Ric53], and the most well known example is Turing’s famous halting problem[Tur36].

Small-step operational semantics [Plo04] expresses the behaviour of programsin terms of transition systems. In case the behaviour is finite we may computethe corresponding transition system by exhaustive simulation and subsequentlycheck it for interesting properties. This is the approach of finite state ModelChecking [CGP00], which often works well for state spaces of moderate size,but becomes intractable for large state spaces and undecidable for infinite ones.

In contrast, the basic tenet of Static Analysis [NNH99] is that a loss in preci-sion often makes a property decidable, even for programs of infinite behaviour.Rather than computing the exact behaviour of a program, as defined by thesemantics, a static analysis aims to compute an acceptable approximation tothe behaviour directly from the static program code.

Traditionally, this type of analysis has been developed and used in the construc-tion of optimising compilers [ASU86], where static analysis estimates serve asthe basis of semantically safe program transformations. The design principles

Page 76: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

54 Static Analysis Techniques

Universe

Exact Answer Over−approximation Under−approximation Unacceptable

over−approximation under−approximationThe exact world unacceptable

Figure 4.1: The nature of approximation.

that give static analyses their strength all owe to this heritage: Exhaustive – Astatic analysis should apply to all programs. Infinite behaviour must be accept-able. Correct – A static analysis should be safe with respect to the semantics.As shown in (Fig. 4.1), approximations are safe only if strict. Implementable– A static analysis should admit efficient implementations. Static analyses aremeant to be invoked often. Useful – A static analysis should be useful. Thisgenerally requires a sensible compromise between the precision of the analysisand its decidability and computational tractability.

It is customary to distinguish between static Data Flow Analysis [Kil72], whereone tracks how data moves through a collection of atomic computations, andControl Flow Analysis [Shi88], where one tracks how the point of control tra-verses a program. Traditionally, data flow properties have been the primarystudy in the classic setting of imperative languages, and control flow propertiesthe primary study in the setting of functional languages. In process calculi,however, it is difficult to make a clear distinction between data and controlstructures. Thus, while it is often beneficial to approach process calculi withcontrol flow techniques [BDNN98, HJNN99, NNH02, NNPdR07, NNB04], weshall be applying both control and data flow techniques to BioAmbients.

Classically there are two approaches to the design of static analyses. On theone hand, the design is semantics directed if the specification is calculatedfrom a semantic specification; this is the approach of Abstract Interpretation[CC77, CC79]. On the other hand, the design is semantics based if the analysisinformation can be proved correct with respect to a semantic specification; thisis the approach of Monotone Frameworks [KU77], Flow Logic [NN02], and TypeSystems [Mil78]. As we shall see, the analyses specified in this dissertation areall semantics based.

The chapter is in four sections. We start in Section 4.1 by briefly reviewingbasic order theory. Complete lattices, monotone functions, fixed points, andMoore families are all key notions in static analysis. In Section 4.2 we go onto introduce the central concepts of Monotone Frameworks. This is a classicalconcept in static analysis, which originally emerged as a generalisation of DataFlow Analysis. In Section 4.3 we turn to the main concepts of Flow Logic. This

Page 77: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Order Theoretic Preliminaries 55

is a relative modern concept in static analysis, which originally emerged as ageneralisation of Control Flow Analysis. Finally, in Section 4.4, we make someconcluding remarks.

4.1 Order Theoretic Preliminaries

In preparation for the analysis sections we now give a cursory introduction tothe notion of complete lattices, For a very detailed treatment of the subject, see[DP02] or perhaps [Tay99], and for a more succinct introduction, see [NNH99].

Definition 4.1 (Partial order) A partial order (S,⊑) is a set S accompaniedby a binary relation ⊑ which is:

1. Reflexive: ∀s ∈ S : s ⊑ s

2. Transitive: ∀s, s′, s′′ ∈ S : s ⊑ s′ ∧ s′ ⊑ s′′ ⇒ s ⊑ s′′

3. Antisymmetric: ∀s, s′ ∈ S : s ⊑ s′ ∧ s′ ⊑ s ⇒ s = s′

Notation 4.2 When necessary, we shall use subscripts to disambiguate thedenotation of operators and e.g. write ⊑S rather than just ⊑. �

If S has an element s ∈ S such that ∀s′ ∈ S : s′ ⊒ s then this element is calledthe least element of S and is denoted ⊥. Analogously, the greatest element ofS is an element s ∈ S such that ∀s′ ∈ S : s′ ⊑ s and is denoted ⊤. Generalisingthis leads to the definition of upper bounds:

Definition 4.3 (Upper bound) For a partial order (S,⊑) and subset Y ⊆ Sthe element s ∈ S is an upper bound of Y iff

∀s′ ∈ Y : s′ ⊑ s

Definition 4.4 (Least upper bound) For a partial order (S,⊑) and subsetY ⊆ S the element s is a least upper bound (lub) of Y iff

1. s is an upper bound of Y , and

2. for every upper bound s′ of Y , s ⊑ s′.

Whenever Y ⊆ S has a lub we denote it by⊔

Y . �

The binary least upper bound of s, s′ ∈ S is written s ⊔ s′.

Page 78: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

56 Static Analysis Techniques

The converse notions of a lower bound and a greatest lower bound (glb) may bedefined similarly, and we write s ⊓ s′ for the binary glb of s, s′ ∈ S.

Definition 4.5 (Lattice) Any non-empty partial order (L,⊑) that has l ⊔ l′

and l ⊓ l′ for all l, l′ ∈ L, or equivalently⊔

Y andd

Y for all finite Y ⊆ L, is alattice.

Definition 4.6 (Complete lattice) Any non-empty partial order (L,⊑) thathas

Y andd

Y for all Y ⊆ L is a complete lattice. �

Note that, if (L,⊑) is a complete lattice, then ⊥ =⊔

∅ =d

L is the leastelement and ⊤ =

d∅ =

L is the greatest element. Furthermore, Definitions4.5 and 4.6 ensure that any finite lattice is complete.

Proposition 4.7 (Cartesian product) If (L1,⊑L1) and (L2,⊑L2

) are bothcomplete lattices then so is the Cartesian product

(L1 × L2,⊑L1×L2)

where the componentwise order is defined by

(l1, l2) ⊑L1×L2(l′1, l

′2) iff l1 ⊑L1

l′1 ∧ l2 ⊑L2l′2

Proof See e.g. [NNH99]. �

Proposition 4.8 (Function space) If S is any set and (L,⊑L) is a completelattice then so is the function space

(S → L,⊑S→L)

where the pointwise order is defined by

f ⊑S→L g iff ∀s ∈ S : f(s) ⊑L g(s)

Proof See, e.g., [NNH99]. �

Definition 4.9 (Moore family) A subset, Y , of a complete lattice, ⊆ (L,⊑),is a Moore family if it is closed under greatest lower bounds, i.e.,

∀Y ′ ⊆ Y :d

Y ′ ∈ Y�

Note that a Moore family always contains a least element,d

Y , and a greatestelement,

d∅ = ⊤L; thus it is never empty.

Definition 4.10 (Monotone function) A function, f : S1 → S2, betweenpartially ordered sets, (S1,⊑S1

) and (S2,⊑S2), is monotone if

Page 79: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Order Theoretic Preliminaries 57

∀s, s′ : s ⊑S1s′ ⇒ f(s) ⊑S2

f(s′)

Definition 4.11 (Chain) A subset, Y , of a partial order, (S,⊑), is a chain ifit is totally ordered, i.e.,

∀s, s′ ∈ Y : (s ⊑ s′) ∨ (s′ ⊑ s)

If Y is a finite subset then it constitutes a finite chain.

Definition 4.12 (Ascending chain condition) A partial order, (S,⊑), sat-isfies the ascending chain condition if any ascending chain, {s1, s2, . . . , sn, . . .} ⊆S, for which s1 ⊑ s2 ⊑ · · · ⊑ sn ⊑ · · · eventually stabilises, i.e.,

∃k ∈ N : sk = sk+1 = . . .�

Note, that this condition is trivially satisfied by any finite partial order.

Definition 4.13 (Fixed point) Consider a monotone function, f : L → L,on a complete lattice, (L,⊑). A fixed point of f is an element, l ∈ L, such thatf(l) = l. We write

Fix(f) = {l | f(l) = l}

for the set of such fixed points. �

Proposition 4.14 (Least fixed point) Any monotone function, f : L → L,on a complete lattice, (L,⊑), has a unique least fixed point, LFP(f), definedby:

LFP(f) =⊔

i≥0 f i(⊥)

for which f(LFP(f)) = LFP(f).

Proof See, e.g., [NNH99]. �

Clearly, if L satisfies the ascending chain condition, then LFP(f) can be com-puted in a finite number of iterations.

Page 80: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

58 Static Analysis Techniques

4.2 Monotone Frameworks

Data Flow Analysis problems emerge when one tries to compute how data movesthrough a collection of atomic computations. Monotone Frameworks constitutea generalised approach to solving such problems. Originally invented as thebasis of safe transformations within optimising compilers they date back to thework of Kam and Ullman [KU77], who generalised Kildall’s lattice theoreticapproach to Data Flow Analysis [Kil72]. For a thorough introduction to thesubject see Aho, Sethi, and Ullman [ASU86] or Nielson, Nielson, and Hankin[NNH99].

Basic Concepts. Originally the notion of global Data Flow Analysis problemsarose in the context of imperative languages. In this context it is natural to thinkof programs, P , as flow-graphs where the nodes denote so-called elementaryblocks and the edges denote flow of control. Generally, the notion of elementaryblocks depends on the language at hand. However, we shall think of them simplyas the least pieces of code that are of interest to the analysis problem and hasone entry point, one exit point, and no internal looping.

Example 4.15 Consider the following minimal imperative language:

x ∈ V variablesn ∈ C constantsℓ ∈ Lab labels

a ::= x | n | · · · Arithmetic expressions

b ::= t | f | · · · Boolean expressions

S ::= Program Statements

[x:=a]ℓ assignment statement| S1; S2 sequential statement

| while [b]ℓ do S loop statement

where the labelled entities [B]ℓ are the elementary blocks. In this context the factorial

program

Sfact = [y := x]1; [z := 1]2; while [y > 1]3 do ([z := z ∗ y]4; [y := y − 1]5); [y := 0]6

may be thought of as the flow-graph

[y := x]1 [z := 1]2 [y > 1]3yes

no

[z := z ∗ y]4 [y := y − 1]5

[y := 0]6

Page 81: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Monotone Frameworks 59

The aim of a Monotone Framework is to approximate some interesting propertyof the data flow of a given program. For this purpose it is useful to think ofthe program as a transition system. The states correspond to the data flowinformation available at the entry points of elementary blocks. The transitionscorrespond to the effects of the elementary blocks on the information. In thissetting it is the information of the states that we seek to approximate.

Example 4.16 Consider Sfact of Example 4.15. In terms of data flow this program

might be seen as a non-deterministic transition system

q11 q2

2 q33

3

q44 q5

5

q66

where q1 is the initial state and each edge corresponds to the effect of the elementary

block of the same label. �

Property Space. In order to reason about this transition system we need arepresentation of states. The corresponding universe of discourse is called aproperty space.

It is customary to demand that the property space is a complete lattice

(L,⊑)

that satisfies the ascending chain condition. It should be designed such thatstates of the transition system of interest correspond to elements of L. Further-more, the least upper bound operation

associated with L must be suitable forsafely combining the effects of transitions whenever multiple transitions enterthe same state. The ascending chain condition helps to ensure that the DataFlow approximation may be computed in a finite manner.

Transfer Function Space. We also need a representation of the effect of tran-sitions. The corresponding universe of discourse is called a transfer functionspace.

Here it is customary to demand that the transfer function space is a set of func-tions

F = {f : L → L | f is monotonic}

that contains the identity function id and is closed under composition. It shouldbe designed such that the transitions of the transition system of interest corre-

Page 82: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

60 Static Analysis Techniques

spond to elements of F . The monotonicity requirement helps to ensure that theData Flow approximation may be computed in a finite manner.

Monotone Framework. These two ingredients formalise a given Data Flow Anal-ysis problem as a Monotone Framework.

It is normal to refer to the Monotone Framework of some global Data FlowAnalysis problem, A, as a triple

A = (L,⊔

L, F )

consisting of a suitable property space, L, the corresponding least upper boundoperation,

L, and a suitable transfer function space, F .

If all functions f in F are required to be distributive, i.e., f(l1⊔l2) = f(l1)⊔f(l2),the framework specialises to a distributive framework. This concept is somewhatstronger as it often allows for more efficient algorithms [NNH99].

If furthermore the property space is restricted to

L = (P(D),⊑)

for some finite set D and ⊑=⊆ or ⊑=⊇, then the transfer function space caninvariably be restricted to

F = {f : L → L | ∃lk, lg ∈ L : f(l) = (l \ lk) ∪ lg}

and the framework further specialises to a bitvector framework.

Example 4.17 Consider the global Data Flow Analysis problem of Reaching Defi-nitions:

RD: What definitions (assignments) may reach what basic blocks?

The goal of the corresponding analysis is to determine, for every program point, theorigin of every variable assignment that might be in effect when the point is reached.

The property space of interest, LRD, is the set of functions from finite subsets ofV to Lab?, where Lab? is the infinite set of labels Lab extended with ? (meaning‘undefined’), i.e. LRD ⊂ P(V×Lab?). This is a complete lattice where

F

=S

, ⊑=⊆,⊥ = ∅.

In order to define the corresponding transfer function space we first define two helpfulfunctions. The generate function

genRD[x := a]ℓ = {(x, ℓ)}

genRD[b]ℓ = ∅

Page 83: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Monotone Frameworks 61

defines how elementary blocks contribute new information, and the kill function

killRD[x := a]ℓ = {(x, ?)} ∪ {(x, ℓ′) | (x, ℓ′) ∈ l}

killRD[b]ℓ = ∅

defines how elementary blocks invalidate old information. Now, let l ∈ L. Then

1. if x ∈ V and ℓ ∈ Lab then (λl.(l \ killRD[x := a]ℓ) ∪ genRD[x := a]ℓ) ∈ FRD,

2. if ℓ ∈ Lab then (λl.(l \ killRD[b]ℓ) ∪ genRD[b]ℓ) ∈ FRD,

3. (λl.l) ∈ FRD, and

4. if f1, f2 ∈ F then (f1 ◦ f2) ∈ FRD.

Instances. When a general Monotone Framework is confronted with a par-ticular program, P⋆, of interest, a Data Flow Analysis emerges as a concreteinstance of the framework.

Such an instance comprises a number of elements:

• The property space (L,⊔

)

• The transfer function space F

• A finite transition system (Q, q⋆, δ) where Q is the set of program pointsin P⋆. δ is a labelled transition relation representing the flow-graph of P⋆,and q⋆ is the (set of) distinguished entry point(s).

• An extremal value ι, to be associated with the entry point(s).

• A mapping f : Lab⋆ → F

where Lab⋆ is the set of labels actually occuring in P⋆.

It also gives rise to a set of constraints, A⊑, defined by:

A[qt] ⊒⊔

{fℓ(A[qs]) | (qs, ℓ, qt) ∈ δ} ⊔ ιqtq⋆

where

ιqtq⋆

=

{

ι if qt ∈ q⋆

⊥ otherwise

Page 84: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

62 Static Analysis Techniques

Example 4.18 Consider Sfact of Example 4.15. It yields the following instance ofRD:

L = P(Vfact × Lab?fact) ⊂ P(V × Lab

?)

F is as FRD but restricted to Vfact and Labfact

q⋆ = {q1}

Q = {q1, · · · , q6}

δ = {(q1, 1, q2), (q2, 2, q3), (q3, 3, q4), (q3, 3, q6), (q4, 4, q5), (q5, 5, q3)}

fℓ(l) = (l \ killl[B]ℓ) ∪ genl[B]ℓ where [B]ℓ is an elementary block of Sfact

ι = {(x, ?) | x ∈ Vfact}

RD⊑ = {ι ⊑ RD[q1], (f2(RD[q2]) ⊔ f5(RD[q5])) ⊑ RD[q3],

f1(RD[q1]) ⊑ RD[q2], f3(RD[q3]) ⊑ RD[q4],

f3(RD[q3]) ⊑ RD[q6], f4(RD[q4]) ⊑ RD[q5]}

Note that P(Vfact × Lab?fact) is finite, in contrast to P(V × Lab?), and hence the

instance satisfies the ascending chain condition. �

Remark 4.19 Note, that in the context of an imperative language the transi-tion system (Q, q⋆, δ) is a priori fixed by the flow-graph, and only a solution,A⋆ ∈ (Lab⋆ → L), to A⊑remains to be computed. As we shall see in Chapter8, this is not necessarily the case for process calculi. �

Worklist Algorithm. The constraint system, A⊑(P ), derived from P con-stitutes a declarative specification of the solution space. We are interested inthe most informative solution, A⋆, which emerges as the least fixed point of theconstraint system. In the terminology of Monotone Frameworks this is calledthe Meet Over all Paths (MOP) and is sometimes hard to compute. In thecase of Distributive Frameworks, however, it coincides with the so-called Maxi-mal Fixed Point (MFP), which is computed by the worklist algorithm shown inTable 4.1.

Example 4.20 For Sfact the following result emerges as the MFP solution to RD⊑,and we say that RDfact |= RD⊑(Sfact):

RDfact[q1] = {(x, ?), (y, ?), (z, ?)}

RDfact[q2] = {(x, ?), (y, 1), (z, ?)}

RDfact[q3] = {(x, ?), (y, 1), (z, 2), (y, 5), (z, 4)}

RDfact[q4] = {(x, ?), (y, 1), (z, 2), (y, 5), (z, 4)}

RDfact[q5] = {(x, ?), (y, 1), (y, 5), (z, 4)}

RDfact[q6] = {(x, ?), (y, 1), (z, 2), (y, 5), (z, 4)}

Page 85: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Monotone Frameworks 63

input : Instance ((L,F ), (Q, q⋆, δ), ι, f )

output : Least solution A⋆ such that A⋆ |= A⊑

method : Step 1: Initialisation of W and A

W = q⋆;for all q in Q do

if q ∈ q⋆

then A[q] := ιelse A[q] := ⊥

Step 2: Iteration (updating W and A)while W 6= ∅ do

select qs from W; W := W \ {qs};for each (qs, ℓ, qt) in δ do

let l := fℓ(A[qs]) inif l 6⊑ A[qt]then (A[qt] := A[qt] ⊔ l; W := W ∪ {qt})

Step 3: Presenting the resultA⋆ := A

Table 4.1: Maximal Fixed Point algorithm for Monotone Frameworks.

Desirable Properties. A number of properties are desirable for a MonotoneFramework; hence their presence is subject to proof.

Termination. The analysis must be exhaustive and compute a result for all pro-grams. We prove this property by showing that the worklist algorithm alwaysterminates. Here, this is ensured by the ascending chain property and mono-tonicity, but in Chapter 8, where we define the so-called Pathway Analysis, weshall have to prove termination.

Preservation of Solutions. The solutions produced by the worklist algorithmmust be preserved under the semantics. This can be proved by showing a resultof the following kind:

If A |= A⊑(P ) and P −→⋆Q then A |= A⊑(Q)

Intuitively, this expresses that analysis results remain valid as execution com-mences, and, as we shall see, this is one of the most fundamental properties ofa static analysis.

Correctness. The solution, however, must also correctly capture the property ofinterest.

Page 86: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

64 Static Analysis Techniques

This might be a property related to, e.g., the memory state of semantic con-figurations 〈P, s〉, in the manner of structural operational semantics [Plo04]. Inthis case it is customary to define relation R ⊆ (Lab×L)×State that formallycaptures the notion of correctness, i.e.,

If A |= A⊑(P ), AR s, and 〈P, s〉 −→⋆〈Q, s′〉 then A R s′

Alternatively, as is more common for process calculi equipped with a reductionstyle semantics, it can be a property related to the sequence of events that leadto a certain configuration P . In this case one often relies on a collecting seman-tics, i.e., an instrumented version that keeps track of additional information thatpertains to the analysis. Thus, all configurations include a trace, tr ∈ Trace, ofsuch information, and we define a correctness relation, R ⊆ (Lab×L)×Trace.Correctness of the analysis then follows from:

If A |= A⊑(P ), AR tr, and 〈P, tr〉 −→⋆〈Q, tr′〉 then A R tr′

Remark 4.21 Note that the collecting semantics of BioAmbients collects such

traces, in the sense that 〈P⋆, ε〉, and 〈P⋆, ε〉L

−→⋆〈Q, L〉. We shall use this laterwhen formulating our correctness results. �

4.3 Flow Logic

Control Flow Analysis problems emerge when one tries to compute how focus ofcontrol moves through a program. The initial conceptualisation of static ControlFlow Analysis dates back to work on interprocedural analysis and obtainedmomentum with Shivers’ work on functional languages [Shi88], and was furtherrefined by Jagannathan [JW95]. In this context, and based on the view that, interms of flow, data and control are two sides of the same thing [NNH99], FlowLogic was pioneered in the late 1990’s, by Nielson and Nielson [NN98, NN97,NN02] as a unifying specification oriented approach to constraint based staticanalysis.

Basic Concepts. The Flow Logic framework makes a clear distinction be-tween the specification of an analysis and the computation of correspondinganalysis results. This approach allows the designer to focus on the specificationof analyses without making compromises dictated by implementation consid-erations. The implementation phase is also simplified and improved, as theimplementor is always free to choose the best available tool — no particular

Page 87: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Flow Logic 65

tool or formalism is prescribed by the framework.

Analysis Domain. As for Monotone Frameworks, a Flow Logic specification isbased on a suitable universe of discourse, L. Here it is customary to follow theapproach of Monotone Frameworks and demand that L is a complete lattice.However, the domain of a Flow Logic specification should be designed such thatelements correspond to global Control Flow Analysis information of the programof interest. This is in contrast to the property space of a monotone framework,which is designed such that the elements correspond to the Data Flow Analysisinformation of individual program points (or states if you will) of the programof interest,

Acceptability Judgement. Fundamentally, a Flow Logic specification is concernedwith the relationship between programs P ∈ lang and static analysis estimatesA ∈ L. This connection is captured by an acceptability judgement

A |= P

intended to hold precisely when A constitutes an acceptable analysis estimatefor P .

The judgement is defined by clauses; typically there is one clause for each syn-tactic construct φ of lang and they take the form

A |= φ(· · ·Pi · · · ) iff(some formula ϕ with A |= P ′

for various sub-programs P ′)

where ϕ is usually a formula in a suitable fragment of First Order Logic, fol.

Flow Logic. These ingredients formalise a given Control Flow Analysis problemas a Flow Logic. Obviously the involved judgement relation “|=” has function-ality

|=: (L × lang) → {true, false}

Generally, as we do not demand the specification to be syntax directed, i.e., foreach of the defining clauses that each P ′ occurring in ϕ is one of the Pi occurringin φ(· · ·Pi · · · ), we have to define the meaning of “|=” by co-induction ratherthan induction.

Thus, when formally assigning meaning to a Flow Logic specification, we do soby regarding it as defining a functional

Q : ((L × lang) → {true, false}) → ((L × lang) → {true, false})

the greatest fixed point of which we take as the formal meaning.

However, when the specification is syntax directed the least and greatest fixedpoints coincide, and hence “|=” is adequately defined by ordinary induction.

Page 88: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

66 Static Analysis Techniques

In the terminology of Flow Logic, and by analogy to Structural OperationalSemantics [Plo04], we call such specifications compositional. Specifications thatare not syntax directed we call abstract.

Desirable Properties. For a setup like this to qualify as a Flow Logic it mustposses a number of desirable properties that we shall elaborate in the following:

Well-definedness. The analysis must be well-defined, i.e., for every combinationof P ∈ lang and A ∈ L, the acceptability of A as an analysis estimate for P isunambiguously defined. This amounts to showing that “|=”, with functionalityas outlined above, constitutes a total function. In the case of a compositionalspecification this is immediate to show by structural induction in P . In the caseof an abstract specification, however, the judgement is well-defined only if Qconstitutes a monotone functional over the complete lattice

((L × lang) → {true, false},⊑)

where the ordering ⊑ is given by:

Q1 ⊑ Q2 iff ∀(A, P ) : (Q1(A, P ) = true) ⇒ (Q2(A, P ) = true)

When this is the case the existence of a suitable greatest fixed point is guaranteedby Tarski’s fixed point theorem.

Semantic Correctness. Intuitively, an analysis is semantically correct if analysisestimates that are acceptable for a process P contain information about everypossible evolution of P that is allowed by the semantics. As Flow Logic is asemantics based approach to static analysis this notion of correctness is usuallycaptured by a subject reduction result:

if A |= P and P −→ Q then A |= Q

which expresses that the acceptability of analysis estimates is preserved by thereaction relation. That is, if A is an acceptable estimate for P , and the seman-tics allows P to evolve into Q, then A is also acceptable for Q. This relies onan auxiliary result for structural congruence:

if P ≡ Q then A |= P iff A |= Q

asserting that the acceptability of analysis estimates is invariant under the struc-tural congruence.

Remark 4.22 Due to the directedness of ⇛ the latter becomes

if A |= P and P ⇛ Q then A |= Q

in our setup. �

Clearly, full semantic correctness, sometimes called semantic soundness followsby the transitive closure of the subject reduction result, i.e.,

Page 89: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Flow Logic 67

if A |= P and P −→⋆Q then A |= Q

Occasionally, it is convenient to use a collecting semantics and formalise therelationship, R ⊆ L×Trace, between analysis estimates and traces in order toformulate correctness:

if A |= P,AR tr, and 〈P, tr〉 −→⋆〈Q, tr′〉, then A |= Q and AR tr′

Moore Family Property. Finally, it is desirable for every program, P , to actuallyhave an acceptable analysis estimate and, indeed, a unique least such. This isthe case if:

{A | A |= P} constitutes a Moore Family for all P

Recall that, trivially, a Moore family is never empty as it always contains agreatest element

d∅ = ⊤L, which is the trivial worst (i.e., least informative)

acceptable analysis estimate. In contrast it is also guaranteed to have a leastelement A⋆ =

d{A | A |= P}, which is the least admissible result under the

ordering ⊑L of L and, hence, the best (most informative) acceptable estimate.

Implementation. The implementation of a Flow Logic specification is en-abled by a simple change of viewpoint: Intuitively, the acceptability judgement

A |= P iff ϕ

associates each program, P , with a formula, ϕ, such that an analysis estimate,A, is acceptable for P if, and only if, A constitutes a model of ϕ. Thus, as longas we are able to compute the appropriate ϕ the remaining task of finding asuitable model, A, can be left to an auxiliary logical solver.

In the presence of an abstract Flow Logic specification this idea clearly breaksdown because ϕ is not necessarily finite for every given P . In the case of acompositional Flow Logic specification, however, the finiteness of ϕ follows di-rectly from the finiteness of the syntactic representation of P and the inductivenature of the specification. If, furthermore, the specification is well-defined thenit defines a total function from programs P ∈ lang to formulae ϕ ∈ fol. Andwhen this is the case we shall refer to the Flow Logic specification of some globalControl Flow Analysis problem, A, as a binary predicate

A = FL(lang, fol).

The ALFP Logic. Clearly, the theoretical as well as practical properties of an

Page 90: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

68 Static Analysis Techniques

term ::= c | x | f(term1, . . . , termk)

pre ::= R(term1, . . . , termk) | ¬R(term1, . . . , termk)| term1 = term2 | term1 6= term2

| pre1 ∧ pre2 | pre1 ∨ pre2 | ∀x : pre | ∃x : pre

clause ::= R(x1, . . . , xk) | 1 | clause1 ∧ clause2

| pre =⇒ clause | ∀x : clause

Table 4.2: Syntax of the Alternation-free Least Fixed Point Logic.

analysis implementation depend on the formulae, ϕ, derived from programs, P ,as well as the solver used for the computation of models.

In order to ensure nice properties, the Flow Logic specifications presented inthis dissertation are expressed using an extension of Horn clauses, known asAlternation-free Least Fixed Point Logic (alfp) [NNS02]. This logic is presentedin Table 4.2, where we write x for variables, c for constants, f for functionsymbols, R for predicates, term for terms, pre for preconditions, and clause forclauses.

The clauses are interpreted over a universe U of ground terms. The semanticsis given in terms of satisfaction relations

(ρ, σ) |= pre and (ρ, σ) |= clause

that are defined in the standard way. Thus, ρ is an interpretation of predicatesymbols, σ is an interpretation of terms, and the clause, e.g., R(x1, . . . , xk isinterpreted according to the following rule:

(ρ, σ) |= R(x1, . . . , xk) iff (σx1, . . . , σx2) ∈ ρR

The logic has the nice feature that clauses with no free variables satisfy themodel intersection property, i.e., given an interpretation σ0 of the free variablesin a clause cl, the set of interpretations {ρ | (ρ, σ0) |= cl} constitutes a Moorefamily [NNS02].

Note that in the presence of negation in preconditions the solvability of clausesis subject to a notion of stratification, which is not relevant, however, in thecontext of this dissertation.

As a matter of convenience we shall often use other mathematical notation towrite certain alfp constructs:

Page 91: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Flow Logic 69

INPUT : a Flow Logic FL(lang,alfp) and

a program P.

OUTPUT : an alfp formula ϕ such that A |= ϕ ⇔ A |= P .

METHOD : Set ϕ := A |= P

while ϕ contains A |= P ′

and there is a rule α iff β in FL

and a substitution θ

such that θα = A |= P ′

do replace A |= P ′ with θβ in ϕ.

Table 4.3: Constraint generation algorithm for Flow Logics.

• The predicates of alfp represent relations. For a such a relation R weshall often write (x1, . . . , xk) ∈ R to denote the alfp atom R(x1, . . . , xk).

• When testing for intersection we often write R(x1, . . . , xi)∩R(y1, . . . , yi) 6=∅ to denote ∃zj , . . . , zk : R(x1, . . . , xi, zj , . . . , zk)∧R(y1, . . . , yi, zj , . . . , zk).

• When expressing subset relationships we write R(x1, . . . , xi) ⊆ R(y1, . . . , yi)to denote ∀zj , . . . , zk : R(x1, . . . , xi, zj , . . . , zk) ⇒ R(y1, . . . , yi, zj , . . . , zk).

Clause Generation. Algorithmically we capture the required change of view-point in a clause generator that works largely as the chaotically iterative al-gorithm outlined in Table 4.3. This procedure takes as input a Flow LogicFL(lang,alfp) and a program P . And, from this, it produces an alfp clause,the models of which are exactly the acceptable analysis estimates.

The best (i.e., least under the ordering of L) acceptable analysis estimate A⋆ forP⋆ is the minimal model satisfying the generated clause. The model intersectionproperty of alfp guarantees that this can be unambiguously computed using afixed point engine like those of The Succinct Solver Suite [NNS+04].

Remark 4.23 Generally analyses implemented using the succinct solver ex-hibit an asymptotic worst-case complexity that is of the order of O(nk+1), wheren is the size of the term universe, which is fixed for any given process, and k isthe maximal nesting depth of quantifiers in the generated clause.

Page 92: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

70 Static Analysis Techniques

4.4 Concluding Remarks

In this chapter we have introduced two common approaches to Static ProgramAnalysis. These approaches originate from the analytic needs of two very dif-ferent programming paradigms and thus differ in some respects.

Classically, Monotone Frameworks constitute a generalised approach to the DataFlow Analysis problems of imperative languages – a setting where programs arenormally considered as flow-graphs. However, we have deliberately presentedthe approach in a setting where programs are considered as labelled transitionsystems. This sets the scene for Chapter 8, where we shall use the approachto approximate the temporal structure, i.e., the underlying transition systemsdefined by the semantics, of BioAmbients models.

Similarly, Flow Logic constitutes a generalised approach to the Control FlowAnalysis problems of functional languages – a setting where continuations aredata and the flow of control is predominantly decided at run-time, Here it hastraditionally been used to approximate the control flow structures of programsin the context of dynamic dispatch. In Chapters 6 and 7 we shall similarly useFlow Logic to approximate the spatial structure, i.e., the set of ambient inducednesting hierarchies, that can arise dynamically from a BioAmbients model.

Throughout the technical developments we shall consistently take the approachthat is inherent in semantics based program analysis: In order to study proper-ties of a program P⋆ we analyse the static snapshot given by the direct syntac-tical representation of P⋆. However, we carefully define the analyses such thatthe properties are “⊒-preserved” under heating and reaction and use this to es-tablish the semantic soundness. In this manner we ensure that the information

learnt from the study of P⋆ in P⋆L

−→⋆Q is also valid for Q.

Page 93: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Part II

Analysing for Structural

Properties

Page 94: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics
Page 95: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 5

Well-formed Programs and

Their Properties

“Well-typed programs don’t go wrong.”— Robin Milner

This chapter introduces a notion of well-formed programs defined by a staticwell-formedness condition in the manner of type systems [NNH99]. The presen-tation largely covers material that is also covered in [PNN07].

When defining the syntax and semantics of BioAmbients in Chapter 3, we de-liberately postponed the formal definition of several central notions, such asthe free names and identifiers of processes, identifier substitution, and namesubstitution. This we did because many of these notions are complicated bythe presence of general recursion. In particular, it is important to enforce staticscope if α-equivalence is included in the structural congruence (heating) relation[PV05]. There are three viable approaches to this:

One option is to define recursion via a notion of parametrised process constants

A(~p) , P

where fn(P ) ⊆ {~p} [Mil99]. However, this immediately turns seemingly simpleproperties, such as free names, into fixpoint properties [NNP07].

Another option is to define α-equivalence and α-renaming for all entities thatrisk capture, i.e., constants, variables, and process identifiers and then dynami-cally enforce static scope. However, this results in quite complicated notions of

Page 96: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

74 Well-formed Programs and Their Properties

C ⊢ΓfnP

fnΓfn(P ) fpi(P )

Figure 5.1: The hierarchy of notions that defines well-formedness.

substitution that rely heavily on α-renaming and, hence, are computationallyquite involved [PV05].

The last option is to statically enforce a well-formedness condition that ensuresstatic scope at run-time. This is the option we shall pursue in the following and,as we shall see, this choice simplifies the notions of substitution.

As always, the type of well-formed programs must be preserved by the semanticsand a complication arises in this context:

The decision procedure, C ⊢ P , associated with well-formedness turns out torequire knowledge about the free names of arbitrary sub-processes. This knowl-edge can only be specified in a satisfactory manner if environments are usedto associate the free process identifiers with appropriate information. Thus,we shall specify well-formedness in terms of notions that are parametrised onenvironments as shown in Fig. 5.1.

The remainder of the chapter presents the required developments in more de-tail. In Section 5.1 we define the notions of free names and identifiers. Then, inSection 5.2, we specify the class of well-formed programs that is going to be thefocus of later developments. In 5.3 we define the notions of name and identifiersubstitution for well-formed programs. In Section 5.4 we show a subject reduc-tion result expressing that well-formedness is preserved by reaction. Finally, inSection 5.5, we conclude.

5.1 Free Names and Identifiers

In the following we shall formalise the free names of a process. In doing so weshall rely on the auxiliary definition of free and bound names of capabilities,which is shown in Table 5.1.

Page 97: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Free Names and Identifiers 75

M enter x exit x merge– x x!{y} x#!{y} x !{y} x !{y}bn(M) ∅ ∅ ∅ ∅ ∅ ∅ ∅fn(M) {x} {x} {x} {x, y} {x, y} {x, y} {x, y}

M accept x expel x merge+ x x?{p} x#?{p} x ?{p} x ?{p}bn(M) ∅ ∅ ∅ {p} {p} {p} {p}fn(M) {x} {x} {x} {x} {x} {x} {x}

Table 5.1: Bound names, bn(M), and free names, fn(M), of capabilities M .

fnΓfn((n)P ) = fnΓfn

(P ) \ {n}

fnΓfn( P µ) = fnΓfn

(P )

fnΓfn(P Q) = fnΓfn

(P ) ∪ fnΓfn(Q)

fnΓfn(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

(fn(Mi) ∪ fnΓfn(Pi) \ bn(Mi))

fnΓfn(rec X. P ) = fnΓfn [X 7→∅](P )

fnΓfn(X) = Γfn(X)

Table 5.2: Free names, fnΓfn(P ), of processes, P .

Free names. As customary we specify the set of free names, fnΓfn(P ) ∈

P(Name), of a process, P , in terms of a recursive function that is defined asshown in Table 5.2.

The function is parametrised on an environment, Γfn : Pid → P(Name), becauselater, when defining higher level notions such as well-formedness, we shall some-times use it to specify the free names of sub-expressions where process identifiersoccur free, e.g., a sub-process of the P in rec X. P . In this case the higher levelnotion can only be correctly defined if Γfn is used to associate each free processidentifier, such as X, with the set of names occurring free in the correspondingprocess body, P .

The definition is pretty straightforward. The interesting cases are those of re-striction, (n)P , where the bound name n is subtracted from the set of free namessynthesised from the sub-process P , and summation,

i∈I M ℓi

i . Pi, where thefree names are collected from all capabilities Mi and bound names are sub-tracted, as appropriate, from the sets synthesised from the subterms Pi.

In the case of recursive processes, rec X. P , the corresponding bound processidentifier, X, is wiped from the environment. The associated information is no

Page 98: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

76 Well-formed Programs and Their Properties

fpi((n)P ) = fpi(P )

fpi( P µ) = fpi(P )

fpi(P Q) = fpi(P ) ∪ fpi(Q)

fpi(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

fpi(Pi)

fpi(rec X. P ) = fpi(P ) \ {X}

fpi(X) = {X}

Table 5.3: The free process identifiers, fpi(P ), of processes, P .

longer needed because the present invocation of fnΓfnhas access to the full body,

P , of the recursive process. The latter suffices because a) we are computing aset rather than a multiset, and b) every syntactic entity of interest must occurin P in order to occur in rec X. P .

Clearly, if the process P of interest is identifier closed then the contents of Γfn

is irrelevant, i.e., fnΓfn(P ) = fn[ ](P ) for any Γfn. This lead to the following

convention:

Convention 5.1 When P is known to be an identifier closed process it sufficesto write fn(P ) to denote the free names of P .

Free process identifiers. We now turn to the set of free process identifiers,fpi(P ), of P , formally defined in Table 5.3. In this case no environment isrequired because recursive processes are ’self-closed’, i.e., the substitution of aprocess identifier always results in an identifier closed process.

The only interesting cases are those of the recursive process, rec X. P , and theprocess identifier, X. In the former case X is subtracted from the set of freeprocess identifiers synthesised from the sub-process P because it is bound. Inthe later case X is added to the set of free process identifiers.

5.2 Well-formed and Initial Programs.

The BioAmbients language is very expressive, but we do not consider all pro-cesses equally acceptable. Some are unacceptable because they may cause nameor identifier capture; these are semantically ill-behaved. Some are unacceptable

Page 99: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Well-formed and Initial Programs. 77

wf-resC ⊢Γfn

P

C ⊢Γfn(n)P

if ⌊n⌋ ∈ C

wf-ambC ⊢Γfn

P

C ⊢ΓfnP µ if fpi(P ) = ∅

wf-parC ⊢Γfn

P C ⊢ΓfnP

C ⊢ΓfnP Q

wf-sum∀i ∈ I : C ⊢Γfn

Pi

C ⊢Γfn

i∈I

M ℓi

i . Pi

if∀i ∈ I :(⌊bn(Mi)⌋ ∩ C) = ∅

wf-recC ⊢Γfn

′ P

C ⊢Γfnrec X. P

if X ∈ fpi(P ) ∧ Γfn′ = Γfn[X 7→ fnΓfn [X 7→∅](P )] ∧ nocapΓfn

wf-pid C ⊢ΓfnX

where we write nocapΓfnfor

(∀(M ℓ . P ′) � P : X ∈ fpi(P ′) ⇒ ⌊bn(M)⌋ ∩ ⌊fnΓfn(rec X. P )⌋ = ∅)

Table 5.4: Well-formedness, C ⊢ΓfnP , of a process, P , with respect to a set of

constants, C.

because they do not follow the conventions that ensure tractability of processname spaces; these are hard to analyse statically.

Well-formed Processes. We are now going to formally define the set ofBioAmbients processes that we find acceptable. We shall say that a process Pis well-formed with respect to C if it satisfies the well-formedness predicate C ⊢Γfn

P , defined by the inference system of Table 5.4. The notion of well-formednesscaptured by this definition encapsulates a number of important choices:

We require that processes observe the implicit typing requirements imposed bythe distinction between constants (in C) and variables (in V) in Name. Therule wf-res ensures that names bound by restrictions are indeed in C. Therule wf-sum ensures that names bound by prefixes are not in C.

We disallow infinite nesting of ambients. Technically, this kind of behaviour ishard to handle in a static analysis [NNP04] and, since it has no biological rele-vance either, we omit it from well-formed processes. The rule wf-amb ensures

Page 100: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

78 Well-formed Programs and Their Properties

this by demanding that the P in P µ is process identifier closed. A similarchoice was made for Mobile Safe Ambients in [LS03].

We disallow useless recursive definitions. When the body P of a recursive defini-tion rec X. P does not make direct use of the associated process identifier X, i.e.,X /∈ fpi(P ), then the recursive definition is useless. The rule wf-rec handlesthis by demanding that X ∈ fpi(P ).

Finally, we intend processes to have static scope, and hence we have to ensurethat identifier and name capture cannot occur. In the case of process identifiersthis is ensured by the fact that only the top-most recursive process can be un-folded. In the case of constants we rely on disciplined α-conversion in order toprevent capture. In the case of variables, however, we use the well-formednesscondition to ensure that a process, rec X. P , can only recur through a namebinder, e.g., P = · · · . n?{p}ℓ . · · · . X, if P has no free occurrence of the boundname, i.e., p. This is ruled out by the nocap part of rule wf-rec and preventssituations such as

n?{p}ℓ . · · · . rec X. · · · p · · · . n?{p}ℓ . · · · . X,

and

n?{p}ℓ . · · · . rec Y. · · · p · · · . rec X. · · ·Y n?{p}ℓ . · · · . X.

Convention 5.2 When P is known to be identifier closed it suffices to writeC ⊢ P , rather than C ⊢Γfn

P , for the well-formedness of P with respect to C. �

Programs. The well-formedness predicate gives us the means to distinguishwell-behaved processes. This allows us to formally define the notion of pro-grams – the processes P⋆ that satisfy the predicate PRGC(P⋆) defined as theconjunction of the following conditions:

• P⋆ is process identifier closed: fpi(P⋆) = ∅.

• P⋆ has free names only from the constants: ⌊fn(P⋆)⌋ ⊆ C.

• P⋆ is well-formed with respect to the constants: C ⊢ P⋆.

Convention 5.3 In the following we shall assume that all process expressionsconsidered are part of a (well-formed) program. �

Remark 5.4 (Initial Programs) As a matter of convenience we shall alsoassume that programs, P⋆, subjected to static analysis, are initial in the sensethat they satisfy P⋆ = ⌊P⋆⌋, where ⌊P ⌋ is the process that is as P except thatevery name x is replaced by the corresponding canonical name ⌊x⌋.

Page 101: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Substitution of Identifiers and Names. 79

((n)P )[Q/X ] =

(n)P [Q/X ] if n /∈ fn(Q)

(n′)P [n′

/n][Q/X ] otherwise

where n′ /∈ (fn(Q)∪ fn(P )) ∧ ⌊n′⌋ = ⌊n⌋

( P µ)[Q/X ] = P µ

(P1 P2)[Q/X ] = P1[

Q/X ] P2[Q/X ]

(∑

i∈I

M ℓi

i . Pi)[Q/X ] =

i∈I

M ℓi

i . Pi[Q/X ]

(rec Y. P )[Q/X ] =

{

rec Y. P [Q/X ] if X 6= Y

rec Y. P otherwise

Y [Q/X ] =

{

Q if X = Y

Y otherwise

Table 5.5: Substitution, P [Q/X ], of a process Q for an identifier X in a processP .

5.3 Substitution of Identifiers and Names.

We now turn to the formal definitions of substitution.

Substitution of Process Identifiers. We shall first define what it meansto substitute a process expression Q for a process identifier X in a processexpression P . As pointed out in Section 5.2 the risk of name or identifier capturearises in conjunction with identifier substitution. As evident from the reactionsemantics of Section 3.2, however, a substitution, P [Q/X ], always arises fromthe unfolding of the top-most recursive process; hence it is safe to assume that

• Q is the top-level recursive process that defines X, i.e., Q = rec X. Q′.

• Q is identifier closed, i.e., fpi(Q) = ∅,

• Q is a sub-process of a well-formed program, and hence C ⊢ Q, and

• P is a sub-process of Q, i.e., P ≺ Q or, equivalently, P � Q′.

This justifies the particularly simple definition of Table 5.5.

The substitution over restrictions is nearly the standard one, i.e., constant cap-ture is avoided by α-renaming. The capture of variables is not possible because

Page 102: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

80 Well-formed Programs and Their Properties

((n)P )[m/x] =

(n)P [m/x] if n 6= x ∧ m 6= n

(n′)P [n′

/n][m/x] if n 6= x ∧ m = n

where n′ /∈ ({n,m}∪ fn(P )) ∧ ⌊n′⌋ = ⌊n⌋

(n)P otherwise

( P µ)[m/x] = P [m/x] µ

(P1 P2)[m/x] = P1[

m/x] P2[m/x]

(∑

i∈I

M ℓi

i . Pi)[m/x] =

i∈I

M ℓi

i [m/x] . P ′i where P ′

i =

{

Pi[m/x] if x /∈ bn(Mi)

Pi otherwise

(rec X. P )[m/x] = rec X. P [m/x]

X[m/x] = X

Table 5.6: Substitution, P [m/x], of a constant m for a name x in a process P .

of the separation of Name into C and V.

Similarly the substitution over summations is correctly defined because bn(Mi)∩fnΓfn

(Q) 6= ∅ cannot be the case if X ∈ fpi(P ).

The substitution over ambients is correctly defined because fpi(P ) = ∅.

The substitution over recursive processes is correctly defined because Y /∈ fpi(Q)due to fpi(Q) = ∅.

Finally, the substitution over process identifiers is straightforward to define.

Substitution of Names. We shall now go on to define what it means tosubstitute a constant m for an arbitrary name x in a program P . As pointedout in Section 3.1.2, name substitution can lead to name capture. In the contextconvention 5.3, however, this is relevant only to constants; this facilitates thedefinition of Table 5.6, which does not rely on α-conversion of variables.

The substitution over restrictions is defined in the standard way. In case ofconstant capture we apply disciplined α-conversion in order to avoid the capture.

The substitution over summations is simpler. Capture cannot occur in thecontext of well-formed programs due to the separation of Name into C and V.Of course, we define the associated notion of substitution over capability prefixesM ℓi

i [m/x] such that bound variables p ∈ bn(Mi) are not subject to substitution.

Page 103: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Properties of Programs 81

The substitution over recursive processes is correctly defined. Due to the well-formedness of P we are ensured that, in case x is a variable, then x occurs freein P only if this cannot lead to capture. If x is a constant we rely on α-renamingto avoid capture.

The substitution over process identifiers straightforward.

5.4 Properties of Programs

Later, when proving properties of the analyses, we shall rely on the well-formednessconditions of programs. We can do this because programs evaluate into pro-grams, i.e.,

If PRGC(P⋆) and P⋆L

−→⋆P then PRGC(P )

In order to show this we shall first show a number of minor results.

Free Names. We start by showing some basic properties of free names:

Fact 5.5 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

fnΓfn(P [Q/X ]) = fnΓfn [X 7→fnΓfn

(Q)](P )

=

{

fnΓfn[X 7→∅](P ) ∪ fnΓfn(Q) if X ∈ fpi(P )

fnΓfn(P ) otherwise

Proof The result follows by structural induction on P . In the case of restric-tion we note that α-renaming ensures that the names bound in P cannot capturethe free names of Q. A similar property is ensured for summations as the well-formedness condition demands that X ∈ fpi(P ) ⇒ (⌊bn(Mi)⌋ ∩ ⌊fn(Q)⌋ = ∅) forall i, where we note that fpi(Q) = ∅ ⇒ (fnΓfn

(Q) = fn(Q)) for all Γfn. �

Fact 5.6 fnΓfn(P [m/x]) =

{

(fnΓfn(P ) \ {x}) ∪ {m} if x ∈ fnΓfn

(P )

fnΓfn(P ) otherwise

Proof By structural induction on P . �

Page 104: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

82 Well-formed Programs and Their Properties

Fact 5.7 If fpi(P ) = ∅ and C ⊢ P then both of the following hold:

1. If P ⇛ Q then fn(P ) = fn(Q).

2. If Pℓ

−→ Q then fn(P ) ⊇ fn(Q).

Proof The proof of (1) proceeds by induction on the inference of P ⇛ Q. Inthe case of h-alph we use Fact 5.6. In the case of h-urec we use Fact 5.5.

The proof of (2) follows by induction on the inference of Pℓ

−→ Q, where weuse Fact 5.6 for the communication axioms and (1) in conjunction with theinduction hypothesis in the case of r-aux. �

Free Process Identifiers. We establish a similar result for the free processidentifiers:

Fact 5.8 If fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then fpi(Q) = ∅.

2. If Pℓ

−→ Q then fpi(Q) = ∅.

Proof The proof of (1) follows by induction on the inference of p ⇛ Q.

The proof of (2) follows by straightforward induction on the inference of Pℓ

−→ Qusing (1) in the case of r-aux. �

Well-formedness. A similar development is required for the higher level no-tion of well-formedness with respect to C:

Fact 5.9 Assume Q = rec X. Q′ and fpi(Q) = ∅; if furthermore P ≺ Q then

C ⊢ΓfnP [Q/X ] ⇔

{

C ⊢Γfn′ P ∧ C ⊢Γfn

Q if X ∈ fpi(P )

C ⊢ΓfnP otherwise

where Γfn′ = Γfn[X 7→ fnΓfn [X 7→∅](Q)].

Proof The direction of ⇒ follows by structural induction on P . All but twocases are straightforward:

Page 105: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Properties of Programs 83

In the case of name restriction we rely on Fact 5.11-(1), the induction hypothesis,and the fact that α-conversion is disciplined.

The most interesting case is that of the recursive process. Here we have

C ⊢Γfn(rec Y. P )[Q/X ]

and must show that, if X ∈ fpi(rec Y. P ) then

C ⊢Γfn′ rec Y. P and (1)

C ⊢ΓfnQ (2)

where Γfn′ = Γfn[X 7→ fnΓfn [X 7→∅](Q)], and C ⊢Γfn

rec Y. P otherwise. The lattercase is trivially true and we proceed to show the former, i.e. (1) and (2). Wehave that

C ⊢Γfn(rec Y. P )[Q/X ] ⇒

C ⊢Γfn′′ P [Q/X ] (I)

∧ X ∈ fpi(P [Q/X ]) (II)

∧ (∀M ℓ . P ′ � P [Q/X ] : Y ∈ fpi(P ′) ⇒⌊bn(M)⌋ ∩ ⌊fnΓfn

(rec Y. (P [Q/X ]))⌋ = ∅)(III)

where Γfn′′ = Γfn[Y 7→ fnΓfn [Y 7→∅](P [Q/X ])].

Knowing that X ∈ fpi(P ) we use the induction hypothesis on (I) to obtain

C ⊢Γfn′′ P [Q/X ] ⇒ C ⊢Γfn

′′′ P (IV)

∧ C ⊢Γfn′′ Q (V)

where Γfn′′′ = Γfn

′′[X 7→ fnΓfn′′ [X 7→∅](Q)]. Given that fpi(Q) = ∅ the goal (2) follows

from (V) and only (1) remains.

Now we observe that

C ⊢Γfn′′′′ P (a)

∧ X ∈ fpi(P ) (b)

∀M ℓ . P ′ � P : Y ∈ fpi(P ′) ⇒⌊bn(M)⌋ ∩ ⌊fnΓfn

′(rec Y. P )⌋ = ∅

(c)

⇒ C ⊢Γfn′ (rec Y. P ) (1)

where Γfn′′′′ = Γfn

′[Y 7→ fnΓfn′ [Y 7→∅](P )]. Given that fpi(Q) = ∅ the goal (b) follows

from (II) and only (a) and (c) remain.

From (III) we get

Page 106: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

84 Well-formed Programs and Their Properties

(∀M ℓ . P ′ � P [Q/X ] : Y ∈ fpi(P ′) ⇒⌊bn(M)⌋ ∩ ⌊fnΓfn

(rec Y. (P [Q/X ]))⌋ = ∅)

Since fpi(Q) = ∅ and according to the definition of Γfn′′′ this is the same as

(∀M ℓ . P ′ � P : Y ∈ fpi(P ′) ⇒⌊bn(M)⌋ ∩ ⌊fnΓfn

′′′(P )⌋ = ∅)

However, it follows from simple calculations and Fact 5.5 that Γfn′′′ = Γfn

′′′′ =fnΓfn

′(rec Y. P ); hence (c) follows from (III) and (a) follows from (IV).

Finally, (1) follows from (a), (b), and (c), which concludes the proof of ⇒.

The proof of ⇐ follows by similar considerations. �

Fact 5.10 If C ⊢ΓfnP and ⌊m⌋ ∈ C then C ⊢Γfn

P [m/x]

Proof The proof proceeds by induction on the structure of P . In the case ofrestrictions we use that α-renaming is disciplined. In the case of recursive pro-cesses we rely on disciplined α-renaming to ensure that the separation betweenfree and bound names is preserved by the substitution. The remaining cases arestraightforward. �

Fact 5.11 If fpi(P ) = ∅ and C ⊢ P then the following both hold:

1. If P ⇛ Q then C ⊢ Q

2. If Pℓ

−→ Q then C ⊢ Q

Proof (1) follows by induction on the inference of P ⇛ Q. In the case of h-

alph we rely on the fact that α-renaming is disciplined. In the case of h-urec

we use Fact 5.9. In case of h-camb we use Fact 5.8-(1) in conjunction with theinduction hypothesis. The remaining axioms are trivial and the remaining rulesfollow by the induction hypothesis.

(2) follows by induction on the inference of Pℓ

−→ Q. The axioms for movementyield the desired result by simple calculation whereas the axioms for communi-cation use Fact 5.10. In the case of r-amb we use Fact 5.8-(2) in conjunctionwith the induction hypothesis. Finally, r-aux follows from (1), and the remain-ing rules follow by application of the induction hypothesis. �

Lemma 5.12 (Subject reduction) If PRGC(P ) and Pℓ

−→ Q then PRGC(Q).

Proof The result follows from Fact 5.11, Fact 5.7, and Fact 5.8. �

Page 107: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concluding Remarks 85

Corollary 5.13 (Type soundness) Assume that P⋆ is a well-formed initialprogram then:

If P⋆L

−→⋆Pℓ

−→ Q then PRGC(P ) and PRGC(Q).

Proof The result follows by induction of the length of the derivation sequenceL, where we use Fact 5.12 to establish the base case and, in conjunction withthe induction hypothesis, to establish the inductive step. �

5.5 Concluding Remarks

This chapter has introduced a notion of well-formed programs that can be stat-ically checked and is invariant under reaction. This notion of well-formednesshelps us achieve simple definitions of substitution that rely on neither heavy useof α-renaming [PV05] nor parametrised process constants [SW01, Cai04]. Fur-thermore, well-formed programs do not exhibit behaviours, such as unboundednesting, that we find incompatible with the biological domain.

In the following chapters we shall (mostly) restrict our attention to the classof well-formed processes. This helps us formulate and show the required cor-rectness results; hence this is the class of programs for which we guarantee theanalyses to be correct. The running example and the two case models all qualifyas well-formed programs.

Page 108: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

86 Well-formed Programs and Their Properties

Page 109: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 6

Context Insensitive Control

Flow Analysis

“Seek simplicity but distrust it.”— Alfred North Whitehead

This chapter presents a context insensitive or mono-variant Control Flow Anal-ysis (0CFA) for the BioAmbients language, and evaluates it by analysing theLDL degradation pathway of Section 3.3. Much of the presented material waspreviously covered in [NNPdR07].

The Control Flow Analysis is defined as a Flow Logic

0CFA = FL(BioAmbients,alfp)

that aims to safely over-approximate the set of spatial configurations that mayarise at run-time. Thus, the focus of the approximation is on the spatial hierar-chy established by the nesting of ambient boundaries. More precisely, the resultof application of the analysis to a program P⋆ yields an over-approximation of

1. the set of roles and capabilities that might show up in each role and

2. the set of constants that might be bound to each variable.

The approach of the analysis bears some resemblance to abstract interpretation:Analysis results are acceptable if they, within the limited precision of the analysisdomain,

1. faithfully represent the initial configuration and

Page 110: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

88 Context Insensitive Control Flow Analysis

capΓcap((n)P ) = capΓcap

(P )

capΓcap( P µ) = capΓcap

(P )

capΓcap(P Q) = capΓcap

(P ) ∪ capΓcap(Q)

capΓcap(∑

i∈I M ℓi

i . Pi) =⋃

i∈I({ℓi} ∪ capΓcap(Pi))

capΓcap(rec X. P ) = capΓcap[X 7→∅](P )

capΓcap(X) = Γcap(X)

Table 6.1: Occurring capabilities, capΓcap(P ), of a process P .

2. are closures with respect to a set of conditions that mimic the semantics.

In most cases the closure conditions contribute the bulk of the analysis infor-mation and, hence, it is well worthwhile to be as precise as possible and limitthe scope of the closure conditions to the set of capabilities that actually havea chance of becoming concurrently possible at run-time. This is captured by astatically computed auxiliary relation CP⋆.

The chapter is in four sections. First, in Section 6.1, we define the auxiliaryrelation CP⋆, and show that it constitutes a safe approximation to the set ofconcurrently possible capabilities. Then, in Section 6.2, we specify the 0CFAanalysis as a Flow Logic, show that it is correct, and how to compute it. InSection 6.3 we apply the 0CFA to the LDL degradation pathway in order toboth evaluate the analysis and obtain information about the pathway. Finally,in Section 6.4 we summarise our findings.

6.1 Concurrently Possible Capabilities

For any given program, P⋆, the problem of deciding the precise set of capabilityinteractions that may come to pass is undecidable. From a semantic pointof view, however, two capabilities can react only if simultaneously present inthe same subordinate solution [BDPZ03]. In the following we shall use thisinsight to compute a first crude approximation to the set of potential capabilityinteractions. This approximation will be useful later when we specify the 0CFA.

We start by defining the set of occurring capabilities, capΓcap(P ) ∈ P(Lab),

of a process P as shown in Table 6.1. Like the free names of the previouschapter, the set is parameterised on an environment, Γcap : Pid → P(Lab),intended to associate each process identifier occurring free in P with the setof capabilities occurring in the corresponding process body. The definition is

Page 111: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concurrently Possible Capabilities 89

CPΓcap,∆CP((n)P ) = CPΓcap,∆CP

(P )CPΓcap,∆CP

( P µ) = CPΓcap,∆CP(P )

CPΓcap,∆CP(P Q) = CPΓcap,∆CP

(P ) ∪ CPΓcap,∆CP(Q) ∪

(capΓcap(P )×capΓcap

(Q)) ∪

(capΓcap(Q)×capΓcap

(P ))

CPΓcap,∆CP(∑

i∈I M ℓi

i . Pi) =⋃

i∈I(CPΓcap,∆CP(Pi))

CPΓcap,∆CP(rec X. P ) = CPΓcap[X 7→capΓcap[X 7→∅](P )],∆CP[X 7→∅](P )

CPΓcap,∆CP(X) = ∆CP(X)

Table 6.2: Concurrently possible capabilities, CPΓcap,∆CP(P ), of a process P .

straightforward — the associated function simply performs a recursive traversalof process expressions. Each occurring prefix label is picked up as defined in thecase for guarded sums.

Intuitively, two capabilities can interact only if separated by a ‘ ’. Furthermore,they cannot possibly interact if separated by a ‘+’ [BDPZ03]. We use theseinsights to specify the set of concurrently possible capabilities, CPΓcap,∆CP

(P ), ofa BioAmbients process, P , as shown in Table 6.2. The set is parameterised ontwo environments:

The environment Γcap helps in making the distinction between ‘ ’ and ‘+’. In

the case of a parallel composition, P Q, the auxiliary function capΓcap() is used

to record that prefixes in the two branches may interact with one another. Incontrast, no such information is recorded for non-deterministic choice, P + Q,i.e., summation.

The environment ∆CP : Pid → P(Lab×Lab) simply helps us define CPΓcap,∆CP()

in such a way that it can be applied to sub-expressions that are not identifierclosed.

Thus, in the case of a recursive process, rec X. P , the environment Γcap is updatedto associate X with the capabilities occurring in P , whereas the environment∆CP is updated to ensure that X is associated with no information.

Convention 6.1 When P is known to be identifier closed we write CP(P ),rather than CP[ ],[ ](P ), to denote the concurrently possible capabilities of P .When furthermore the subject of approximation is an initial program, P⋆, wewrite CP⋆ for CP(P⋆). �

Example 6.2 For the running example Peat we obtain the relation CPeat shown be-

Page 112: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

90 Context Insensitive Control Flow Analysis

low:

(ℓ1, ℓ2), (ℓ1, ℓ3), (ℓ1, ℓ4), (ℓ1, ℓ5), (ℓ1, ℓ6), (ℓ1, ℓ7), (ℓ1ℓ8), (ℓ1, ℓ9),

(ℓ2, ℓ1), (ℓ2, ℓ6), (ℓ2, ℓ7), (ℓ2, ℓ8), (ℓ2, ℓ9),

(ℓ3, ℓ1), (ℓ3, ℓ6), (ℓ3, ℓ7), (ℓ3, ℓ8), (ℓ3, ℓ9),

(ℓ4, ℓ1), (ℓ4, ℓ6), (ℓ4, ℓ7), (ℓ4, ℓ8), (ℓ4, ℓ9),

(ℓ5, ℓ1), (ℓ5, ℓ6), (ℓ5, ℓ7), (ℓ5, ℓ8), (ℓ5, ℓ9),

(ℓ6, ℓ1), (ℓ6, ℓ2), (ℓ6, ℓ3), (ℓ6, ℓ4), (ℓ6, ℓ5), (ℓ6, ℓ7), (ℓ6ℓ8), (ℓ6, ℓ9),

(ℓ7, ℓ1), (ℓ7, ℓ2), (ℓ7, ℓ3), (ℓ7, ℓ4), (ℓ7, ℓ5), (ℓ7, ℓ6),

(ℓ8, ℓ1), (ℓ8, ℓ2), (ℓ8, ℓ3), (ℓ8, ℓ4), (ℓ8, ℓ5), (ℓ8, ℓ6),

(ℓ9, ℓ1), (ℓ9, ℓ2), (ℓ9, ℓ3), (ℓ9, ℓ4), (ℓ9, ℓ5), (ℓ9, ℓ6)

As must be expected, this constitutes a crude over-approximation of the interactions

that may take place. �

Soundness of CP⋆. We shall now show that for any P⋆ the correspondingCP⋆ constitutes a safe over-approximation, i.e.,

If P⋆L

−→⋆P then CP⋆ ⊇ CP(P ).

The result rests on a number of minor results:

Fact 6.3 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

capΓcap(P [Q/X ]) = capΓcap[X 7→capΓcap

(Q)](P )

=

{

capΓcap[X 7→∅](P ) ∪ capΓcap(Q) if X ∈ fpi(P )

capΓcap(P ) otherwise

Proof The proof proceeds by induction in the structure of P . �

Facts 6.4 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then capΓcap(P ) ⊇ capΓcap

(Q)

2. If Pℓ

−→ Q then capΓcap(P ) ⊇ capΓcap

(Q)

Proof For (1) the proof proceeds by induction on the shape of the proof treeestablishing P ⇛ Q, using Fact 6.3 in the case of h-urec. For (2) the proof

proceeds by induction on the shape of the proof tree establishing Pℓ

−→ Q, using(1) to prove the property in the case of r-aux. �

Page 113: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concurrently Possible Capabilities 91

Fact 6.5 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

CPΓcap,∆CP(P [Q/X ]) =

{

CPΓ′cap,∆CP[X 7→∅](P ) ∪ CPΓcap,∆CP

(Q) if X ∈ fpi(P )

CPΓcap,∆CP(P ) otherwise

where Γ′cap = Γcap[X 7→ capΓcap

(Q)].

Proof The proof proceeds by structural induction on P . In the cases of par-allel compositions and recursive processes it uses Fact 6.3. �

Facts 6.6 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then (CPΓcap,∆CP(P ) ⊇ CPΓcap,∆CP

(Q))

2. If Pℓ

−→ Q then (CPΓcap,∆CP(P ) ⊇ CPΓcap,∆CP

(Q))

Proof For (1) the proof proceeds by induction on the shape of the proof treeestablishing P ⇛ Q, using Fact 6.5 in the case of h-urec.

For (2) the proof proceeds by induction in the shape of the proof tree establishing

Pℓ

−→ Q, using (1) in the case of r-aux. �

Corollary 6.7 (Subject reduction)

If PRGC(P ) and Pℓ

−→ Q then CP(P ) ⊇ CP(Q). �

Finally this thread of reasoning leads to the sought result:

Lemma 6.8 (CP⋆ is semantically correct) Assume that P⋆ is a well-formedinitial program, then

if P⋆L

−→⋆Pℓ

−→ Q then CP⋆ ⊇ CP(P ) ⊇ CP(Q)

Proof The result follows by induction on the length of L. The base casefollows from Corollary 6.7, and the inductive step follows from the inductionhypothesis in conjunction with Corollaries 5.13 and 6.7. �

Page 114: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

92 Context Insensitive Control Flow Analysis

6.2 Control Flow Analysis

We now turn to the task of approximating the set of spatial configurations thatmay arise at run-time. For this purpose we shall specify a context independentControl Flow Analysis (0CFA).

6.2.1 The Analysis Domain

For a given BioAmbients program P⋆ we want the analysis to specify the fol-lowing three components:

• An approximation of the relevant name bindings:

R ⊆ V × C

where we write n ∈ R(p) or (p, n) ∈ R to assert the truth of predicateR(p, n), i.e., that R records that the variable p might become bound tothe constant n.

• An approximation of the contents (ambients, prefixes) of ambients:

I ⊆ Role × (Role ∪ (Cap × Lab))

where we write µ ∈ I(µ′) or (µ′, µ) ∈ I (or, M ℓ ∈ I(µ′) or (µ′,M ℓ) ∈ I)to denote the truth of predicate I(µ′, µ) (or I(µ′,M ℓ)), i.e., that I recordsthat an ambient of role µ (or a prefix M ℓ) might occur inside an ambientof role µ′.

• An approximation of the pairs of capabilities that may react:

F ⊆ Lab × Lab

where we shall write (ℓ1, ℓ2) ∈ F to denote the truth of F(ℓ1, ℓ2), i.e., thatF records that prefixes labelled ℓ1 and ℓ2 might react.

The domain of the analysis is the direct product of those corresponding tothe three components. This clearly constitutes a complete lattice under thecomponent-wise subset ordering.

Remark 6.9 Note that in the context of a concrete program P⋆ the domainspecialises to a finite lattice over V⋆,C⋆,Role⋆,Cap⋆, and Lab⋆, which are allfinite sets. Hence all instances satisfy the ascending chain condition. �

Page 115: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 93

(I,R,F) |=µ (n)P iff (I,R,F) |=µ P

(I,R,F) |=µ P µc iff µc ∈ I(µ) ∧ (I,R,F) |=µc P

(I,R,F) |=µ P Q iff (I,R,F) |=µ P ∧ (I,R,F) |=µ Q

(I,R,F) |=µ∑

i∈I M ℓi

i . Pi iff ∀i ∈ I : (⌊Mi⌋ℓi ∈ I(µ) ∧closure⌈Mi⌉ ∧ (I,R,F) |=µ Pi)

(I,R,F) |=µ rec X. P iff (I,R,F) |=µ P

(I,R,F) |=µ X iff true

Table 6.3: The 0CFA acceptability judgement.

6.2.2 The Acceptability Judgement

The acceptability judgement of the analysis takes the form

(I,R,F) |=µ P

and expresses that, when the sub-process P (of P⋆) is enclosed within an ambientof role µ ∈ Role, then I, R, and F correctly capture the behaviour of P —meaning that I approximates the contents that may occur in each ambient, Rthe bindings of names that may take place, and F the capability pairs that mayreact, as P evolves inside P⋆.

The judgement is specified in Table 6.3 and refers to Table 6.4 and Table 6.5for specifications of the closure conditions, closure⌈M⌉, where ⌈M⌉ is as M butwith names replaced by ’·’. The specification is syntax directed, which makesthe 0CFA compositional in the terminology of Flow Logic.

The clauses of the definition carry the following meaning:

Name restriction: An analysis estimate (I,R,F) is acceptable for a process,(n)P , located inside an ambient of role µ if, and only if, acceptable for the sub-process P in µ. Avoiding further requirements helps us ensure that the analysisis invariant under, rather than preserved by, the heating relation, even whenh-eva is destabilising the set of name restrictions.

Ambient boundary: An analysis estimate (I,R,F) is acceptable for a pro-cess, P µc , located inside an ambient of role µ if, and only if,

• (µ, µc) ∈ I and

• (I,R,F) is acceptable for the sub-process P in µc.

Page 116: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

94 Context Insensitive Control Flow Analysis

Parallel composition: An estimate (I,R,F) is acceptable for P Q in µif, and only if, acceptable for both P and Q in µ. Note that this makes theacceptability conditions for parallel and choice appear identical. We leave it tothe auxiliary closure conditions, closure⌈M⌉, to ensure a differentiated treatmentof the two types of composition by using CP⋆.

Summation: An estimate (I,R,F) is acceptable for∑

i∈I M ℓi

i . Pi in µ if, andonly if, for every i ∈ I it is the case that

• (µ,M ℓi

i ) ∈ I,

• the associated closure condition, closure⌈Mi⌉, is satisfied by (I,R,F), and

• (I,R,F) is acceptable for the associated sub-process Pi in µ.

Recursive process: An analysis estimate (I,R,F) is acceptable for rec X. Pin µ if, and only if, acceptable for the sub-process P in µ.

Process identifier: Any (I,R,F) is an acceptable analysis estimate for X inµ, hence we ignore them.

This simple treatment of recursion is acceptable because:

• The well-formedness conditions ensure that no process identifier occursfree inside an ambient; thus, it suffices to analyse the sub-process P (thebody) in the context where it is first defined.

• Furthermore, the flow-insensitivity of the CFA, i.e., that prefix sequencesare represented as sets rather than lists, ensures that a single inspectionof the body suffices.

6.2.3 The Closure Conditions

Acceptable 0CFA estimates take the dynamic behaviour of capabilities into ac-count. This is ensured by a set of closure conditions that mimic, within thelimited precision of the analysis domain, the axioms of the reaction relation.Acceptable estimates are closures with respect to these conditions: whenever anestimate satisfies the pre-conditions it must also satisfy the conclusion.

In dealing with these requirements the conditions rely on the R component,which records only the potential bindings of variables to names and does notinclude any information about constants. In order to provide the required in-formation about constant definitions we introduce a new relation

Page 117: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 95

closureenter · = ∀µ, µ1, µ2, x, y, ℓ1, ℓ2 :enter xℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ)∧accept yℓ2 ∈ I(µ2) ∧ µ2 ∈ I(µ)∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ µ1 ∈ I(µ2) ∧ (ℓ1, ℓ2) ∈ Fclosureaccept · = true

closureexit · = ∀µ, µ1, µ2, x, y, ℓ1, ℓ2 :exit xℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ2)∧expel yℓ2 ∈ I(µ2) ∧ µ2 ∈ I(µ)∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ µ1 ∈ I(µ) ∧ (ℓ1, ℓ2) ∈ Fclosureexpel · = true

closuremerge– · = ∀µ, µ1, µ2, x, y, ℓ1, ℓ2 :merge– xℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ)∧merge+ yℓ2 ∈ I(µ2) ∧ µ2 ∈ I(µ)∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ I(µ1) ⊆ I(µ2) ∧ (ℓ1, ℓ2) ∈ Fclosuremerge+ · = true

Table 6.4: 0CFA closure conditions for movement.

〈R〉 ⊆ Name × C

defined by the following closure condition:

completeR = (∀p,m : R(p,m) ⇒ 〈R〉(p,m)) ∧ (∀n : n ∈ C ⇒ 〈R〉(n, n))

This condition asserts that 〈R〉 contains everything that R does, and that 〈R〉binds all constants to themselves. Thus 〈R〉 acts exactly like R with respect tovariable bindings but also acts as the identity on constants.

Movement Closures. The closure conditions for movement ensure that anal-ysis estimates are only acceptable if they reflect the potential consequences ofmovements that might take place at run-time. If, e.g., (I,R,F) implies that anenter movement might become enabled in some ambient (role) µ (Fig. 6.1), i.e.,

• An enter and an accept capability might occur in sibling ambients:

enter xℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ)∧accept yℓ2 ∈ I(µ2) ∧ µ2 ∈ I(µ)

• The corresponding prefixes might be concurrently possible:

(ℓ1, ℓ2) ∈ CP⋆

Page 118: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

96 Context Insensitive Control Flow Analysis

I µ

µ1 µ2

enter nℓ11

R(n1)∩R(n2) 6=∅

accept nℓ22

=⇒

I µ

µ1 µ2

enter nℓ11

accept nℓ22

µ1

enter nℓ11

Figure 6.1: 0CFA closure condition for enter movement.

• The enter and accept capabilities might agree on a communication channel:

〈R〉(x) ∩ 〈R〉(y) 6= ∅

Remark 6.10 Note that the capabilities may reference the channel name di-rectly, by using a constant, or indirectly, by using a variable. Both situationsare captured when the condition is expressed as a requirement of non-emptyintersection in 〈R〉 rather than R. �

Then (I,R,F) should reflect that:

• The moving ambient role µ1 might occur in an ambient of role µ2:

µ1 ∈ I(µ2)

• The corresponding capabilities, ℓ1 and ℓ2, might react:

(ℓ1, ℓ2) ∈ F

Communication Closures. The closure conditions for communication en-sure that analysis estimates are only acceptable if they reflect the potentialconsequences of communications that might take place at run-time. If, for ex-ample, (I,R,F) implies that a local communication might become enabled insome ambient (role) µ (Fig. 6.2):

Page 119: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 97

closure·!{·} = ∀µ, x, y, z, p, ℓ1, ℓ2 :x!{z}ℓ1 ∈ I(µ) ∧y?{p}ℓ2 ∈ I(µ) ∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(z) ⊆ R(p) ∧ (ℓ1, ℓ2) ∈ Fclosure·?{·} = true

closure· !{·} = ∀µ, µ1, x, y, z, p, ℓ1, ℓ2 :x !{z}ℓ1 ∈ I(µ) ∧y ?{p}ℓ2 ∈ I(µ1) ∧ µ1 ∈ I(µ) ∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(z) ⊆ R(p) ∧ (ℓ1, ℓ2) ∈ Fclosure·ˆ?{·} = true

closure·ˆ!{·} = ∀µ, µ1, x, y, z, p, ℓ1, ℓ2 :x !{z}ℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ) ∧y ?{p}ℓ2 ∈ I(µ) ∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(z) ⊆ R(p) ∧ (ℓ1, ℓ2) ∈ Fclosure· ?{·} = true

closure·#!{·} = ∀µ, µ1, µ2, x, y, z, p, ℓ1, ℓ2 :

x#!{z}ℓ1 ∈ I(µ1) ∧ µ1 ∈ I(µ) ∧y#?{p}ℓ2 ∈ I(µ2) ∧ µ2 ∈ I(µ) ∧〈R〉(x) ∩ 〈R〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(z) ⊆ R(p) ∧ (ℓ1, ℓ2) ∈ Fclosure·#?{·} = true

Table 6.5: 0CFA closure conditions for communication.

• A local input action and a local output action might occur in the samecontext:

x!{z}ℓ1 ∈ I(µ) ∧ y?{p}ℓ2 ∈ I(µ)

• The corresponding prefixes might be concurrently possible:

(ℓ1, ℓ2) ∈ CP⋆

• The input and output actions might agree on a communication channel:

〈R〉(x) ∩ 〈R〉(y) 6= ∅

Then (I,R,F) should reflect that:

Page 120: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

98 Context Insensitive Control Flow Analysis

I µ

n1!{m}ℓ1

R(n1)∩R(n2) 6=∅

n2?{p}ℓ2 ⇒

I µ

n1!{m}ℓ1

R(m)⊆R(p)

n2?{p}ℓ2

Figure 6.2: 0CFA closure condition for local communication.

• Any name that might become bound to z might become the object of thecommunication and might thus become bound to p:

〈R〉(z) ⊆ R(p)

• The corresponding capabilities, ℓ1 and ℓ2, might react:

(ℓ1, ℓ2) ∈ F

Remark 6.11 Note that, while semantically z is always a constant, it may beeither a constant or a variable in the representation of (I,R,F). In contrast palways denotes a variable. This is why the inclusion requirement is expressedwith 〈R〉 as the source and R as the target. �

Example 6.12 The best analysis estimate (Ieat,Reat,Feat) of the running examplePeat is shown below:

nutrient

cell

food

system

n R(n)

rl RL

Reat component

ℓ F(ℓ)

ℓ7 ℓ2ℓ4 ℓ1, ℓ8ℓ6 ℓ3ℓ5 ℓ9

Ieat component Feat component

Page 121: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 99

In the figure the contents of Ieat is presented graphically, while Reat and Feat arepresented as tables where last components are sorted into bins identified by first com-ponents. In the graph the triple bordered node represents the super-environment, ⊤,used as superscript in (Ieat,Reat,Feat) |=⊤ Peat, the double bordered nodes connectedby bold (black) edges represent the initial configuration (as specified by Table 6.3),and the remaining (red) edges represent the system dynamics (as specified by Table6.4, 6.5, and 6.2). The trees of the individual frames of Example 3.28 are all sub-treesof these graphs.

Considering the semantics of Peat (Example 3.28) it should be clear that nutrients can

only be released from the food particles in the context of the cell. The analysis result,

however, exhibits poor precision and is not able to reach this conclusion. �

6.2.4 Properties of the 0CFA

We have now defined an acceptability judgement that specifies the notion of anacceptable CFA estimate (I,R,F). It remains, to be shown that this judgementdoes indeed specify a static analysis that is 1) well-defined, 2) sound with respectto the semantics, and 3) implementable. This we shall do in the following.

Well-definedess. Due to the compositional nature of the specification thefirst property is easy to establish:

Theorem 6.13 (Well-defined) The analysis judgement, (I,R,F) |=µ P , iswell-defined, i.e., for every pair, (I,R,F) and P , it unambiguously specifieswhether (I,R,F) is acceptable for P .

Proof The proof proceeds by straightforward structural induction on P . �

This means that we can always check an a priori given analysis estimate.

Semantic Correctness. In order to establish the second property we haveto show that the analysis is correct in the sense that the studied propertiesare“⊒-preserved” under heating and reaction.

We shall find use for a minor fact regarding substitution and (I,R,F) |=µ P .

Fact 6.14 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

(I,R,F) |=µ P [Q/X ] ⇔

{

(I,R,F) |=µ P ∧ (I,R,F) |=µ Q if X ∈ fpi(P )

(I,R,F) |=µ P otherwise

Page 122: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

100 Context Insensitive Control Flow Analysis

Proof The fact follows by structural induction on P , where the well-formed-ness of P ensures that Q can only occur within µ. �

Now we can easily show that analysis acceptability is preserved under heating.

Lemma 6.15 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If (I,R,F) |=µ P and P ⇛ Q, we have (I,R,F) |=µ Q.

Proof The proof proceeds by induction in the shape of the proof tree estab-lishing P ⇛ Q. In the case of scope rules for name bindings the lemma triviallyholds because name restrictions are ignored by the analysis judgement. For thecongruence requirements the result follows from the induction hypothesis. Inthe case of α-equivalence we have that if P ≡α Q then ⌊P ⌋ = ⌊Q⌋ and hence(I,R,F) |=µ ⌊P ⌋ ⇔ (I,R,F) |=µ ⌊Q⌋ follows by referential transparency. Fi-nally, in the case of unfolding of recursion the desired result follows directlyfrom Fact 6.14. �

To finally show the invariance under reaction we shall introduce an expansion ofI into the new relation I@R, which takes into account the bindings of variablesspecified by the R component. This relation is defined as follows:

If M ℓ ∈ I(µ), x ∈ fn(M) and n ∈ 〈R〉(x) then M ℓ [n/x] ∈ I@R(µ).

where we remind that the involved names are, indeed, canonical.

This expansion satisfies a useful substitution property.

Fact 6.16 If ⌊n⌋ ∈ R(x) and (I,R,F) |=µ P then (I@R,R,F) |=µ P [n/x].

Proof The proof proceeds by structural induction on P . �

In this context we can show that the acceptability of analysis estimates is pre-served under reaction in the following sense:

Lemma 6.17 If C ⊢ P and fpi(P ) = ∅ then, if furthermore CP(P ) ⊆ CP⋆ thefollowing holds:

If (I,R,F) |=µ P and Pℓ

−→ Q then (I@R,R,F) |=µ Q and ℓ ∈ F .

Proof The proof is by induction on the inference of reactions Pℓ

−→ Q. In thecase of movement it suffices to expand both (I,R,F) |=⊤ P and (I,R,F) |=⊤ Q

Page 123: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 101

using the definition of (I,R,F) |=µ P . In the case of communication we performa similar expansion and then obtain the desired result using Fact 6.16. Theremaining cases follow from the induction hypothesis and, in the case of r-aux,uses Lemma 6.15. �

Corollary 6.18 (Subject reduction) Assume PRGC(P ) and CP(P ) ⊆ CP⋆;

if furthermore (I,R,F) |=µ P and Pℓ

−→ Q then (I@R,R,F) |=µ Q and ℓ ∈ F .�

This means that an analysis estimate that is acceptable for a process P is alsoacceptable for any process Q derived from it by a single reaction.

Analysis estimates are typically based on initial programs P⋆ related to Q

through a longer reaction sequence of the form P⋆L

−→ ⋆Pℓ

−→ Q. It is im-mediate to show that I@R = (I@R)@R and hence we can state the overallcorrectness result as follows:

Theorem 6.19 (Semantic correctness) Assume that P⋆ is a well-formed ini-

tial program, then (I,R,F) |=⊤ P⋆ and P⋆L

−→⋆P entails (I@R,R,F) |=⊤ Pand L ⊆ F .

Proof The corollary follows by induction on the length of L. The basecase holds vacuously. The inductive step is established using Corollary 6.17,Corollary 5.13, the induction hypothesis, and the above insight. �

Intuitively, this asserts that an analysis estimate acceptable for P⋆ is acceptablefor any process Q derivable by a sequence of reactions.

Implementability. Having defined the notion of acceptable analysis esti-mates and established their semantical soundness we shall now show that thereis always a best estimate and indicate how to compute it.

Recall from Section 4.3 that a Moore family results guarantees best estimates.Here we write best to mean least with respect to the partial order of the analysisdomain:

(I1,R1,F1) ⊑ (I2,R2,F2) iff (I1 ⊆ I2) ∧ (R1 ⊆ R2) ∧ (F1 ⊆ F2)

In the context of this ordering we can prove the theorem:

Theorem 6.20 (Moore family (0CFA)) For any program P⋆ the set of ac-ceptable analyses under |=µ constitutes a Moore family, i.e.,

Page 124: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

102 Context Insensitive Control Flow Analysis

∀S′ ⊆ {(I,R,F) | (I,R,F) |=µ P⋆} :d

S′ ∈ {(I,R,F) | (I,R,F) |=µ P⋆}.

Proof Assume that for some index set I and process P we have

∀i ∈ I : (Ii,Ri,Fi) |=µ P

It then remains to be shown thatd

i∈I(Ii,Ri,Fi) |=µ P

This follows by structural induction on P.

Case (n)P : Follows by the induction hypothesis.

Case P µ: Assume that ∀i ∈ I : (Ii,Ri,Fi) |=µ P µc . It follows thatµc ∈ Ii(µ) and (Ii,Ri,Fi) |=µc P for all i ∈ I. The induction hypothesisensures that

di∈I(Ii,Ri,Fi) |=

µ P and, since µc ∈ Ii(µ) for all i ∈ I, wealso have µc ∈

i∈I Ii(µ). The desired resultd

i∈I(Ii,Ri,Fi) |=µ P µ

follows directly.

Case P Q: follows from the induction hypothesis.

Case∑

j∈J Mℓj

j . Pj: Assume that ∀i ∈ I : (Ii,Ri,Fi) |=µ∑

j∈J Mℓj

j . Pj . It

follows that ∀j ∈ J : (⌊Mj⌋ℓj ∈ Ii(µ) ∧ closure⌈Mj⌉ ∧ (Ii,Ri,Fi) |=µ Pj)

for all i ∈ I. As before,d

i∈I(Ii,Ri,Fi) |=µ Pj for all j ∈ J follows by

the induction hypothesis, and ⌊Mj⌋ℓj ∈ (⋂

i∈I Ii)(µ) for all j ∈ J followsbecause the formula is universally quantified.

It remains to be shown that closure⌈Mj⌉ for all j ∈ J is satisfied bydi∈I(Ii,Ri,Fi). This is done by case analysis on M , and we shall con-

sider the case for the enter capability:Fix µ, µ1, µ2, x, y, ℓ1, ℓ2 such that

enter xℓ1 ∈ (⋂

i∈I Ii)(µ1) ∧ µ1 ∈ (⋂

i∈I Ii)(µ)∧accept yℓ2 ∈ (

i∈I Ii)(µ2) ∧ µ2 ∈ (⋂

i∈I Ii)(µ)∧〈⋂

i∈I Ri〉(x) ∩ 〈⋂

i∈I Ri〉(y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

It then follows that µ1 ∈ Ii(µ2)∧ (ℓ1, ℓ2) ∈ Fi for all i ∈ I, and hence thatµ1 ∈ (

i∈I Ii)(µ2) ∧ (ℓ1, ℓ2) ∈ (⋂

i∈I Fi). This concludes the case.

The remaining cases are similar.

Case rec X. P : follows from the induction hypothesis.

Case X: Holds vacuously.�

Page 125: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing the LDL Degradation Pathway 103

INPUT : a Flow Logic FL(BioAmbients,alfp) and

a BioAmbients process P⋆.

OUTPUT : an alfp formula ϕ such that (I,R,F) |= ϕ ⇔ (I,R,F) |=⊤ P⋆.

METHOD : Set ϕ := (V

{CP⋆(ℓ1, ℓ2) | (ℓ1, ℓ2) ∈ CP⋆}) ∧

(V

{C(x) | x ∈ C}) ∧

completeR∧

(I,R,F) |=⊤ P

while ϕ contains (I,R,F) |=µ P ′

and there is a rule α iff β in FL

and a substitution θ

such that θα = (I,R,F) |=µ P ′

do replace (I,R,F) |=µ P ′ with θβ in ϕ.

Table 6.6: 0CFA constraint generation algorithm.

Thus, trivially,d∅ = (⊤I ,⊤R,⊤F ), is always an acceptable analysis estimate

(the worst), and there is always a best estimate (I⋆,R⋆,F⋆) =d{(I,R,F) |

(I,R,F) |=µ P⋆}. As outlined in Section 4.3, we compute (I⋆,R⋆,F⋆) for agiven P⋆ by first deriving an alfp clause, ϕ, as shown in Table 6.6 and thencomputing the least model using the Succinct Solver Suite [NNS+04, NNS02],which then guarantees polynomial complexity.

6.3 CASE: Analysing the LDL Degradation Path-

way

In order to investigate our model of the LDL degradation pathway (3.3) weshall now subject it to the 0CFA. The resulting (I,R,F) is only included inAppendix B.1, but the corresponding ambient role containment hierarchy (I) isshown graphically in Fig. 6.3.

Basically, the analysis results indicate that cholesterol (CH) might occur, inthe lumen of every modelled compartment. This is in sharp contrast to the factthat CH can only really be released in the lumen of the lysosome (LY SO).Thus, the result is almost maximally un-informative. Due to the nature of over-approximation, however, we cannot guarantee that CH may actually reach thelumens of the indicated compartments.

Page 126: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

104 Context Insensitive Control Flow Analysis

XVLY SO

LE

CC

EE

CH

CELL

LDL

Figure 6.3: 0CFA applied to an LDL pathway with normal receptors. The spe-cial ⊤ node with triple borders denotes the super-environment. The remaining(double bordered) nodes represent ambients roles and the edges represent thecontainment relation I. The bold black edges represent the system in its initialstate. The thinner red edges account for the dynamic evolution of the system.

6.3.1 Related Diseases

As mentioned in Section 2.2.3 familial hypercholesterolemia is caused by defectsin the genetic coding of the LDL receptor proteins.

When such a defect affects the exo-plasmic domain of the protein the receptoris no longer able to bind an LDL particle. As mentioned in Section 3.3 thiscorresponds to the model included in Appendix A.2. The corresponding 0CFAresult, (I,R,F), is reported in Appendix B.1.2, and the I relation, which isshown graphically in Fig. 6.4(a), reveals that the cell can no longer internaliseLDL particles. It still internalises early endosomes (EE). Empty endosomesmay occur inside the clathrin coat (CC) within the CELL, but in this casethey carry no LDL cargo.

When the defect affects the intra-cellular part of the protein the receptor canbind, but not internalise, LDL particles. This corresponds to the model includedin Appendix A.2. Unfortunately, the corresponding I relation, presented verba-

Page 127: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing the LDL Degradation Pathway 105

XV

LY SO

LE

CC

EE

CH

CELL LDL

XVLY SO

LECC

EE

CH

CELL

LDL

(a) extra-cellular defects (b) intra-cellular defects

Figure 6.4: 0CFA analysis of defect receptor LDL pathway models.

tim in Appendix B.1.2 and shown graphically in Fig. 6.4(b), barely shows theeffect. In most aspects the graph is completely similar to the graph in Fig. 6.3,which represents the normal receptor model. However, it conclusively showsthat the EE can never enter the CC. As this is a prerequisite of entering theCELL we conclude that, in fact, the LDL cannot enter. But how do we thenexplain the remaining spurious edges? Well, they occur due to a conspiracy oftwo factors:

• The aforementioned modelling artifact (Section 3.3) allows EE to enterCELL before entering the CC.

• The analysis is flow-insensitive, hence, the behaviour of the EE is analysedidentically regardless of the reactions that have passed before. Or, in otherwords, the analysis does not care if the EE has passed through the CCor not.

Still, these results are slightly more encouraging than the normal receptor resultwould initially have us believe. In these particular cases the 0CFA is, just,precise enough to indicate that the system, as modelled here, cannot performits normal function if severely perturbed.

Page 128: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

106 Context Insensitive Control Flow Analysis

6.4 Concluding Remarks

The 0CFA presented in this chapter extends a tradition of CFAs for ambientcalculi that began with [HJNN99, NNHJ99]. In the biological branch of this tra-dition the work reported here, largely that of [NNP04], emerged as an evolutionof the 0CFA presented in [NNPdR07].

The 0CFA presented in this chapter differs from that presented in [NNPdR07]in two major respects:

In [NNPdR07] communication causes updates in both I and R. In this disser-tation, however, we keep the binding information strictly confined to R, storeonly representatives in I, and then expand these representatives “on the fly”.This approach optimises with respect to space consumption, which has provena sensible strategy when using the Succinct Solver [NNS02]. In this context,however, we must expand the I relation into (I@R,R) when formulating thecorrectness result with respect to a substitution based reaction semantics.

In [NNPdR07] the closure conditions may match and fire any two capabilitiesthat might occur in the right positions of the ambient hierarchy. Here we takeinspiration from [BDPZ03] and use an auxiliary relation CP⋆ to impose a re-striction; only capabilities that might be concurrently possible may be matched.

Furthermore, the 0CFA presented here also differs from the work originallypublished in [NNP04] in a couple of respects:

In [NNP04] a relation similar to CP⋆ is used to control the application of theclosure conditions. At the time, however, it had not yet been realised that, inthe context of general recursion, CP⋆ is really a fixed point property. Here werectify this mistake by using an indexed hierarchy of functions, capΓcap

() andCPΓcap,∆CP

(), to compute the necessary information.

In [NNP04] process identifiers are allowed to occur within ambient boundaries.Hence, the developments of [NNP04] include a general mechanism for ensuringthat the corresponding body is analysed in all relevant contexts that may arisedynamically. For the purpose of this dissertation we see no biological relevancein allowing free process identifiers within ambient boundaries and, hence, thisdevelopment is not included.

There has previously been positive comments [NNPdR07][NNP04] regardingthe use of 0CFA for biological applications. However, the results of this chapterseem to indicate, quite conclusively, that the presented 0CFA is too weak. Theanalysis results of both the running example and, in particular, the LDL degra-

Page 129: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concluding Remarks 107

dation pathway model exhibits poor precision, which emerges as spurious edgesin the corresponding graphical representations.

Reflecting upon this, it seems obvious that the observed imprecision owes tolack of context and, to a lesser degree, flow information. This conclusion is whatoriginally motivated the context sensitive Control Flow Analysis presented inthe following chapter.

Page 130: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

108 Context Insensitive Control Flow Analysis

Page 131: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 7

Context Sensitive Control

Flow Analysis

“From naive simplicity we arrive at more profound simplicity.”— Albert Schweitzer

This chapter presents a context sensitive or poly-variant Control Flow Anal-ysis (2CFA) for the BioAmbients language and evaluates it by analysing theLDL degradation pathway of Section 3.3. Parts of the material was previouslypublished in [PNN06b].

The analysis uses two levels of context information and, like the 0CFA of theprevious chapter, it is defined as a Flow-Logic

2CFA = FL(BioAmbients,alfp)

that keeps track of the contents of ambients and their abilities for participat-ing in various interactions and, thus, shares many characteristics with previousanalyses of Mobile Ambients [NNH02, NNB04, LM01, NNPdR07, NNP04]

The analysis is structurally similar to the 0CFA, but it adds several novel ele-ments in order to increase precision:

First, the Control Flow Analysis is context sensitive, i.e., it approximates thecontainment contexts of ambients, capabilities, and bindings and is thereby ableto differentiate between interactions that are possible in different settings. It hasbeen found that two enclosing ambient roles suffices for analysing fixed-structurecellular models with simple intruders [PNN05]. This strikes a balance betweenthe classical approaches that completely ignore context [NNH02, NNPdR07] and

Page 132: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

110 Context Sensitive Control Flow Analysis

the more complex shape analysis of [NN00].

Second, the use of contexts means that extensive copying is delegated to theclosure conditions, and this offers multiple handles for controlling the precision:

Whenever two capabilities might match to cause an ambient movement then themoving ambient, the relevant parts of its contents, and the relevant associatedvariable bindings might also occur in some ‘new’ context. Here ‘relevant’ refersto those that do not disappear due to non-deterministic choice when the move-ment occurs. In the case of variable bindings this notion of relevance is formallycaptured by an auxiliary relation RV⋆, that safely approximates the relevantvariables, and is used to control the copying of variable bindings within R thatoccurs in conjunction with ambient movement. And, in the case of ambientcontents, by an auxiliary relation RPA⋆ that safely approximates the relevantprefixes and ambients, and is used to control the copying of capabilities andprefixes within I that occurs in conjunction with ambient movement.

Furthermore, whenever two capabilities might match to cause a communicationthen all names that might be communicated might become bound, both inthe immediate context of the receiving capability, M ℓ , and in all relevant sub-contexts. Here the relevant sub-contexts are the ambients that lie within thestatic scope of the variable bound by M ℓ . This notion of relevance is formallycaptured by an auxiliary relation SCP⋆, that safely approximates the spatialshape of static scope, and is used to control the propagation of name bindingsinto sub-contexts.

The chapter is in six sections. First, in Section 7.1, we define the auxiliary rela-tion SCP⋆, and show that, for each variable, it constitutes a safe approximationto the spatial shape of the corresponding static scope. In Section 7.2 we definethe auxiliary relation RV⋆, and show that it constitutes a safe approximation tothe set of relevant variables of each capability. Similarly, in Section 7.3, we definethe auxiliary relation RPA⋆, and show that it constitutes a safe approximationto the set of relevant prefixes and ambients of each capability, Then, in Section7.4, we specify the 2CFA analysis as a Flow Logic, show that it is correct, andhow to compute it. In Section 7.5 we apply the 2CFA to the LDL degradationpathway in order to both evaluate the analysis and obtain information aboutthe LDL pathway. Finally, in Section 7.6 we summarise our findings.

Page 133: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The Spatial Shape of Static Scope 111

ambΓamb((n)P ) = ambΓamb

(P )

ambΓamb( P µ) = {µ} ∪ ambΓamb

(P )

ambΓamb(P Q) = ambΓamb

(P ) ∪ ambΓamb(Q)

ambΓamb(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

ambΓamb(Pi)

ambΓamb(rec X. P ) = ambΓamb[X 7→∅](P )

ambΓamb(X) = Γamb(X)

Table 7.1: Ambient roles, ambΓamb(P ), occurring in a process, P .

7.1 The Spatial Shape of Static Scope

When defining the 2CFA in Section 7.4 we shall find it useful to have an a priorigiven estimate of the spatial shape of static scopes, i.e., a relation that associatesevery variable with the ambients where it might be in scope.

In order to specify the required information we shall first define the set of oc-curring ambients, ambΓamb

(P ), of a process P . The set is defined in terms ofthe recursive function of Table 7.1 and, as usual, it is parameterised on an envi-ronment, Γamb : Pid → P(Role), used to associate process identifiers with setsof ambient roles. The function performs a straightforward recursive descent inorder to simply accumulate the set of occurring ambient roles. Consequently,the main case is that for the ambient boundary.

In this context we capture the spatial shape of the static scopes, SCPΓamb,∆SCP(P ),

that occur in a process, P , by associating each occurring bound variable withthe set of ambient roles that occur in its static scope. This is captured by thefunction shown in Table 7.2, which is parameterised on the aforementioned Γamb

environment as well as another environment, ∆SCP : Pid → P(V×Role), used toassociate process identifiers with spatial shapes of static scopes. The interestingcase is that of summation, where each variable occurring bound in a prefix,M ℓi

i , is associated with the set of ambient roles occurring in the correspondingcontinuation process Pi.

Convention 7.1 When P is known to be identifier closed we write SCP(P ),rather than SCP[ ],[ ](P ), to denote the spatial shape of static scope of P . Whenfurthermore the subject of approximation is an initial program, P⋆, we writeSCP⋆ for SCP(P⋆). �

Page 134: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

112 Context Sensitive Control Flow Analysis

SCPΓamb,∆SCP((n)P ) = SCPΓamb,∆SCP

(P )

SCPΓamb,∆SCP( P µ) = SCPΓamb,∆SCP

(P )

SCPΓamb,∆SCP(P Q) = SCPΓamb,∆SCP

(P ) ∪ SCPΓamb,∆SCP(Q)

SCPΓamb,∆SCP(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

( (bn(Mi) × ambΓamb(Pi)) ∪

SCPΓamb,∆SCP(Pi) )

SCPΓamb,∆SCP(rec X. P ) = SCPΓamb[X 7→ambΓamb[X 7→∅](P )],∆SCP[X 7→∅](P )

SCPΓamb,∆SCP(X) = ∆SCP(X)

Table 7.2: Static scopes, SCPΓamb,∆SCP(P ), of a process, P .

Soundness of SCP⋆. We shall now show that for any P⋆ the correspondingSCP⋆ constitutes a safe over-approximation, i.e.,

if P⋆L

−→⋆P then SCP⋆ ⊇ SCP(P )

The proof involves a number of minor results.

Fact 7.2 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

ambΓamb(P [Q/X ]) = ambΓamb[X 7→ambΓamb[X 7→∅](Q)](P )

=

{

ambΓamb[X 7→∅](P ) ∪ ambΓamb(Q) if X ∈ fpi(P )

ambΓamb[X 7→∅](P ) otherwise

Proof The result follows by structural induction on P . �

Facts 7.3 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then ambΓamb(P ) ⊇ ambΓamb

(Q).

2. If Pℓ

−→ Q then ambΓamb(P ) ⊇ ambΓamb

(Q).

Proof The proof of (1) is by induction on the shape of the proof tree estab-lishing P ⇛ Q. We use Fact 7.2 in the case of h-urec. The remaining axiomsare straightforward and the rules all follow from the induction hypothesis, and,in the case of h-tran, the transitivity of ⊇.

The proof of (2) is by induction on the shape of the proof tree establishing

Page 135: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

The Spatial Shape of Static Scope 113

Pℓ

−→ Q. The axioms follow by calculation. The rules follow by the inductionhypothesis, using (1) in the case of r-aux. �

Fact 7.4 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

SCPΓamb,∆SCP(P [Q/X ]) =

{

SCPΓ′amb

,∆′SCP

(P ) ∪ SCPΓamb,∆SCP(Q) if X ∈ fpi(P )

SCPΓamb,∆SCP(P ) otherwise

where Γ′amb

= Γamb[X 7→ ambΓamb[X 7→∅](Q)] and ∆′SCP

= ∆SCP[X 7→ ∅].

Proof The proof proceeds by structural induction in P . The base case andmost other cases are trivial. The most interesting case is that of summationwhere, for each summand M ℓ . P , if X ∈ fpi(M ℓ . P ), then:

SCPΓamb,∆SCP((M ℓ . P )[Q/X ]) = SCPΓamb,∆SCP

(M ℓ . P [Q/X ])

= bn(M ℓ) × ambΓamb(P [Q/X ]) ∪

SCPΓamb,∆SCP(P [Q/X ])

(by ind. hyp.) = bn(M ℓ) × ambΓamb(P [Q/X ]) ∪

SCPΓamb[X 7→ambΓamb[X 7→∅](Q)],∆SCP[X 7→∅](P ) ∪

SCPΓamb,∆SCP(Q)

(by Fact 7.2 & def. SCP) = SCPΓamb[X 7→ambΓamb[X 7→∅](Q)],∆SCP[X 7→∅](Mℓ . P ) ∪

SCPΓamb,∆SCP(Q).

In

the case of the recursive process we make similar use of Fact 7.2. �

Facts 7.5 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then SCPΓamb,∆SCP(P ) ⊇ SCPΓamb,∆SCP

(Q).

2. If Pℓ

−→ Q then SCPΓamb,∆SCP(P ) ⊇ SCPΓamb,∆SCP

(Q).

Proof The proof of (1) follows by induction on the shape of the proof treeestablishing P ⇛ Q. Most axioms are straightforward. However, in the caseof h-urec we use Fact 7.4. The rules all follow from the induction hypothesis,and, in the case of h-tran, transitivity of ⊇.

The proof of (2) proceeds by induction on the shape of the prooftree establishing

Pℓ

−→ Q. Again, the axioms follow by straightforward calculation. The rulesmostly follow directly from the induction hypothesis, but, in the case of r-aux,also depends on (1). �

Page 136: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

114 Context Sensitive Control Flow Analysis

Corollary 7.6 (Subject reduction)

If PRGC(P ) and Pℓ

−→ Q then SCP(P ) ⊇ SCP(Q). �

Finally, the sought result emerges as a special case of the following lemma:

Lemma 7.7 (Semantic correctness) Assume that P⋆ is a well-formed initialprogram then:

If P⋆L

−→⋆Pℓ

−→ Q then SCP⋆ ⊇ SCP(P ) ⊇ SCP(Q).

Proof The result follows by induction on the length of the derivation sequence

P⋆L

−→⋆P . The base case is given by Corollary 7.6 and the inductive step followsfrom the induction hypothesis, Corollary 7.6, and Corollary 5.13. �

7.2 Relevant Variables

We shall also find use for a safe estimate of the variable bindings that remainrelevant after a reaction, i.e., a relation associating each capability with thosevariables that do not disappear due to non-deterministic choice when the capa-bility engages in reaction.

In order to specify this information we first need to formalise the free variables,fvΓfv

(P ), of a process P . We capture this by the function shown in Table 7.3,which, in contrast to the free names function used by the semantics, only consid-ers variables. The associated environment, Γfv : Pid → P(V), associates processidentifiers with sets of free variables. The interesting information is collected inthe case of summation. The canonical version of each free name that occurs ina prefix M ℓi

i , but not in the set of constants C, is collected. We are computingan over-approximation and therefore the information synthesised from varioussub-expressions is combined by taking the union of the sets.

In this context we capture the relevant variables, RVΓfv,Γcap,∆RV(P ), occurring

in a process, P , by associating each occurring capability label, ℓ, with the setof variables that may occur either in parallel to, or sequentially after, ℓ. Theinformation is computed as shown in Table 7.4. Here Γfv is the environmentused by fvΓfv

(), Γcap the environment used by capΓcap() (Section 6.1), and ∆RV :

Pid → P(Lab × V) is an environment that associates process identifiers withrelevant names estimates. We shall briefly examine the interesting cases:

Page 137: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Relevant Variables 115

fvΓfv((n)P ) = fvΓfv

(P )

fvΓfv( P µ) = fvΓfv

(P )

fvΓfv(P Q) = fvΓfv

(P ) ∪ fvΓfv(Q)

fvΓfv(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

((⌊fn(Mi)⌋ \ C) ∪ (⌊fvΓfv(Pi)⌋\⌊bn(Mi)⌋))

fvΓfv(rec X. P ) = fvΓfv[X 7→∅](P )

fvΓfv(X) = Γfv(X)

Table 7.3: Free variables, fvΓfv(P ), of a process P .

RVΓfv,Γcap,∆RV((n)P ) = RVΓfv,Γcap,∆RV

(P )

RVΓfv,Γcap,∆RV( P µ) = RVΓfv,Γcap,∆RV

(P )

RVΓfv,Γcap,∆RV(P Q) = RVΓfv,Γcap,∆RV

(P ) ∪ RVΓfv,Γcap,∆RV(Q) ∪

(capΓcap(P )×fvΓfv

(Q))∪(capΓcap(Q)×fvΓfv

(P ))

RVΓfv,Γcap,∆RV(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

(({ℓi} × fvΓfv(Pi)) ∪ RVΓfv,Γcap,∆RV

(Pi))

RVΓfv,Γcap,∆RV(rec X. P ) = RVΓfv[X 7→fvΓfv[X 7→∅](P )],

Γcap[X 7→capΓcap[X 7→∅](P )],∆RV[X 7→∅]

(P )

RVΓfv,Γcap,∆RV(X) = ∆RV(X)

Table 7.4: Relevant variables, RVΓfv,Γcap,∆RV(P ), of a process, P .

• In the case of parallel composition, P Q, the union of two direct productsis recorded; the product of all capabilities of P (i.e., capΓcap

(P )) and allfree (canonical) variables of Q (i.e., fvΓfv

(Q)) – and vice versa.

• For each summand M ℓi

i . Pi of a summation the label ℓi is associated withthe free variables of the continuation Pi (i.e., fvΓfv

(Pi)).

• In the case of a recursive process, rec X. P , Γcap and Γfv are updated in orderto correctly associate X with, respectively, the set of labelled capabilitiesand the set of free variables of P .

The function computes an over-approximation; hence the information emergingfrom sub-expressions is combined by taking the union of the emerging sets.

Page 138: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

116 Context Sensitive Control Flow Analysis

Example 7.8 For the running example Peat we get

RVPeat = {(ℓ2, rl)}

which in fact indicates that no names are relevant to any ambient movement. �

Convention 7.9 When P is known to be identifier closed we write RV(P ),rather than RV[ ],[ ],[ ](P ), to denote the relevant variables of P . When further-more the subject of approximation is an initial program, P⋆, we write RV⋆ forRV(P⋆). �

Soundness of RV⋆. We shall now show that for any P⋆ the correspondingRV⋆ constitutes a safe over-approximation, i.e.,

if P⋆L

−→⋆P then RV⋆ ⊇ RV(P )

meaning that the relation RV⋆ correctly relates the label of each capability, M ℓ ,occurring in P to all variables in P that might be relevant, i.e., those that donot disappear because of non-deterministic choice when M ℓ engages in reaction.

The proof involves a number of minor results.

Fact 7.10 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

fvΓfv(P [Q/X ]) = fvΓfv[X 7→fvΓfv[X 7→∅](Q)](P )

=

{

fvΓfv[X 7→∅](P ) ∪ fvΓfv(Q) if X ∈ fpi(P )

fvΓfv[X 7→∅](P ) otherwise

Proof The result follows by induction in the structure of P . In the case ofsummations we rely on the well-formedness of Q, and the fact that P is a sub-process of Q, to ensure that the bound variables of P cannot capture the freevariables of Q. �

Facts 7.11 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then fvΓfv(P ) ⊇ fvΓfv

(Q).

2. If Pℓ

−→ Q then fvΓfv(P ) ⊇ fvΓfv

(Q).

Proof The proof of (1) is by induction on the shape of the proof tree estab-lishing P ⇛ Q. The base case is straightforward. In the case of h-urec we useFact 7.10. The remaining cases follow by the induction hypothesis.

Page 139: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Relevant Variables 117

The proof of (2) is by induction on the shape of the proof tree establishing

Pℓ

−→ Q. In the cases of movement and communication axioms the resultfollows by simple calculation. In the case of r-aux we use (1). The remainingcases follow by the induction hypothesis. �

Fact 7.12 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

RVΓfv,Γcap,∆RV(P [Q/X ]) =

RVΓ′fv,Γ′

cap,∆RV[X 7→∅](P ) ∪

RVΓfv,Γcap,∆RV(Q) if X ∈ fpi(P )

RVΓfv,Γcap,∆RV(P ) otherwise

where Γ′fv

= Γfv[X 7→ fvΓfv[X 7→∅](Q)] and Γ′cap = Γcap[X 7→ capΓcap[X 7→∅](Q)].

Proof The proof proceeds by structural induction on P . The base case issimple. In the cases of parallel composition and recursive processes we rely onFact 6.3, Fact 7.10, and the induction hypothesis. In the case of summation weonly need Fact 7.10 and the induction hypothesis. The remaining cases followby the induction hypothesis. �

Facts 7.13 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then RVΓfv,Γcap,∆RV(P ) ⊇ RVΓfv,Γcap,∆RV

(Q).

2. If Pℓ

−→ Q then RVΓfv,Γcap,∆RV(P ) ⊇ RVΓfv,Γcap,∆RV

(Q).

Proof The proof of (1) follows by induction on the shape of the proof treeestablishing P ⇛ Q. Most base cases are straightforward. In the case of h-

urec, however, we rely on Fact 7.12. In the case of h-cpar we use Fact 6.4-(1),Fact 7.11-(1), and the induction hypothesis. In the case of h-csum we use 7.11-(1) and the induction hypothesis. The remaining rules follow by the inductionhypothesis.

The proof of (2) proceeds by induction on the shape of the prooftree establishing

Pℓ

−→ Q. The movement axioms follow by calculation. The communicationaxioms likewise, but here we use the observation that e.g.

fvΓfn(n!{n}ℓ . Q) = fvΓfn

(Q[m/p])

for all n,m ∈ C. In the case of r-par we use Fact 6.4-(2) and Fact 7.11-(2).In the case of r-aux we use (1). The remaining rules follow by the inductionhypothesis. �

Page 140: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

118 Context Sensitive Control Flow Analysis

Corollary 7.14 (Subject reduction)

If PRGC(P ) and Pℓ

−→ Q then RV(P ) ⊇ RV(Q). �

Finally, the sought result emerges as a special case of the following lemma:

Lemma 7.15 (Semantic correctness) Assume that P⋆ is a well-formed ini-tial program then:

If P⋆L

−→⋆Pℓ

−→ Q then RV⋆ ⊇ RV(P ) ⊇ RV(Q).

Proof The result follows by induction on the length of the derivation sequence

P⋆L

−→ ⋆P . The base case is given by Corollary 7.14 and the inductive stepfollows from the induction hypothesis, Corollary 7.14, and Corollary 5.13. �

7.3 Relevant Prefixes and Ambient Roles

As for variables we shall also be interested in the relevance of capabilities andambient roles when we specify the 2CFA. Thus we shall now specify a safe esti-mate of the capabilities and ambients that remain relevant after a transition, i.e.,a relation associating each capability prefix M ℓ with those capabilities and am-bients that do not disappear due to non-deterministic choice when M ℓ engagesin reaction.

Thus the intuition behind the approximation is similar to that of the relevantvariables approximation and, as we shall see, the method is also similar. Therelevant prefixes and ambients, RPAΓcap,Γamb,∆RPA

(P ), of a process, P , is the setspecified by the function of Table 7.5. The environments Γcap and Γamb are asbefore, but ∆RPA : Pid → P(Lab × Role) can be used to associate processidentifiers to sets of relevant prefixes and ambients. and we shall briefly examinethe intersting cases:

• In the case of parallel composition, P Q, the union of two direct productsis recorded; the product of all capability labels of P (i.e., capΓcap

(P )) andall occurring prefixes and ambients of Q (i.e., ambΓamb

(Q) ∪ capΓcap(Q)) –

and vice versa.

• For each summand M ℓi

i . Pi of a summation the label ℓi is paired with theset of prefixes and ambients occurring in the continuation Pi.

Page 141: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Relevant Prefixes and Ambient Roles 119

RPAΓcap,Γamb,∆RPA((n)P ) = RPAΓcap,Γamb,∆RPA

(P )

RPAΓcap,Γamb,∆RPA( P µ) = RPAΓcap,Γamb,∆RPA

(P )

RPAΓcap,Γamb,∆RPA(P Q) = RPAΓcap,Γamb,∆RPA

(P ) ∪ RPAΓcap,Γamb,∆RPA(Q) ∪

(capΓcap(P ) × (capΓcap

(Q) ∪ ambΓamb(Q))) ∪

(capΓcap(Q) × (capΓcap

(P ) ∪ ambΓamb(P )))

RPAΓcap,Γamb,∆RPA(∑

i∈I

M ℓi

i . Pi) =⋃

i∈I

( ({ℓi} × (capΓcap(Pi) ∪ ambΓamb

(Pi))) ∪

RPAΓcap,Γamb,∆RPA(Pi) )

RPAΓcap,Γamb,∆RPA(rec X. P ) = RPAΓcap[X 7→capΓcap[X 7→∅](P )],

Γamb[X 7→ambΓamb[X 7→∅](P )],∆RPA[X 7→∅]

(P )

RPAΓcap,Γamb,∆RPA(X) = ∆RPA(X)

Table 7.5: Relevant prefixes and ambients, RPAΓcap,Γamb,∆RPA(P ), of P .

• For a recursive process, rec X. P , Γcap and Γamb are updated in order tocorrectly associate X with the set of capability labels and ambient roles,respectively, of P .

Again, as the function defines an over-approximation it combines the informa-tion emerging from sub-expression by taking the union of the computed sets.

Convention 7.16 When P is known to be identifier closed we write RPA(P ),rather than RPA[ ],[ ],[ ](P ), to denote the relevant prefixes and ambients of P .When furthermore the subject of approximation is an initial program, P⋆, wewrite RPA⋆ for RPA(P⋆). �

Soundness of RPA⋆. We shall now show that for any P⋆ the correspondingRPA⋆ constitutes a safe over-approximation, i.e.,

if P⋆L

−→⋆P then RPA⋆ ⊇ RPA(P )

meaning that the relation RPA⋆ correctly relates the label of each capabilityoccurring in P to the labels of all capabilities and the roles of all ambients inP that might be relevant to it, i.e., names that do not disappear because ofnon-deterministic choice. The proof involves a few minor results.

Fact 7.17 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

Page 142: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

120 Context Sensitive Control Flow Analysis

RPAΓcap,Γamb,∆RPA(P [Q/X ]) =

RPAΓ′cap,Γ′

amb,∆RPA[X 7→∅](P ) ∪

RPAΓcap,Γamb,∆RPA(Q) if X ∈ fpi(P )

RPAΓcap,Γamb,∆RPA(P ) otherwise

where Γ′cap = Γcap[X 7→ capΓcap[X 7→∅](P )] and Γ′

amb= Γamb[X 7→ ambΓamb[X 7→∅](P )].

Proof The proof proceeds by structural induction in P .The base case isstraightforward. In the case of parallel composition, summation, and recur-sive processes we utilise the induction hypothesis in conjunction with Fact 6.3and 7.2. The remaining cases follow by the induction hypothesis. �

Facts 7.18 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then RPAΓcap,Γamb,∆RPA(P ) ⊇ RPAΓcap,Γamb,∆RPA

(Q).

2. If Pℓ

−→ Q then RPAΓcap,Γamb,∆RPA(P ) ⊇ RPAΓcap,Γamb,∆RPA

(Q).

Proof The proof of (1) follows by induction on the shape of the proof treeestablishing P ⇛ Q. Most of the axioms are straightforward. In the case ofh-urec, however, we use Fact 7.17. In the case of h-cpar and h-csum we usethe induction hypothesis in conjunction with Fact 6.4-(1) and Fact 7.3-(1). Theremaining rules follow from the induction hypothesis.

The proof of (2) proceeds by induction on the shape of the proof tree establishing

Pℓ

−→ Q.The axioms all follow by toilsome calculation. In the case of r-par weuse the induction hypothesis in conjunction with Fact 6.4-(2) and Fact 7.3-(2).In case of r-aux we use the induction hypothesis in conjunction with (2). Theremaining cases follow by the induction hypothesis. �

Corollary 7.19 (Subject reduction)

If PRGC(P ) and Pℓ

−→ Q then RPA(P ) ⊇ RPA(Q). �

Finally, the sought result emerges as a special case of the following lemma:

Lemma 7.20 (Semantic correctness) Assume that P⋆ is a well-formed ini-tial program then:

If P⋆L

−→⋆Pℓ

−→ Q then RPA⋆ ⊇ RPA(P ) ⊇ RPA(Q).

Page 143: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 121

Proof The result follows by induction on the length of the derivation sequence

P⋆L

−→ ⋆P . The base case is given by Corollary 7.19 and the inductive stepfollows from the induction hypothesis, Corollary 7.19, and Corollary 5.13. �

7.4 Control Flow Analysis

Being both context and flow insensitive the 0CFA approach of the previouschapter leads to highly imprecise analysis estimates. The 2CFA analysis of thischapter is context sensitive: Whenever the analysis makes an entry into I orR, corresponding to a semantic action that changes the system, it also recordsthe k ambients enclosing the site of change. This can be done in relationsI ⊆ Rolek × (Role ∪ (Cap × Lab)) and R ⊆ Rolek × Name × Name.

The parameter k is subject to conflicting interests and must be chosen withcare:

• On the one hand, for the analysis to be precise, k must be large enough tolocate all tracked entities uniquely in the ambient hierarchy of the model.

• On the other hand, for the analysis to be computationally efficient, k mustnot be too large.

In terms of nesting, the eukaryotic cell, shown schematically in Fig. 7.1, is themost complex entity of Molecular Biology. It contains a number of interior com-partments called organelles. Some of these are nested structures with multiplecompartments within compartments.

The analysis is intended to distinguish compartments only up to their roles; alllysosomes are the same. Thus, it is safe to assume that all roles are uniquelyplaced within the nesting hierarchy of the cell, and hence that only the nestingdepth of ingested molecules are relevant to k.

No biological processes, except for certain cases of phagocytosis, i.e., the in-gestion of large foreign bodies by macrophages, allow truly nested entities intothe living cell. Some biological processes, though, can only be modelled inBioAmbients if non-nested entities are encoded as ambients; the running exam-ple, Peat, and the LDL pathway model of Section 3.3 both rely on this type ofmodelling.

Encodings normally require two levels of context (k = 2). Three layer encodings

Page 144: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

122 Context Sensitive Control Flow Analysis

Figure 7.1: Schematic view of eukaryotic cell. Copyright 2002 from MolecularBiology of the Cell by Alberts et al [AJL+02]. Reproduced by permission ofGarland Science/Taylor & Francis LLC.

(k = 3) seem contrived but may rarely be useful. Thus for the purpose of thisdissertation it is assumed that k = 3 is sufficient to analyse any single cellsystem. We call the resulting analysis, which we shall specify in the following,a 2CFA because the 0CFA of the previous chapter, where k = 1

2 , is used as thebaseline. Higher values of k may be required for the modelling of multicellularsystems, because different cell types contain similar compartments and the aimis to distinguish between these. Of course the technical developments generalizestraightforwardly to any k, but using a clever naming scheme when modellingmight be a better option,

7.4.1 The Analysis Domain

Notation 7.21 In the following we shall write։

µ to denote µgp, µp, µ and→µ to

denote µp, µ. �

Page 145: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 123

The analysis specifies the following three components:

• A localised approximation of the relevant name bindings:

R ⊆ Role3 × V × C

where we write n ∈ R(։

µ, p) or (։

µ, p, n) ∈ R to assert the truth of predicate

R(։

µ, p, n), i.e., that R records that the variable p may be bound to the

name n in the context of։

µ .

• A localised approximation of the contents of ambients:

I ⊆ Role3 × (Role ∪ (Cap × Lab))

where we write µ′ ∈ I(։

µ) or (։

µ, µ′) ∈ I (or, M ℓ ∈ I(։

µ) or (։

µ,M ℓ) ∈ I)

to denote the truth of predicate I(։

µ, µ′) (or I(։

µ,M ℓ)), i.e., that I records

that an ambient of role µ′ (or a prefix M ℓ) may occur in the context of։

µ .

• An approximation of the pairs of capabilities that may react:

F ⊆ Lab× Lab

where we shall write (ℓ1, ℓ2) ∈ F to denote the truth of F(ℓ1, ℓ2), i.e. thatF records that prefixes labelled ℓ1 and ℓ2 may react.

The domain of the 2CFA is then the direct product of those corresponding to thethree components, which constitutes a complete lattice under the component-wise subset ordering.

7.4.2 The Acceptability Judgement

An acceptability judgement defines the set of acceptable 2CFA estimates. Ittakes the form

(I,R,F) |=µgp,µp,µ P

and expresses that a 2CFA estimate (I,R,F) is acceptable for a (sub-)expres-sion P (of P⋆) occurring in the context of µgp, µp, µ. This means that I approx-imates the set of prefixes and ambient roles that might occur in each realisablecontext, R approximates the bindings of names that might come into effect ineach realisable context, and F the prefix pairs that may react, as P evolvesinside P⋆.

The judgement is specified in Table 7.6 and refers to Table 7.7 and Table 7.8for the specification of the closure conditions closure⌈M⌉, which ensure that ac-

Page 146: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

124 Context Sensitive Control Flow Analysis

(I,R,F) |=։

µ (n)P iff (I,R,F) |=։

µ P

(I,R,F) |=µgp,→µ P µc iff µc ∈ I(µgp,

→µ) ∧ (I,R,F) |=

→µ,µc P

(I,R,F) |=։

µ P P ′ iff (I,R,F) |=։

µ P ∧ (I,R,F) |=։

µ P ′

(I,R,F) |=։

µ ∑

i∈I M ℓi

i . Pi iff ∀i ∈ I : (⌊M ℓi

i ⌋ ∈ I(։

µ) ∧

(I,R,F) |=։

µ Pi ∧ closure⌈Mi⌉)

(I,R,F) |=։

µ rec X. P iff (I,R,F) |=։

µ P

(I,R,F) |=։

µ X iff true

Table 7.6: 2CFA acceptability judgement, (I,R,F) |=։

µ P .

ceptable (I,R,F) correctly capture the dynamics of capabilities. Again, thespecification is syntax directed, making the 2CFA compositional in the termi-nology of Flow Logic. The resulting recursive definition is intentionally similarto that of the 0CFA.

The individual clauses of the specification carry the following meaning:

Name restriction: A 2CFA estimate (I,R,F) is acceptable for a process (n)P

occurring in the context of։

µ if, and only if, acceptable for the sub-process P in

the context of։

µ . Further requirements are avoided to ensure that the analysisis invariant under the heating relation. The treatment of constants is deferredto the closure conditions closure⌈M⌉ (where ⌈M⌉ is as M but with all namesreplaced by ‘·’) defined in Section 7.4.3

Ambient boundary: An analysis estimate (I,R,F) is acceptable for a process

P µc , occurring in the context of µgp,→µ , if, and only if, I appropriately records

the location of µc, i.e., (µgp,→µ, µc) ∈ I and (I,R,F) is acceptable for P in the

context of→µ, µc.

Parallel composition: An estimate (I,R,F) is acceptable for P Q occurring

in the context of։

µ if, and only if, acceptable for each of P and Q in the context

of։

µ . It is left to the closure conditions, closure⌈M⌉, to ensure a differentiatedtreatment of parallel composition and summation.

Summation: An estimate (I,R,F) is acceptable for∑

i∈I M ℓi

i . Pi occurring

Page 147: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 125

in the context of։

µ if, and only, if for every i ∈ I it is the case that

• I records the location of the corresponding prefix, i.e., (։

µ,M ℓi

i ) ∈ I,

• (I,R,F) is closed under the associated closure condition, closure⌈Mi⌉, and

• (I,R,F) is acceptable for the sub-process Pi in the context of։

µ .

Recursive process: An analysis estimate (I,R,F) is acceptable for rec X. P

occurring in the context of։

µ if, and only if, acceptable for P in the context of։

µ .

Process identifier: any (I,R,F) is an acceptable analysis estimate for X in

the context of։

µ .

Again, this treatment is correct because the analysis is flow-insensitive and thewell-formedness condition prevents free process identifiers from occurring withinambients.

7.4.3 Closure Conditions

As for the 0CFA a set of closure conditions govern acceptability of analysisestimates with respect to the dynamic behaviour of processes. Generally theseclosure conditions follow the schema used for the 0CFA closures. However,the specifications are somewhat more complex and in the following they are,therefore, explained in detail.

As was the case for the 0CFA we introduce a new relation, 〈R〉, in order to pro-vide the necessary information about constant definitions. Based on a bindingenvironment R this completion of R

〈R〉 ⊆ Role3 × Name × C

is defined by the closure condition

completeR = (∀։

µ : R(։

µ) ⊆ 〈R〉(։

µ)) ∧

(∀։

µ, ν, n : (։

µ, ν) ∈ I ∧ n ∈ C ⇒ 〈R〉(։

µ, n, n)) (7.1)

where ν ∈ (Role ∪ (Cap × Lab)). This asserts that 〈R〉 contains everythingthat R does, and that 〈R〉 binds all constants to themselves in every realisablenon-empty context.

Page 148: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

126 Context Sensitive Control Flow Analysis

closureenter · =∀µgp, µp, µ, µ1, µ2, x, y, ℓ1, ℓ2 :

enter xℓ1 ∈ I(µp, µ, µ1) ∧ µ1 ∈ I(µgp, µp, µ) ∧accept yℓ2 ∈ I(µp, µ, µ2) ∧ µ2 ∈ I(µgp, µp, µ) ∧〈R〉(µp, µ, µ1, x) ∩ 〈R〉(µp, µ, µ2, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ µ1 ∈ I(µp, µ, µ2)∧(∀ν : (ℓ1, ν) ∈ RPA⋆ ∧ (µp, µ, µ1, ν) ∈ I ⇒ (µ, µ2, µ1, ν) ∈ I) ∧(∀ν, µ′ : (ℓ1, ν) ∈ RPA⋆ ∧ (µ, µ1, µ

′, ν) ∈ I ⇒ (µ2, µ1, µ′, ν) ∈ I) ∧

(∀p : (ℓ1, p) ∈ RV⋆ ⇒ R(µp, µ, µ1, p) ⊆ R(µ, µ2, µ1, p)) ∧(∀p, µ′ : (ℓ1, p) ∈ RV⋆ ⇒ R(µ, µ1, µ

′, p) ⊆ R(µ2, µ1, µ′, p)) ∧

(ℓ1, ℓ2) ∈ Fclosureaccept · = true

closureexit · =∀µgp, µp, µ, µ1, µ2, x, y, ℓ1, ℓ2 :

exit xℓ1 ∈ I(µ, µ2, µ1) ∧ µ1 ∈ I(µp, µ, µ2) ∧expel yℓ2 ∈ I(µp, µ, µ2) ∧ µ2 ∈ I(µgp, µp, µ) ∧〈R〉(µ, µ2, µ1, x) ∩ 〈R〉(µp, µ, µ2, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ µ1 ∈ I(µgp, µp, µ) ∧(∀ν : (ℓ1, ν) ∈ RPA⋆ ∧ (µ, µ2, µ1, ν) ∈ I ⇒ (µp, µ, µ1, ν) ∈ I) ∧(∀ν, µ′ : (ℓ1, ν) ∈ RPA⋆ ∧ (µ2, µ1, µ

′, ν) ∈ I ⇒ (µ, µ1, µ′, ν) ∈ I) ∧

(∀p : (ℓ1, p) ∈ RV⋆ ⇒ R(µ, µ2, µ1, p) ⊆ R(µp, µ, µ1, p)) ∧(∀p, µ′ : (ℓ1, p) ∈ RV⋆ ⇒ R(µ2, µ1, µ

′, p) ⊆ R(µ, µ1, µ′, p)) ∧

(ℓ1, ℓ2) ∈ Fclosureexpel · = true

closuremerge– · =∀µgp, µp, µ, µ1, µ2, x, y, ℓ1, ℓ2 :

merge– xℓ1 ∈ I(µp, µ, µ1) ∧ µ1 ∈ I(µgp, µp, µ) ∧merge+ yℓ2 ∈ I(µp, µ, µ2) ∧ µ2 ∈ I(µgp, µp, µ) ∧〈R〉(µp, µ, µ1, x) ∩ 〈R〉(µp, µ, µ2, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ (∀ν : (ℓ1, ν) ∈ RPA⋆ ∧ (µp, µ, µ1, ν) ∈ I ⇒ (µp, µ, µ2, ν) ∈ I) ∧(∀ν, µ′ : (ℓ1, ν) ∈ RPA⋆ ∧ (µ, µ1,mu′, ν) ∈ I ⇒ (µ, µ2, µ

′, ν) ∈ I) ∧(∀ν, µ′, µ′′ : (ℓ1, ν) ∈ RPA⋆ ∧ (µ1, µ

′, µ′′, ν) ∈ I ⇒ (µ2, µ′, µ′′, ν) ∈ I) ∧

(∀p : (ℓ1, p) ∈ RV⋆ ⇒ R(µp, µ, µ1, p) ⊆ R(µp, µ, µ2, p)) ∧(∀p, µ′ : (ℓ1, p) ∈ RV⋆ ⇒ R(µ, µ1, µ

′, p) ⊆ R(µ, µ2, µ′, p)) ∧

(∀p, µ′, µ′′ : (ℓ1, p) ∈ RV⋆ ⇒ R(µ1, µ′, µ′′, p) ⊆ R(µ2, µ

′, µ′′, p)) ∧(ℓ1, ℓ2) ∈ F

closuremerge+ · = true

Table 7.7: 2CFA closure conditions for movement.

Page 149: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 127

Movement Closures. The 2CFA movement closures, specified in Table 7.7,ensure that acceptable analysis estimates (I,R,F) are closed such as to safelyover-approximate the consequence of all movement reactions that might occur atrun-time. If, e.g., a 2CFA estimate (I,R,F) indicates that an enter movement

might become enabled in some context։

µ , i.e.,

• An enter capability and an accept capability might occur in sibling con-texts:

enter xℓ1 ∈ I(µp, µ, µ1) ∧ µ1 ∈ I(µgp, µp, µ) ∧accept yℓ2 ∈ I(µp, µ, µ2) ∧ µ2 ∈ I(µgp, µp, µ)

• The corresponding prefixes might be concurrently possible:

(ℓ1, ℓ2) ∈ CP⋆

• The enter and accept actions might agree on the name of the communica-tion channel:

〈R〉(µp, µ, µ1, x) ∩ 〈R〉(µp, µ, µ2, y) 6= ∅

As mentioned by Remark 6.10 this requires non-empty intersection in the〈R〉 relation.

Then (I,R,F) should reflect that:

• The moving ambient (role) µ1 might occur in the context of µp, µ, µ2:

(µ1 ∈ I(µp, µ, µ2))

• The prefixes and ambient roles that are relevant to ℓ1 and might occurin the originating context, µp, µ, µ1, might also occur in the new context,µ, µ2, µ1:

(∀ν : (ℓ1, ν) ∈ RPA⋆ ∧ (µp, µ, µ1, ν) ∈ I ⇒ (µ, µ2, µ1, ν) ∈ I) ∧(∀ν, µ′ : (ℓ1, ν) ∈ RPA⋆ ∧ (µ, µ1, µ

′, ν) ∈ I ⇒ (µ2, µ1, µ′, ν) ∈ I)

Note that we have two contributions depending on whether we considerthe children or the grandchildren of µ1.

• The variables relevant to ℓ1 that might become bound in the originatingcontext, µp, µ, µ1, might also become bound in the new context, µ, µ2, µ1.A similar update is needed for the relevant variables in a child µ′ of µ1:

(∀p : (ℓ1, p) ∈ RV⋆ ⇒ R(µp, µ, µ1, p) ⊆ R(µ, µ2, µ1, p)) ∧(∀p, µ′ : (ℓ1, p) ∈ RV⋆ ⇒ R(µ, µ1, µ

′, p) ⊆ R(µ2, µ1, µ′, p))

Again, note that we have two contributions depending on whether weconsider the children or the grandchildren of µ1.

Page 150: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

128 Context Sensitive Control Flow Analysis

Figure 7.2: 2CFA closure condition for enter movement.

• The corresponding capabilities, ℓ1 and ℓ2, might react:

(ℓ1, ℓ2) ∈ F

As only the ambient hosting the enter capability moves, the corresponding ac-cept capability does not cause similar updates.

Communication Closures. Similarly, the 2CFA communication closures,specified in Table 7.8, ensure that acceptable analysis estimates (I,R,F) areclosed such as to safely over-approximate the consequence of all communicationreactions that might occur at run-time. If, e.g., a 2CFA estimate (I,R,F) in-

dicates that an local communication might become enabled in some context։

µ ,i.e.,

• A local input action and a local output action might occur in the samecontext:

x!{z}ℓ1 ∈ I(։

µ) ∧ y?{p}ℓ2 ∈ I(։

µ)

• The corresponding prefixes might be concurrently possible in the sensedefined by the precomputed filter CP⋆ of Section 6.1:

(ℓ1, ℓ2) ∈ CP⋆

• The input and output actions might agree on the name of the communi-cation channel:

〈R〉(։

µ, x) ∩ 〈R〉(։

µ, y) 6= ∅

Note here that Remark 6.10 about intersection testing in 〈R〉 still applies.

Page 151: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 129

closure·!{·} , ∀։

µ, x, y, z, p, ℓ1, ℓ2 :

x!{z}ℓ1 ∈ I(։

µ) ∧

y?{p}ℓ2 ∈ I(։

µ) ∧

〈R〉(։

µ, x) ∩ 〈R〉(։

µ, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(։

µ, z) ⊆ R(։

µ, p) ∧(ℓ1, ℓ2) ∈ F ∧propagateR

closure·?{·} , true

closure· !{·} , ∀µgp,→µ, µ1, x, y, z, p, ℓ1, ℓ2 :

x !{z}ℓ1 ∈ I(µgp,→µ) ∧

y ?{p}ℓ2 ∈ I(→µ, µ1) ∧ µ1 ∈ I(µgp,

→µ) ∧

〈R〉(µgp,→µ, x) ∩ 〈R〉(

→µ, µ1, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(µgp,→µ, z) ⊆ R(

→µ, µ1, p) ∧

(ℓ1, ℓ2) ∈ F ∧propagateR

closure·ˆ?{·} , true

closure·ˆ!{·} , ∀µgp,→µ, µ1, x, y, z, p, ℓ1, ℓ2 :

x !{z}ℓ1 ∈ I(→µ, µ1) ∧ µ1 ∈ I(µgp,

→µ) ∧

y ?{p}ℓ2 ∈ I(µgp,→µ) ∧

〈R〉(→µ, µ1, x) ∩ 〈R〉(µgp,

→µ, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(→µ, µ1, z) ⊆ R(µgp,

→µ, p) ∧

(ℓ1, ℓ2) ∈ F ∧propagateR

closure· ?{·} , true

closure·#!{·} , ∀µgp,→µ, µ1, µ2, x, y, z, p, ℓ1, ℓ2 :

x#!{z}ℓ1 ∈ I(→µ, µ1) ∧ µ1 ∈ I(µgp,

→µ) ∧

y#?{p}ℓ2 ∈ I(→µ, µ2) ∧ µ2 ∈ I(µgp,

→µ) ∧

〈R〉(→µ, µ1, x) ∩ 〈R〉(

→µ, µ2, y) 6= ∅ ∧ (ℓ1, ℓ2) ∈ CP⋆

⇒ 〈R〉(→µ, µ1, z) ⊆ R(

→µ, µ2, p) ∧

(ℓ1, ℓ2) ∈ F ∧propagateR

closure·#?{·} , true

Table 7.8: 2CFA closure conditions for communication.

Page 152: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

130 Context Sensitive Control Flow Analysis

µgp

µp

µ

x!{z}ℓ1

R(։

µ,x)∩R(։

µ,y) 6=∅

y?{p}ℓ2

µgp

µp

µ

x!{z}ℓ1

R(։

µ,z)⊆R(։

µ,p)

y?{p}ℓ2

Figure 7.3: 2CFA closure condition for local communication.

Then (I,R,F) should reflect that:

• Any name that might be bound to z might be the object of the commu-nication and might thus be bound to p:

〈R〉(։

µ, z) ⊆ R(։

µ, p)

Note here that Remark 6.11 about propagation of bindings still applies.

• Any binding of p that might be made is valid, not only in the present

context of։

µ , but also in any sub-context that is part of the static scopeof p:

propagateR

Remark 7.22 This closed formula is actually a closure condition in itself; it isdefined in Table 7.9 and explained in the following subsection. Invoking it here,as a consequence of action closures, simply ensures that it has to be consideredonly if at least one communication might take place.

Scope Propagation Closure. The scope propagation closure is structurally

similar the other closures, but it is simpler: For every context µgp,→µ and every

ambient role µc that, according to (I,R,F), might occur in it

µc ∈ I(µgp,→µ)

and, according to SCP⋆, might also lie within the shape of the static scope of

Page 153: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 131

propagateR = ∀µgp,→µ, µc, p : µc ∈ I(µgp,

→µ) ∧ µc ∈ SCP⋆(p) ⇒

R(µgp,→µ, p) ⊆ R(

→µ, µc, p) (7.2)

Table 7.9: 2CFA closure condition for propagation of variable bindings.

some variable p

µc ∈ SCP⋆(p),

the estimate (I,R,F) is only acceptable if: every binding of p that might occur

in the context of µgp,→µ might also occur in the context of

→µ, µc:

R(µgp,→µ, p) ⊆ R(

→µ, µc, p)

As scoping is static this suffices for correctly propagating the localised envi-ronment from ‘outside µc’ into ‘inside µc’ and, thus, ensures that all variablebindings are recorded throughout static scope.

Example 7.23 The least fixed point analysis of the running example Peat gives riseto the I and R components shown in the tables below.

Page 154: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

132 Context Sensitive Control Flow Analysis

µgp µp µ I(→µ, µ)

⊤ ⊤ ⊤ food, system⊤ ⊤ system cell, food, expel rjℓ1

⊤ ⊤ food enter acℓ5 , exit rjℓ4 , rea ?{rl}ℓ2 , expel rlℓ3 , nutrient⊤ system food nutrient, expel rlℓ3 , rea ?{rl}ℓ2 , exit rjℓ4 , enter acℓ5

⊤ system cell nutrient, food, rea !{RL}ℓ7 , expel rjℓ8 , accept acℓ9

⊤ food nutrient exit RLℓ6

system food nutrient exit RLℓ6

system cell food enter acℓ5 , exit rjℓ4 , rea ?{rl}ℓ2 , expel rlℓ3 , nutrientcell food nutrient exit RLℓ6

T

system food

food cell

nutrient food nutrient

nutrient

nutrient

µgp µp µ p R(→µ, µ, p)

system cell food rl RL

ℓ F(ℓ)

5 9

6 3

4 1

4 8

7 2

Furthermore, the tree graph gives a graphical relation of the ambient part of the

containment relation I. The triple bordered node represents the super-environment,

the double bordered nodes connected by bold lines represent the initial configuration

(Table 7.6), and the remaining nodes represent the system dynamics (Tables 7.7, 7.8,

and 7.9). The trees of the individual frames of Example 3.28 are all sub-trees of this

figure. Note, that although the analysis is formally an over-approximation the result

is rather precise. �

Page 155: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 133

7.4.4 Properties of the 2CFA

Having defined a judgement that specifies the set of acceptable 2CFA estimate(I,R,F) we shall now show that it does indeed specify a static analysis that is1) well-defined, 2) sound with respect to the semantics, and 3) implementable.The required results are very similar to those established for the 0CFA and,hence, the following will be brief.

Well-definedness. As the specification is syntax directed the acceptabilityjudgement is well-defined

Theorem 7.24 (Well-defined) The analysis judgement (I,R,F) |=։

µ P iswell-defined.

Proof The proof proceeds by structural induction on P . �

Semantic Correctness. The analysis is correct as indicated by the following:

Fact 7.25 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

(I,R,F) |=։

µ P [Q/X ] ⇔

(I,R,F) |=։

µ P ∧ (I,R,F) |=։

µ Q if X ∈ fpi(P )

(I,R,F) |=։

µ P otherwise

Proof Put։

µ for µ in the proof of Fact 6.14. �

Lemma 7.26 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If (I,R,F) |=։

µ P and P ⇛ Q then (I,R,F) |=։

µ Q.

Proof The proof is by induction on the inference of P ⇛ Q and is similarto that of Lemma 6.15. In the cases of h-alph the result follows by referen-tial transparency because renaming is disciplined. The remaining axioms arestraightforward. In the case of h-urec the result follows from Fact 7.25. Theinduction hypothesis and arduous calculation does the rest. �

Again, to show the invariance under reaction we introduce an expansion of Iinto I@R, which takes into account the bindings of variables specified by the Rcomponent:

Page 156: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

134 Context Sensitive Control Flow Analysis

If M ℓ ∈ I(։

µ), x ∈ fn(M), and n ∈ 〈R〉 then M ℓ [n/x] ∈ (I@R,R)(։

µ).

which satisfies

Fact 7.27 If ⌊n⌋ ∈ R(x) and (I,R,F) |=։

µ P then (I@R,R,F) |=։

µ P [n/x].

Proof The proof proceeds by structural induction on P . �

In this context we can show that the acceptability of analysis estimates is pre-served under reaction in the following sense:

Lemma 7.28 Assume C ⊢ΓfnP , if furthermore CP⋆ ⊇ CP(P ). SCP⋆ ⊇ SCP(P ).

RV⋆ ⊇ RV(P ), and RPA⋆ ⊇ RPA(P ) then the following holds:

If (I,R,F) |=։

µ P and Pℓ

−→ Q then (I@R,R,F) |=։

µ Q and ℓ ∈ F

.

Proof The proof proceeds by induction of the inference of Pℓ

−→ Q. In thecase of movement axioms we compute by equational reasoning. In the case ofcommunication axioms we do the same and use Fact 7.27. In the case of r-

aux we use the induction hypothesis in conjunction with Lemma 7.26. Theremaining cases follow by the induction hypothesis. �

Corollary 7.29 (Subject reduction) Assume PRGC(P ); if furthermore CP⋆ ⊇CP(P ). SCP⋆ ⊇ SCP(P ). RV⋆ ⊇ RV(P ), and RPA⋆ ⊇ RPA(P ) then the follow-ing holds:

If (I,R,F) |=։

µ P and Pℓ

−→ Q then (I@R,R,F) |=։

µ Q and ℓ ∈ F .

And, finally, as I@R = (I@R)@R we can state the overall correctness resultas follows:

Theorem 7.30 (Semantic correctness) Assume that P⋆ is a well-formed ini-

tial program, then (I,R,F) |=⊤ P⋆ and P⋆L

−→⋆P entails (I@R,R,F) |=⊤ Pand L ⊆ F .

Proof The theorem follows by induction on the length of the derivationsequence and uses Corollary 7.29, Corollary 5.13, the induction hypothesis, andthe above insight to establish the inductive step. �

Page 157: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Control Flow Analysis 135

INPUT : a Flow Logic FL(BioAmbients, fol) and

a BioAmbients process P⋆.

OUTPUT : a fol formula ϕ such that (I,R,F) |= ϕ ⇔ (I,R,F) |=⊤ P⋆.

METHOD : Set ϕ := (V

{CP⋆(ℓ1, ℓ2) | (ℓ1, ℓ2) ∈ CP⋆}) ∧

(V

{SCP⋆(p, µ) | (p, µ) ∈ SCP⋆}) ∧

(V

{RV⋆(ℓ, p) | (ℓ, p) ∈ RV⋆}) ∧

(V

{RPA⋆(ℓ, ν) | (ℓ, ν) ∈ RPA⋆}) ∧

(V

{C⋆(n) | n ∈ C⋆})∧

completeR∧

(I,R,F) |=⊤ P⋆

while ϕ contains (I,R,F) |=։

µP ′

and there is a rule α iff β in FL

and a substitution θ

such that θα = (I,R,F) |=։

µP ′

do replace (I,R,F) |=։

µP ′ with θβ in ϕ.

Table 7.10: Generation of 2CFA constraints.

Implementability. Obviously the set of acceptable analysis estimates consti-tutes a Moore Family :

Theorem 7.31 (Moore family (2CFA)) For any program, P⋆, the set of ac-

ceptable analyses under |=։

µ is a Moore family, i.e.,

∀S′ ⊆ {(I,R,F) | (I,R,F) |=։

µ P⋆} :d

S′ ∈ {(I,R,F) | (I,R,F) |=։

µ P⋆}.

Proof The proof proceeds by structural induction in P and is similar to thatfor the 0CFA. �

Recall, from Chapter 4, that a Moore family is never empty and is also guar-

anteed to have a least element (I⋆,R⋆,F⋆) =d{(I,R,F) | (I,R,F) |=

։

µ

P⋆}, which is the most informative acceptable estimate. The computation of(I⋆,R⋆,F⋆) is made possible by a clause generation algorithm as outlined inTable 7.10. In order to compute (I⋆,R⋆,F⋆) for a given P⋆ we use the con-straint generator to produce a suitable ϕ and then compute the least modelusing the Succinct Solver Suite [NNS+04]. Again, this implementation strategyguarantees polynomial complexity, but the degree is higher than for the 0CFA.

Page 158: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

136 Context Sensitive Control Flow Analysis

CELL

LY SO LE

XV

XVCC

EE

EE

LDL

LDLLDL

LDL

LDLLDL

LDL

LDL

CH

CHCH

CHCH CH

CH

CH

CH

Figure 7.4: 2CFA analysis of normal receptor LDL pathway model. The special⊤ node with triple borders represents the super-environment, The other nodesrepresent ambient roles and the edges represent the containment relation I.The nodes with double borders connected with bold (black) edges represent thesystem in its initial state. The remaining (red) nodes and edges account for thedynamic evolution of the system.

7.5 CASE: Analysing the LDL Degradation Path-

way

We now continue our investigation of the LDL pathway model by subjecting itto the 2CFA. The resulting (I,R,F) is only included in Appendix B.2.1, butthe I relation is depicted graphically in Fig. 7.4.

At first look the analysis result appears remarkably more precise than the cor-responding 0CFA result of the previous chapter. Formally, this is an over-approximation - but the picture depicts exactly the behaviour that we expectthe normal receptor LDL model to exhibit. In particular, we notice that choles-terol (CH) might be released only in the lumen of the lysosome (LY SO), whichis essential.

Note that the graph indicates that LDL might float freely in the cytosol (the

Page 159: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing the LDL Degradation Pathway 137

CELL

LY SO LE

XV

XVCC

EE

EE

LDL

CH

CELL

LY SO LE

XV

XVCC EE

LDLLDL

LDL

LDLLDL

LDL

LDL

CH

CH

CHCH CH

CH

CH

CH

(a) extra-cellular defects (b) intra-cellular defects

Figure 7.5: 2CFA analysis of defect receptor LDL pathway models.

top-level fluid of the CELL). This is the modelling artifact noted in Section 3.3appearing again. Also note that the LDL can occur inside an un-coated earlyendosome (EE) after the clathrin coat (CC) has been shredded; in this case thedouble bordered EE node represents both the initial configuration and a laterstage of evolution.

7.5.1 Related Diseases

Fig. 7.5(a) shows the effect of disabling the exo-plasmic binding site of the LDLreceptor. This corresponds to the model of Appendix A.2 and the analysisresult is presented verbatim in Appendix B.2.2. As the graph reveals, the cellcan no longer internalise LDL particles. The internal part of the pathway stillworks; hence the cell still internalises early endosomes, but only empty ones(EE may occur inside CC, but it carries no LDL cargo). In fact, the pathwayprocesses the endosomes right up to the point where transfer vesicles merge withthe lysosome (XV might occur inside CELL). Intuitively, this is exactly theexpected result.

Similarly, Fig. 7.5(b) shows the effect of disabling the cytosolic binding siteof the LDL receptor. This corresponds to the model of Appendix A.3 and theanalysis result is presented verbatim in Appendix B.2.3. We expect the receptorprotein to be able to bind LDL particles, but internalisation of early endosomes

Page 160: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

138 Context Sensitive Control Flow Analysis

should fail. In contrast to before, however, the analysis fails to capture this.It appears that the clathrin coating process is simply skipped and the EE isinternalised directly. This is similar to the failure we observed for the 0CFAanalysis. Fortunately the 2CFA is much more precise, and we can now see theproblem: The modelling artifact allows the LDL into the cytosol, where it canenter the EE. Since the analysis is completely flow insensitive it fails to capturethat the EE can only merge with the late endosome LE after it has been cladby, and subsequently shredded, the CC.

7.6 Concluding Remarks

The 2CFA presented in this chapter extends the tradition of CFAs that spawnedthe 0CFA of the previous chapter [NNH02, NNB04, LM01, NNPdR07, NNP04].In particular, precision has been increased by incorporating context in the man-ner of 2CFA and by using auxiliary information produced by four separate staticanalyses:

As for the 0CFA of the previous chapter a safe approximation to the concur-rently possible prefixes, CP⋆, is used to impose a restriction on the scope of theclosure conditions. We only compute the closure of semantic actions that are notimpossible because the involved capabilities are separated by non-deterministicchoice. A safe approximation to the spatial shape of static scope, SCP⋆, is usedto ensure that name bindings are only propagated into sub-context that mightactually be part of the static scope of the corresponding bound variables.

A safe approximation to the relevant variables, RV⋆, is used to restrict thecopying of name bindings (in R) that is demanded by the closure conditions forambient movement. Only the bindings of variables that do not disappear dueto non-deterministic choice are copied to the new location.

Finally, a safe approximation to the relevant prefixes and ambients, RPA⋆. isused to achieve a similar restriction on the copying of prefixes and ambients (inI) that is demanded by the closure conditions for movement.

These relations all emerge as fix-point properties; hence the information is com-puted by a hierarchy of indexed functions

Page 161: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concluding Remarks 139

SCP⋆ RPA⋆ RV⋆ CP⋆

SCPΓamb,∆SCP() RPAΓcap,Γamb,∆RPA

() RVΓfv,Γcap,∆RV() CPΓcap,∆CP

()

ambΓamb() capΓcap

() fvΓfv()

Compared to the original presentation in [PNN06b] these functions have beenrefactored in this dissertation, such that the hierarchy is more natural. Also, thescope approximation is now sound, which, unfortunately, it was not in [PNN06b].

The results obtained by applying the 2CFA to both the running example Peat andthe LDL pathway model show a very clear improvement over the 0CFA of theprevious chapter; due to the spatial separation of contexts the resulting graphsare both easier to understand and much more precise. Still, however, the resultobtained from the LDL pathway with cytosolic receptor defects is disappointing.Here, the 2CFA quite simply falls short because it is flow-insensitive, and thereare two possible responses to this problem:

One can take the engineers approach and fashion a model that is easier toanalyse, e.g., by incorporating explicit synchronisation points. In [PNN05] thisapproach was used to good effect for a simpler version of the 2CFA.

Fundamentally, however, one should apply Ockham’s Razor and always keepmodels as pure as possible. This is the approach we advocate by specifying aflow-sensitive analysis in the following chapter.

Page 162: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

140 Context Sensitive Control Flow Analysis

Page 163: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Part III

Analysing for Causal

Properties

Page 164: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics
Page 165: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 8

Pathway Analysis

“Any important disease whose causality is murky, and for whichtreatment is ineffectual, tends to be awash in significance.”

— Susan Sontag

This chapter presents a flow sensitive Pathway Analysis that given a model, P⋆,computes a safe approximation to the set of, possibly infinite, reaction sequencesthat, according to P⋆, may occur. The analysis is evaluated in the context ofboth the LDL degradation pathway model of Section 3.3 and the transcriptionmodel of Section 3.4. Some of the presented material has been submitted aspart of [PNN07].

When developing the Pathway Analysis our focus shall be on the notion ofexposed prefixes. Intuitively, the exposed prefixes of a process (or state) arethose that might participate in the next reaction. Consider, for example, thesystem configuration

P = (n!{m}ℓ1 .m?{q}ℓ2 . P + .?{.}ℓ3 . Q) n?{p}ℓ4 .m?{q}ℓ2 . R

where we write ‘.’ for an arbitrary name. Here there is one exposed occurrenceof each of the prefixes n!{m}ℓ1 , .?{.}ℓ3 , and n?{p}ℓ4 . Due to their syntacticpositions relative to one-another, the exposed prefixes, n!{m}ℓ1 and n?{p}ℓ4 ,

enable a reaction, P(ℓ1,ℓ4)−→ Q, such that the resulting configuration,

Q = m?{q}ℓ2 . P m?{q}ℓ2 . R,

only has two occurrences of m?{q}ℓ2 as exposed prefixes,

Thus, for the purposes of the analysis, we shall abstractly characterise systemconfigurations (or states) by their extended multisets of exposed prefixes. We

Page 166: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

144 Pathway Analysis

then develop the desired automaton by tracking how these multisets evolve whenreactions occur. This is complicated by the fact that a reaction may cause someexposed prefixes to disappear, and others to emerge.

Technically, we will turn to the Monotone Frameworks (Section 4.2), normallyassociated with Data Flow Analysis, in order to deal with this. In doing so, weshall attach transfer functions to the reactions of processes. Corresponding toa forward Data Flow Analysis [NNH99], these transfer functions will computehow the extended multisets of exposed prefixes are transformed as reactionsoccur. Much akin to Bitvector Frameworks (Section 4.2) our transfer functionswill take the simple form

fstate(E) = (E\killstate) ∪ genstate.

However, we shall need to generalise the developments to extended multisets,rather than the simple powersets usually associated with bitvector frameworks.

As opposed to the Flow Logic (Section 4.3) approach of the previous chaptersthe approach of Monotone Frameworks offers no convenient separation betweenspecification and implementation. Thus, once we have defined the notion ofexposed prefixes and the corresponding transfer functions, we shall turn to theissue of implementation and define a suitable worklist algorithm.

Given a program P⋆ the idea is to construct a Finite Automaton such thatthe potentially infinite transition system of P⋆ is faithfully represented withinthe states and transitions of the automaton. As outlined in Section 4.2, thecomputed automaton will take the form

(Q⋆, q⋆, δ⋆,E⋆)

where Q⋆ is a set of states, q⋆ the initial state, δ⋆ a transition relation, and E⋆

a vector of extended multisets corresponding to the states. It will emerge fromthe construction that the resulting automaton is partially deterministic.

The chapter will be in six sections. First we introduce the notion of extendedmultisets in Section 8.1. Then, in Section 8.2, we specify the transfer functionsassociated with the Pathway Analysis. In 8.3 we define a worklist algorithmfor the analysis. Then we turn to the case studies. First we analyse the LDLpathway model in Section 8.4, and then the transcription model in Section 8.5.Finally, in Section 8.6, we summarise our findings.

Page 167: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Extended Multisets 145

8.1 Extended Multisets

We define an extended multiset, M, as an element of the domain

M = Lab → (N ∪∞)

which is equipped with an ordering ≤M defined by:

M ≤M M′ iff ∀ℓ : M(ℓ) ≤ M′(ℓ) ∨ M′(ℓ) = ∞

The domain (M,≤M) constitutes a complete lattice where the constants andoperations corresponding to ⊥,⊤,⊓,⊔ are defined as follows:

⊥M is defined by:

∀ℓ : ⊥M(ℓ) = 0

⊤M is defined by:

∀ℓ : ⊤M(ℓ) = ∞

minM is defined by:

(M minM M′)(ℓ) =

min{M(ℓ),M′(ℓ)} if M(ℓ) ∈ N ∧ M′(ℓ) ∈ N

M(ℓ) if M′(ℓ) = ∞

M′(ℓ) otherwise

maxM is defined by:

(M maxM M′)(ℓ) =

{

max{M(ℓ),M′(ℓ)} if M(ℓ) ∈ N ∧ M′(ℓ) ∈ N

∞ otherwise

Furthermore we define addition and subtraction on multisets as follows:

+M is defined by:

(M +M M′)(ℓ) =

{

M(ℓ) + M′(ℓ) if M(ℓ) ∈ N ∧ M′(ℓ) ∈ N

∞ otherwise

−M is defined by:

(M −M M′)(ℓ) =

M(ℓ) − M′(ℓ) if M(ℓ) ∈ N ∧ M(ℓ) ≥ M′(ℓ)

0 if M′(ℓ) ∈ N ∧ M(ℓ) < M′(ℓ)

0 if M(ℓ) ∈ N ∧ M′(ℓ) = ∞

∞ otherwise

In particular note that ∞−∞ is intended to give ∞ in order to ensure that thetransfer function defined in Section 8.2.3 is always a correct over-approximation.

Page 168: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

146 Pathway Analysis

EΓE[[ (n)P ]] = EΓE

[[P ]]EΓE

[[ P µ ]] = EΓE[[P ]]

EΓE[[P1 P2 ]] = EΓE

[[P1 ]] +M EΓE[[P2 ]]

EΓE[[∑

i∈I M ℓi

i . Pi ]] =∑

Mi∈I⊥M[ℓi 7→ 1]

EΓE[[ rec X. P ]] = LFP(λE. EΓE [X 7→E][[P ]])

= EΓE [X 7→⊥M][[P ]] ⊲⊳M EΓE [X 7→EΓE [X 7→⊥M][[ P ]]][[P ]]

EΓE[[X ]] = ΓE(X)

(M ⊲⊳M M′)(ℓ) =

{

M(ℓ) if M(ℓ) = M′(ℓ)

∞ otherwise

Table 8.1: Exposed capabilities, EΓE[[P ]], of a process, P .

8.2 Computing and Preserving Exposed Prefixes

Using the extended multisets we shall now define the concept of exposed prefixes.We formalise this as a function,

E⋆[[]] : Proc → M,

intended to capture the following intuition: For an identifier closed process, P ,the expression E⋆[[P ]] (ℓ) denotes the number of distinct occurrences of ℓ that

might participate in the next reaction, Pℓ

−→ Q.

Recall that our focus is not just a given process P , but on all processes thatmay arise by heating of P ; in particular those that arise by the unfolding ofrecursion. In order to appropriately address this issue we use the parameterisedfunction EΓE

[[ ]], shown in Table 8.1, when computing the exposed prefixes. Theenvironment ΓE : Pid → M is a mapping that is used to associate each freeprocess identifier with an extended multiset of exposed prefixes. It is needed inorder to support the least fixed point computation that is required for computingthe exposed prefixes of recursive processes; it simply unfolds the recursion untilno further exposed prefixes arise from doing so.

It is obvious that the naive computation may not terminate because (M,≤M)admits infinite ascending chains. However, it turns out that we are justified inrecasting the computation in terms of the expansion operator, ⊲⊳M, defined inTable 8.1, which ensures termination in two iterations. We refer to [NN08] fora formal proof of this result, but provide an informal argument below.

Informally, it is easy to see that, if a process identifier X occurs un-guarded in

Page 169: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Computing and Preserving Exposed Prefixes 147

the body of rec X. P , e.g., in rec X. (X P ), then the exposed prefixes of P willhave infinitely many occurrences and the naive computation of the fixed pointnever terminates. However, it is also the case that, the number of exposed pre-fixes will grow in every iteration, only if the X is indeed unguarded. Obviously,two iterations suffice in order to determine this kind of behaviour.

Besides this technicality, the function EΓE[[ ]] performs a rather straightforward

recursive descent into processes; it uses the addition operation of extended mul-tisets in order to calculate the total number of exposed prefixes, i.e., those thatmight participate in the next reaction. As should be evident from the caseconcerning summations, only the top-most prefixes contribute and the result isobtained by addition of their multiplicities.

Finally, the desired function, E⋆[[]] , is defined by:

E⋆[[P ]] = E[ ][[P ]]

Convention 8.1 In the case of initial programs, P⋆, we shall use the distin-guished symbol E⋆ to denote E⋆[[P⋆]] . �

Example 8.2 The result Eeat of subjecting the program Peat to the exposed capa-bility analysis is shown below:

Eeat = ⊥M [1 7→ 1, 2 7→ 1, 4 7→ 1, 5 7→ 1, 6 7→ 1, 7 7→ 1, 8 7→ 1, 9 7→ 1]

One may observe that exactly one copy of every capability – except for 3, which is not

exposed – is exposed. �

Correctness of EΓ[[P ]]. Intuitively, correctness of EΓ[[P ]] means (i) that it isinvariant under heating and (ii) that it correctly captures the prefixes that maybe involved in the first reaction step. We start by showing the usual substitutionresult:

Fact 8.3 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

EΓE[[P [Q/X ] ]] = EΓE [X 7→EΓE

[[ Q ]]][[P ]]

Proof This is easily shown by structural induction on P . �

Page 170: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

148 Pathway Analysis

And then we go on to show the main result:

Lemma 8.4 If C ⊢ P and fpi(P ) = ∅ then both of the following hold:

1. If P ⇛ Q then EΓE[[Q ]] ≤M EΓE

[[P ]]

2. If P(ℓ1,ℓ2)−→ Q then ℓ1 ∈ dom(EΓE

[[P ]]) and ℓ2 ∈ dom(EΓE[[P ]])

Proof The proof of the first part is by induction on the inference of P ⇛ Q.Most cases are trivial. However, in the case of unfolding of recursion we useFact 5.11, that LFP (λE. EΓE [X 7→E][[P ]]) is indeed a fixed point, and Fact 8.3.

The second part follows by induction on the inference of Pℓ

−→ Q. The resultis immediate for the axioms and in the case of r-aux we make use of the firstpart of the lemma. The remaining rules follow from the induction hypothesis. �

Corollary 8.5 If PRGC(P ) and P ⇛ Q then EΓE[[Q ]] ≤M EΓE

[[P ]] and further-

more, if P(ℓ1,ℓ2)−→ Q then ℓ1 ∈ dom(EΓE

[[P ]]) and ℓ2 ∈ dom(EΓE[[P ]]). �

8.2.1 Generated Prefixes

In preparation for the transfer function, we shall define a notion of generatefunctions,

G⋆[[]] : Proc → T,

where elements of type

T = (Lab → M)

are mappings from labels into extended multisets. The intuition intended is the

following: For a process, P , such that P(ℓ1,ℓ2)−→ Q, the expression G⋆[[P ]] (ℓ1)(ℓ)

denotes a safe (over-) approximation to the number of occurrences of ℓ that maybecome exposed in Q due to the involvement of ℓ1 in the transition from P .Whenever G⋆[[P ]] (ℓ1)(ℓ) = m and G⋆[[P ]] (ℓ2)(ℓ) = n, we write G⋆[[P ]] ((ℓ1, ℓ2))(ℓ)to denote G⋆[[P ]] (ℓ1)(ℓ) + G⋆[[P ]] (ℓ2)(ℓ) = m + n.

The domain T is a straightforward extension of M, and the constants ⊥T,⊤T

are defined as expected. The associated operations, ≤T, minT, maxT, +T, and −T,are all defined as point-wise extensions of the corresponding operators on M.For example, ≤T is defined by:

T1 ≤T T2 iff ∀ℓ : T1(ℓ) ≤M T2(ℓ)

Page 171: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Computing and Preserving Exposed Prefixes 149

GΓE ,∆G[[ (n)P ]] = GΓE ,∆G

[[P ]]

GΓE ,∆G[[ P µ ]] = GΓE ,∆G

[[P ]]

GΓE ,∆G[[P1 P2 ]] = GΓE ,∆G

[[P1 ]] maxT GΓE ,∆G[[P2 ]]

GΓE ,∆G[[∑

i∈I

M ℓi

i . Pi ]] = MAXTi∈I(⊥T[ℓi 7→ EΓE[[Pi ]]] maxT GΓE ,∆G

[[Pi ]])

GΓE ,∆G[[ rec X. P ]] = LFP(λG. GΓE [X 7→EΓE

[[ recX. P ]]],∆G [X 7→G][[P ]])

= GΓE [X 7→EΓE[[ recX. P ]]],∆G [X 7→⊥T][[P ]]

GΓE ,∆G[[X ]] = ∆G(X)

Table 8.2: Generated capabilities, GΓE ,∆G[[P ]], of a process, P .

Once more, we have to take the unfolding of recursion into account when com-puting the generate function. For this, we use the parameterised recursive pro-cedure GΓE ,∆G

[[ ]] defined in Table 8.2. Here ΓE : Pid → M is as before, i.e., amapping that associates an extended multiset with each free process identifier,and ∆G : Pid → T is a mapping that associates a mapping from labels intoextended multisets with each free process identifier. We use the ΓE environ-ment to ensure that the number of prefixes exposed in a given continuation iscomputed correctly, that is, in the case of rec X. P , ΓE is used to appropriatelyassociate X with the multiset of prefixes, EΓE

[[ rec X. P ]], exposed by the recur-sion construct. The ∆G environment, on the other hand, is used to support thefixed point computation required for rec X. P — much like the ΓE environmentof E⋆[[]] .

The interesting case is that of the guarded sum construct,∑

i∈I M ℓi

i . Pi. It

is straightforward to see that every guard, M ℓi

i , generates all of the prefixesexposed by the corresponding continuation, Pi. However, each distinct ℓ mayhave several occurrences in a process, P , and, as we are aiming for a safe over-approximation, we must take this into account. We do so by ensuring that theresults obtained for sub-expressions are combined using maxT. This ensures thatthe computed GΓE ,∆G

[[P ]] is a safe over-approximation, in the sense that, forevery ℓ, GΓE ,∆G

[[P ]](ℓ) safely over-approximates every multiset generated by aspecific occurrence of ℓ in P — even if labels are not unique.

In the case of the recursion construct, however, we have to unfold the recursionuntil no further information arises from doing so. This amounts to the leastfixed point computation shown in Table 8.2. Again, the termination of the naive

Page 172: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

150 Pathway Analysis

computation is endangered, because (T,≤T) admits infinite ascending chains.As formally proved in [NN08], however, the computation actually stabilises aftera single iteration, which justifies the alternative formulation of Table 8.2.

The desired function G⋆[[]] is defined by:

G⋆[[P ]] = G[ ],[ ][[P ]]

Convention 8.6 In the case of initial programs, P⋆, we shall use the distin-guished symbol G⋆ to denote G⋆[[P⋆]] . �

Example 8.7 The result, Geat, of subjecting the program Peat to the generated ca-pability analysis is shown below:

Geat = ⊥T [ 1 7→ ⊥M [1 7→ 1],2 7→ ⊥M [3 7→ 1],3 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],4 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],5 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],7 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1],8 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1],9 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1]]

Labels mapped to ⊥M are left out in the enumeration of the multisets. �

Correctness of GΓE ,∆G[[P ]]. Due to the different role that GΓE ,∆G

[[ ]] playsin the transfer function, the correctness of GΓE ,∆G

[[P ]] is slightly more involvedthan was the case for EΓE

[[P ]]. As usual, substitution possesses nice properties:

Fact 8.8 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

GΓE ,∆G[[P [Q/X ] ]] ={

GΓE [X 7→EΓE[[ Q ]]],∆G [X 7→⊥T][[P ]] maxT GΓE ,∆G

[[Q ]] if X ∈ fpi(P )

GΓE [X 7→EΓE[[ Q ]]],∆G [X 7→⊥T][[P ]] otherwise

(8.1)

Proof The result is shown by structural induction on P . �

And, surely, the safety of the approximation, GΓE ,∆G[[P ]], must be preserved

under heating:

Page 173: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Computing and Preserving Exposed Prefixes 151

Lemma 8.9 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If P ⇛ Q then GΓE ,∆G[[Q ]] ≤T GΓE ,∆G

[[P ]].

Proof The proof is by induction on the inference of P ⇛ Q. When showingthe lemma for h-urec we use Fact 5.11, Fact 8.8, and thatLFP(λG. GΓE [X 7→EΓE

[[ recX. P ]]],∆G [X 7→G][[P ]]) is indeed a fixed point. In the case

of h-csum we use Lemma 8.4-(1).

The remaining axioms follow by simple calculations and the rules by the induc-tion hypothesis. �

Finally, we must show that the safety of the approximation is preserved underreduction. This amounts to the following ’local’ subject reduction result:

Lemma 8.10 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If Pℓ

−→ Q then GΓE ,∆G[[Q ]] ≤T GΓE ,∆G

[[P ]].

Proof The proof is by induction on the shape of the inference of Pℓ

−→Q. The axioms follow by straightforward calculation using that GΓE ,∆G

[[P ]] =GΓE ,∆G

[[P [n/p] ]] and EΓE[[P ]] = EΓE

[[P [n/p] ]] in the case of communication.

For the rule using the structural congruence we use Lemma 8.9. The remainingrules follow by the induction hypothesis. �

Corollary 8.11 If PRGC(P ) and Pℓ

−→ Q then GΓE ,∆G[[Q ]] ≤T GΓE ,∆G

[[P ]]. �

8.2.2 Killed Prefixes

As the last component of the transfer function we shall also define a notion ofkill functions,

K⋆[[]] : Proc → T,

the intention of which is the following: For a process, P , such that P(ℓ1,ℓ2)−→ Q, the

expression K⋆[[P ]] (ℓ1)(ℓ) denotes a safe (under-)approximation to the number ofoccurrences of ℓ that are no longer exposed in Q because of the involvement of ℓ1in the transition from P . Whenever K⋆[[P ]] (ℓ1)(ℓ) = m and K⋆[[P ]] (ℓ2)(ℓ) = n,we write K⋆[[P ]] ((ℓ1, ℓ2))(ℓ) to denote K⋆[[P ]] (ℓ1)(ℓ) + K⋆[[P ]] (ℓ2)(ℓ) = m + n.

Page 174: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

152 Pathway Analysis

K∆K[[ (n)P ]]env = K∆K

[[P ]]K∆K

[[ P µ ]] = K∆K[[P ]]

K∆K[[P1 P2 ]] = K∆K

[[P1 ]] minT K∆K[[P2 ]]

K∆K[[∑

i∈I M ℓi

i . Pi ]] = MINTi∈I(⊤T[ℓi 7→ M ] minT K∆K[[Pi ]])

where M = +Mj∈I(⊥M[ℓj 7→ 1])K∆K

[[ rec X. P ]] = LFP(λK.K∆K[X 7→K][[P ]])K∆K

[[X ]] = ∆K(X)

Table 8.3: Killed capabilities, K∆K[[P ]], of a process, P .

The unfolding of recursion also complicates the computation of kill information,and therefore we use the parameterised function K∆K

[[ ]], defined in Table 8.3,for the computation. Here, the environment ∆K : Pid → T serves as a mappingthat associates a mapping from labels into extended multisets with each freeprocess identifier. Similar to before, it is used to support the fixed point com-putation required for rec X. P . Note, that we do not need a ΓE environment,because the definition of K∆K

[[ ]] does not rely on EΓE[[ ]].

Again, the interesting case is that of guarded sum constructs,∑

i∈I M ℓi

i . Pi. Itis straightforward to see that every guard of such a construct kills exactly oneoccurrence of each guard in the construct, including itself. It is still the case,however, that each distinct ℓ may have several occurrences in a process P . Weensure the safety of the computed under-approximation by combining the resultsobtained for sub-expressions using minT. As a consequence, the computed K∆K

[[ ]]is always a safe approximation – in the sense that, for every ℓ, K∆K

[[P ]](ℓ) safelyunder-approximates every multiset killed by a specific occurrence of ℓ in P –even when labels are not unique.

As usual, the recursion construct requires a fixed point computation where un-folding is performed until no further information arises from doing so. Thisamounts to the least fixed point computation shown in Table 8.3, which is guar-anteed to terminate because T admits no infinite descending chains.

Again, the desired function, K⋆[[]] , is defined by:

K⋆[[P ]] = K[ ][[P ]]

Convention 8.12 In the case of initial programs, P⋆, we shall use the distin-guished symbol K⋆ to denote K⋆[[P⋆]] . �

Example 8.13 The result, Keat, of subjecting the program Peat to the killed capa-bility analysis is shown below:

Page 175: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Computing and Preserving Exposed Prefixes 153

Keat = ⊤T [ 1 7→ ⊥M [1 7→ 1],2 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],3 7→ ⊥M [3 7→ 1],4 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],5 7→ ⊥M [2 7→ 1, 4 7→ 1, 5 7→ 1],6 7→ ⊥M [6 7→ 1],7 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1],8 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1],9 7→ ⊥M [7 7→ 1, 8 7→ 1, 9 7→ 1]]

Correctness of K∆K[[P ]]. Besides of being an under- rather than an over-

approximation, the correctness of K∆K[[P ]] is rather similar to that of GΓE ,∆G

[[P ]].We have the usual substitution property:

Fact 8.14 Assume Q = rec X. Q′ and C ⊢ Q; if furthermore fpi(Q) = ∅ andP ≺ Q then

K∆K[[P [Q/X ] ]] =

{

K∆K[X 7→⊥T][[P ]] minT K∆K[[Q ]] if X ∈ fpi(P )

K∆K[X 7→⊥T][[P ]] otherwise

Proof The result follows by structural induction on P . �

Then we must ensure that the safety of the approximation is preserved by heat-ing:

Lemma 8.15 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If P ⇛ Q then K∆K[[P ]] ≤T K∆K

[[Q ]].

Proof The proof is by induction on the inference of P ⇛ Q. In the caseof h-urec we use Fact 5.11, that LFP(λK. K∆K[X 7→K][[P ]]) is indeed a fixedpoint, and Fact 8.14. The remaining axioms follow by simple calculations andthe rules by the induction hypothesis. �

Also, we must show that the safety of the approximation is preserved underreduction:

Lemma 8.16 If C ⊢ P and fpi(P ) = ∅ then the following holds:

If Pℓ

−→ Q then K∆K[[P ]] ≤T K∆K

[[Q ]].

Page 176: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

154 Pathway Analysis

Proof The proof is by induction on the shape of the inference of Pℓ

−→Q. The axioms follow by straightforward calculation using that K∆K

[[P ]] =K∆K

[[P [n/p] ]] in the case of communication.

In the case of r-aux we use Lemma 8.15. The remaining rules follow by theinduction hypothesis. �

Corollary 8.17 If PRGC(P ) and Pℓ

−→ Q then K∆K[[P ]] ≤T K∆K

[[Q ]]. �

8.2.3 The Transfer Function

In the classical Bit Vector Frameworks (Section 4.2) transfer functions take theform:

fblock(E) = (E\killblock) ∪ genblock

In the case of a forward analysis, E is the information holding at the entry tothe elementary block, block, killblock is the information invalidated by block, andgenblock is the information created by block.

We follow this template when defining the transfer functions of our setup. How-ever, for us, transitions serve the role of basic blocks. Thus, E is the extendedmultiset of exposed prefixes characterising some state P where a transition,

Pℓ

−→ Q, might be enabled for some Q, K⋆[[P ]] (ℓ) is the extended multiset ofprefixes that is guaranteed to be disabled by the transition, and G⋆[[P ]] (ℓ) is theextended multiset of prefixes that might be enabled by the transition. Thus thetransfer function takes the form:

transferP,ℓ(E) = (E −M K⋆[[P ]] (ℓ)) +M G⋆[[P ]] (ℓ)

Given a multiset of exposed prefixes, E, the transfer function computes, for ana priori given process, P , and transition, ℓ, a safe over-approximation to the setof prefixes exposed after the transition.

Correctness of the Transfer Function. The following result states thatthis transfer function provides a safe approximation to the exposed actions ofthe process that results from the transition:

Theorem 8.18 (Subject reduction) If C ⊢ P and fpi(P ) = ∅ then the fol-lowing holds:

Page 177: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Computing and Preserving Exposed Prefixes 155

If Pℓ

−→ Q then (E⋆[[Q]] ≤M transferP,ℓ(E⋆[[P ]] )).

Proof The proof is by induction of the inference of Pℓ

−→ Q. We have thefollowing cases:

Case r-ent:Writing lhs for

(enter nℓ1 . P + P ′) P ′′ µ1 (accept nℓ2 . Q + Q′) Q′′ µ2

we observe that

E⋆[[P ]] ≤M G⋆[[enter nℓ1 . P + P ′]] (ℓ1) ≤M G⋆[[lhs]] (ℓ1)

and

E⋆[[Q]] ≤M G⋆[[accept nℓ2 . Q + Q′]] (ℓ2) ≤M G⋆[[lhs]] (ℓ2).

Similarly we have

K⋆[[lhs]] (ℓ1) ≤M K⋆[[enter nℓ1 . P + P ′]] (ℓ1) ≤M E⋆[[enter nℓ1 . P + P ′]]

and

K⋆[[lhs]] (ℓ2) ≤M K⋆[[accept nℓ2 . Q + Q′]] (ℓ2) ≤M E⋆[[accept nℓ2 . Q + Q′]] .

By calculation we get

E⋆[[P ]] +M E⋆[[P′′]] +M E⋆[[Q]] +M E⋆[[Q

′′]]

≤M (E⋆[[enter nℓ1 . P + P ′]] +M E⋆[[P′′]] −M E⋆[[enter nℓ1 . P + P ′]] +M E⋆[[P ]] )

+M (E⋆[[accept nℓ2 . Q + Q′]] +M E⋆[[Q′′]] −M E⋆[[accept nℓ2 . Q + Q′]] +M E⋆[[Q]] )

≤M (E⋆[[lhs]] −M K⋆[[lhs]] (ℓ1, ℓ2)) +M G⋆[[lhs]] (ℓ1, ℓ2) (8.2)

which is exactly the desired result. The remaining axioms follow by similarreasoning.

Case r-res:In this case the proof follows from the induction hypothesis as names are ignoredby the Pathway Analysis.

Case r-amb:Again the proof is by the induction hypothesis as ambients are ignored by thePathway Analysis.

Case r-par:It follows from the induction hypothesis that

E⋆[[Q]] ≤M (E⋆[[P ]] −M K⋆[[P ]] (ℓ)) +M G⋆[[P ]] (ℓ).

Furthermore we have

K⋆[[P R]] (ℓ) ≤M K⋆[[P ]] (ℓ)

Page 178: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

156 Pathway Analysis

and

G⋆[[P ]] (ℓ) ≤M G⋆[[P R]] (ℓ)

so that

E⋆[[Q]] ≤M (E⋆[[P ]] −M K⋆[[P ]] (ℓ)) +M G⋆[[P ]] (ℓ)

≤M (E⋆[[P ]] −M K⋆[[P R]] (ℓ)) +M G⋆[[P R]] (ℓ)

Thus we may calculate

E⋆[[Q R]] = E⋆[[Q]] +M E⋆[[R]]

≤M (E⋆[[P ]] −M K⋆[[P R]] (ℓ)) +M G⋆[[P R]] (ℓ) +M E⋆[[R]]

= ((E⋆[[P ]] +M E⋆[[R]] ) −M K⋆[[P R]] (ℓ)) +M G⋆[[P R]] (ℓ)

= (E⋆[[P R]] −M K⋆[[P R]] (ℓ)) +M G⋆[[P R]] (ℓ)

which finishes the case.

Case r-aux:This case is straightforward because the safety of E⋆[[]] ,G⋆[[]] , and K⋆[[]] is pre-served by heating due to Lemmas 8.4, 8.9, and 8.15. �

Corollary 8.19 (Subject reduction)

If PRGC(P ) and Pℓ

−→ Q then E⋆[[Q]] ≤M transferP,ℓ(E⋆[[P ]] ). �

Finally, we establish the semantic soundness of the transfer function as a straight-forward corollary of the following theorem:

Theorem 8.20 (Semantic correctness)

P⋆L

−→⋆Pℓ

−→ Q ⇒ (E⋆[[Q]] ≤M transferP⋆,ℓ(E⋆[[P ]] )).

Proof This follows by induction on the length of L. The base case followsfrom Corollary 8.19. The inductive step is established using Corollaries 8.10,8.16, 8.19, and 5.13. �

This shows that the approximation is safe for all reaction sequences that mayarise from an initial program, P⋆.

8.3 Constructing the Automaton

We now turn to the pragmatics of calculating the finite automata that are thegoals of the Pathway Analysis. Given a program, P⋆, the idea is to construct

Page 179: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Constructing the Automaton 157

a finite automaton, such that the potentially infinite transition system of P⋆ isfaithfully represented within the states and transitions of the automaton. Thecomputed automaton will have the following components:

• A set, Q⋆, of states. Each state, q, is associated with an extended multiset,E⋆[q], and is intended to represent all processes, P , for which E⋆[[P ]] ≤M

E⋆[q].

• An initial state, q⋆ ∈ Q⋆, associated with the corresponding set, E⋆, ofexposed prefixes.

• A transition relation, δ⋆, containing transitions, qs(ℓ1,ℓ2)=⇒ qt, reflecting that

in the state qs, two prefixes, labelled ℓ1 and ℓ2, respectively, may react andgive rise to the new state qt.

We shall refer to this automaton as (Q⋆, q⋆, δ⋆,E⋆).

It will emerge from the construction that the resulting automaton is partially

deterministic, in the sense that, if qsℓ

=⇒ q1 and qsℓ

=⇒ q2 then q1 = q2. Thissuffices for our purposes, and hence we shall not make the effort of adding a failstate in order to obtain a deterministic finite automaton.

8.3.1 The Worklist Algorithm

Motivated by Monotone Frameworks (Section 4.2) the heart of the computationis an iterative worklist algorithm that computes a least solution to the frameworkinstance given as input. It simply starts out from the initial state and constructsthe automaton by adding more and more states and transitions.

The algorithm, which is defined in Table 8.4, computes over four data structures:

• A set, Q, of states.

• A vector, E, of associated multisets of exposed capabilities.

• A worklist, W ⊆ Q, of states that have yet to be processed.

• A set, δ, of transitions valid for the current automaton.

The algorithm is initialised in line (1) and (2). First the start state, q⋆, and

Page 180: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

158 Pathway Analysis

input : Instance (q⋆, E⋆,F⋆)

output : Least solution (Q⋆, q⋆, δ⋆,E⋆)

such that (Q⋆, q⋆, δ⋆,E⋆) |= pathway(P⋆)

method : Step 1: Initialisation of Q,W, δ, and E

(1) Q := {q⋆}; E[q⋆] := E⋆;(2) W := {q⋆}; δ := ∅;

Step 2: Iteration (updating Q,W, δ, and E)(3) while W 6= ∅ do(4) select qs from W; W := W\{qs};

(5) for each ℓ ∈ enabled(E[qs]) do(6) let E = transferP⋆,ℓ(E[qs])

(7) in update(qs, ℓ, E)

Step 3: Presenting the result(Q⋆, q⋆, δ⋆,E⋆) := (Q, q⋆, δ,E)

Table 8.4: Maximal Fixed Point algorithm for the Pathway Analysis.

associated multiset, E⋆, of prefixes exposed by P⋆ are added to the otherwiseempty Q and E. Then the start state, q⋆, is added to the worklist and thecurrent set of transitions, δ, is set to the empty set.

Line (3) defines the iterative loop, which inspects the worklist until it is finallyempty. In every iteration, line (4) selects and removes a state, qs, from theworklist. The set of reactions potentially causing transitions out of qs is thencomputed, using the procedure call enabled(E[qs]), in line (5). The correspond-ing procedure is defined in Subsection 8.3.2 and will yield a set of pairs, i.e.,enabled(E[qs]) ⊆ E[qs]×E[qs]. For each potential reaction, ℓ, an extended multi-set, E, denoting the corresponding next state, is computed in line (6) by a call,transferP⋆,ℓ(E[qs]), to the transfer function of Section 8.2.

Finally, in line (7), the automaton is updated to reflect the new transition usinga call, update(qs, ℓ, E), to the update procedure defined in Section 8.3.3. As weshall see, it is crucial for termination that this update is performed in a waysuch that Q remains finite. In order to ensure this, we shall enable extensivereuse of states, using a clever way of comparing them, and, only if we fail infinding a suitable preexisting state do we add a new one.

Page 181: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Constructing the Automaton 159

8.3.2 Enabled Reactions

Writing E = E[q] for the extended multiset associated with some state, q, weuse the procedure enabled(E) to compute the set of potential reactions of q. Ifℓ1 ∈ dom(E) and ℓ2 ∈ dom(E) then the pair (ℓ1, ℓ2) may be enabled in q if thecorresponding prefixes

1. match in the sense that one is complementary to the other, and

2. may be concurrently possible.

Inspecting the definition of F⋆ for either one of the CFAs of the previous chap-ters, it becomes clear that F⋆ precisely captures the intuitions suggesting (1)and (2).This leads to estimating the set of enabled transitions simply by takingthe pairs in F⋆ where both of the two capabilities are exposed in the presentstate:

enabled(E) = {(ℓ1, ℓ2) | (ℓ1, ℓ2) ∈ F⋆ ∧ ℓ1 ∈ dom(E) ∧ ℓ2 ∈ dom(E)}

Correctness of enabled. The function, enabled, constructed in this way, iscorrect in the sense that it safely over-approximates the set of enabled transi-tions:

Lemma 8.21 If P⋆L

−→⋆Pℓ

−→ Q and E⋆[[P ]] ≤M E then ℓ ∈ enabled(E).

Proof The lemma follows from Theorem 6.19 and Lemma 8.5. �

8.3.3 Updating Data Structures

When updating the data structures with a newly computed transition we mustdo it in such a way that the resulting automaton stays finite and the constructionterminates.

The corresponding procedure, update(qs, ℓ, E), is defined in Table 8.5, wherethe parameters qs, ℓ, and E denote a transition labelled ℓ from state qs to a(potentially new) state characterised by E that is to be added to the automaton.The procedure does the following:

The line (1) first checks whether a suitable state is already present in the au-tomaton. Here, we enforce a partitioning of states according to the domains oftheir corresponding multisets of exposed capabilities; two states, q1 and q2, areidentified if dom(E[q1]) = dom(E[q2]). If a suitable state, q, exists it is used as

Page 182: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

160 Pathway Analysis

(1) if some q ∈ Q with dom(E[q]) = dom(E)(2) then qt := q(3) else select qt from outside Q; Q := Q ∪ {qt}; E[qt] := ⊥M;(4) if ¬(E ≤M E[qt])(5) then E[qt] := E[qt] ∇M E; W := W ∪ {qt}

(6) δ := (δ\{(qs, ℓ, q) | q ∈ Q}) ∪ {(qs, ℓ, qt)};(7) clean-up(Q,W, δ)

Table 8.5: Procedure, update(qs, ℓ, E), for updating states of pathway automata.

the target state of the new transition in line (2). Otherwise a fresh state, qt,is inserted into the automaton, and the corresponding entry in E initialised to⊥M in line (3).

Then, in line (4), it is checked whether the multiset, E[qt], corresponding to thetarget state, already includes the information contributed by E. If this is not thecase the information is updated using a widening operator, ∇M : M × M → M

(see, e.g., [NNH99]), described below, and qt is added to the worklist in line (6).The widening operator is defined by

(M1 ∇M M2)(ℓ) =

M1(ℓ) if M2(ℓ) ≤ M1(ℓ)

M2(ℓ) if M1(ℓ) = 0 ∧ M2(ℓ) > 0

∞ otherwise

and guarantees that new information is added to the pre-existing in a mannerthat stabilises after finitely many iterations and still ensures that M1 maxM M2≤M

M1 ∇M M2.

In line (7) the new transition, (qs, ℓ, qt), is added to the automaton, while thepre-existing transition(s) out of qs along ℓ is removed as the destination may nolonger be correct. This may leave some states unreachable, and, in line (8), asuitable clean-up procedure, defined in the following section, is invoked in orderto ensure that these are removed.

Remark 8.22 (Complexity) It is clear that the worst-case complexity of thePathway Analysis depends on the maximal number of states, as well as the num-ber of transitions leaving each state. Thus, controlling the size of the automatonis one way of controlling the complexity of the algorithm. The partitioning ofstates, enforced by the check dom(E[q]) = dom(E) in line (1), asserts this kindof control. It is, of course, possible to devise other such granularity functionsthat result in higher or lower complexity and precision.

Another handle to control the complexity is the widening. Here, we interpret

Page 183: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Constructing the Automaton 161

Qreach := {q⋆} ∪ {q | ∃n,∃q1, · · · , qn : (q⋆, · · · , q1) ∈ δ ∧ · · · ∧ (qn, · · · , q) ∈ δ};

Q := Q ∩ Qreach;

W := W ∩ Qreach;

δ := δ ∩ (Qreach × (Lab × Lab) × Qreach)

Table 8.6: Procedure, clean-up(Q,W, δ), for eliminating dead states.

growing numbers of exposed prefixes as a sign of infinite behaviour and go to⊤M immediately upon detecting such behaviour. One could, of course, makeother choices, e.g., in order to separate the first k steps of a recurrent behaviour.

However, in this dissertation we shall not investigate any of these options. �

Cleaning Up

The cleanup procedure, shown in Table 8.6, simply computes the set of states,Qreach, that is reachable from the start state, q⋆, and uses this to restrict the setof states, Q, the set of transitions, δ, and the worklist, W, by intersection.

Example 8.23 Applying the Pathway Analysis to Peat we obtain the result shownbelow. Comparing to Fig. 3.28 this corresponds to the exact behaviour that arisesfrom collapsing frames 1,2, and 3 into one and 5, 6, and 7 into one. In each of thesecases it is not possible to distinguish the states corresponding to the collapsed framesbecause the multisets of globally exposed reaction capabilities are the same.

q_0 q_1 q_2 q_3

(4,8)(4,8) (4,1)(4,1) (5,9)(5,9)

(7,2)(7,2) (6,3)

Page 184: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

162 Pathway Analysis

8.3.4 Correctness of the Algorithm

Intuitively, the outlined worklist algorithm is correct if it terminates producing afinite partially deterministic automaton able to faithfully simulate all transitionsequences of the program P⋆ of interest.

We address these issues separately starting with termination, where the follow-ing result holds:

Lemma 8.24 (Termination) The worklist algorithm always terminates.

Proof Clearly, for any program P⋆ the algorithm operates over a finite set,Lab⋆, of labels. Now let us consider a possibly non-terminating execution.Notethat Q as well as E[·] grow in a non-decreasing manner.

The set {dom(E[q]) | q ∈ Q} grows in a non-decreasing manner and sincedom(E) ⊆ Lab⋆ the value of the set must eventually stabilise. After this thetest in line 1) of Table 8.5 will always succeed and the production of new statesin line 3) will cease. Thus Q stabilises.

The vector (E[q])q∈Q grows in a non-decreasing manner, but due to the proper-ties of the widening it must eventually stabilise, thereby stopping the growth ofW.

From this point on lines (4-7) of Table 8.4 will decrease the size of W by onein every iteration. Eventually W will be empty and hence the algorithm willterminate. �

We then turn to the correctness of the format of the output. Here we have:

Lemma 8.25 The worklist algorithm always produces a partially deterministicautomaton.

Proof We prove the claim by showing that

∀(q1, ℓ1, q′1), (q2, ℓ2, q

′2) ∈ δ : q1 = q2 ∧ ℓ1 = ℓ2 ⇒ q′1 = q′2

is an invariant at line (4) of Table 8.4. It is maintained due to the constructionof δ in line (6) of Table 8.5. �

Finally, we address the correctness of the contents of the output. We will showthis by a simulation result. We shall say that a state q, denoting the exposed

Page 185: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Constructing the Automaton 163

capabilities E, represents a process, P , whenever P ⊲ E where

P ⊲ E iff E⋆[[P ]] ≤M E

Using this we can state:

Lemma 8.26 If PRGC(P ), P ⇛ Q and P ⊲ E then Q ⊲ E.

Proof This follows from Corollary 8.5. �

Now the following result shows that a single step in the semantics is correctlysimulated by the automaton:

Theorem 8.27 The worklist algorithm produces a finite automaton, (Q, q0, δ,E),such that, if

P ⊲ E[q] and Pℓ

−→ Q,

then there exists a unique q′ ∈ Q, such that

Q ⊲ E[q′] and (q, ℓ, q′) ∈ δ.

Proof Consider the last time, t0, that the state q was removed from W inline (4) of Table 8.4. Now let E0 denote the corresponding values of the datastructures such that E0[q] = E[q] and hence P ⊲ E0[q].

It follows from Pℓ

−→ Q, Lemma 8.21, and the fact that enabled is monotonic,that ℓ ∈ enabled(E0[q]) and, hence, that ℓ is selected for consideration in line (5)of Table 8.4. By Theorem 8.18 line (6) then produces E such that Q ⊲ E.

Following line (7) of Table 8.4 it is immediate that lines (1-3) of Table 8.5identify a state, q′, and that lines (4-6) yield (q, ℓ, q′) ∈ δ1 and E ≤M E1[q

′],where δ1 and E1 denote the corresponding new data structures.

As this is the last iteration over q there will be no further calls of update(q, ℓ, ...).Thus, line (8) of Table 8.6 will not remove (q, ℓ, q′) from δ at a later stage.Clearly, the values of E[·] grow in a non-decreasing manner, and, writing δ andE for the final values of the data structures, we have (q, ℓ, q′) and E ≤M E1[q

′]≤M

E[q′], which completes the proof.

Uniqueness of q′ follows from Lemma 8.25. �

Page 186: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

164 Pathway Analysis

q_0 q_1

q_2

q_3

q_24

q_4

q_5

q_6

q_7

q_8

q_9

q_10

q_11

q_12

q_22

q_13

q_14

q_15

q_16 q_17

q_18

q_19

q_20

q_21 q_23 q_25q_26 q_27

q_28

q_29

q_30

q_31

q_32(1, 19)

(8, 26)

(12, 26)

(2, 20)

(9, 27)

(1, 13)

(10, 28)

(2, 14)

(11, 29)

(3, 15)

(16, 27)

(16, 27)

(17, 28)

(17, 28)

(18, 29)

(18, 29)

(32, 30)

(32, 30)

(32, 30)

(33, 34)

(33, 34)

(33, 34)

(35, 5) (7, 6)

(21, 26) (3, 22)

(23, 27)

(23, 27)

(24, 28)

(24, 28)

(4, 31)(4, 31)

(4, 31)

(4, 31)

(4, 31)

(4, 31)

(4, 31)

(4, 31)(4, 31)

(25, 29)(25, 29)

Figure 8.1: Pathway analysis of normal receptor LDL model.

We may define the reflexive and transitive closure of δ inductively as follows:

(q⋆, ε, q⋆) ∈ δ⋆

(q⋆,Λ, q) ∈ δ⋆ (q, ℓ, q′) ∈ δ

(q⋆,Λℓ, q′) ∈ δ⋆

This allows us to state the following corollary:

Corollary 8.28 The worklist algorithm produces a finite automaton, (Q, q⋆, δ,E),such that, if

P⋆Λ

−→⋆P,

then there exists a q ∈ Q, such that

P ⊲ E[q] and (q⋆,Λ, q) ∈ δ⋆. �

Proof The result follows straightforwardly by induction on the length of Λ. �

This shows that arbitrary reaction sequences are correctly simulated by theautomaton.

8.4 CASE: Analysing the LDL Degradation Path-

way

When subjecting the normal receptor LDL pathway model to Pathway Analysiswe obtain the result shown in Fig. 8.2, independently of the particular CFAused to produce the auxiliary relation F⋆.

The resulting automaton clearly exhibits the transitions that might lead to theconfigurations identified by the 2CFA analysis. However, some of the exhib-ited transitions are spurious, false positives that cannot really take place. In

Page 187: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing Genetic Transcription 165

q_0

q_1

q_2

q_3 q_4 q_5 q_6 q_7(8, 26)

(12, 26)

(9, 27) (10, 28) (11, 29) (32, 30) (33, 34)

(a) extra-cellular defects

q_0 q_1 q_2(1, 19) (2, 20)

(b) intra-cellular defects

Figure 8.2: Pathway analysis of defect receptor LDL models.

particular, only one of the transitions labelled (4, 31), corresponding to LDLentering the XV , is actually possible, namely the one going from state 16 intostate 17. This is because the EE must merge with the LE before LDL canenter the XV . Every other occurrence of (4, 31) can be attributed to the factthat the Pathway Analysis incorporates no context information. Consequently,the states 11, 13, 15, 28, 30, and 32 are analysis artifacts, i.e., states that are notreachable in practice, and the state 20 is actually a dead end.

8.4.1 Related Diseases

When the LDL pathway model that incorporates exo-plasmic receptor defectsis subjected to Pathway Analysis we obtain the pathway automaton shown inFig. 8.2(a). It is evident that the internalisation of EE still works completelyas intended. Only the extracellular binding capacity is affected. This resultcontains no surprises.

In the case of cytosolic defects we obtain the pathway automaton shown inFig. 8.2(b). Here we see that the receptor protein can ligate an LDL particle,but the steps that correspond to internalisation, i.e., LDL entering EE, cannotoccur. Note in particular, that only the steps associated with the aforementionedmodelling artifact might occur. The remaining steps, leading to the rather wildover-approximation of Fig. 7.5(b), cannot occur.

8.5 CASE: Analysing Genetic Transcription

We now turn to the examination of the abstract model of genetic transcriptionthat was presented in Section 3.1.2. When subjecting this model to PathwayAnalysis we obtain the pathway automaton shown in Fig. 8.3, regardless of theCFA used for computing F⋆.

The result is rather imprecise, and in more than one way:

Page 188: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

166 Pathway Analysis

q_0

q_1q_2

q_3

q_4

q_5

q_6

q_7

q_8q_9

q_10

q_11

q_12

q_13

q_14

q_15 q_16

(1, 15)(2, 16)

(3, 17)

(18, 10) (4, 17)

(18, 9) (5, 17)

(18, 8)(6, 17)

(7, 19)

(11, 19)

(11, 19)

(11, 19)

(11, 19)

(12, 19)(12, 19)

(12, 19)

(12, 19)

(7, 20)

(11, 20)

(11, 20)

(11, 20)

(11, 20)

(11, 20)

(11, 20)

(12, 20)

(12, 20)

(12, 20)

(12, 20)

(12, 20)

(12, 20)

Figure 8.3: Pathway analysis of genetic transcription model.

First of all, the system is not expected to have dead end states. Indeed, inspec-tion reveals that the edges (12, 20), (11, 20), (7, 19) are analysis artifacts, and, ifthis is taken into account, the model seems quite reasonable.

Secondly, the edges (11, 19), (12, 19), corresponding to the attachment of actualnucleotides, always appear in pairs. This is adequately explained by the factthat the Pathway Analysis does not take the binding of names into account.

8.6 Concluding Remarks

The CFAs of the previous chapters are mainly concerned with the spatial prop-erties of reachable configurations. In contrast, the Pathway Analysis focuseson the causal properties of the realisable transition sequences. Given a model,P⋆, the Pathway Analysis computes a finite, partially deterministic, automatonthat safely over-approximates the set of, possibly infinite, sequential behavioursthat are realisable by P⋆.

In aims and scope the Pathway Analysis is superficially related to previousoccurrence counting analyses of ambient calculi [HJNN99, LM04, GL05]. How-ever, the Pathway Analysis counts capability prefixes, rather than ambients,and, technically, it resorts to an adaptation of classical Monotone Frameworks[KU77, NNH99], rather than Abstract Interpretation. Thus, the analysis ex-tends a line of work on data flow related analyses for the CCS family of processcalculi [NN06, NN08].

Pathway automata model system configurations by extended multisets of ex-posed prefixes, The dynamic nature of processes is captured by transfer functionsin the manner of classical Bit-vector Frameworks. These functions determinehow extended multisets grow and shrink as prefixes react. A worklist algorithm

Page 189: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Concluding Remarks 167

computes the final automaton. It expands the state space iteratively until nonew states arises by application of the transfer functions.

The practical value of the Pathway Analysis is, in part, determined by its com-putational complexity. As presented here the algorithm is exponential, but thepresented algorithm is remarkably flexible in this respect. Three mechanisms intotal are used to control the tradeoff between precision and termination prop-erties of the algorithm:

Firstly, the analysis counts and hence a notion of widening, in the manner ofAbstract Interpretation [CC77, CC79, NNH99], helps to ensure termination inthe presence of infinite multiplicities.

Secondly, the F⋆ relation, computed by either of the previously presented CFAs,is used to bound the number of enabled transitions explored by the algorithm.

Finally, an equivalence function on states is used to decide whether a computedstate is new or constitutes an update to an already existing state. The only suchequivalence explored in this dissertation is equality for dom(E), but in practicea variety of functions may be used – as long as they are finitary, stable, andinjective in the terminology of [NN08]. A fine-grained equivalence is likely togenerate fairly precise analysis results at a high complexity. A coarser equiva-lence relation will give more approximative analysis results at a lower (perhapseven low polynomial) complexity.

Analysis results obtained in the context of the two ongoing case studies arepromising, but also indicate viable avenues of improvement:

When applied to the normal receptor LDL pathway model of Section 3.3 theanalysis retrieves an automaton that quite clearly identifies the interaction se-quences that constitute the pathway. There are a few spurious edges. Theseanalysis artifacts owe to the fact that the analysis does not take context infor-mation into account. The results obtained from the defect receptor models arealso promising. In the case of exoplasmic defects the resulting pathway automa-ton basically accounts for the findings of the CFAs. But, in the case of cytosolicdefects, the pathway automaton excludes the interactions that might lead tothe wild over-approximations observed in the corresponding CFA results. Theobtained result is invariant with respect to the CFA used to produce F⋆.

When applied the transcription model of Section 3.4 the analysis retrieves anautomaton that exhibits the overall behaviour of the transcription process, i.e.a step-wise elongation procedure that may be interrupted at any point, shouldthe required metabolites not materialise. Again, however, the result containsanalysis artifacts that are clearly connected to the fact that the analysis does

Page 190: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

168 Pathway Analysis

not take the binding of names into account.

As this model is flat, in the sense that it does not model spatial structure, it isnot surprising that the obtained result is also invariant with respect to the CFAused for producing F⋆. This is in contrast to the LDL pathway result, whereone might have hoped that 2CFA would lead to better F⋆ estimates.

Technically, the two analysis approaches are complementary — one is contextsensitive and considers the bindings of names, while the other is flow-sensitivebut ignores bindings and context. This is evident in the results, which seem toindicate that an iterative positive-negative feedback loop between the two anal-yses would improve the precision of both; an idea that immediately motivatesthe developments of the following chapter.

Page 191: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 9

An Iterative Analysis

“It’s performance-based. It’s systemic. Not just an item here orthere. It comes together as a system and operates better. The endis greater than the sum of the parts.”

— Jim Folkman

This chapter presents a iterative analysis for the BioAmbients language. Theanalysis is evaluated in the context of both the LDL degradation pathway ofSection 3.3 and the transcription model of Section 3.4. None of the presentedmaterial has previously been covered.

At first sight the two types of analysis presented in the previous chapters arequite different. The CFA analyses of Chapter 6 and 7 approximate the set ofreachable spatial configurations. In contrast, the Pathway Analysis of Chapter 8approximates the set of realisable causal sequences. These differences, however,are not as profound as they might appear. Clearly, the analyses are all concernedwith different aspects of the same thing, namely the run-time behaviour ofBioAmbients processes, and, whichever way you put it, this is closely related tothe set of reactions that might occur.

The Pathway Analysis already takes avantage of this by using the F⋆ relation,produced by either of the CFA analyses, as a safe approximation to the setof run-time enabled reactions. The Pathway Analysis further refines this firstestimate internally, when computing the set of enabled transitions, becausein each state only capabilities with a positive number of occurrences can beenabled. This is evident in the Pathway Analysis result for the LDL pathway(Section 8.4), where δ⋆ does not contain all members, ℓ, of F⋆. As δ⋆ is formallyan over-approximation it is safe to conclude that the unused members of F⋆

Page 192: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

170 An Iterative Analysis

INPUT : a Flow Logic FL(BioAmbients,alfp),

a BioAmbients process P⋆, and

A safe estimate, CP, of the concurrently possible capabilities.

OUTPUT : an alfp formula ϕ such that

(I,R,F) |= ϕ ⇔ (I,R,F) |=⊤ P⋆.

METHOD : Set ϕ := (V

{CP⋆(ℓ1, ℓ2) | (ℓ1, ℓ2) ∈ CP}) ∧

(V

{C(n) | n ∈ C}) ∧

completeR∧

(I,R,F) |=⊤ P

while ϕ contains (I,R,F) |=µ P ′

and there is a rule α iff β in FL

and a substitution θ

such that θα = (I,R,F) |=µ P ′

do replace (I,R,F) |=µ P ′ with θβ in ϕ.

Table 9.1: Parameterised 0CFA clause generator for iteration.

cannot take place.

In the following we take advantage of this observation, and show how a simpleiterative analysis can improve on the previous results.

The chapter contains fours sections. In Section 9.1 we present the iterativeanalysis and state some correctness results. Then, in Section 9.2, we apply it tothe LDL pathway model in order to obtain evaluation data. In Section 9.3 weapply it to the transcription model. Finally, in Section 9.4, we summarise ourfindings.

9.1 Analysis

In the following we shall develop an iterative approach to CFA and PathwayAnalysis. In doing so, we pursue the idea that both types of analysis should, ofcourse, only consider the least set of reactions that is known to safely approxi-mate the true set of run-time enabled reactions.

The Pathway Analysis is already parameterised on a safe estimate F⋆ of therun-time enabled reactions, and, in order to specify the iterative analysis, we

Page 193: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Analysis 171

INPUT : a BioAmbients process P⋆.

OUTPUT : (I⋆,R⋆,F⋆) and (Q⋆, q⋆, δ⋆,E⋆) such that

(I⋆,R⋆,F⋆) |=⊤ P⋆, (Q⋆, q⋆, δ⋆, E⋆) |= pathway(P⋆), and

F⋆ = {ℓ | ∃q1, q2 : (q1, ℓ, q2) ∈ δ⋆}

METHOD : ϕ0 := genConstraints(CFA, P⋆, CP⋆);

(I,R,F) := solve(ϕ0);

(Q, q⋆, δ, E) := pathway(P⋆,F);

CP := {ℓ | ∃q1, q2 : (q1, ℓ, q2) ∈ δ};

while F 6= CP do

ϕ := genConstraints(CFA, P⋆, CP);

(I,R,F) := solve(ϕ);

(Q, q⋆, δ, E) := pathway(P⋆,F);

CP := {ℓ | ∃q1, q2 : (q1, ℓ, q2) ∈ δ};

(I⋆,R⋆,F⋆) := (I,R,F);

(Q⋆, q⋆, δ⋆, E⋆) := (Q, q⋆, δ, E)

Table 9.2: Iterative analysis algorithm.

shall parameterise the CFAs in a similar way. The basic idea is to pipe the set ofreactions contained in δ⋆ back into the CFA as a refined version of the estimate,CP⋆, of concurrently possible capabilities. This requires a modification of theCFA clause generators. In the case of 0CFA the resulting clause generator isshown in Table 9.1, and the 2CFA can be similarly modified.

In this context the iterative analysis for BioAmbients is defined as shown inTable 9.2, where CFA can be either the 0CFA or the 2CFA. The algorithm startsby computing an ordinary CFA, followed by an ordinary Pathway Analysis, inorder to obtain first estimates for F and CP. If these sets are not equal thiscomputation is repeated until they are.

Lemma 9.1 (Termination) The algorithm terminates.

Proof The CFA ensures that F ⊆ CP and the Pathway Analysis ensures that{ℓ | ∃q1, q2 : CP ⊆ F}; hence each iteration either shrinks the estimate of run-time enabled reactions or the algorithm terminates. As P(Lab⋆ × Lab⋆) doesnot admit infinite descending chains the algorithm will eventally terminate. �

Theorem 9.2 (Soundness) The outputs, (I⋆,R⋆,F⋆) and (Q⋆, q⋆, δ⋆,E⋆), con-stitute safe approximations in the manner of Theorems 6.19, 7.30, and 8.28.

Page 194: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

172 An Iterative Analysis

XVLY SO

LE

CC

EE

CH

CELL

LDL

q_0 q_1

q_2

q_3

q_24

q_4

q_5

q_6

q_7

q_8

q_9

q_10

q_11

q_12

q_22

q_13

q_14

q_15

q_16 q_17

q_18

q_19

q_20

q_21 q_23 q_25q_26 q_27

q_28

q_29

q_30

q_31

q_32(1, 19)(8, 26)

(12, 26)

(2, 20)

(9, 27)

(1, 13)

(10, 28)

(2, 14)

(11, 29)

(3, 15)

(16, 27)

(16, 27)

(17, 28)

(17, 28)(18, 29)

(18, 29)(32, 30)

(32, 30)

(32, 30)

(33, 34)

(33, 34)

(33, 34)

(35, 5) (7, 6)(21, 26) (3, 22)

(23, 27)

(23, 27)

(24, 28)

(24, 28)(4, 31)(4, 31)

(4, 31)(4, 31)

(4, 31)

(4, 31)

(4, 31)

(4, 31)(4, 31)

(25, 29)(25, 29)

Figure 9.1: Iteration of 0CFA and Pathway Analysis — normal receptors.

Proof This follows directly from the fact that each F and CP of the descendingchain are formally over-approximations. �

9.2 CASE: Analysing the LDL Degradation Path-

way

We now conclude our examination of the LDL pathway model by subjecting itto the iterative analyses outlined in the previous section.

We start by applying the 0CFA and the Pathway Analysis iteratively to thenormal receptor LDL model. This yields the containment graph and the path-way automaton shown in Fig. 9.1. Here we experience no difference from theresults previously obtained by the ordinary 0CFA (Section 6.3) and the ordi-nary pathway analysis (both based on 0CFA and 2CFA in Section 8.4). Thecorresponding 0CFA analysis estimates, presented verbatim in Appendix B.1.1

Page 195: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing the LDL Degradation Pathway 173

CELL

LY SO LE

XV

XVCC

EE

EE

LDL

LDLLDL

LDL

LDLLDL

LDL

LDL

CH

CHCH

CHCH CH

CH

CH

CH

Figure 9.2: Iteration of 2CFA and Pathway Analysis — normal receptors.

and Appendix B.3.1, are identical.

We then go on to apply the 2CFA and the Pathway Analysis iteratively to thenormal receptor model. This yields the containment graph shown in Fig. 9.2 andwe abstain from showing the related pathway automaton, as it is identical to thatshown in Fig. 9.1. Again, there is no observable difference between the showngraph and the previous result for the ordinary 2CFA (Section 7.5). Indeed, thecorresponding 2CFA analysis estimates, presented verbatim in Appendix B.2.1and Appendix B.4.1, are identical.

From these observations we conclude that both the 0CFA and 2CFA results areas precise as possible for this particular model, and, hence, that the 2CFA isvastly more informative than the 0CFA. In the context of the previous exami-nations this is unsurprising.

9.2.1 Related Diseases

We now turn to revisit the disease related models, turning first to the issue ofdefect exo-plasmic binding sites. When the corresponding model is subjected to

Page 196: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

174 An Iterative Analysis

XV

LY SO

LE

CC

EE

CH

CELL LDL

q_0

q_1

q_2

q_3 q_4 q_5 q_6 q_7(8, 26)

(12, 26)

(9, 27) (10, 28) (11, 29) (32, 30) (33, 34)

Figure 9.3: Iteration of 0CFA and Pathway Analysis — exo-plasmic defects.

CELL

LY SO LE

XV

XVCC

EE

EE

LDL

CH

q_0

q_1

q_2

q_3 q_4 q_5 q_6 q_7(8, 26)

(12, 26)

(9, 27) (10, 28) (11, 29) (32, 30) (33, 34)

Figure 9.4: Iteration of 2CFA and Pathway Analysis — exo-plasmic defects.

iterative application of 0CFA and Pathway Analysis we obtain the containmentgraph and pathway automaton shown in Fig. 9.3. At first sight the result looksidentical to those previously obtained by the 0CFA (Section 6.3), but inspectionof the corresponding analysis results, printed verbatim in Appendices B.1.2 andB.3.2, shows that the number of reactions considered by the iterative 0CFAresult is less than half of that considered by the ordinary 0CFA result. Asshown in Fig. 9.4, this result basically repeats itself for the 2CFA, also for theverbatim analysis results in Appendices B.2.2 and B.4.2.

We then turn to the model of defect cytosolic binding sites. When this model issubjected to iterative application of 0CFA and Pathway Analysis we obtain the

Page 197: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing the LDL Degradation Pathway 175

XV

LY SO LE CC EE

CH

CELL

LDL

q_0 q_1 q_2(1, 19) (2, 20)

Figure 9.5: Iteration of 0CFA and Pathway Analysis — cytosolic defects.

CELL

LY SO LE

XV

CC EE LDL

LDL

CH

CH

q_0 q_1 q_2(1, 19) (2, 20)

Figure 9.6: Iteration of 2CFA and Pathway Analysis — cytosolic defects.

containment graph and pathway automaton shown in Fig. 9.5. This result showsa remarkable improvement over the result previously obtained by the ordinary0CFA (Section 6.3). It is now easy to see that cytosolic defects also cripplethe system and prevent LDL particles from being internalised. The exo-plasmicdomain of the receptor might still ligate LDL particles, but the internal bindingsites cannot adhere to the clathrin coat that forms around the coated pit andshapes the early endosome. Hence, the LDL particles cannot be internalised,and the occurrence of LDL in the cytosol is caused by the previously mentionedmodelling artifact. The corresponding analysis results can be found in AppendixB.1.3 and B.3.3. Again, the iterative application of 2CFA, shown in Fig. 9.6exhibits a similar improvement (see Appendix B.2.3 and B.4.3).

Page 198: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

176 An Iterative Analysis

q_0

q_1 q_2

q_3q_4

q_5q_6

q_7 q_8 q_9

q_10

(1, 15)(2, 16)

(3, 17)

(18, 10) (4, 17)

(18, 9) (5, 17)

(18, 8)(6, 17) (11, 19)(11, 19)

(11, 19)

(11, 19)

(12, 19)(12, 19)

(12, 19)

(12, 19)

(7, 20)

Figure 9.7: Iterative analysis — transcription of single gene.

9.3 CASE: Analysing Genetic Transcription

To complete the examination of the case studies we now return to the transcrip-tion model of Section 3.1.3. The result of iterative application of any CFA andPathway Analysis is the pathway automaton shown in Fig. 9.7.

It is immediate to see that this result is much more precise than the correspond-ing automaton of the previous chapter. In particular, all of the spurious edgesand dead end states have disappeared, and the transcription process is nowplain to see; the two first transitions orchestrate the coordination compoundcomprising the gene and the polymerase. This is followed by four elongationsteps, where first the next nucleotide of the sequence is read from the gene(single edge forward), and then either a suitable nucleotide tri-phosphate isligated (double edge forward, one for each possible nTP) or the transcriptionis terminated prematurely (backedge to start state). After four such roundstranscription is terminated normally, and the coordination compound is broken(backedge to start state).

The one remaining imprecision is that the analysis cannot distinguish the par-ticular nTP that is ligated in any of the elongation steps. This is no surprise asthe Pathway Analysis does not take name bindings into account.

Due to the lack of information regarding name bindings crosstalk is inevitablewhen, e.g., more than one gene is available for transcription. Hence, we cannotexpect the analysis to scale very well. Unfortunately this suspicion is confirmedby the analysis result shown in Fig. 9.8, which is obtained by iterative analysis ofa two-gene transcription system. Due to the locking of coordination compounds,we expect the transcription of one gene to terminate, before transcription ofthe other commences. Thus, intuitively, the corresponding pathway automatonwould simply be two separate one-gene automata, but this is not the case.

Page 199: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

CASE: Analysing Genetic Transcription 177

q_0

q_1

oHudr

q_2

yItJvz

q_3

PxBOR

q_4

rajtRZ

q_5dbSA

L

q_6

rY21Xp

F9yIH9

q_21

YmOiH1

q_22

rY21Xp

F9yIH9

pEfKGs

q_7lZ2jP

Zegw1

n9

q_8

yItJvz

q_9

seLoT

q_10

rY21Xp

F9yIH9

q_51

rajtRZ

q_53

pxeXK

yOojf

q_11lZ2jP

Zegw1

n9

q_12

yItJvz

q_13

nAG0N

q_14

rY21Xp

F9yIH9

q_58

rajtRZ

q_59yboE

R

d8RPi

q_15lZ2jP

Zegw1

n9

q_16

yItJvz

q_17PKW

Vk

q_18

rY21Xp

F9yIH9

q_64

rajtRZ

q_117

p5Px3

q_19

f2YKA

q_20

lZ2jPZ

egw1n9

q_150

yItJvz

rY21Xp

F9yIH9

utpA8

q_86

rY21Xp

F9yIH9

rZlr7g

q_23lZ2jP

Zegw1

n9

q_24

oHudr

q_25

tZ9rFH

q_26

rY21Xp

F9yIH9

PxBOR

q_126

YXOynC

NGtEFj

q_27

lZ2jPZ

egw1n9

q_28

oHudr

q_29

s4M6zs

q_30

rY21Xp

F9yIH9

q_122

PxBOR

q_123

PSNlEa

GmtSMR

q_31lZ2jP

Z

egw1n9

q_32

oHudr

q_33rY21

Xp

F9yIH9

q_34

VK8XI7

q_118

PxBOR

q_119

XpmfJW

q_35

oHudr

ney0PR

q_36lZ2jP

Z

egw1n9

q_37

PxBOR

q_38

APTlht

tKjEhb

q_54rY21

Xp

F9yIH9

q_39

dbSAL

q_40rY21

Xp

F9yIH9

q_41

VK8XI7

Zr4yh

q_42

tKjEhb

q_43

rY21Xp

F9yIH9

GmtSMR

pEfKGs

q_147

lZ2jPZ

egw1n9

ney0PR

q_45lZ2jP

Z

egw1n9

q_44

yItJvz

q_46

PxBOR

q_47rajtR

Z

tKjEhb

q_141

dbSAL

q_142

rY21Xp

F9yIH9

q_48dbSA

L

q_49

NV50wo

q_50

rY21Xp

F9yIH9

Zr4yhq_13

9

YmOiH1

q_140

rY21Xp

F9yIH9

q_52

pEfKGs

lZ2jPZ

egw1n9

lZ2jPZ

egw1n9

q_56

dbSAL

q_137

YmOiH1

q_138

rY21Xp

F9yIH9

oHudr

q_55

seLoT

NV50wo

q_57

rY21Xp

F9yIH9

q_114

oHudr

yOojf

lZ2jPZ

egw1n9

pEfKGs

lZ2jPZ

egw1n9

q_60

seLoT

q_61

YmOiH1

q_62

rY21Xp

F9yIH9

NV50wo

q_115

nAG0N

q_116

rY21Xp

F9yIH9

yOojf

lZ2jPZ

egw1n9

rZlr7g

pEfKGs

q_63

lZ2jPZ

egw1n9

q_65

seLoT

q_66

tZ9rFH

q_67rY21

Xp

F9yIH9

YmOiH1

q_108

nAG0N

q_109

rY21Xp

F9yIH9

rZlr7g

yOojf

q_69

lZ2jPZ

egw1n9

NGtEFj

pEfKGs

q_68lZ2jP

Zegw1

n9

s4M6zs

q_70seLo

T

q_71rY21

Xp

F9yIH9

tZ9rFH

q_73

nAG0N

q_74rY21

XpF9yIH

9

NGtEFj

yOojf

q_72lZ2jP

Z

egw1n9

q_76

nAG0N

q_77

s4M6zs

q_78rY21

Xp

F9yIH9

rZlr7g

d8RPi

q_75lZ2jP

Z

egw1n9

tZ9rFH

q_102

PKWVk

q_111

rY21Xp

F9yIH9

NGtEFj

d8RPi

q_80

lZ2jPZ

egw1n9

GmtSMR

yOojf

q_79lZ2jP

Z

egw1n9

q_81

nAG0N

q_82rY21

XpF9yIH

9

q_83

VK8XI7

s4M6zs

q_84

PKWVk

q_85rY21

Xp

F9yIH9

GmtSMR

d8RPi

q_105

lZ2jPZ

egw1n9

ney0PR

yOojf

q_95lZ2jP

Zegw1

n9

f2YKA

NGtEFj

q_87lZ2jP

Z

egw1n9

q_88

yItJvz

utpA8

q_89

s4M6zs

q_90rY21

XpF9yIH

9

q_92rajtR

Z

f2YKA

GmtSMR

q_91lZ2jP

Z

egw1n9

utpA8

q_93

rY21Xp

F9yIH9

q_94

VK8XI7

utpA8

YmOiH1

q_103

rY21Xp

F9yIH9

ney0PR

f2YKA

q_96

lZ2jPZ

egw1n9

tKjEhb

q_97

nAG0N

q_98rY21

XpF9yIH

9

utpA8

tKjEhb

q_100

rY21Xp

F9yIH9

ney0PR

d8RPi

q_99

lZ2jPZ

egw1n9

tKjEhb

PKWVk

q_101

rY21Xp

F9yIH9

f2YKA

rZlr7g

q_104

lZ2jPZ

egw1n9

utpA8

tZ9rFH

q_106

rY21Xp

F9yIH9

PKWVk

VK8XI7

q_107

rY21Xp

F9yIH9

d8RPi

q_110

lZ2jPZ

egw1n9

YmOiH1

q_112

PKWVk

q_113

rY21Xp

F9yIH9

f2YKA

lZ2jPZ

egw1n9

PxBOR

d8RPi

lZ2jPZ

egw1n9

NV50wo

q_131

PKWVk

q_132

rY21Xp

F9yIH9

dbSAL

q_120

s4M6zs

q_121

rY21Xp

F9yIH9

Zr4yh

q_129

rY21Xp

F9yIH9

q_130

VK8XI7

GmtSMR

lZ2jPZ

egw1n9

dbSAL

q_124

tZ9rFH

q_125

rY21Xp

F9yIH9

Zr4yh

q_127

s4M6zs

q_128

rY21Xp

F9yIH9

NGtEFj

lZ2jPZ

egw1n9

Zr4yh

q_148

tZ9rFH

q_149

rY21Xp

F9yIH9

GmtSMR

lZ2jPZ

egw1n9

lZ2jPZ

egw1n9

q_144

ney0PR

q_133

f2YKA

q_134

lZ2jPZ

egw1n9

rY21Xp

F9yIH9

q_135

NV50woutpA8

NV50wo

q_136

rY21Xp

F9yIH9

lZ2jPZ

egw1n9

rZlr7g

lZ2jPZ

egw1n9

rZlr7g

lZ2jPZ

egw1n9

ney0PR

pEfKGs

q_143

lZ2jPZ

egw1n9

tKjEhb

seLoT

q_145

rY21Xp

F9yIH9

rY21Xp

F9yIH9

q_146

Zr4yh

lZ2jPZ

egw1n9

seLoT

VK8XI7

q_151

rY21Xp

F9yIH9

NGtEFj

lZ2jPZ

egw1n9

rajtRZ

bwzDO

Figure 9.8: Iterative analysis — transcription of two genes. The figure is not in-tended for reading. Rather, it illustrates the combinatorial blow-up that emergesin the presence of spurious cross-talk between similar subsystems.

Page 200: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

178 An Iterative Analysis

9.4 Concluding Remarks

The main contribution of this chapter is a simple idea that, to a large extent,reconciles the otherwise orthogonal analyses of previous chapters. These anal-yses all take as input a safe estimate of the set of run-time enabled reactionsand produce as output a more precise estimate that remains safe. Thus, to ob-tain the best possible estimate we simply iterate, alternating CFA and PathwayAnalysis until the estimate stabilises. This approach is somewhat related to thedistinction that is made in Data Flow Analysis between faint and dead variables[NNH99], but, in our case, facilitates analysis refinement rather than programoptimisation.

The case studies show that the approach works quite well, in particular for pro-totypical examples. This positive outcome owes to the profound difference inthe approaches of CFA and Pathway Analysis. The former combines contextsensitivity with binding information in order to approximate the set of enabledprefixes, whereas the latter relies solely on flow-sensitivity. The outlined itera-tive approach achieves synergy between these features.

However, the study of genetic transcription also shows that, even for the iter-ative approach, the Pathway Analysis does not scale in the presence of modelsthat exhibit several similar pathways. Due to the lack of appropriate name bind-ing information the analysis simply cannot distinguish between the pathways.Consequently, the results are obfuscated by cross-talk.

Similarly, the study of the LDL degradation pathway shows that the Path-way Analysis would benefit from the incorporation of context information. Theobtained result exhibits several spurious edges corresponding to capabilities re-acting ‘out of context’.

In contrast, the iterative CFA results are remarkably precise. Of course, 2CFAis much more informative than 0CFA, but both exhibit good results. And, whileCFA does not seem to offer much information about flat models, such as thegenetic transcription model, they still prove very useful to the iterative analysis.

Page 201: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Chapter 10

Conclusion

“I never do last words. They always seem so final.”— Laurell K. Hamilton

In this dissertation we have investigated the use of Static Program Analysistechniques in conjunction with BioAmbients process expressions that modelsub-cellular biological systems. Our main thesis was that

The approximative approach of Static Program Analysis can beused to automatically decide biologically relevant properties of pro-cess calculus expressions that model biological systems.

Where by interesting properties we mean anything that can be used to eitherpinpoint errors during model development or predict the consequences of per-turbations. For the purpose of this dissertation, however, we have focused onthe reachability of spatial configurations, a property related to the operation ofthe secretory pathway, and the realisability of sequential behaviours, a propertyrelated to the operation of cellular pathways in general.

The aim of this chapter is to evaluate our findings with respect to the abovethesis. In order to do this we start in Section 10.1 by briefly summing upthe work that has been contributed in support of the thesis. In Section 10.2 wegloss over the main findings. Then, in Section 10.3, we discuss how the obtainedresults reflect upon the thesis and discuss a number of research ideas that wouldfurther elaborate on and support the thesis.

Page 202: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

180 Conclusion

10.1 Contributions

In order to support the thesis we have contributed a number of technical devel-opments:

• We have formalised a variant of the BioAmbients language. The pro-posed variant incorporates a general recursion construct in the manner ofthe Calculus of Communicating Systems [Mil80] and discards unrestrictednon-deterministic choice in favour of guarded sums (Chapter 3).

• The introduction of general recursion complicates many aspects of thetechnical developments. We have resolved these issues for a class of well-formed initial programs (Chapter 5).

• In order to analyse for structural/spatial properties we have adapted thetraditional 0CFA (mono-variant Control Flow Analysis) approach of FlowLogic into a space efficient analysis that for any BioAmbients program,P⋆, safely approximates the set of reachable configurations by a singletree (Chapter 6).

• Being dissatisfied with the resulting 0CFA we have made a further adap-tation that incorporates context in the manner of 2CFA and relies on mul-tiple auxiliary minor analyses in order to significantly extend the 0CFAapproach (Chapter 7).

• To further analyse for causal/temporal properties we have adapted theclassical approach of Monotone Frameworks into a ‘Pathway Analysis’that for any BioAmbients program, P⋆, safely approximates the set ofrealisable transition sequences by a finite automaton (Chapter 8).

• Finally, we have devised an iterative algorithm that, to a certain extent,achieves synergy between the context sensitivity of the Control Flow Anal-yses and the flow sensitivity of the Pathway Analysis, thereby improvingthe results of both (Chapter 9).

We have proved that the specified analyses are both exhaustive and correctwith respect to BioAmbients programs. Furthermore we have shown that theanalyses admit the computation of best analysis results and, hence, that theycan be implemented; all results presented throughout the dissertation have beenautomatically computed by working prototype implementations.

Page 203: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Evaluation Results 181

10.2 Evaluation Results

What remains to be discussed is the issue of usefulness, which, in turn, is closelyrelated to the tradeoff between precision and efficiency that is inherently presentin all static analyses [NNH99]. Throughout the dissertation we have evaluatedeach of the analyses by applying it to a number of variations on two small casestudies:

Genetic transcription is the process by which genetic information is tran-scribed into mRNA molecules. The process involves the formation ofa temporary coordination compound (complex) that comprises severalmolecules but operates as a single unit. We have subjected abstract mod-els comprising an RNA polymerase, a supply of nucleotide tri-phosphate,and either a single or two genes to Pathway Analysis (Section 8.5) anditerative analysis (Section 9.3).

The LDL degradation pathway is a prototypical receptor mediated endo-cytic pathway. The pathway may fail due to defects in the genetic codingof crucial receptor binding sites. High cholesterol and cardiovascular dis-eases are well-known pathological indications associated with such failure.We have subjected abstract models of both normal and defect systemsto control flow (Section 6.3 and Section 7.5), pathway (Section 8.4), anditerative analysis (Section 9.2).

The simplest, and computationally least expensive, analysis is the 0CFA. Thissimplicity, however, is paid in terms precision. The 0CFA is generally too weakto clearly expose interesting properties in an intelligible manner. In our studiesof the LDL degradation pathway (Section 6.3) the analysis does detect seriousperturbations, but in all cases the results exhibit many analysis artifacts.

The complexity of the 2CFA analysis is somewhat higher. Correspondingly,the results exhibit very few analysis artifacts and, in many cases, they seem tocapture the exact set of reachable spatial configurations (Section 7.5). Somewhatsurprisingly, however, the 2CFA does not do significantly better than the 0CFAin pinpointing the effects of receptor related perturbations of the LDL pathwaymodel. Due to the higher precision of the 2CFA, however, we are now able tosee that this inadequacy owes to the lack of flow-sensitivity in the CFAs.

The Pathway Analysis, as it is presented in Chapter 8, constitutes another in-crease in complexity and exhibits an asymptotic worst-case complexity that isexponential in the size of the computed automaton. This complexity is notinherent as particular choices of widening operators and equivalence relations

Page 204: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

182 Conclusion

can be used to trade precision for efficiency (Section 8.3.3). The results arequite precise. In particular, the effects of receptor related perturbations of theLDL pathway model are pinpointed very precisely. In fact, the analysis resultfor the normal LDL pathway model (Section 8.4) is good enough to pinpointthe causal relationship between all of the events depicted in Fig. 2.8 while ex-hibiting only a few analysis artifacts. These artifacts arise because the analysisdoes not incorporate context information (Section 8.4). In the case of the genetranscription model the analysis result is also informative, but here we see moreanalysis artifacts (Section 8.5). These artifacts owe to crosstalk, between seem-ingly unrelated capability prefixes, that occurs because the analysis does notinclude name binding information.

These findings are supported by the iterative analysis strategy of Chapter 9.Here the context and name binding information tracked by a CFA analysisinfluences the Pathway Analysis and, in turn, the flow information tracked by thePathway Analysis influences the CFA analysis. The improvement is immediate:In the case of receptor related perturbations of the LDL pathway model bothof the CFAs now pinpoint the effects very precisely (Section 9.2). And, in thecase of genetic transcription, the Pathway Analysis becomes much more preciseand exhibits nearly no analysis artifacts (Section 9.3). The iterative PathwayAnalysis results for the ordinary LDL pathway model, on the other hand, is notimproved. This owes to the very indirect nature of the mutual influence. Thisalso explains the rather poor result obtained for the 2-gene transcription model(Section 9.3), which seems to indicate that, presently, the iterative strategy doesnot scale well to systems with multiple similar pathways.

10.3 Conclusion and Further Work

In order to finally conclude on the thesis we return to the central question: Canstatic analysis decide interesting properties of models of biological systems?

Our claim, based on the outlined results, is that yes, it can. The establishedbattery of analyses, which ranges in complexity from (low) polynomial to expo-nential, is able to accurately, and with increasing precision, pinpoint the mostessential aspects of the studied case models.

Based on these results we conjecture that the set of analyses presented in thisdissertation constitutes a strong tool that can both support the development ofprototypical models and automate significant parts of their post-modelling anal-ysis. Beyond the presented results the former point, in particular, is supportedby our, subjective, practical modelling experiences.

Page 205: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Conclusion and Further Work 183

Further Work The presented results are very promising. The single sourgrape in our basket of attractive fruits is the fact that the analysis packagedoes not maintain the high precision when models grow. Fortunately, the abovereflections on the matter reveals a number of key points that may be addressedby further research in order to improve on this:

Flow Sensitive Control Flow Analysis The results clearly indicate thatthe Control Flow Analyses would benefit from the inclusion of flow-sensitivity[BC05]. In terms of complexity vs. precision, a set of flow sensitive CFAs,independent of the expensive Pathway Analysis, would cover the middle groundbetween the present CFAs and the Pathway Analysis, and thus be a worthwileextension of the present analysis package. An auxiliary flow-relation, in thevein of the relevant name estimate of Section 7.2, and localised per-label namebinding estimates would likely be required.

Name Tracking Pathway Analysis It is also quite clear that the PathwayAnalysis would benefit from the inclusion of name binding information. Again,using the CFAs to obtain such information is clearly not the best way. Ratherit would be prudent for the Pathway Analysis itself to track the bindings ofnames, much in the manner of the CFAs. This type of analysis is already beinginvestigated in [NN07], and it seems that an integration with the present Path-way Analysis would be straightforward. In terms of complexity we note that,even though more information is considered in order to make the analysis moreprecise, the average complexity might well decline as the increased precisionwould, most likely, lead to fewer transitions in the pathway automaton.

Context Sensitive Pathway Analysis Another way to improve the PathwayAnalysis would be to include context information in the manner of e.g. 2CFA.Primarily, this appears to be a necessary precondition for meaningful PathwayAnalysis of elaborate cellular systems, where many processes are completelyseparated by physical barriers. It would also add another dimension to the it-erative analysis strategy of Chapter 9 as the contextualised Pathway Analysisand the 2CFA would be able to exchange contextualised, rather than the presentflat, information about realisable reactions. However, as a contextualised Path-way Analysis is likely to be computationally quite expensive, the alternativeoption – incorporating contextualised name binding information directly intothe Pathway Analysis – may constitute a more worthwhile pursuit.

Post-analysis Data Mining For large models all of the presented, and indeedall of the above suggested, static analyses are likely to compute quite overwhelm-ing analysis estimates that are beyond human interpretation. Thus it would beobvious to supplement the analyses with a tool-set for post-analysis data-mining.In particular, the idea of using model checking techniques, based on, e.g., Com-putation Tree Logic, for analysing the always finite pathway automata is quite

Page 206: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

184 Conclusion

obvious. A more risky, but potentially interesting, endeavour would be to usemodel checking related techniques for guiding the computation of the PathwayAnalysis, thereby a priori restricting the universe of interest [Saı00]. Dually,the analysis information may be used to direct the model checking of the modelitself, which perhaps, in turn, could lead to another iterative approach [BV01].

Stochastic Pathway Analysis Clearly, the Pathway Analysis is somehow re-lated to the state space aggregation algorithms used by the stochastic processalgebra community [GHR01, BSW06]. Thus, one could ask if the PathwayAnalysis could somehow be used to simplify stochastic models? Well, a di-rect coupling seems infeasible – the effect of the various approximations on thestochastic information is rather unpredictable. In the vein of the suggestions ofthe previous paragraph, however, it may be possible to use pathway automatafor guiding stochastic model checking [FSCR04].

Other Modelling Languages Finally, one can take a fundamentally differentapproach and try to influence the precision of the analyses by choosing anothermodelling formalism. The benefit would be to obtain a cleaner model of, e.g.,transport pathways, in the case of Brane calculus, thereby avoiding some analy-sis artifacts. This is feasible. On the one hand, it would be a large undertaking,and the analysis artifacts would most likely just “move” to another aspects ofthe model. But on the other hand, a recent study comparing BioAmbients andBrane calculus leads to optimism as the calculi exhibit more similarities thandifferences [Ver07].

Page 207: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Appendix A

Variants of the LDL

Degradation Pathway

Page 208: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

186 Variants of the LDL Degradation Pathway

A.1 The LDL Pathway with Normal Receptors

LipoProtein =LDLrcpt#!{ApoB}1 . enter ApoB2 . enter ee3 . enter xv4 . proc ?{Hydr}5 .

( expel Hydr6 .0exit Hydr7 .0 CH ) LDL

Endo =enter AP223,16,9 . exit AP224,17,10 .merge– Le25,18,11 .0

EarlyEndo =accept ee22,15 .Endo EE

ClathrinCoat =EErcpt ?{ap2}26 . accept ap227 . expel ap228 .0 CC

XferVesicle =( accept xv31 .0

exit Le32 .merge– lyso33 .0 ) XV

LateEndo =( merge+ Le29 . expel Le30 .0

XferVesicle ) LE

Lysosome =merge+ lyso34 . proc !{hydr}35 .0 LY SO

Cell =( ( EErcpt !{AP2}8 . Endo EE

+EErcpt !{AP2}12 . LDLrcpt#?{apob}13 . accept apob14 .EarlyEndo

+LDLrcpt#?{apob}19 . accept apob20 . EErcpt !{AP2}21 .EarlyEndo )ClathrinCoat

LateEndo

Lysosome ) CELL

(LDLrcpt) (EErcpt) (ApoB) (AP2) (ee) (cc) (lyso) (xv)(Le) (proc) (hydr) (ap2) (ap)

( LipoProtein Cell )

Page 209: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

187

A.2 The LDL Pathway with Defects in Exoplas-

mic Domain

LipoProtein =LDLrcp#!{ApoB}1 . enter ApoB2 . enter ee3 . enter xv4 . proc ?{Hydr}5 .

( expel Hydr6 .0exit Hydr7 .0 CH ) LDL

Endo =enter AP223,16,9 . exit AP224,17,10 .merge– Le25,18,11 .0

EarlyEndo =accept ee22,15 .Endo EE

ClathrinCoat =EErcpt ?{ap2}26 . accept ap227 . expel ap228 .0 CC

XferVesicle =( accept xv31 .0

exit Le32 .merge– lyso33 .0 ) XV

LateEndo =( merge+ Le29 . expel Le30 .0

XferVesicle ) LE

Lysosome =merge+ lyso34 . proc !{hydr}35 .0 LY SO

Cell =( ( EErcpt !{AP2}8 . Endo EE

+EErcpt !{AP2}12 . LDLrcpt#?{apob}13 . accept apob14 .EarlyEndo

+LDLrcpt#?{apob}19 . accept apob20 . EErcpt !{AP2}21 .EarlyEndo )ClathrinCoat

LateEndo

Lysosome ) CELL

(LDLrcpt) (LDLrcp) (EErcpt) (ApoB) (AP2) (ee) (cc) (lyso)(xv) (Le) (proc) (hydr) (ap2) (ap)

( LipoProtein

Cell )

Page 210: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

188 Variants of the LDL Degradation Pathway

A.3 The LDL Pathway with Defects in Cytosolic

Domain

LipoProtein =LDLrcpt#!{ApoB}1 . enter ApoB2 . enter ee3 . enter xv4 . proc ?{Hydr}5 .

( expel Hydr6 .0exit Hydr7 .0 CH ) LDL

Endo =enter AP223,16,9 . exit AP224,17,10 .merge– Le25,18,11 .0

EarlyEndo =accept ee22,15 .Endo EE

ClathrinCoat =EErcp ?{ap2}26 . accept ap227 . expel ap228 .0 CC

XferVesicle =( accept xv31 .0

exit Le32 .merge– lyso33 .0 ) XV

LateEndo =( merge+ Le29 . expel Le30 .0

XferVesicle ) LE

Lysosome =merge+ lyso34 . proc !{hydr}35 .0 LY SO

Cell =( ( EErcpt !{AP2}8 . Endo EE

+EErcpt !{AP2}12 . LDLrcpt#?{apob}13 . accept apob14 .EarlyEndo

+LDLrcpt#?{apob}19 . accept apob20 . EErcpt !{AP2}21 .EarlyEndo )ClathrinCoat

LateEndo

Lysosome ) CELL

(LDLrcpt) (EErcpt) (EErcp) (ApoB) (AP2) (ee) (cc) (lyso)(xv) (Le) (proc) (hydr) (ap2) (ap)

( LipoProtein

Cell )

Page 211: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Appendix B

Analysis Results for the LDL

Degradation Pathway

Page 212: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

190 Analysis Results for the LDL Degradation Pathway

B.1 Analysis Results for the 0CFA

B.1.1 The normal LDL Pathway

µ I(µ)LY SO CH,LDL,merge– lyso33,

exit Le32, accept xv31,merge+ lyso34, proc !{hydr}35

XV CH,LDL, accept xv31,exit Le32,merge– lyso33

LE CH,LDL,merge– Le25,exit AP224, enter AP223, accept ee22,merge– Le18, exit AP217, enter AP216,accept ee15,merge– Le11, exit AP210,enter AP29,merge+ Le29, expel Le30,XV

CC EE,XV ,LE,EErcpt ?{ap2}26, accept ap227, expel ap228

EE CH,LDL, enter AP29,exit AP210,merge– Le11, accept ee15,enter AP216, exit AP217,merge– Le18,accept ee22, enter AP223, exit AP224,merge– Le25

CELL CH,LDL,EErcpt !{AP2}8,EE,EErcpt !{AP2}12, LDLrcpt#?{apob}13,accept apob14, LDLrcpt#?{apob}19, accept apob20,EErcpt !{AP2}21, CC,XV ,LE,LY SO

CH exit Hydr7

LDL LDLrcpt#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP CH,LDL,CELL

Page 213: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

191

n R(n)ap2 AP2apob ApoBHydr hydr

ℓ F(ℓ)33 3432 3021 2612 269 2710 2811 2916 2717 2818 2923 2724 2825 298 267 635 54 313 22, 152 20, 141 19, 13

Page 214: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

192 Analysis Results for the LDL Degradation Pathway

B.1.2 The LDL Pathway with Defects in Exoplasmic Do-

main

µ I(µ)LY SO merge– lyso33, exit Le32, accept xv31,

merge+ lyso34, proc !{hydr}35

XV accept xv31, exit Le32,merge– lyso33

LE merge– Le25, exit AP224, enter AP223,accept ee22,merge– Le18, exit AP217,enter AP216, accept ee15,merge– Le11,exit AP210, enter AP29,merge+ Le29,expel Le30,XV

CC EE,XV ,LE,EErcpt ?{ap2}26, accept ap227, expel ap228

EE enter AP29, exit AP210,merge– Le11,accept ee15, enter AP216, exit AP217,merge– Le18, accept ee22, enter AP223,exit AP224,merge– Le25

CELL EErcpt !{AP2}8, EE,EErcpt !{AP2}12,LDLrcpt#?{apob}13, accept apob14, LDLrcpt#?{apob}19,accept apob20, EErcpt !{AP2}21, CC,XV ,LE,LY SO

CH exit Hydr7

LDL LDLrcp#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP LDL,CELL

n R(n)ap2 AP2

ℓ F(ℓ)33 3432 3021 2612 269 2710 2811 2916 2717 2818 2923 2724 2825 298 26

Page 215: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

193

B.1.3 The LDL Pathway with Defects in Cytosolic Do-

main

µ I(µ)LY SO CH,LDL,merge– lyso33,

exit Le32, accept xv31,merge+ lyso34, proc !{hydr}35

XV CH,LDL, accept xv31,exit Le32,merge– lyso33

LE CH,LDL,merge– Le25,exit AP224, enter AP223, accept ee22,merge– Le18, exit AP217, enter AP216,accept ee15,merge– Le11, exit AP210,enter AP29,merge+ Le29, expel Le30,XV

CC EErcp ?{ap2}26, accept ap227, expel ap228

EE CH,LDL, enter AP29,exit AP210,merge– Le11, accept ee15,enter AP216, exit AP217,merge– Le18,accept ee22, enter AP223, exit AP224,merge– Le25

CELL CH,LDL,EErcpt !{AP2}8,EE,EErcpt !{AP2}12, LDLrcpt#?{apob}13,accept apob14, LDLrcpt#?{apob}19, accept apob20,EErcpt !{AP2}21, CC,XV ,LE,LY SO

CH exit Hydr7

LDL LDLrcpt#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP CH,LDL,CELL

n R(n)apob ApoBHydr hydr

ℓ F(ℓ)33 3432 3011 2918 2925 297 635 54 313 22, 152 20, 141 19, 13

Page 216: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

194 Analysis Results for the LDL Degradation Pathway

B.2 Analysis Results for the 2CFA

B.2.1 The normal LDL Pathway

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

LDL,

CELL LE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL CC EE exit AP224, merge– Le25, exit AP217,

merge– Le18, exit AP210, merge– Le11,

LDL,

CELL EE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL XV LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LY SO LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LDL CH exit Hydr7,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34, accept xv31,

LDL, CH,

⊤ CELL LE XV , expel Le30, merge+ Le29,

LDL,

⊤ CELL CC expel ap228, accept ap227, EErcpt ?{ap2}26,

EE,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29, LDL,

⊤ CELL XV accept xv31, merge– lyso33, LDL,

⊤ CELL LDL enter ee3, enter xv4, proc ?{Hydr}5,

expel Hydr6, CH,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, XV ,

CC, EErcpt !{AP2}21, accept apob20,

LDLrcpt#?{apob}19, accept apob14,

LDLrcpt#?{apob}13, EErcpt !{AP2}12, EE,

EErcpt !{AP2}8, LDL,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcpt#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

CC EE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

LE XV LDL proc ?{Hydr}5, expel Hydr6, CH,

LE LDL CH exit Hydr7,

EE LDL CH exit Hydr7,

XV LDL CH exit Hydr7,

LY SO LDL CH exit Hydr7,

Page 217: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

195

µgp µp µ p R(µgp, µp, µ, p)

⊤ CELL CC ap2 AP2,

⊤p ⊤ CELL apob ApoB,

CELL LY SO LDL Hydr hydr,

CELL LY SO CH Hydr hydr,

LY SO LDL CH Hydr hydr,

ℓ F(ℓ)

2 20

2 14

7 6

35 5

4 31

3 22

3 15

1 19

1 13

8 26

9 27

10 28

11 29

16 27

17 28

18 29

23 27

24 28

25 29

12 26

21 26

32 30

33 34

Page 218: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

196 Analysis Results for the LDL Degradation Pathway

B.2.2 The LDL Pathway with Defects in Exoplasmic Do-

main

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

CELL CC EE exit AP224, merge– Le25, exit AP217,

merge– Le18, exit AP210, merge– Le11,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34, accept xv31,

⊤ CELL LE XV , expel Le30, merge+ Le29,

⊤ CELL CC expel ap228, accept ap227, EErcpt ?{ap2}26,

EE,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29,

⊤ CELL XV accept xv31, merge– lyso33,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, XV ,

CC, EErcpt !{AP2}21, accept apob20,

LDLrcpt#?{apob}19, accept apob14,

LDLrcpt#?{apob}13, EErcpt !{AP2}12, EE,

EErcpt !{AP2}8,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcp#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

µgp µp µ p R(µgp, µp, µ, p)

⊤ CELL CC ap2 AP2,

ℓ F(ℓ)

8 26

9 27

10 28

11 29

16 27

17 28

18 29

23 27

24 28

25 29

12 26

21 26

32 30

33 34

Page 219: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

197

B.2.3 The LDL Pathway with Defects in Cytosolic Do-

main

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

LDL,

CELL LE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL EE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL XV LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LY SO LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LDL CH exit Hydr7,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34, accept xv31,

LDL, CH,

⊤ CELL LE XV , expel Le30, merge+ Le29,

LDL,

⊤ CELL CC expel ap228, accept ap227, EErcp ?{ap2}26,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29, LDL,

⊤ CELL XV accept xv31, merge– lyso33, LDL,

⊤ CELL LDL enter ee3, enter xv4, proc ?{Hydr}5,

expel Hydr6, CH,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, XV ,

CC, EErcpt !{AP2}21, accept apob20,

LDLrcpt#?{apob}19, accept apob14,

LDLrcpt#?{apob}13, EErcpt !{AP2}12, EE,

EErcpt !{AP2}8, LDL,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcpt#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

LE XV LDL proc ?{Hydr}5, expel Hydr6, CH,

LE LDL CH exit Hydr7,

EE LDL CH exit Hydr7,

XV LDL CH exit Hydr7,

LY SO LDL CH exit Hydr7,

µgp µp µ p R(µgp, µp, µ, p)

⊤p ⊤ CELL apob ApoB,

CELL LY SO LDL Hydr hydr,

CELL LY SO CH Hydr hydr,

LY SO LDL CH Hydr hydr,

Page 220: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

198 Analysis Results for the LDL Degradation Pathway

ℓ F(ℓ)

2 20

2 14

7 6

35 5

4 31

3 22

3 15

1 19

1 13

11 29

18 29

25 29

32 30

33 34

Page 221: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

199

B.3 Analysis Results for the Iterative 0CFA

B.3.1 The normal LDL Pathway

µ I(µ)

LY SO CH, LDL, merge– lyso33,exit Le32, accept xv31, merge+ lyso34, proc !{hydr}35

XV CH, LDL, accept xv31,exit Le32, merge– lyso33

LE CH, LDL, merge– Le25,exit AP224, enter AP223, accept ee22,merge– Le18, exit AP217, enter AP216,accept ee15, merge– Le11, exit AP210,enter AP29, merge+ Le29, expel Le30, XV

CC EE, XV , LE,EErcpt ?{ap2}26, accept ap227, expel ap228

EE CH, LDL, enter AP29,exit AP210, merge– Le11, accept ee15,enter AP216, exit AP217, merge– Le18,accept ee22, enter AP223, exit AP224, merge– Le25

CELL CH, LDL, EErcpt !{AP2}8,EE, EErcpt !{AP2}12, LDLrcpt#?{apob}13,accept apob14, LDLrcpt#?{apob}19, accept apob20,EErcpt !{AP2}21, CC, XV ,LE, LY SO

CH exit Hydr7

LDL LDLrcpt#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP CH, LDL, CELL

n R(n)

ap2 AP2apob ApoBHydr hydr

ℓ F(ℓ)

33 3432 3021 2612 269 2710 2811 2916 2717 2818 2923 2724 2825 298 267 635 54 313 22, 152 20, 141 19, 13

Page 222: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

200 Analysis Results for the LDL Degradation Pathway

B.3.2 The LDL Pathway with Defects in Exoplasmic Do-

main

µ I(µ)

LY SO merge– lyso33, exit Le32, accept xv31,merge+ lyso34, proc !{hydr}35

XV accept xv31, exit Le32, merge– lyso33

LE merge– Le25, exit AP224, enter AP223,accept ee22, merge– Le18, exit AP217,enter AP216, accept ee15, merge– Le11,exit AP210, enter AP29, merge+ Le29,expel Le30, XV

CC EE, XV , LE,EErcpt ?{ap2}26, accept ap227, expel ap228

EE enter AP29, exit AP210, merge– Le11,accept ee15, enter AP216, exit AP217,merge– Le18, accept ee22, enter AP223,exit AP224, merge– Le25

CELL EErcpt !{AP2}8, EE, EErcpt !{AP2}12,LDLrcpt#?{apob}13, accept apob14, LDLrcpt#?{apob}19,accept apob20, EErcpt !{AP2}21, CC,XV , LE, LY SO

CH exit Hydr7

LDL LDLrcp#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP LDL, CELL

n R(n)

ap2 AP2

ℓ F(ℓ)

33 3432 3012 269 2710 2811 298 26

Page 223: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

201

B.3.3 The LDL Pathway with Defects in Cytosolic Do-

main

µ I(µ)

LY SO merge+ lyso34, proc !{hydr}35

XV accept xv31, exit Le32, merge– lyso33

LE merge+ Le29, expel Le30, XVCC EErcp ?{ap2}26, accept ap227, expel ap228

EE enter AP29, exit AP210, merge– Le11,accept ee15, enter AP216, exit AP217,merge– Le18, accept ee22, enter AP223,exit AP224, merge– Le25

CELL LDL, EErcpt !{AP2}8, EE,EErcpt !{AP2}12, LDLrcpt#?{apob}13, accept apob14,LDLrcpt#?{apob}19, accept apob20, EErcpt !{AP2}21,CC, LE, LY SO

CH exit Hydr7

LDL LDLrcpt#!{ApoB}1, enter ApoB2, enter ee3,enter xv4, proc ?{Hydr}5, expel Hydr6, CH

TOP LDL, CELL

n R(n)

apob ApoB

ℓ F(ℓ)

2 201 19

Page 224: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

202 Analysis Results for the LDL Degradation Pathway

B.4 Analysis Results for the Iterative 2CFA

B.4.1 The normal LDL Pathway

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

LDL,

CELL LE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL CC EE exit AP224, merge– Le25, exit AP217,

merge– Le18, exit AP210, merge– Le11,

LDL,

CELL EE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

CELL XV LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LY SO LDL proc ?{Hydr}5, expel Hydr6, CH,

CELL LDL CH exit Hydr7,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34, accept xv31,

LDL, CH,

⊤ CELL LE XV , expel Le30, merge+ Le29,

LDL,

⊤ CELL CC expel ap228, accept ap227, EErcpt ?{ap2}26,

EE,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29, LDL,

⊤ CELL XV accept xv31, merge– lyso33, LDL,

⊤ CELL LDL enter ee3, enter xv4, proc ?{Hydr}5,

expel Hydr6, CH,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, XV ,

CC, EErcpt !{AP2}21, accept apob20,

LDLrcpt#?{apob}19, accept apob14,

LDLrcpt#?{apob}13, EErcpt !{AP2}12, EE,

EErcpt !{AP2}8, LDL,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcpt#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

CC EE LDL enter xv4, proc ?{Hydr}5, expel Hydr6,

CH,

LE XV LDL proc ?{Hydr}5, expel Hydr6, CH,

LE LDL CH exit Hydr7,

EE LDL CH exit Hydr7,

XV LDL CH exit Hydr7,

LY SO LDL CH exit Hydr7,

Page 225: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

203

µgp µp µ p R(µgp, µp, µ, p)

⊤ CELL CC ap2 AP2,

⊤p ⊤ CELL apob ApoB,

CELL LY SO LDL Hydr hydr,

CELL LY SO CH Hydr hydr,

LY SO LDL CH Hydr hydr,

ℓ F(ℓ)

2 20

2 14

7 6

35 5

4 31

3 22

3 15

1 19

1 13

8 26

9 27

10 28

11 29

16 27

17 28

18 29

23 27

24 28

25 29

12 26

21 26

32 30

33 34

Page 226: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

204 Analysis Results for the LDL Degradation Pathway

B.4.2 The LDL Pathway with Defects in Exoplasmic Do-

main

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

CELL CC EE exit AP210, merge– Le11,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34, accept xv31,

⊤ CELL LE XV , expel Le30, merge+ Le29,

⊤ CELL CC expel ap228, accept ap227, EErcpt ?{ap2}26,

EE,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29,

⊤ CELL XV accept xv31, merge– lyso33,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, XV ,

CC, EErcpt !{AP2}21, accept apob20,

LDLrcpt#?{apob}19, accept apob14,

LDLrcpt#?{apob}13, EErcpt !{AP2}12, EE,

EErcpt !{AP2}8,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcp#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

µgp µp µ p R(µgp, µp, µ, p)

⊤ CELL CC ap2 AP2,

ℓ F(ℓ)

8 26

9 27

10 28

11 29

12 26

32 30

33 34

Page 227: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

205

B.4.3 The LDL Pathway with Defects in Cytosolic Do-

main

µgp µp µ I(µgp, µp, µ)

CELL LE XV merge– lyso33, exit Le32, accept xv31,

CELL LDL CH exit Hydr7,

⊤ CELL LY SO proc !{hydr}35, merge+ lyso34,

⊤ CELL LE XV , expel Le30, merge+ Le29,

⊤ CELL CC expel ap228, accept ap227, EErcp ?{ap2}26,

⊤ CELL EE merge– Le25, exit AP224, enter AP223,

accept ee22, merge– Le18, exit AP217,

enter AP216, accept ee15, merge– Le11,

exit AP210, enter AP29,

⊤ CELL LDL enter ee3, enter xv4, proc ?{Hydr}5,

expel Hydr6, CH,

⊤ LDL CH exit Hydr7,

⊤p ⊤ CELL LY SO, LE, CC,

EErcpt !{AP2}21, accept apob20, LDLrcpt#?{apob}19,

accept apob14, LDLrcpt#?{apob}13, EErcpt !{AP2}12,

EE, EErcpt !{AP2}8, LDL,

⊤p ⊤ LDL CH, expel Hydr6, proc ?{Hydr}5,

enter xv4, enter ee3, enter ApoB2,

LDLrcpt#!{ApoB}1,

⊤gp ⊤p ⊤ CELL, LDL,

µgp µp µ p R(µgp, µp, µ, p)

⊤p ⊤ CELL apob ApoB,

ℓ F(ℓ)

2 20

1 19

Page 228: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

206 Analysis Results for the LDL Degradation Pathway

Page 229: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Appendix C

Analysis Results for Genetic

Transcription

Page 230: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

208 Analysis Results for Genetic Transcription

C.1 Ordinary 2CFA - One Gene

µgp µp µ I(µgp, µp, µ)

⊤gp ⊤p ⊤ MRNA, b?{dp}20, p?{dp}19,

b!{d}18, gene?{p}17, gene?{b}16,

tr?{gene}15, t!{t}14, g!{g}13,

c!{c}12, a!{a}11, b1?{d11}10,

b1?{d11}9, b1?{d11}8, b1!{d1}7,

g1!{c}6, g1!{a}5, g1!{c}4,

g1!{a}3, g1!{b1}2, tr!{g1}1,

µgp µp µ p R(µgp, µp, µ, p)

⊤gp ⊤p ⊤ gene g1,

⊤gp ⊤p ⊤ b c, a, b1,

⊤gp ⊤p ⊤ dp c, a, d1,

⊤gp ⊤p ⊤ p c, a, b1,

⊤gp ⊤p ⊤ d11 d,

ℓ F(ℓ)

18 8

18 9

18 10

7 19

7 20

2 17

2 16

3 17

3 16

4 17

4 16

11 19

11 20

5 17

5 16

12 19

12 20

6 17

6 16

1 15

Page 231: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

209

C.2 Iterative 2CFA - One Gene

µgp µp µ I(µgp, µp, µ)

⊤gp ⊤p ⊤ MRNA, b?{dp}20, p?{dp}19,

b!{d}18, gene?{p}17, gene?{b}16,

tr?{gene}15, t!{t}14, g!{g}13,

c!{c}12, a!{a}11, b1?{d11}10,

b1?{d11}9, b1?{d11}8, b1!{d1}7,

g1!{c}6, g1!{a}5, g1!{c}4,

g1!{a}3, g1!{b1}2, tr!{g1}1,

µgp µp µ p R(µgp, µp, µ, p)

⊤gp ⊤p ⊤ gene g1,

⊤gp ⊤p ⊤ p c, a,

⊤gp ⊤p ⊤ dp c, a, d1,

⊤gp ⊤p ⊤ b b1,

⊤gp ⊤p ⊤ d11 d,

ℓ F(ℓ)

18 8

18 9

18 10

7 20

2 16

3 17

4 17

11 19

5 17

12 19

6 17

1 15

Page 232: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

210 Analysis Results for Genetic Transcription

C.3 Ordinary 2CFA - Two Genes

µgp µp µ I(µgp, µp, µ)

⊤gp ⊤p ⊤ MRNA, b?{dp}30, p?{dp}29,

b!{d}28, gene?{p}27, gene?{b}26,

tr?{gene}25, t!{t}24, g!{g}23,

c!{c}22, a!{a}21, b2?{d21}20,

b2?{d21}19, b2?{d21}18, b2!{d2}17,

g2!{c}16, g2!{a}15, g2!{c}14,

g2!{a}13, g2!{b2}12, tr!{g2}11,

b1?{d11}10, b1?{d11}9, b1?{d11}8,

b1!{d1}7, g1!{c}6, g1!{a}5,

g1!{c}4, g1!{a}3, g1!{b1}2,

tr!{g1}1,

µgp µp µ p R(µgp, µp, µ, p)

⊤gp ⊤p ⊤ gene g2, g1,

⊤gp ⊤p ⊤ b c, a, b2, b1,

⊤gp ⊤p ⊤ dp c, a, d2, d1,

⊤gp ⊤p ⊤ p c, a, b2, b1,

⊤gp ⊤p ⊤ d21 d,

⊤gp ⊤p ⊤ d11 d,

ℓ F(ℓ)

7 29

7 30

2 27

2 26

3 27

3 26

4 27

4 26

5 27

5 26

6 27

6 26

1 25

28 8

28 9

28 10

28 18

28 19

28 20

17 29

17 30

12 27

12 26

13 27

13 26

14 27

14 26

21 29

21 30

15 27

15 26

22 29

22 30

16 27

16 26

11 25

Page 233: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

211

C.4 Iterative 2CFA - Two Genes

µgp µp µ I(µgp, µp, µ)

⊤gp ⊤p ⊤ MRNA, b?{dp}30, p?{dp}29,

b!{d}28, gene?{p}27, gene?{b}26,

tr?{gene}25, t!{t}24, g!{g}23,

c!{c}22, a!{a}21, b2?{d21}20,

b2?{d21}19, b2?{d21}18, b2!{d2}17,

g2!{c}16, g2!{a}15, g2!{c}14,

g2!{a}13, g2!{b2}12, tr!{g2}11,

b1?{d11}10, b1?{d11}9, b1?{d11}8,

b1!{d1}7, g1!{c}6, g1!{a}5,

g1!{c}4, g1!{a}3, g1!{b1}2,

tr!{g1}1,

µgp µp µ p R(µgp, µp, µ, p)

⊤gp ⊤p ⊤ gene g2, g1,

⊤gp ⊤p ⊤ b c, a, b2, b1,

⊤gp ⊤p ⊤ dp c, a, d2, d1,

⊤gp ⊤p ⊤ p c, a, b2, b1,

⊤gp ⊤p ⊤ d21 d,

⊤gp ⊤p ⊤ d11 d,

ℓ F(ℓ)

7 29

7 30

2 27

2 26

3 27

3 26

4 27

4 26

5 27

5 26

6 27

6 26

1 25

28 8

28 9

28 10

28 18

28 19

28 20

17 29

17 30

12 27

12 26

13 27

13 26

14 27

14 26

21 29

21 30

15 27

15 26

22 29

22 30

16 27

16 26

11 25

Page 234: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

212 Analysis Results for Genetic Transcription

Page 235: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

Bibliography

[AJL+02] Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, KeithRoberts, and Peter Walter. Molecular Biology of The Cell. GarlandScience, 4th edition, 2002.

[ASU86] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: principles,techniques, and tools. Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA, 1986.

[BB90] Gerard Berry and Gerard Boudol. The chemical abstract machine. InPOPL ’90: Proceedings of the 17th ACM SIGPLAN-SIGACT symposiumon Principles of programming languages, pages 81–94, New York, NY,USA, 1990. ACM Press.

[BC05] Chiara Braghin and Agostino Cortesi. Flow-sensitive leakage analysis inmobile ambients. Electr. Notes Theor. Comput. Sci., 128(5):17–25, 2005.

[BCP06] Ralf Blossey, Luca Cardelli, and Andrew Phillips. A compositional ap-proach to the stochastic dynamics of gene networks. Transactions inComputational Systems Biology, 3939:99–122, 2006.

[BDNN98] Chiara Bodei, Pier Paolo Degano, Flemming Nielson, and Hanne RiisNielson. Control flow analysis for the π-calculus. In CONCUR 1998 –Concurrency Theory, volume 1466 of Lecture Notes in Computer Science,pages 84–98. Springer, 1998.

[BDNN01] Chiara Bodei, Pierpaolo Degano, Flemming Nielson, and Hanne Riis Niel-son. Static analysis for secrecy and non-interference in networks of pro-cesses. In Proceedings of Parallel Computing Technologies, volume 2127of Lecture Notes in Computer Science, pages 27–41, 2001.

[BDPZ03] Chiara Bodei, Pier Paolo Degano, Corrado Priami, and Nicola Zannone.An enhanced cfa for security policies. In Proc. of the Workshop on Issueson the Theory of Security (WITS’03), 2003.

Page 236: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

214 BIBLIOGRAPHY

[BSW06] Hauke Busch, Werner Sandmann, and Verena Wolf. A numerical ag-gregation algorithm for the enzyme-catalyzed substrate conversion. InCorrado Priami, editor, Proc. of Computational Methods in Systems Bi-ology (CMSB’06), volume 4210 of Lecture Notes in Computer Science,pages 298–311. Springer, 2006.

[BV01] Guillaume Brat and Willem Visser. Combining static analysis and modelchecking for software analysis. In Proc. of Proceedings of the 16th IEEEinternational conference on Automated software engineering (ASE’01),page 262, Washington, DC, USA, 2001. IEEE Computer Society.

[Cai04] Luis Caires. Behavioral and spatial observations in a logic for the pi-calculus. In Igor Walukiewicz, editor, Proceedings of 7th InternationalConference on Foundations of Software Science and Computation Struc-tures (FOSSACS’04), volume 2987 of Lecture Notes in Computer Science.Springer, 2004.

[Car04] Luca Cardelli. Bioware languages. In Andrew Herbert and Karen S.Jones, editors, Computer Systems: Theory, Technology, and Applications- A Tribute to Roger Needham, Monographs in Computer Science, pages59–65. Springer, 2004.

[Car05a] Luca Cardelli. Abstract machines of systems biology. Transactions onComputational Systems Biology, III:145–168, 2005.

[Car05b] Luca Cardelli. Brane calculi. In Vincent Danos and Vincent Schachter,editors, Proc. of Computational Methods in Systems Biology (CMSB’04),volume 3082 of Lecture Notes in Computer Science, pages 257–280.Springer, 2005.

[CC77] Patrick Cousot and Radhia Cousot. Abstract Interpretation: a unifiedlattice model for static analysis of programs by construction or approx-imation of fixpoints. In POPL’1977: Proc. of the 4th ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, pages238–252. ACM Press, 1977.

[CC79] Patrick Cousot and Radhia Cousot. Systematic design of program analysisframeworks. In POPL’1979: Proc. of the 6th ACM SIGPLAN-SIGACTsymposium on Principles of Programming Languages, pages 269–282.ACM Press, 1979.

[CDGH06] Muffy Calder, Adam Duguid, Stephen Gilmore, and Jane Hillston.Stronger computational modelling of signalling pathways using both con-tinuous and discrete-state methods. In Corrado Priami, editor, Proc. ofComputational Methods in Systems Biology (CMSB’06), volume 4210 ofLecture Notes in Computer Science, pages 63–77. Springer, 2006.

[CG00] Luca Cardelli and Andrew D. Gordon. Mobile ambients. TheoreticalComputer Science, 240(1):177–213, 2000.

[CGH05] Muffy Calder, Stephen Gilmore, and Jane Hillston. Automatically deriv-ing ODEs from process algebra models of signalling pathways. In Gor-don Plotkin, editor, Proc. of Computational Methods in Systems Biology(CMSB’05), pages 204–215, April 2005.

Page 237: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

BIBLIOGRAPHY 215

[CGH06] Muffy Calder, Stephen Gilmore, and Jane Hillston. Modelling the influ-ence of RKIP on the ERK signalling pathway using the stochastic processalgebra PEPA. In Transactions on Computational Systems Biology VII,volume 4230 of Lecture Notes in Computer Science. Springer, 1–23 2006.

[CGHV06] Muffy Calder, Stephen Gilmore, Jane Hillston, and VladislavVyshemirsky. Formal methods for biochemical signalling pathways. InBCS, 2006. To appear.

[CGP00] Edmund M. Clarke, Orna Grumberg, and Doron A. Peled. Model Check-ing. MIT Press, January 2000.

[CP05] Luca Cardelli and Sylvain Pradalier. Where membranes meet complexes.In Proc. of BioConcur’05, 2005.

[Cri58] Francis Henry Compton Crick. The biological replication of macro-molecules. In Proceedings of the 12th Symposia of the Society for Ex-perimental Biology, pages 138–163, New York, 1958. Academic Press.

[DK03] Vincent Danos and Jean Krivine. Formal molecular biology done in CCS-R. In Proc. of BioConcur’03, Electronic Notes in Theoretical ComputerScience. Elsevier, 2003.

[DK04] Vincent Danos and Jean Krivine. Reversible communicating systems. InCONCUR 2004 – Concurrency Theory, volume 3170 of Lecture Notes inComputer Science, pages 292–307. Springer, September 2004.

[DL03a] Vincent Danos and Cosimo Laneve. Core formal molecular biology. InProc. of the 12th ESOP, volume 2618 of Lecture Notes in Computer Sci-ence, pages 308–318. Springer, 2003.

[DL03b] Vincent Danos and Cosimo Laneve. Graphs for core molecular biology. InProc. of Computational Methods in Systems Biology (CMSB’03), volume2602 of Lecture Notes in Computer Science, pages 34–46. Springer, 2003.

[DL04] Vincent Danos and Cosimo Laneve. Formal molecular biology. TheoreticalComputer Science, 325(1):69–110, 2004.

[DP02] Brian A. Davey and Hilary A. Priestley. Introduction to Lattices andOrder. Cambridge University Press, April 2002.

[DP04] Vincent Danos and Sylvain Pradalier. Projective brane calculus. In Proc.of Computational Methods in Systems Biology (CMSB’04), 2004.

[FSCR04] Francois Fages, Sylvain Sollman, and Nathalie Chabrier-Rivier. Modellingand querying interaction networks in the biochemical abstract machinebiocham. Journal of Biological Physics and Chemistry, 4:64–73, 2004.

[GHR01] Stephen Gilmore, Jane Hillston, and Marina Ribaudo. An efficient algo-rithm for aggregating PEPA models. Software Engineering, 27(5):449–464, 2001.

[Gil76] Daniel T. Gillespie. A general method for numerically simulating thestochastic time evolution of coupled chemical reactions. Journal of Com-putational Physics, 22:403–434, 1976.

[Gil77] Daniel T. Gillespie. Exact stochastic simulation of coupled chemical re-actions. The Journal of Physical Chemistry, 81(25):2340–2361, 1977.

Page 238: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

216 BIBLIOGRAPHY

[GL05] Roberta Gori and Francesca Levi. A new occurrence counting analysis forbioambients. In Proceedings of Asian Symposium on Programming Lan-guages and Systems, volume 3780 of Lecture Notes in Computer Science,pages 381–400, 2005.

[Her02] Holger Hermanns. Interactive Markov Chains and the Quest for Quanti-fied Quality, volume 2428 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 2002.

[Hil96] Jane Hillston. A Compositional Approach to Performance Modelling. PhDthesis, 1996.

[HJNN99] Rene R. Hansen, Jacob G. Jensen, Flemming Nielson, and Hanne RiisNielson. Abstract interpretation of Mobile Ambients. In SAS’99: Proc. ofthe 6th International Static Analysis Symposium, volume 1694 of LectureNotes in Computer Science, pages 135–148, 1999.

[Jon07] Mike Jones. Flow of information in biological systems. OriginalWork by Mike Jones for Wikipedia, January 2007. Licensed un-der the Creative Commons Attribution ShareAlike License v. 2.5:http://creativecommons.org/licenses/by-sa/2.5.

[JW95] Suresh Jagannathan and Stephen Weeks. A unified treatment of flowanalysis in higher-order languages. In POPL ’95: Proceedings of the 22ndACM SIGPLAN-SIGACT symposium on Principles of programming lan-guages, pages 393–407, New York, NY, USA, 1995. ACM Press.

[Kil72] Gary Arlen Kildall. Global expression optimization during compilation.PhD thesis, 1972.

[Kit02] Hiroaki Kitano. Systems biology: a brief overview. Science,295(5560):1662–1664, March 2002.

[KN06] Celine Kuttler and Joachim Niehren. Gene regulation in the pi-calculus:simulating cooperativity at the lambda switch. Transactions on Compu-tational Systems Biology VII, 4230:24–55, 2006.

[KU77] John B. Kam and Jeffrey D. Ullman. Monotone Data Flow Analysisframeworks. Acta Informatica, 7(3):305–317, 1977.

[Kut07] Celine Kuttler. Modeling bacterial gene expression in a stochastic pi-calculus with concurrent objects. PhD thesis, University of Lille 1, 2007.

[LBZ+99] Harvey Lodish, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira,David Baltimore, and James E. Darnell. Molecular Cell Biology. W.H.Freeman and Company, 4th edition, 1999. Fig. ?? reprinted with thepermission of the publisher.

[LM01] Francesca Levi and Sergio Maffeis. An abstract interpretation frameworkfor analysing Mobile Ambients. In SAS’2001: Proc. of the 8th Inter-national Static Analysis Symposium, volume 2126 of Lecture Notes inComputer Science, pages 395–411. Springer, 2001.

[LM04] Francesca Levi and Sergio Maffeis. On abstract interpretation of mobileambients. Information and Computation, 188(2):179–240, 2004.

Page 239: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

BIBLIOGRAPHY 217

[LS03] Francesca Levi and Davide Sangiorgi. Mobile safe ambients. ACM Trans-actions on Programming Languages and Systems (TOPLAS’03), 25(1):1–69, 2003.

[Mil78] Robin Milner. A theory of type polymorphism in programming. Journalof Computer and System Sciences, 17(3):348–375, December 1978.

[Mil80] Robin Milner. A Calculus of Communicating Systems, volume 92 of Lec-ture Notes in Computer Science. Springer-Verlag, 1980.

[Mil89] Robin Milner. Communication and Concurrency. Prentice-Hall, Inc.,Upper Saddle River, NJ, USA, 1989.

[Mil99] Robin Milner. Communicating and Mobile Systems: The π-Calculus.Cambridge University Press, 1999.

[MPW92] Robin Milner, Joachim Parrow, and David Walker. A calculus of Mobileprocesses (I and II). Information and Computation, 100(1):1–77, 1992.

[NN97] Hanne Riis Nielson and Flemming Nielson. Infinitary Control Flow Anal-ysis: a collecting semantics for closure analysis. In POPL’1997: Proc. ofthe 24th ACM SIGPLAN-SIGACT symposium on Principles of Program-ming Languages, pages 332–345, 1997.

[NN98] Hanne Riis Nielson and Flemming Nielson. Flow logics for constraintbased analysis. In Proc. of the 7th International Conference on Com-piler Construction (CC’98), volume 1383 of Lecture Notes in ComputerScience, pages 109–127. Springer, 1998.

[NN00] Hanne Riis Nielson and Flemming Nielson. Shape analysis for mobileambients. In POPL’00: Proc. of the 27th ACM SIGPLAN-SIGACT sym-posium on Principles of Programming Languages, pages 142–154. ACMPress, 2000.

[NN02] Flemming Nielson and Hanne Riis Nielson. Flow Logic: a multi-paradigmatic approach to static analysis. In The Essense of Computation:Complexity, Analysis, Transformation, volume 2566 of Lecture Notes inComputer Science, pages 223–244. Springer, 2002.

[NN06] Hanne Riis Nielson and Flemming Nielson. Data Flow Analysis for CCS.In Thomas Reps and Mooly Sagiv, editors, Festschrift for Reinhard Wil-helm, volume 4444 of Lecture Notes in Computer Science. Springer, 2006.

[NN07] Hanne Riis Nielson and Flemming Nielson. A flow-sensitive analysis ofprivacy properties. In Proceedings of the 20th IEEE Computer SecurityFoundations Symposium, 2007. To Appear.

[NN08] Hanne Riis Nielson and Flemming Nielson. A Monotone Framework forCCS. under revision for Computer Languages, Systems & Structures,2008.

[NNB04] Hanne Riis Nielson, Flemming Nielson, and Mikael Buchholtz. Securityfor mobility. In FOSAD 2001/2002 Tutorial Lecture Notes, volume 2946of Lecture Notes in Computer Science, pages 207–265. Springer, 2004.

[NNH99] Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. Principles ofProgram Analysis. Springer, 1999.

Page 240: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

218 BIBLIOGRAPHY

[NNH02] Flemming Nielson, Hanne Riis Nielson, and Rene R. Hansen. Validatingfirewalls using Flow Logics. Theoretical Computer Science, 283(2):381–418, 2002.

[NNHJ99] Flemming Nielson, Hanne Riis Nielson, Rene R. Hansen, and Jacob G.Jensen. Validating firewalls in Mobile Ambients. In CONCUR 1999 –Concurrency Theory, volume 1664 of Lecture Notes in Computer Science,pages 463–477. Springer, 1999.

[NNP04] Hanne Riis Nielson, Flemming Nielson, and Henrik Pilegaard. Spatialanalysis of BioAmbients. In SAS’04: Proc. of the 11th InternationalStatic Analysis Symposium, volume 3148 of Lecture Notes in ComputerScience, pages 69–83, 2004.

[NNP07] Flemming Nielson, Hanne Riis Nielson, and Henrik Pilegaard. Whatis a free name in a process algebra? Information Processing Letters,103(5):188–194, 2007.

[NNPdR07] Flemming Nielson, Hanne Riis Nielson, Corrado Priami, and Debora S.da Rosa. Control flow analysis for BioAmbients. In Proceedings of theFirst Workshop on Concurrent Models in Molecular Biology (BioConcur2003), volume 180 of Electronic Notes in Theoretical Computer Science,pages 65–79, 2007., 2007.

[NNS02] Flemming Nielson, Hanne Riis Nielson, and Helmuth Seidl. A succinctsolver for ALFP. Nordic Journal of Computing, 9:335–372, 2002.

[NNS+04] Flemming Nielson, Hanne Riis Nielson, Hongyan Sun, Mikael Buchholtz,Rene Rydhof Hansen, Henrik Pilegaard, and Helmuth Seidl. The suc-cinct solver suite. In Proc. of Tools and Algorithms for the Constructionand Analysis of Systems (TACAS’04), volume 2988 of Lecture Notes inComputer Science, pages 251–265. Springer, 2004.

[Par01] Joachim Parrow. An introduction to the π-calculus. In Bergstra, Ponse,and Smolka, editors, Handbook of Process Algebra, pages 479–543. Else-vier, 2001.

[PC04] Andrew Phillips and Luca Cardelli. A correct abstract machine for thestochastic pi-calculus. In Bioconcur’04. Electronic Notes in TheoreticalComputer Science, August 2004.

[PC05] Andrew Phillips and Luca Cardelli. A graphical representation for thestochastic pi-calculus. In Proc. of Bioconcur’05, August 2005.

[Plo04] Gordon D. Plotkin. A structural approach to operational semantics. Jour-nal of Logic and Algebraic Programming, 60-61:17–139, 2004.

[PNN05] Henrik Pilegaard, Flemming Nielson, and H. R. Nielson. Static analysisof a model of the LDL degradation pathway. In Gordon D. Plotkin,editor, Proc. of Computational Methods in Systems Biology (CMSB’05).University of Edinburgh, April 2005.

[PNN06a] Henrik Pilegaard, Flemming Nielson, and Hane Riis Nielson. Active eval-uation contexts for process algebra. In Proceedings of Structural Opera-tional Semantics, 2006. To appear.

Page 241: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

BIBLIOGRAPHY 219

[PNN06b] Henrik Pilegaard, Flemming Nielson, and Hanne Riis Nielson. Contextdependent analysis of bioambients. In Francesco Ranzato, editor, Pro-ceedings of the 1st International Workshop on Emerging Applications ofAbstract Interpretation, 2006.

[PNN07] Henrik Pilegaard, Flemming Nielson, and Hanne Riis Nielson. Pathwayanalysis for bioambients. Submitted for publication, 2007.

[PQ04] Corrado Priami and Paola Quaglia. Beta binders for biological interac-tions. In Proc. of Computational Methods in Systems Biology (CMSB’04),Lecture Notes in Bioinformatics. Springer, 2004.

[Pri96] Corrado Priami. Stochastic pi-calculus with general distributions. In Proc.of the 4th Int. Workshop on Process Algebras and Performance Modelling(PAPM’96), 1996.

[PRSS01] Corrado Priami, Aviv Regev, Ehud Shapiro, and William Silvermann. Ap-plications of a stochastic name-passing calculus to representation and sim-ulation of molecular processes. Information Processing Letters, 80(1):25–31, 2001.

[PV05] Catuscia Palamidessi and Frank D. Valencia. Recursion vs replication inprocess calculi: Expressiveness. Bulletin of the European Association ofTheoretical Computer Science, 87:105–125, 2005.

[Reg03] Aviv Regev. Computational Systems Biology: A Calculus for Biomolecu-lar Knowledge. PhD thesis, Tel Aviv University, 2003.

[Ric53] Henry Gordon Rice. Classes of recursively enumerable sets and theirdecision problems. Transactions of the American Mathematics Society,74:358–366, 1953.

[RPS+04] Aviv Regev, Ekaterina M. Panina, William Silverman, Luca Cardelli, andEhud Shapiro. BioAmbients: An abstraction for biological compartments.Theoretical Computer Science, 325(1):141–167, September 2004.

[RS02] Aviv Regev and Ehud Shapiro. Cells as computation. Nature, 419(6905),September 2002.

[RSS01] Aviv Regev, William Silverman, and Ehud Shapiro. Representation andsimulation of biochemical processes using the π-calculus process algebra.In Proc. of Pacific Symposium on Biocomputing (PSB 2001), pages 459–470, 2001.

[Saı00] Hassen Saıdi. Model checking guided abstraction and analysis. In SAS’00:Proc. of the 7th International Static Analysis Symposium, volume 1824 ofLecture Notes in Computer Science, pages 377–396, London, UK, 2000.Springer-Verlag.

[Shi88] Olin Shivers. Control flow analysis in scheme. In PLDI ’88: Proceedings ofthe ACM SIGPLAN 1988 conference on Programming Language designand Implementation, pages 164–174, New York, NY, USA, 1988. ACMPress.

[SW01] Davide Sangiorgi and David Walker. The π-calculus: a Theory of MobileProcesses. Cambridge University Press, 2001.

Page 242: Language Based Techniques for Systems BiologyLanguage Based Techniques for Systems Biology Henrik Pilegaard Kongens Lyngby 2007 IMM-PHD-2007-184 Technical University of Denmark Informatics

220 BIBLIOGRAPHY

[Tay99] Paul Taylor. Practical Foundations of Mathematics (Cambridge Studiesin Advanced Mathematics). Cambridge University Press, May 1999.

[Tur36] Alan M. Turing. On computable numbers, with an application to theEntscheidungsproblem. Proceedings of the London Mathematical Society,2(42):230–265, 1936.

[van05] Jan vanderGreef. Systems biology, connectivity and the future ofmedicine. In IEE Proc. Systems Biology, volume 152, pages 174–178,2005.

[Ver07] Cristian Versari. A core calculus for a comparative analysis of bio-inspired calculi. In Proc. of 16th European Symposium on Programming(ESOP’07), 2007. To Appear.

[Vil07a] Mariana Ruiz Villarreal. Cell illustration. Part of Wikimedia Commons,April 2007. Released into Public Domain.

[Vil07b] Mariana Ruiz Villarreal. Cell membrane illustration. Part of WikimediaCommons, January 2007. Released into Public Domain.

[vMJ+07] Jan vanderGreef, S. Martin, P. Juhasz, A. Adourian, T. Plasterer, E. R.Verheij, and R. N. McBurney. The art and practice of systems biologyin medicine: Mapping patterns of relationships. Journal of ProteomeResearch, 2007.

[vSvdH04] Jan vanderGreef, Paul Stroobant, and Rob van der Heijden. The roleof analytical sciences in medical systems biology. Current Opinion inChemical Biology, 8:559–565, 2004.