Download - Operating System Support for Mobile Interactive Applicationsbumba/research/diss.pdf · Abstract Mobile interactive applications are becoming increasingly important. One such application

Operating SystemSupport for MobileInteracti veApplications

DushyanthNarayanan

August2002CMU-CS-02-168

Schoolof ComputerScienceComputerScienceDepartment

Carnegie Mellon UniversityPittsburgh,PA

Submittedin partial fulfillmentof therequirementsfor thedegreeof Doctorof Philosophy.

ThesisCommittee:M. Satyanarayanan,Chair

ChristosFaloutsosRandyPausch

JohnWilkes,Hewlett-Packard Laboratories

Copyright c�

2002DushyanthNarayanan

This researchwas supportedby the DefenseAdvancedResearchProjectsAgency (DARPA) and the AirForceMateriel Command(AFMC) undercontractsF19628-93-C-0193andF19628-96-C-0061,the SpaceandNaval WarfareSystemsCenter(SPAWAR) / U.S.Navy (USN) undercontractN660019928918,theNa-tional ScienceFoundation(NSF) undergrantsCCR-9901696and ANI-0081396, IBM Corporation,IntelCorporation,Compaq,andNokia.

Theviewsandconclusionscontainedin thisdocumentarethoseof theauthorandshouldnotbeinterpretedasrepresentingtheofficial policiesor endorsements,eitherexpressor implied, of DARPA, AFMC, SPAWAR,USN, theNSF, IBM, Intel, Compaq,Nokia,CarnegieMellon University, theU.S.Government,or any otherentity.

Keywords: interactive applications,mobilecomputing,ubiquitouscomputing,multi-fidelity algorithm,application-awareadaptation,predictive resourcemanagement,history-baseddemandprediction,augmentedreality, machinelearning

Abstract

Mobile interactiveapplicationsarebecomingincreasinglyimportant.Onesuchapplicationalone— augmentedreality— hasenormouspotentialin fieldsrangingfrom entertainmentto aircraft maintenance.Suchapplicationsdemandgoodinteractive response.However,their environmentsareresource-poorandturbulent,with frequentanddramaticchangesinresourceavailability. To keepresponsetimesbounded,theapplicationandsystemtogethermustadaptto changingresourceconditions.

In this dissertation,I presenta new abstraction— multi-fidelity computation— andclaimthatit is theright abstractionfor adaptationin mobile,interactiveapplications.I alsopresentanAPI thatallows a mobile interactive applicationto recastits corefunctionalityasamulti-fidelity computation,

I identify one of the key problemsin applicationadaptation:predictingapplicationperformanceat any givenfidelity. I solve this problemin two steps.History-basedpredic-tion predictsapplicationresourcedemandasa functionof fidelity. A resourcemodelthenmapsapplicationresourcedemandandsystemresourcesupplyto performance.History-basedpredictionis validatedthroughfour casestudiesdemonstratingaccuratepredictionof CPU,memory, network, andenergy demand.

I alsodescribethedesignandimplementationof runtimesupportfor multi-fidelity com-putations:the overall systemarchitectureaswell aseachkey component.I presentfourapplicationcasestudies:of avirtual walkthroughprogram,a3-D graphicsalgorithm,awebbrowser, anda speechrecognizer. In eachcase,I show how theapplicationusesthemulti-fidelity API; thattheprogrammingcostof usingtheAPI is small;andthatthehistory-basedpredictionmethodaccuratelypredictsapplicationresourcedemand.

In evaluatingthe systemprototype,I askthreequestions.First, is adaptationagile inthe faceof changingload conditions?Second,is the systemaccurate in choosingthe fi-delity thatbestmatchestheapplications’needs?Third, doesthesystemprovidesubstantialbenefitcomparedto thenon-adaptivecase?I answerthesequestionsthroughaseriesof ex-perimentsbothwith syntheticandrealworkloads.I show thatadaptationis agile,accurate,andbeneficialin boundingresponsetime despitevarying CPU andmemoryload. I alsoshow thatadaptationreducesthevariability in responsetime,providing amorepredictableandstableuserexperience.

i

Acknowledgements

While I deserve any blamefor this dissertation,many peopledeserve a shareof thecredit.I could never have completeda work of this sizeandscopewithout substantialhelp andencouragementfrom colleagues,friends,andfamily.

I would like to thankmy advisorSatyafor his expert technicalguidance,for thehighstandardshealwayssetshimself andhis students,andfor his confidencein my ability todo quality research.It hasbeenanhonourandaneducationto work with him. Withouthisability to seethe big pictureat all times,I could not have stayedfocussedandmotivatedover thelast7 years.

Large researchsystemsarethe productsof many mindsandman-years.I have beenfortunateto work on thesamesystemwith several first-classresearchers.I would partic-ularly like to thankBrian Noble for his role asmentor, surrogateadvisor, andofficemate;andJasonFlinn, for an enjoyableandfruitful 5-yearcollaboration.JanHarkesprovidedinvaluableLinux expertiseanda much-appreciatedability to keepmy Codavolumeavail-able.

Despitetheir busyschedules,themembersof my committeewerealwaysavailableforfeedbackandadvice. I would like to thankRandyPauschfor keepingme honestaboutusersandinteractivity; JohnWilkesfor muchperceptiveandconstructivecriticism,andforkeepingmeabreastof relatedwork; andChristosFaloutsosfor stimulatingdiscussionsona varietyof topics.

I could not imaginea morefriendly andsupportive environmentthanCMU-CS. Thepeoplewho make this departmenta wonderfulplacearetoo numerousto list, but specialthanksgo to SharonBurks,Tracy Farbacher, David Petrou,SanjayRao,FranciscoPereira,BelindaThom,andMariusMinea. I amalsoindebtedto themanagementof the61CCafe,to their excellentcoffee,andto the many friendsI madethere,for hugelyimproving thequality of my life outsidecomputerscience.

Lastbut certainlynot least,I thankmy parents.Theirhardwork andvisiongavemetheeducationaladvantagesthatbroughtmeto CarnegieMellon. Despiteoccasionalmisgivingsaboutmy avoidingtherealworld, they alwaysencouragedmetopursuemy academicgoals:without their faith in meandmy abilities,I wouldnothavemadeit this far.

iii

iv

0

10000

20000

30000

40000

50000

60000

Dec 26 2000 Mar 26 2001 Jun 24 2001 Sep 22 2001 Dec 21 2001 Mar 21 2002 Jun 19 2002 Sep 17 20020

50

100

150

200

250

Wor

ds�

Pag

es�

Date

WordsPages

Best fit on wordsBest fit on pages

Thescatterplot shows thenumberof wordsandpagesin this dissertationover time. Thelinesarebestfits to thescatterplotscomputedusingleast-squaresregression.

Figure1: Sizeof this dissertationover time

Contents

1 Intr oduction 1

1.1 Scenario— augmentedreality . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 TheThesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 A resourcemodel 5

2.1 Whatis a resource?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Aggregatingresourcemeasurements. . . . . . . . . . . . . . . . . . . . . 7

2.2.1 Operationsandinteractiveapplications . . . . . . . . . . . . . . . 7

2.2.2 Why not time-series?. . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 A taxonomyof resources. . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Time-sharedresources. . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Space-sharedresources. . . . . . . . . . . . . . . . . . . . . . . . 10

2.3.3 Cacheresources . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.4 Exhaustibleresources. . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Combiningresources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4.1 Linearandadditivecostmetrics . . . . . . . . . . . . . . . . . . . 16

2.4.2 Concurrentuseof resources . . . . . . . . . . . . . . . . . . . . . 16

2.5 Resourcemodelfor mobileinteractivesystems . . . . . . . . . . . . . . . 16

2.5.1 Example:remoteexecution . . . . . . . . . . . . . . . . . . . . . 17

2.5.2 Example:webbrowser . . . . . . . . . . . . . . . . . . . . . . . . 18

2.5.3 Which resourcesarerelevant? . . . . . . . . . . . . . . . . . . . . 18

2.5.4 Userdistraction:nota resourcecostmetric . . . . . . . . . . . . . 20

2.6 Resourcedependencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

v

vi CONTENTS

3 Multi-fidelity computation 23

3.1 Fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Multi-fidelity algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3 Mobile, interactive,multi-fidelity applications. . . . . . . . . . . . . . . . 26

3.4 Performance-tuningparameters. . . . . . . . . . . . . . . . . . . . . . . . 27

3.5 Nontunableparameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.6 Outputquality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.7 RelatedConcepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 A multi-fidelity API 31

4.1 Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 Guidingprinciples . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.3 Historicalcontext: Odyssey . . . . . . . . . . . . . . . . . . . . . 32

4.2 Theimportanceof prediction . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Demandpredictionthroughloggingandlearning . . . . . . . . . . 34

4.3 Multi-fidelity API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3.1 C library calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.2 ApplicationConfigurationFiles . . . . . . . . . . . . . . . . . . . 35

4.4 Exampleuseof API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 Systemsupport for multi-fidelity computation 41

5.1 DesignRationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.2 Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 Thesolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3.1 Restrictedvs. unrestrictedutility functions . . . . . . . . . . . . . 44

5.3.2 Searchvs. feedback-control . . . . . . . . . . . . . . . . . . . . . 45

5.4 Utility functionsandresourceconstraints . . . . . . . . . . . . . . . . . . 46

5.4.1 Constraintsassigmoids . . . . . . . . . . . . . . . . . . . . . . . 46

5.5 Systemresourcepredictors . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.5.1 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

CONTENTS vii

5.5.2 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.5.3 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.5.4 Otherpredictors . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.6 Demandmonitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.6.1 Measuringanoperation’sworkingsetsize . . . . . . . . . . . . . . 52

5.7 Thelogger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.8 Remoteexecution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.9 Overheads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6 Machine Learning for DemandPrediction 57

6.1 How to build apredictor . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Least-squaresregression . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.3 Onlineupdates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.3.1 RecursiveLeastSquares . . . . . . . . . . . . . . . . . . . . . . . 60

6.3.2 GradientDescent. . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.3.3 Onlinelearningasfeedback-control. . . . . . . . . . . . . . . . . 63

6.4 Data-specificprediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.4.1 Binning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.4.2 File accessprediction. . . . . . . . . . . . . . . . . . . . . . . . . 66

6.5 Evaluatingpredictionaccuracy . . . . . . . . . . . . . . . . . . . . . . . . 67

6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Applications 71

7.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.1 Portingcost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.2 Predictionaccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.2 AugmentedReality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.2.1 Multiresolutionmodels. . . . . . . . . . . . . . . . . . . . . . . . 74

7.3 GLVU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.3.1 ExtendingGLVU for multiresolutionrendering . . . . . . . . . . . 75

7.3.2 PortingGLVU to themulti-fidelity API . . . . . . . . . . . . . . . . 77

viii CONTENTS

7.3.3 PredictingGLVU’s CPUdemand. . . . . . . . . . . . . . . . . . . 77

7.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

7.4 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.4.1 PortingRadiatorto themulti-fidelity API . . . . . . . . . . . . . . 90

7.4.2 PredictingRadiator’sCPUdemand . . . . . . . . . . . . . . . . . 91

7.4.3 PredictingRadiator’smemorydemand. . . . . . . . . . . . . . . . 92

7.4.4 Extrapolatingpredictorsfor higherfidelities . . . . . . . . . . . . . 95

7.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.5 Webbrowsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.5.1 PortingNetscapeto themulti-fidelity API . . . . . . . . . . . . . . 104

7.5.2 Predictingthenetwork demandof imagefetch . . . . . . . . . . . 105

7.5.3 Predictingtheenergy demandof webimagefetch . . . . . . . . . . 106

7.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.6 Speechrecognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7.6.1 PortingJanusto themulti-fidelity API . . . . . . . . . . . . . . . . 112

7.6.2 Predictingresourceconsumptionfor Janus . . . . . . . . . . . . . 113

7.6.3 File accesspredictionfor Janus . . . . . . . . . . . . . . . . . . . 117

7.6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

7.7 Otherapplications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

8 Evaluation 119

8.1 EvaluationMethodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.1.1 ExperimentalPlatform . . . . . . . . . . . . . . . . . . . . . . . . 120

8.1.2 Syntheticworkloads . . . . . . . . . . . . . . . . . . . . . . . . . 120

8.1.3 Syntheticloadgenerators. . . . . . . . . . . . . . . . . . . . . . . 121

8.1.4 Evaluationmetrics . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2 Adaptationto CPUload . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.2.1 Agility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.2.2 Accuracy: CPUadaptationin GLVU . . . . . . . . . . . . . . . . . 126

8.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8.3 Adaptationto memoryload . . . . . . . . . . . . . . . . . . . . . . . . . . 137

8.3.1 Agility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

CONTENTS ix

8.3.2 Accuracy: Memoryadaptationin Radiator. . . . . . . . . . . . . . 139

8.4 Concurrentapplications— architectscenario . . . . . . . . . . . . . . . . 153

8.4.1 Experimentalsetup . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8.4.2 Experimentalresults . . . . . . . . . . . . . . . . . . . . . . . . . 154

8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

9 RelatedWork 163

9.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

9.1.1 Mobile applicationadaptation . . . . . . . . . . . . . . . . . . . . 164

9.1.2 Odyssey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

9.1.3 QoSandreal-timeoperatingsystems . . . . . . . . . . . . . . . . 165

9.1.4 Resourceprediction . . . . . . . . . . . . . . . . . . . . . . . . . 166

9.2 DesignSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

9.2.1 Adaptationor Allocation? . . . . . . . . . . . . . . . . . . . . . . 167

9.2.2 Centralizedor decentralized?. . . . . . . . . . . . . . . . . . . . . 168

9.2.3 Predictionor feedback-control? . . . . . . . . . . . . . . . . . . . 168

9.2.4 Application-awareor application-transparent?. . . . . . . . . . . . 169

9.2.5 Gray-boxor in-kernel? . . . . . . . . . . . . . . . . . . . . . . . . 170

9.3 Inferringuserpreferences. . . . . . . . . . . . . . . . . . . . . . . . . . . 170

10 Conclusion 173

10.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

10.1.1 ConceptualContributions . . . . . . . . . . . . . . . . . . . . . . 173

10.1.2 Artif acts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

10.1.3 EvaluationResults . . . . . . . . . . . . . . . . . . . . . . . . . . 175

10.2 Currentstatusandongoingwork . . . . . . . . . . . . . . . . . . . . . . . 176

10.3 FutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

10.3.1 TheApplicationProgrammingInterface . . . . . . . . . . . . . . . 176

10.3.2 TheSolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

10.3.3 Resourcemonitoring . . . . . . . . . . . . . . . . . . . . . . . . . 177

10.3.4 Resourcedemandprediction . . . . . . . . . . . . . . . . . . . . . 179

10.3.5 UserInteraction. . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

x CONTENTS

10.3.6 AugmentedReality . . . . . . . . . . . . . . . . . . . . . . . . . . 180

10.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

Bibliography 180

A UnrelatedWork 191

List of Figures

1 Sizeof this dissertationover time . . . . . . . . . . . . . . . . . . . . . . . iv

2.1 A simpleinteractiveapplicationandoperation. . . . . . . . . . . . . . . . 8

2.2 Resourcesrelevantto mobileinteractivesystems. . . . . . . . . . . . . . . 19

2.3 Dependenciesbetweenresources. . . . . . . . . . . . . . . . . . . . . . . 21

3.1 An exampleof aninteractive,multi-fidelity operation . . . . . . . . . . . . 26

3.2 Conceptsrelatedto multi-fidelity algorithms. . . . . . . . . . . . . . . . . 29

4.1 Themulti-fidelity API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 ExampleApplicationConfigurationFile . . . . . . . . . . . . . . . . . . . 37

4.3 Signaturesfor hint modulefunctions . . . . . . . . . . . . . . . . . . . . . 37

4.4 Sampleuseof multi-fidelity API . . . . . . . . . . . . . . . . . . . . . . . 39

5.1 Systemsupportfor themulti-fidelity API . . . . . . . . . . . . . . . . . . 43

5.2 Utility functionfor a latency constraintof 1 sec . . . . . . . . . . . . . . . 47

5.3 A log entryfor the“radiator” program . . . . . . . . . . . . . . . . . . . . 53

5.4 Per-operationoverheadof runtimesystem . . . . . . . . . . . . . . . . . . 54

6.1 Least-squareslinearfit for CPUdemandin Radiator. . . . . . . . . . . . . 59

6.2 Changein linearfit with camerapositionin GLVU . . . . . . . . . . . . . . 61

6.3 Data-specificCPUdemandin Radiator. . . . . . . . . . . . . . . . . . . . 64

6.4 Binning for file accesspredictionin Janus . . . . . . . . . . . . . . . . . . 66

6.5 Exampleof outlierdistributionplot . . . . . . . . . . . . . . . . . . . . . . 68

7.1 Applicationconfigurationfile for GLVU . . . . . . . . . . . . . . . . . . . 75

7.2 Vertex treefor multiresolutionmodels . . . . . . . . . . . . . . . . . . . . 76

xi

xii LIST OFFIGURES

7.3 Modificationsmadeto GLVU . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.4 3-D scenesusedin GLVU . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.5 Effectof resolutiononoutputin GLVU . . . . . . . . . . . . . . . . . . . . 79

7.6 GLVU’s CPUdemand:random(top)andfixed(bottom)camerapositions. . 80

7.7 CPUdemandpredictionin GLVU: outlierdistribution . . . . . . . . . . . . 82

7.8 CPUdemandpredictionaccuracy in GLVU . . . . . . . . . . . . . . . . . . 83

7.9 GLVU’s CPUdemand:usertrace,“Notre Dame”scene . . . . . . . . . . . 84

7.10 Staticvs. dynamicpredictors:randomresolution . . . . . . . . . . . . . . 86

7.11 Staticvs. dynamicpredictors:fixedresolution . . . . . . . . . . . . . . . . 87

7.12 CPUdemandpredictionaccuracy for GLVU runningusertraces. . . . . . . 88

7.13 Effectof radiosityona 3-D model . . . . . . . . . . . . . . . . . . . . . . 89

7.14 ApplicationConfigurationFile for Radiator . . . . . . . . . . . . . . . . . 90

7.15 Modificationsmadeto Radiator. . . . . . . . . . . . . . . . . . . . . . . . 91

7.16 3-D modelsusedwith Radiator . . . . . . . . . . . . . . . . . . . . . . . . 91

7.17 CPUdemandof radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.18 CPUdemandpredictionfor Radiator:outlierdistribution . . . . . . . . . . 94

7.19 CPUdemandpredictionaccuracy for Radiator. . . . . . . . . . . . . . . . 95

7.20 Memorydemandof radiosity . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.21 Memorydemandpredictionfor Radiator:outlierdistribution . . . . . . . . 97

7.22 Memorydemandpredictionaccuracy for Radiator. . . . . . . . . . . . . . 98

7.23 CPUdemandof radiosityon fastserver . . . . . . . . . . . . . . . . . . . 99

7.24 Memorydemandof radiosityon fastserver . . . . . . . . . . . . . . . . . 100

7.25 CPUdemandpredictionaccuracy for Radiatorona fastserver . . . . . . . 101

7.26 Memorydemandpredictionaccuracy for Radiatoron a fastserver . . . . . 102

7.27 ApplicationConfigurationFile for webimagefetch . . . . . . . . . . . . . 104

7.28 Netscape’s “shootingstars”statuswindow: idle (left) andanimated. . . . . 105

7.29 Codemodificationsfor adaptivewebbrowsing. . . . . . . . . . . . . . . . 105

7.30 Testimagesusedwith webbrowser . . . . . . . . . . . . . . . . . . . . . 106

7.31 Compressionratio asa functionof JPEGQualityFactor . . . . . . . . . . . 107

7.32 Network demandpredictionaccuracy for webimagefetch . . . . . . . . . 108

7.33 Energy demandpredictionaccuracy for webimagefetch . . . . . . . . . . 109

7.34 Energy demandof differentbrowsers. . . . . . . . . . . . . . . . . . . . . 110

LIST OFFIGURES xiii

7.35 ApplicationConfigurationFile for Janus. . . . . . . . . . . . . . . . . . . 112

7.36 Modificationsmadeto Janus . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.37 LocalCPUdemandof speechrecognition . . . . . . . . . . . . . . . . . . 114

7.38 RemoteCPUdemandof speechrecognition . . . . . . . . . . . . . . . . . 114

7.39 Network demandof speechrecognition . . . . . . . . . . . . . . . . . . . 115

7.40 Resourcedemandpredictionaccuracy for Janus . . . . . . . . . . . . . . . 116

8.1 Sigmoidutility function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

8.2 Adaptationphasesfor ahypotheticalapplicationandsystem . . . . . . . . 124

8.3 Exampleof deviationplot . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

8.4 Agility of adaptationto CPUsupply . . . . . . . . . . . . . . . . . . . . . 127

8.5 Adaptationin GLVU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8.6 Latency in GLVU: with andwithoutadaptation. . . . . . . . . . . . . . . . 130

8.7 Latency outliersin GLVU: with andwithoutadaptation . . . . . . . . . . . 131

8.8 Outlierdistribution for different3-D scenesin GLVU . . . . . . . . . . . . 132

8.9 GLVU outlierdistribution for differenttargetlatencies . . . . . . . . . . . . 133

8.10 GLVU outlierdistribution for differentloadtransitionfrequencies. . . . . . 135

8.11 GLVU outlierdistribution for differentpeakloads . . . . . . . . . . . . . . 136

8.12 Agility of adaptationto memorysupply . . . . . . . . . . . . . . . . . . . 138

8.13 Agility of memoryadaptation:latency distributionanddeviation from �� 140

8.14 Adaptationin Radiatorundervaryingmemoryload . . . . . . . . . . . . . 142

8.15 Radiatorwith no adaptationundervaryingmemoryload . . . . . . . . . . 143

8.16 Memoryadaptationaccuracy in Radiator:whenunloaded. . . . . . . . . . 145

8.17 Memoryadaptationaccuracy in Radiator:underload . . . . . . . . . . . . 146

8.18 Costof abortingon latency callbacksin Radiator . . . . . . . . . . . . . . 147

8.19 Memoryadaptationaccuracy for differentscenesin Radiator:outliergraphs149

8.20 Memoryadaptationaccuracy for differentscenesin Radiator:baddeviationfrequency andcommon-casedeviation . . . . . . . . . . . . . . . . . . . . 150

8.21 Memoryadaptationaccuracy for differentscenesin Radiator:latency dis-tribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

8.22 Adaptationin Radiatorundervaryingmemoryloadon a fastserver . . . . . 151

8.23 Latency distribution for Radiatorwith time-varyingmemoryloadon a fastserver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

xiv LIST OFFIGURES

8.24 Memoryadaptationin Radiatoron two differentplatforms . . . . . . . . . 152

8.25 GLVU andRadiatorin architectscenario . . . . . . . . . . . . . . . . . . . 155

8.26 Latency distributionsfor GLVU: architectscenario. . . . . . . . . . . . . . 156

8.27 Latency distributionsfor Radiator:architectscenario . . . . . . . . . . . . 157

8.28 ConcurrentGLVU andRadiator:latency deviation . . . . . . . . . . . . . . 158

8.29 ConcurrentGLVU andRadiator:latency variability . . . . . . . . . . . . . 160

Symbolsusedin this dissertation

��instantaneousdemandfor resource � � instantaneoussupplyfor resource � ��aggregatedemandfor resource overaperiod

�� averagesupplyfor resource overaperiod

�� CPUdemandin cycles� � �� CPUsupplyin cycles/s�� "!��memorydemand(workingset)in bytes� �� "!��memorysupply(availablephysicalmemory)in bytes��#�%$ � ��network transmitdemand(bytestransmitted)� #�%$ � ��network transmitsupply(bandwidthin bytes/sec)� � � ��& ��network receivedemand(bytesreceived)� � �'��& ��network receivesupply(bandwidthin bytes/sec)� � �(� ��network round-trip“demand”(numberof RPCsmade)� � �(� ��network round-trip“supply” (inverseof round-triptime)��)�* �(� � �"!+��batterydemand(in Joulesconsumed)� )�* �(� � �'!,��batterysupply(Joulesavailablefor consumption)�.-/$(01�'��*2�435� ��file cachedemand(setof filesaccessed)� -/$(01�'��*2��36� ��file cachesupply(setof valid cachedfiles)7 � instantaneouscostof usingresource 8 ��aggregatecostof usingresource overaperiod

�� 9��monetarycostof usingresource : �9��latency costof usingresource ; �<��energy costof usingresource :>=?� �(@A� �CB+��total latency costof usingthenetwork (

:>#�%$ �ED : GF 75H D : � �(� ):total operationlatency: �� optimallatency for anoperation;I01* � �4=J� ! percentageerrorof observedlatency from optimal/predictedvalueK<LCMbad prediction/deviation frequency: fraction of observations that deviatefrom optimal/predictedvalueby morethan20%;IN Mcommon-caseerror:highestdeviationfrom optimal,or predictionerror, ob-servedafterdiscardingtheworst10%of data

xv

xvi LIST OFFIGURES

Chapter 1

Intr oduction

An undertakingof greatadvantage,but nobodyto know whatit is.(prospectusof a “bubble” company. Anon.,London,1720)

Wearablecomputingis alreadycommonin today’s world, andwill be widespreadintomorrow’s. Wearablecomputerswill augmenta user’s environmentin a varietyof ways,simultaneouslyinteractingwith usersandtheir physicalenvironment.Thus,mobileinter-activeapplicationsareof increasingimportance.Theseapplicationsmustprovide a goodinteractiveexperience,but alsoconstraintheirresourceconsumptionto theavailablebatteryenergy, network bandwidth,processingpowerandmemorycapacity. Userrequirementsaswell asresourceconditionsvary acrossspaceandtime: thesystemandapplicationsmustadapt their behaviour to track this variation. Adaptationto resourcevariability is a use-ful strategy in many mobile computingenvironments[35, 54, 79]: I will show that, forinteractiveapplications,it is essential.

How bestcanwe design,implementandexecuteinteractive applicationsin a mobileenvironment?Thisquestionis themainfocusof this dissertation.It addressestheproblemwith a novel theoreticalconcept— themulti-fidelityalgorithm— andputsforth thethesisthat it is a goodabstractionfor adaptationin mobile interactive applications.It presentsaresourcemodelandanApplicationProgrammingInterface(API) whichallow applicationsto provide high-level descriptionsof their core tasksand policies. It also describesthesystemsupportnecessaryto translatethesehigh-level descriptionsinto effective runtimeadaptation.

Thedissertationestablishesits claim throughthedesignandimplementationof multi-fidelity adaptationin Linux, and four applicationcasestudies: virtual reality rendering,radiosity, speechrecognition,andweb browsing. It demonstratesthat the costof addingadaptationsupportto legacy codeissmall;thattheruntimesystemadaptswith accuracy andagility to variationsin resourceavailability; andthatsuchadaptationimprovesapplicationinteractive response.

This chapterbegins with a brief scenariothat exemplifiesthe applicationdomainad-

1

2 CHAPTER1. INTRODUCTION

dressedby this dissertation,andthe challengespresentedby it. It thenstatesthe centralthesis,andpresentsthestepsrequiredto substantiateit. Finally, it providesa roadmaptotheremainderof thedissertation.

1.1 Scenario— augmentedreality

An architect is designingtherenovationof an old warehousefor useasa mu-seum. Using a wearable computerwith a head-mounteddisplay, shewalksthroughthe warehousetrying out manydesignalternativespertainingto theplacementof doors,windows,interior walls, andsoon. Often,shecanimme-diatelyrejecttheproposedmodification:evenwith a “quick-and-dirty” repre-sentation,an aestheticor functionallimitation becomesapparent. Occasion-ally, a more detailedrepresentationis required, and sheis willing to wait alittle longer to view theresult.

For hands-freeoperation,our architectusesa speech-driveninterface. At somelocations, the wearable computerhas good wirelessbandwidthto a remotecomputeserver, and can perform speech recognition remotely. Elsewhere,bandwidthand/or servercyclesare scarce, and speech recognition mustbedonelocally; sincethewearablehaslimited memory, therecognizerswitchesto a smallervocabulary, andaskstheuserto usea restrictedsetof words.

At somepoint, the systemnoticesthat the battery is running low, and thatbatterylifetimewill fall shortof theuser’sstatedgoal. It respondsbyreducingapplicationfidelity andhencepowerconsumption:augmentedreality runsata lower resolution,and whenthe architect browsesweb pages, images arecompressedto reducethe energy usage of the wirelessinterface. The usermightdecidethat this is unacceptable, andthat shewouldpreferhigh fidelityto long batterylifetime: sheusesthespeech interfaceto seta lessaggressivebatterylifetimegoal.

This scenarioillustratesthechallengesthatconfrontmobile interactive systems.Theymustcontinuallymonitortheavailability of multiplesystemresources;predicttheimpactoftheseresourcelevelsonapplications;makeadaptivedecisionsfor eachapplicationthatbestsatisfyusergoals;andallow theuserto modify thesegoalson thefly. In orderto designand build sucha system,we needto answerseveral questions. What is an appropriateprogrammingmodelfor applications?How canthesystemmonitorandpredictresourcesaccuratelyandcheaply? What othersystemsupportis requiredto ensurethat usersseecrisp interactive responsefrom applications?How do we resolve conflicting goalswhenmakingadaptive decisions?What is a simpleyet effective way to allow usersto expresstheir preferences?

1.2. THE THESIS 3

1.2 The Thesis

Thethesisdirectlyanswersthequestionsraisedabove:

Mobile interactive applications are best expressedas multi-fidelity al-gorithms. Given systemsupport for resource monitoring, history-basedprediction of resource consumption, and adaptive decisionmaking, suchalgorithms cansimultaneouslyoptimize acrossmultiple usermetrics of fi-delity, performance,and resourceconsumption. It is feasibleto implementsuchsystemsupport through a user-level, gray-box approachwith no OSkernel modifications.

Thisdissertationestablishesthethesisin thefollowing steps:O It arguesthatmobileinteractiveapplicationspresentspecialchallengesandopportu-nities,andrequireanew approachto designingboththeapplicationsandthesystem.O It introducesthemulti-fidelityalgorithmandshowswhy it is theright abstractionformobileinteractiveapplications.O It providesa resourcemodelandan API for how multi-fidelity algorithmsinteractwith theoperatingsystem.Thisallowsapplicationsto expresshigh-level applicationmetricssuchasperformanceandfidelity, ratherthanlow-level abstractionssuchasresources.In four casestudies,it shows thatthecostsof portinglegacy applicationsto theAPI aremodest.O Applicationsare concernedwith fidelity and performance;operatingsystemsareequippedto manageresources.Tobridgethisgap,wemustbeableto relatefidelity toresourceconsumptionandperformance.Thedissertationshows how history-basedpredictioncanmapapplicationmetricsto systemresourcesin anautomatic,accurate,andhardware-independentway.O It presentsthe designandimplementationof multi-fidelity supportin Linux. Thissupportis built onOdyssey, anexistingLinux-basedplatformfor mobilecomputing.O It presentsexperimentalresultsthatdemonstratethatmulti-fidelity adaptationis ag-ile, accurate,andleadsto betterinteractiveresponsefor applications.

The restof this dissertationis organizedasfollows. Chapter 2 introducesa resourcemodel: a systematicway to characterizeapplicationandsystembehaviour in termsof re-sourcesupplyanddemand. In additionto traditionalresourcessuchasCPUandnetwork,themodelalsoincludesfile cachestateandbatteryenergy, which areof greatimportancewhenmobile.

Chapter3 definesfidelity, and extendstraditional notionsof datafidelity to includevariousadaptive “knobs” and“switches” that arecommonin interactive applications. Itthenderives the notion of a multi-fidelity computation, andarguesthat it is a naturalfitto theneedsof mobile interactive applications.Multi-fidelity computation,alongwith theresourcemodel,is theformalbasisof this dissertation.

4 CHAPTER1. INTRODUCTION

Chapter4 presentsa programmingmodelandAPI for multi-fidelity computations.Indesigningthis API, I was concernedto minimize the amountof modificationto legacyapplicationcode:this considerationinfluencedseveralof my designchoices.Thechapterlocatesthe API in the designspace,andhighlightsthe guiding principlesthat influencedmy design.

Chapter5 describestheruntimesystemcomponentsthatsupportthemulti-fidelity API.Here,oneof my key designmotivationswasdeployabilityof the systemsupportacrossplatforms.Thechapterarguesthatpassiveresourcemonitoringcansupporteffectiveadap-tation; further, that it makes fewer demandson the underlyingsystemthan an alloca-tion/enforcementapproach.It thendescribesthe overall systemdesign,andeachmajorcomponentin detail: the resourcepredictors,the solver, the logger, andthe remoteexe-cution engine. It concludeswith anevaluationof the overheadimposedby eachof thesecomponents.

Multi-fidelity adaptationrequiresus to regulatean application’s resourcedemandbychangingits fidelity. Thisrequiresusto know therelationshipbetweenfidelity andresourcedemand,for eachhardwareplatformthat theapplicationmight executeon. For theappli-cationprogrammerto completelyspecifythis relationshipis a heavy burden. My multi-fidelity prototypeuseshistory-basedpredictionto learnthis relationshipautomatically, byloggingapplicationbehaviour andapplyingstatisticalmachinelearningtechniquesto theloggeddata. Chapter6 shows how applicationresourcedemandpredictorsarebuilt, anddescribesthe key learningtechniquesusedin my applicationcasestudies: least-squaresregression, data-specificprediction, andbinning.

Chapter7 describescasestudiesof four mobile interactive applications— radiosity,virtual walkthrough,speechrecognition,andwebbrowsing.Thesearerepresentativeof mytargetdomain,i.e. resource-intensive,mobileinteractiveapplications(all four applicationswould be involvedin themotivatingscenariodescribedearlierin this chapter).Eachcasestudyshows how theapplicationusesthemulti-fidelity API; evaluatesthecostof portingtheapplication;describesthehistory-basedresourcedemandpredictorsfor theapplication;andevaluatestheaccuracy of thesepredictors.

In previous chapters,variouscomponentsof the systemwere individually evaluated.Chapter8 is a holistic evaluationof systemperformance.It answerstwo questions:doesthesystemadapteffectively; anddoestheadaptationimprove applicationperformance?Ievaluatethesystemwith syntheticworkloadsandsyntheticbackgroundloads;with realap-plicationsandsyntheticbackgroundloads;andwith realapplicationsrunningconcurrently.

Chapter9 summarizesrelatedwork. Chapter 10 concludesthedissertationby identi-fying andsummarizingits key contributions. It thenprovidesanoverview of thesystem’scurrentcapabilitiesanddescribesongoingresearchin multi-fidelity adaptation.Finally, itexploressomefutureresearchdirectionsopenedup by this work.

Chapter 2

A resourcemodel

Thateverythingshouldbeof thesameweightandmeasurethroughouttheRealm—(excepttheCommonPeople).

(W.C.SellarandR.J.Yeatman,1066andAll That, ch. XIX)

Thisthesisaimsto provideaframework for applicationadaptationin mobileinteractivesystems.Its focusis adaptationto resource variation: varying network bandwidth,CPUspeed,andbatterychargelevels. Designingsucha framework requirescleardefinitionsof“resource”and“performance”.What is a resource?How is it measured?How is perfor-mancemeasured,andhow is it relatedto resources?Are theredifferenttypesof resources;what arethey? What arethe resourcesrelevant to a givensystem?This chapteranswersthesequestionswith a resourcemodel.

Most peoplehave an intuitive understandingof what “resource”means:thereis thesetof thingscommonlyconsideredcomputersystemresources,andthemetricscommonlyusedto measurethem. Unfortunately, this intuitive understandingis inadequate:in thischapterI developamorerigorousdefinition. I derivefrom first principlesaresourcemodelthatO describesapplication-systeminteractionin termsof resourcesupplyanddemand.O representsperformanceascostfunctionsof supplyanddemand.O identifiesthreebroadclassesof resourceandtheir characteristicsupplyanddemand

metrics.O showshow traditionalcomputersystemresourcesfit this classification.O providesacriterionfor whetheraparticularresourceis relevantto a givensystem.

Thoughthemodeloftenmatchestheintuitivenotionof resource,it doespresentsomesurprisingresults.Someresourceshavedifferentunitsfor supplyanddemand;sometimes,supplyanddemandarenotevenscalaror numericvalues.I will alsoshow how anunusualresource— file cachestate— fits into theresourcetaxonomy, andarguethatit is arelevantresourcefor adaptivemobileapplications.

5

6 CHAPTER2. A RESOURCEMODEL

I startwith ageneralmodelthatis applicableto avarietyof systems— Internetservers,massively parallelsupercomputers,anddesktopworkstations,aswell asmobilesystems.Iidentify threeresourceclasses— time-shared,space-shared,andexhaustible— andtheircharacteristicsupply, demand,andcostmetrics.At variouspoints,I show how to restrictthemodel,tradinggeneralityfor simplicity by makingassumptionsappropriateto mobileinteractive systems.I illustratetherestrictedmodelwith someexamples.Finally, I definea criterion for decidingif a resourceis relevant,andlist the resourcesthatarerelevant tomobileinteractivesystems.

2.1 What is a resource?

A resource is a measurableentity that is both scarce anduseful. I.e., thereis a finitesupplyandnon-zerodemand.A resource has,associatedwith it:O an instantaneousdemand

��J��PQ�, not alwayszero: themaximumamountof resource

thattheconsumercanusein thenext time interval R P5SQP DUT PCVO an instantaneoussupply � ��PQ� , alwaysfinite: themaximumamountthatthesuppliercanprovide in R P5SWP DXT PCVO a set of cost functions 7 ��PQ� that computethe instantaneouscost of resourcecon-sumptionasa functionof supplyanddemand.Eachcostfunctioncorrespondsto aperformancemetricof interest:e.g.,batterydrainor monetarycost.Wewill seethatlatencyor wall-clock time canalsobea usefulaggregatecostfunction,althoughitsinstantaneouscostfunctionis trivial (i.e. eachperiod T P correspondsto a latency costof T P .)

How doweapplythismodelto applicationadaptation?Here,thesystemis thesupplierof resources,andthe applicationis the consumer. Givensomelevel of supply, the appli-cationadaptsits demandto maintainsomeperformancegoals. A key assumptionhereisthat instantaneoussupplyanddemandareindependent: thatwe canalterdemandwithoutchangingsupply. Of course,presentdemandcanaffect futuresupply:e.g.,agreaterenergydrain(demand)leadsto a lower charge level (supply). If supplyanddemandareinstanta-neouslyindependent,wecaninvert thecostfunctions:i.e.,giventhesupplyandthedesiredcost(performance),wecancomputetheappropriatelevel of demand.

How doesthis definitionof resourcehelpusto describeapplicationadaptation?Appli-cationsadaptby changingtheirbehaviour in responseto changesin systemstate.I abstractout thoseaspectsof applicationbehaviour thataffect performance,andrepresentthemasapplicationdemandfor oneor moreresources.Similarly, aspectsof systemstatethataffectperformancearerepresentedassystemresourcesupply. This independenceof supplyanddemandis crucial to my model: if supplyanddemanddo not affect eachother, we canmeasureor predictthemindividually, andthenapplya costfunction to derive thedesiredperformancemetric. Independenceis a key propertyof the supplyanddemandmetrics

2.2. AGGREGATING RESOURCEMEASUREMENTS 7

derivedin thischapter.

In thischapterI deriveindependentsupplyanddemandmetricsfor CPU,network band-width, memory, disk bandwidth,disk space,batteryenergy, andcachestate.I alsoderivecostfunctionsfor threeof thefour key performancemetricsin mobile interactive applica-tions: interactive responsetimeor latency, monetarycost,andbatterydrain. I explain whythefourthmetric— userdistraction— is outsidethescopeof theresourcemodel.

2.2 Aggregatingresourcemeasurements

Sofar, we have consideredsupply, demandandcostas instantaneousvalues.In practice,we needaggregateratherthan instantaneousmetrics: e.g. batterydrain measuredoversometime interval. What is theappropriatetime interval over which to aggregatesupplyanddemand?How canwe convert the instantaneousmetricsinto aggregateones?Thissectionintroducesthe notion of operation: a naturalunit of aggregation, especiallyforinteractiveapplications.I will arguethattime-seriesarenotagoodway to do aggregation;in the following section,I will show how to derive aggregatemetricswithout usingtime-series.

2.2.1 Operationsand interactiveapplications

Theunit of aggregationis thetime interval overwhich we wish to characterizesupply, de-mand,andperformance.Oftenthesetime intervalscorrespondto somesubtaskor compu-tationexecutedby theapplication.I call thesesubtasksoperations. Thenotionof operationis naturalto interactiveapplications.A typical interactiveapplicationis idle, andconsumesno resources,until it receivessomeuserinput. It thenexecutesfor sometime, producesa response,andbecomesidle again. This executionis an operation.Interactive responsetime is simplyoperationlatency, whichcanbecomputedby acostfunction.Morecomplexapplicationsallow a seconduserrequestwhile thefirst is beingprocessed,i.e., operationscanoverlapandrun concurrently.

Interactive responsetimessmallerthan100msareinvisible to mostusers[12, ch. 2];responsetimesof severalminutesareunacceptablein mostapplications.Thus,this thesisassumesoperationlengthsrangingfrom tenthsto tensof seconds.I denotetheaggregateresourcedemandfor anoperation(or time interval)

�by

� �<��; theaggregatesupplyby� ��

; andits aggregatecostfor somecostmetric8

by8 �<��

.

Figure2.1 shows a simplified modelof an interactive application— a web browser— andan associatedoperation.The applicationis normally idle, waiting for userinput.When the userclicks on a link, it initiatesa “show document”operation. This consistsof a network-boundfetchphaseanda CPU-boundrenderphase(I countas“CPU-bound”not only processingcycles,but also time spenton main memoryaccesses).The supply


Operation"show document"

Fetch(network receive)

Render

(CPU)

user request

Idle state

response

Figure2.1: A simple interactive application and operation

and demandfor CPU andnetwork determinethe latency cost of the operation,i.e. theinteractiveresponsetime.

Many interactive applicationsare easily describedin termsof operations.A speechrecognizermight have a “recognizeutterance”operation,triggeredperhapswhentheuserpausesbetweensentences.In an augmentedreality application,the usergeneratesa re-questby moving or turning their head. The applicationrespondsby executinga “rendernew view” operation.Theapplicationmight have multiple operationtypes:“add objecttoscene”and“deleteobject” in additionto “rendernew view”.

2.2.2 Why not time-series?

Whenwe aggregate,shouldwe retain information aboutthe exact time sequenceof re-sourceconsumption?Wecouldpreservethis informationby representingaggregatesupplyanddemandastime-series: i.e., infinite vectorsof supplyanddemandvalues,onefor eachinstantin time. In practice,we would sampleat discretetime points, to keepthe vectorlengthsfinite.

Thetime-seriesapproachhasamajordrawback:it destroystheindependenceof supplyanddemand.Consideranapplicationthat,at time

P, hasa smallamountY P

of processingtime remaining,after which it will block. I.e., its demandfor the CPU resourceat timeP

is non-zero. If the supply is zero, i.e. the applicationis not scheduledat timeP, then

thedemandwill still be non-zeroat timeP DZY P

. On theotherhand,if theapplicationisscheduledat time

P, thedemandat

P D[Y Pwill bezero. Thusthedemandat time

P D[Y Pdependson the supplyat time

P. Further, when the applicationblocks, it may generate

demandfor someotherresource,saydisk. Now thedemandfor diskat timeP D\Y P

dependson thesupplyof CPUat time

P.

Time-seriescapturethe maximumamountof information,but they make supplyanddemandinterdependent.Canwe make simplifying assumptionsthatresultin independentmetrics,yet retainenoughinformationto beuseful?Theresourcetaxonomyin Section2.3

2.3. A TAXONOMY OFRESOURCES 9

shows, for eachresourceclass,how to derive aggregatesupplyanddemandmetricsthatpreserve independence.

2.3 A taxonomyof resources

I classifysystemresourcesinto threebasiccategories— time-shared, space-shared, andexhaustible— eachwith a characteristicform for its supplyanddemandmetrics. Thisclassificationis not exhaustive; however, it is sufficient for theresourcesof interestin thisthesis.In this section,I describeeachclass,its characteristicsupplyanddemandmetrics,andcommonresourcesthatbelongto it. For eachclass,I alsogive an exampleof a costmetricasa functionof supplyanddemand.My intentionin all casesis to usethesimplestsupplyanddemandmetricthatallow agoodapproximationof performance.

2.3.1 Time-sharedresources

Network transmit(andreceive)bandwidthandCPUaretime-sharedorscheduledresources.Thereis somesystem-widerateof supply ] �<��PQ�

; at any given instantoneapplicationre-ceivestheentiresupply, andothersreceivenone.(I assumethattheoverheadof switchingbetweenapplicationsis negligible.) Thus,for asingleapplication

� �<��PQ�_^a` ] ��PQ�if theapplicationis scheduledto usetheresourcebotherwise

When an applicationwishesto transmitsomedata,or usethe CPU, it can do so atwhatever ratethesystemwill support;i.e., its demandis effectively infinite:��PQ�_^c`ed if theapplicationwishesto usetheresourceb

otherwise

Theinstantaneousresourceconsumptionis fhgji � � �<��PQ�5S��PQ�W�.

Whenthetime-sharingis sufficiently fine-grained,andthecostof switchingsmall,wecanapproximateresourcesupplywith a GeneralProcessorShare(GPS)[73] model: eachapplicationgetsa steadysupply � �<��PQ� , wherek*2010l* �m�on � ��PQ�_^ ] �9��PQ�

I usethe following aggregatesupplyanddemandmetricsfor anoperation�

that lastsfor a time p � p : � �<��_^ qp � p9ros � ��PQ�Q�P

10 CHAPTER2. A RESOURCEMODEL� ��t^ r s fugvi � � ��PQ�6S/+�J��PQ�Q�Q�PI.e.,

� �<��is theaveragerateof supplyto operation

�, and

� �<��is thetotalamountused

by theoperation:numberof CPUcyclesconsumed,bytestransmitted,etc.

Note thataggregatesupplyanddemandnow have differentunits: aggregatesupplyistheaverage rateof supply, whereasaggregatedemandis the integral over timeof demand.My modelis thateachcomputationconsumesafixedamountof resource,ratherthantakinga fixedamountof time. For supply, it is morelogical to assumethat thesystemprovidestheresourceto theapplicationat a certainrate, ratherthana fixedquantumperoperation.With thesemetrics,wecannow computethelatency costof usingresource : ��t^ � �<�� <��If the operationonly usesresource , and is never idle, then this is the latency of theoperation,i.e.

: ��w^ p � p . Thus,expressingsupplyasa ratemakeslatency a functionofsupplyandnot viceversa.

Clearly, GPSis anidealization.However it is anacceptableonein practice,if thetimesharingis fine-grainedcomparedto thelengthof anoperation.In thisthesis,I treatnetworkbandwidthandCPUastime-shared,GPSresources.For completeness,I alsoincludediskI/O asa GPSresource.However, my currentprototypedoesnot supportthis resource;fur-ther, correctlyaccountingfor positionallatencieswould requireamorecomplex treatmentfor disk I/O thanthesimpleGPSmodel.

2.3.2 Space-shared resources

Disk spaceis thebestexampleof aspace-sharedresource.Thereis a fixedtotal amountofspaceavailable:someis usedto storedataandtheremainderis freespace.Therearethreekindsof objectsthatthis spacecontains:O objectspermanentlystoredon disk: thesearenever reclaimedandcanbeexcluded

from any computationof supplyanddemand.O cachedcopiesof objects,whoseoriginalsareelsewhere(e.g. on a remoteserver) orcanberecomputed.I treatthespaceusedfor cachingasa separateresource:“cacheresources”aredealtwith laterin this section.O temporaryobjects,createdby the operationanddeletedbeforeit completes:thesearetheonly objectsthatI considerto bepartof disk-spacedemand.

The instantaneoussupplyis theamountof freespaceavailablefor temporaryobjects.Thedemandis thetotal spacerequiredby theapplicationfor thecurrentsetof temporaryobjects. � �<��PQ�_^ K GF�F ��PQ�

2.3. A TAXONOMY OFRESOURCES 11��PQ�_^ kxzy � �4� �J{|�~} �J�'� F ��where

P FJ�� PQ�is thesetof temporaryobjectsownedby theapplicationat time

P.

Thelatency costis negligible if thereis sufficient freespacefor all theobjects:I donotmodel the performanceimpactof disk spaceutilization or layout. If thereis insufficientspace,theapplicationcannotrun: thenthecostis infinite.

7 �<��PQ�t^ `[d if��PQ�� PQ�b

otherwise

Theaggregatedemandis themaximumamountof temporaryobjectspaceusedby anoperation;theaggregatesupplyis theamountof freespacethat is alwaysavailableduringtheoperation.Thus,wecandefineconservativeaggregatesupplyanddemandmetrics:� ��t^ fu�9�s ��J��PQ�

� �<��t^ fugvis � �<��PQ�8 �<��t^a`ed if

� �<�� <��botherwise

2.3.3 Cacheresources

Memory is a space-sharedresource,but differs from disk spacein that it is virtualizable.An application’s virtual memorymight not be completelyresidentin physicalmemory:the remainderis on backingstore,usuallydisk. In otherwords,physicalmemoryactsasa cache for virtual memory. I call this a cache resource: suchresourcesmerit a differenttreatmentfrom ordinaryspace-sharedresources.A shortageof freespaceis nolongerfatal:thereis afinite costassociatedwith eviction andreplacement.

A typicalcomputersystemhasalargenumberof cacheresources:everylevelof amem-ory hierarchy, exceptthe bottommost,is a cacheresource.In this thesisI am concernedwith two cacheresourcesthathavea largeimpactonmobileinteractiveapplications:phys-ical memoryandclient file cache.I expectthat mostgeneral-purposemobile computingdeviceswill supporta file systemaswell asvirtual memorybackedby a disk drive, low-powerflash-basedstorage[26], or, in thefuture,MEMS-basedstorage[84].

Memory

Whentheapplicationaccessesa non-residentpage,it causesa pagefault. If thefault is onanew page,it is satisfiedcheaplyfrom apoolof freepages;if thepageis onbackingstore,it requiresa costlydisk access.Thus,it seemsinsufficient only to measuretheamountof


memoryavailableto anapplication:we mustconsiderthecurrentcontentsof memoryaswell. The supplymetric is no longera numericscalarvalue— it mustcapturethesetofresidentpages,aswell asthenumberof freepagesavailableto theapplication.� ��4� � �"!+��PQ�_^�� ]�F �J� FJ� Po��PQ�5S K lF�FW��F � ��PQ�W�Here ]�F �J� FJ� Po��PQ�

is aset,whileK lF�FW��F � ��PQ�

is a scalar. ��4� � �"!G��PQ�_^a`�� PQ�5�if theapplicationis accessingpage� ��PQ�� otherwise

Thecostin termsof disk faultsgeneratedis:

7 ��4� � �'!+��PQ�z^c` bif � ��PQ�� ]�F �J� FJ� Po��PQ�

, or � ��PQ�is a new pageand

K GF�FW��F � ��PQ�� bq otherwise

How do we aggregatememorydemandandsupply over an operation�

? We couldrecordtheexact time sequenceof pageaccesses(for thedemand),aswell asthatof pageevictions (for the supply). This is unwieldy andresultsin interdependentsupplyandde-mand: by makinga simplifying assumption,we canreduceboth supplyanddemandtoindependent,scalar, numericvalues.

Therearetwo reasonsfor pagingactivity: coldmissesandcapacitymisses. Theformeroccuron pagesthathave never beenreadin from disk; the latteron pagesthathave beenevicted due to memorypressure. In this work, I model the cost of pagingonly in thepathologicalcasewheretherearea very large numberof capacitymisses:i.e., whenthesystemthrashesby repeatedlyevicting andreloadingthesamepages.

Thrashingoccurswhentheworkingsetof anapplicationexceedsthememoryavailableto it. Thus,thememorydemandof anoperationis thesizeof its working set.Thesupplyis the numberof pagesavailableto it. I approximatethe supplyby the numberof pagesavailableat operationstart: theapplication’s residentpagesandthefreepagesavailabletoit. I neglectthepagingcostwhenthereis no thrashing;thecostof thrashingis assumedtobeinfinite. ��4� � �"!+��t^ p��< �� p� �� "!��t^ p ]�F �J� FJ� Po��P M � p6D K GF�FW�E�+��F � ��P M �

8 ��4� � �"!G��_^c`[d if�� "!�� "!��b

otherwise

LiketheGPSmodel,theworkingsetmodelis anidealization.It is notvalid in situationswherecold misseshave a major performanceimpact. It doesnot capturethe benefitsofprefetching, sincetheaim of prefetchingis to avoid cold misses.For thepurposesof thisdissertation,theselimitationsof theworkingsetmodelareacceptable,andI chooseit overamoregeneralbut morecomplex model.

2.3. A TAXONOMY OFRESOURCES 13

Client file cache

Many applicationsaccessdatafrom remoteservers; in mobile environments,the clientmachineoftencachesthisdataonlocaldisk. Thusclientfile cacheis alsoacacheresource,at a differentlevel of thecachinghierarchy. This sectionshows how to modela read-onlyclient cachein termsof supplyanddemand.

We canmodelclient cacheaswe do memory;however, the working setmodel is nolongervalid. Cold missesarecommon,andcannotbeignored:evenif anobjecthasbeenpreviously cached,it might havebeeninvalidatedby theserver. On theotherhand,thrash-ing is rareon thetimescaleof a typical interactiveoperation.

Thus,for client file cachesin mobileinteractiveenvironments,we concentrateon coldmissesandignorecapacitymisses.The demandof an operationis the setof objectsac-cessed;thesupplyis thesetof valid cacheentries.�.-/$(01�'��*2�435� ��t^ � � p � is accessedby

�� -/$(01�'��*2��36� ��t^ � � p � is cachedandits cacheentryis valid

�Objectsthatarein theformersetbut not in thelatercausecachemisses.Notethatit is vitalto representsupplyanddemandassetsof objects,ratherthanjust astheaggregatesizeoftheobjects:it is thesetrepresentationthatallowsusto determinewhichobjectswill triggercachemisses.

I assumethat cachemissesareservicedby fetchingdataobjectsover a wirelesslink,andstoringthemon the local disk. The primary costof suchmissesis thusthe networkreceivecost, i.e. thetotal numberof bytesfetched:8 -$�0|� �4*2��35� ��t^ kxwyJ� �C¡1¢¤£j¥j¦"¥�§C£ { s }~¨ x�©y<ªo�C¡|¢¤£j¥v¦"¥�§C£ { s }J«¬ P F � ��Fromthis,wecaneasilycomputethelatency costasafunctionof bytesfetchedandnetworkreceivebandwidth.

Thismodeldoesnotcapturethecostof eviction, sinceI ignorecapacitymisses.It alsoignoresstaleness:if the systemknows, or suspect,that a server hasa later versionof anobject, it would ideally balancethe costof revalidationandrefetchagainstthat of usingstaledata. The model can be extendedto supportstalenessby storing someadditionaldatawith eachcacheentry — its versionnumber, the time it waslast validated,andthemost recentversionnumberfor this object that the client is aware of. Given these,wecouldderivea probabilisticestimateof staleness,anda stalenesscostmetricbasedon thisestimate.In somecases,stalenesscanbedeterminedfrom theobjectmeta-data:e.g.,manycacheablewebpagescomewith an“expiry date”.


2.3.4 Exhaustible resources

Thetime-sharedandspace-sharedresourcesconsideredabovearerenewable: time-sharedresourcesarerenewedperiodically, andspace-sharedresourcesarereclaimedwhenappli-cationsreleasethem. Resourcessuchasbatteryenergy do not fit this pattern: thereis afixed amountof resourceavailable,which is depletedby usageandmust be augmentedby recharging. To characterizethe supplyof sucha resource,we needto know the cur-rent resourcelevel

; ��PQ�. When aggregatingsupply over a time interval, we must also

considertherecharging activity during that interval: thus,the recharge rate ] �<��PQ�andthe

maximumpossibleresourcelevel � ��PQ�arealsorelevant. � ��PQ�

is oftena constant(e.g.batterycapacity):for generality, I assumeit variesdynamically. Finally, theresourcemightnot supportarbitrarily high ratesof resourcedrain: theremight bea maximumallowabledepletionrate ��PQ�

. Thus,theinstantaneoussupplyis thevector� �<��PQ�z^®� ; �<��PQ�6S ] ��PQ�5S � �<��PQ�6S �<��PQ�W�The instantaneousdemand

��<��PQ�is the rateat which theapplicationwishesto deplete

theresourcelevel. Theactualdepletionrateis limited by bothsupplyanddemand.Idefinethe instantaneousdepletioncostmeasuredin Joules/s:7 �<��PQ�_^ fugvi �4��<��PQ�5S ��PQ�Q�

In thecaseof batteryenergy, thesupplymetricmustalsocaptureanimportantaspectofenergy-relatedstate:wall power. I assumethatwhenon wall power, applicationbehaviourhasnoeffecton batterychargelevel. Then� )�* �(� � �"!,��PQ�z^®� ;I)�* �(� � �"!,��PQ�6S ] )�* �(� � �"!+��PQ�6S � )�* �(� � �"!,��PQ�6S )�* �(� � �"!,��PQ�6S ��,¯�¯ ��PQ�W�( ��,¯�¯ ��PQ� is aboolean,truewhenonwall powerandfalsewhennot;notethat ] )�* �(� � �"!,��PQ��b�°u± �²�,¯�¯ ��PQ� .)7 )�* �(� � �"!,��PQ�t^a` b

if ��³¯�¯ ��PQ� is truefhgji � ��J��PQ�6S �<��PQ�W�otherwise

Theaggregatedemandof anoperation�

is thetotal amountof resourceconsumedbyit: � �<��z^ ros fhgji � ��<��PQ�6S �<��PQ�W�Q�P

Theaggregatesupplyis theamountof resourcethatis availablefor theapplication:thisdependsonthecurrentresourcelevel, therechargerate,andthesystempolicy for rationingthe resource. In the caseof batteryenergy, I assumethat the operationcan run until itcompletesor the batteryis completelydrained. In this case,the aggregatesupply is theentireavailableresource,includingtherecharging thathappensduringtheoperation:� �<��t^ ; ��P M � D r s ] ��PQ�2+P

2.4. COMBINING RESOURCES 15

Theaggregatedepletioncostis theamountof resourceconsumed:8 ��t^ � �<��For batteryenergy, takingwall power into account,8 )�* �(� � �'!,��t^ ` b

if onwall power��)�* �(� � �"!+��otherwise

For simplicity, I assumethatwall-powerstatusdoesnotchangeduringanoperation.

In my system,I useFlinn’s goal-directedadaptationmechanism[34] to adaptenergyconsumption.Goal-directedadaptationusesa a variantof thedepletioncostmetric. Thecostof an operationdependson its energy consumption,weightedby the “criticality” ofthe batteryresource.This weight reflectsthe extent to which the systemis meetingtheuser’s goal for desiredbatterylifetime: it is zerowhenon wall power, smallwhenbatteryreservesareplentiful, andincreasesasthesereservesshrinkrelativeto thedesiredlifetime.Thebatterylifetime goalwould besetby theuserbasedon theamountof work they needto do,or theamountof time they will beaway from wall power.

2.4 Combining resources

So far, we have looked at supply, demand,andcostfor operationsthat consumea singleresource.How do we characterizeoperationsthat usemultiple resources?Consideranoperation

�with a network-boundphase

�� '��&, duringwhich it receivesdata,anda CPU-

boundphase� � �5� .

Wecanstill measurethetotalresourcedemandasbefore.TheCPUdemand�� _^�� 5� � is thenumberof cyclesusedby

� � �� , andsimilarly thenetwork receivedemandis thenumberof bytesreceivedby

�� '��&. Wecanapproximatesupplyby assumingthatthe

averagesupplyrateis thesamefor bothphases:� � �� _^ � � �� '��& �� '��& �t^ � � �'��& ��Sincethephasesaresequential,thetotal latency: ��_^ : �� D : �� & �^ :t� �� 5� � D : � �'��& �� & �

^ �� 5� �� D � � �'��& �� & �� & �� '��& �^ �� 5� �� D � � � ��& �� '��& ��


2.4.1 Linear and additivecostmetrics

Latency in theaboveexampleis anexampleof a linear, additivecostmetric. A linearcostmetricfor someresource andoperation

�satisfies8 ��z^e�.�_´ � ��

where�.�

is a constantfor theperiodof theoperation.Latency is a linearcostmetricwithrespectto GPSresources:

�.�µ^ ¶ª6· { s } where� �<��

is the averagesupply rate. Batterydepletionis a linearcostwith respectto CPUandnetwork if we assumethateachcycleofprocessing,andeachbytetransmittedor received,consumesafixedamountof energy.

An additivecostmetricfor anoperation�

thatusesresources ¶ S L SJ¸?¸J¸ satisfies8 ��t^ 8 �m¹6�� D 8 �"º?�� D ¸J¸?¸Latency is an additive costmetric if all the resourcesareaccessedsequentially. Batterydepletionis alwaysadditiveassuminganidealbattery:thetotalenergy consumptionof theCPU, network, disk, etc. is the sumof their individual consumptions.Monetarycost isalwaysadditive.

A costmetricthatis bothlinearandadditivesatisfies:8 ��t^[�h� ¹ � � ¹ �� D �h� º � � º �� D ¸J¸J¸E.g. an operationthat only usesGPSresources,andusesthemsequentially, will have alinear, additive latency cost metric. If they are usedconcurrently, then it is linear withrespectto eachresource,but not additive. If we considerthe effect of memoryusingtheworking set model, then it is additive but not linear: the memoryeffect adds d to thelatency if thrashing,and

bif not.

2.4.2 Concurrent useof resources

If anoperationusesresourcesconcurrently, latency is notadditive. If all resourcesareusedsimultaneously, then: ��z^ f»�<� � : �� 6S : �� #�%$ � �5SJ¸J¸J¸(�¼^ f»�<� � �� S ��#��$ � �� #�%$ � �� S?¸J¸J¸(�Morecomplex concurrency patternswill resultin correspondinglycomplex latency metrics.

2.5 Resourcemodel for mobile interactivesystems

In this chapterso far, I presenteda generalresourcemodelbasedon thenotionof supply,demandand cost metrics. I showed how to describeapplicationbehaviour as resource

2.5. RESOURCEMODEL FORMOBILE INTERACTIVE SYSTEMS 17

demand;systemstateas resourcesupply; and performanceas resourcecost. I derivedsupply, demand,andcostmetricsfor commoncomputersystemresourcesandperformancemeasures.In somecases,I showedhow to simplify thesemetricsby makingassumptionsspecificto mobileinteractivesystems.

Thissectionfurtherrestrictstheresourcemodelto thedomainof thisthesis:applicationadaptationin mobileinteractivesystems.We startwith someexamplesof suchadaptation,andapplytheresourcemodelto them.I thendefinetheresourcesandcostmetricsthatarerelevantto this thesis;I explain why money is not a relevantresource,anduserdistractionnot a costmetric. Finally, I show how resourcedependencies— demandfor oneresourceleadingto demandfor another— canbecapturedby appropriatecostmetrics.

2.5.1 Example: remoteexecution

Consideraspeechrecognizerrunningonahand-helddevice. Dueto localresourcepoverty,theactualcomputationis doneonapowerful remoteserver. Theapplicationrecordsauserutteranceandshipsit to the server, which doesthe recognitionandreturnsthe result. Iamassuminga “request-response”modeof execution,wheretherequest(send)phase,thecomputationat theserver, andtheresponse(receive)phaseareall non-overlapping.Weareinterestedin theresponsetimeor latency of theentire“recognize”operation,whichO transmits

��#�%$ � bytesof digitizedspeechdatato theserver,O performs� � � � �'� � � �� cyclesworth of computationon theserver,O receives

� � �'��&bytesof recognizedtext from theserver.

Thethreestepsaresequentialandtheoperationlatency is additive:: � � � �'½ =?$(¾m� ^ :t#�%$ �AD : n � � &2� � D : � � ��&Eachstepusesexactlyoneresource:network transmitbandwidth,serverCPU,andnetworkreceive bandwidthrespectively. We alreadyknow how to computelatency costmetricsfortheseresources: :>#��$ � ^ ��#�%$ �� #�%$ �: n � � &Q� ��^ : � � � �'� � � �� ^ � � � � �'� �¿� �� 4� �'� � � ��: � � ��& ^ � � � ��&� � �'��&Here

� #�%$ � , � � � ��&, and

� � � � �'� � � �� arethe bandwidthto andfrom the server, andthe pro-cessingrateavailableat theserver. Thus,wehave thelinear, additive latency costmetric:: � �'� �'½ =?$�¾2� ^ ��#��$ �� #�%$ � D � � � � �'� � � �� '� �¿� �� D � � �'��&� � � ��&


The remotecomputationalsorequiressufficient memoryat the server. If we usethesimple “working set” latency costmetric (Section2.3.3), thenmemoryhasno effect onlatency whenthereis no thrashing,andit dominatesthelatency whenthrashingoccurs.Inthis case,latency is not linearany more,but it is still additive:: � �4� � � �� "!À^a`ed if

� � � � �'� � �� "!�� '� ��4� � �'!botherwise: � � � � ½ =o$(¾m� ^ :>#�%$ �ED : � � � �'� � � ��ÁD : � � � �'� ��4� � �'! D : � � ��&

2.5.2 Example: web browser

Figure2.1shows thestatetransitionsof a typical, if simplistic,mobileinteractiveapplica-tion: awebbrowser. Theuserrequestsadocument;thebrowserfetchesit overthenetworkandrendersit to the display. The performancemetricsof interestarelatency andbatterydrain.Let usassumethefetchandrenderarenotsequentialbut perfectlypipelined.In thiscase,the latency is dominatedby the slower of two phases,i.e. we have a non-additivelatency costmetric(Section2.4.1): n 3 � @ÃÂm� � � ��4= � ^ fu�9� � :_-� � ��3 S : � � = Â � �/�

Supposewehaveadocumentof size�

bytes,whichrequiresÄ cyclesof processingtorender. If theavailablenetwork receive bandwidthis Å , andtheavailableprocessingrateis ] , then :_-� � ��3 ^ � Å ^ � � �'��&� � � ��& ^ : � �'��&

: � � = Â � ��^ Ä] ^ �.� �� ^ :t� �5�Hence : n 3 �'@ÆÂm� � � �� = � ^ f»�<� � � � �'��&� � �'��& S ��

Batterydrainor energy useis still linearandadditive: if theCPUconsumesÄ � �� Joulesfor every cycle of processing,andthe wirelessinterface Ä � �'��&

Joulesper byte of datare-ceived,then: ; n 3 �'@�Âm� � � �� = � ^ ;I� ��ÁD ; � �'��& ^ Ä � �� ´ �� ÁDXÄ � �'��& ´ � � �'��&2.5.3 Which resourcesare relevant?

Theresourcedefinitionandtaxonomydevelopedherefit awidevarietyof resources.Eventhe threetraditionalresourcesof economicsfit into the taxonomy:Landasspace-shared,

2.5. RESOURCEMODEL FORMOBILE INTERACTIVE SYSTEMS 19

Resource Type Demandunits SupplyunitsCPU Time-shared cycles cycles/sEnergy Exhaustible Joules JoulesNetwork bandwidth(xmit) Time-shared bytes bytes/sNetwork bandwidth(recv) Time-shared bytes bytes/sPhysicalmemory Space-sharedbytes bytesFile cachestate Space-shared(setof objects) (setof objects)Disk I/O (read) Time-shared bytes bytes/sDisk I/O (write) Time-shared bytes bytes/sDisk space Space-sharedbytes bytes

NotethatCPUdemandis ameasurementof processortime,notof theactualnumberof instructioncyclesusedby theapplication.I simply scaleprocessortime by processorclock speed,to obtaincyclesfrom second.This scalingcompensatesfor differencesin measuredCPUtime dueto dif-ferencesin processorclock speed.However, it doesnot wholly eliminatethe effect of variationin processorarchitecture:thesamecomputationmightstill requiredifferentnumbersof cyclesondifferentprocessors.

Figure 2.2: Resourcesrelevant to mobile interactivesystems

Labourastime-shared,andCapitalasan exhaustibleresource.Many suchresourcesarenotrelevantto mobileinteractivesystemsor evento computersystemsin general.Gasolineis usedin many systemscontainingautomobiles,but is usually irrelevant to a hand-heldcomputer.

What resourcesandcostmetricsarerelevant to this thesis,i.e. to mobile interactivesystems?I identify four key cost metrics: latencyor interactive responsetime, batterylifetime, monetarycost, anduserdistraction. I thendefinerelevantresourcesasthosethat:O fit into my resourcetaxonomy.O areinvolved in the interactionbetweena mobile interactive applicationandtheun-

derlyingsystem:thereis non-zeroapplicationdemandandfinite systemsupply.O haveasignificantimpacton at leastoneof thecostmetricsmentionedabove.

On the basisof this definition, we canexcludegasolinefrom the list of relevant re-sources.We canalsoexcludemoney. Application-systeminteractionmight involvemone-tary cost(e.g.if datais transmittedoveracommercialwirelessnetwork); but money is notdirectly involved in the interactionitself. Thesupplierof money is not thesystem,but isexternalto it.

Figure2.2 shows the resourcesrelevant to a mobile interactive system.Of these,mycurrentprototypesupportsCPU,memory, energy, network bandwidth,andfile cachestateon the local machine,aswell asCPU, memoryandfile cachestateon remoteexecutionservers.


2.5.4 Userdistraction: not a resourcecostmetric

Userdistractionis animportantaspectof aninteractiveapplication,especiallyin a mobileenvironment.A mobileuseris probablyengagedin oneor morereal-world taskswhile theyareusingthecomputer. It is critical to provide theinformationor computationrequiredbytheuserwith minimaldistraction.

Userdistractionis a factorof many things,including:O theinput/outputdevicesused.E.g. requiringkeyboardusemightbedistractingO thenumberandtypeof userinteractionsrequired.E.g. pop-updialogwindowsO thereal-world taskthattheuseris engagedinO themodein which resultsarepresented(text, graphical,video,...)O violation of userexpectation.E.g. a low-quality resultwherehigh quality wasex-pected

To expressuserdistractionasa costmetric in my resourcemodel,we would have torepresentall theseaspectsof applicationbehaviour andsystemstateas“resources”.Ratherthanstrivefor animpossiblygeneralresourcemodel,I choseto treatuserdistractionexplic-itly asametricthatis outsidetheresourcemodel.In Chapter3 wewill seeyetanotherusermetricof greatrelevance:outputquality. Althoughwe canoften tradeoutputquality forresourceusage,theformeris notdirectlyaresultof thelatter: rather, bothresultfrom adap-tive choicesmadeby theapplication.For this reason,I treatoutputquality independently,outsidetheresourcemodel.

2.6 Resourcedependencies

In Section2.5.2,we saw that usingwirelessnetwork bandwidthhasan energy cost. Ingeneral,a cost metric for one resourcemight be in termsof demandfor another. E.g.memorypagingresultsin diskaccess;CPUusageconsumesenergy; file cachemissescausefetchesover the network. We canview theserelationshipsasa dependency graph: thereis a dependency from a resourceto its costmetrics,which might themselvesberesources.Figure 2.3 shows the dependenciesbetweenthe resourcesand cost metricsin a mobileinteractivesystem.

Elementsthatarenotsupportedin my currentprototypeareshown in italicsanddashedlines. The prototypedoesnot supportthe monetarycost metric, or the disk spaceanddisk bandwidthresources.It “skips over” thedependencieson disk bandwidth;it usestheworking setmodel to directly estimatelatency costasa function of memorysupplyanddemand;andit assumesthat theonly costof file cacheaccessis thenetwork fetch in thecaseof a miss. It alsoignoresthe dependency of energy consumptionon CPU,network,and disk bandwidth. Instead,it measuresenergy consumptiondirectly as a propertyofapplicationbehaviour.

2.7. SUMMARY 21

LocalCPU

Remote

Disk bwReadWrite

NetworkTransmitReceive

Application

cost

Battery drain

Disk space Monetary

File Cache

MemorycostLatency

cost

Figure2.3: Dependenciesbetweenresources

2.7 Summary

In this chapter, I presenteda resourcemodelbasedon supply, demand,andcostmetrics.I startedwith an extremelygeneralmodel,andthensimplified themodelby makingrea-sonableassumptionsaboutthetargetdomain:mobile interactive systems.I characterizedtraditional systemresourcesin termsof supply anddemand,andalso a novel resource:cachestate. I showedhow to derive costmetrics— latency, batterydrain,andmonetarycost— as functionsof supplyanddemand.Money anduserdistractionareoutsidetheresourcemodel; disk I/O anddisk spacearecoveredby the modelbut not supportedbythecurrentprototype;CPU,energy, network, memory, andfile cachearesupportedby themodelandtheprototype.

Chapter 3

Multi-fidelity computation

Thereasonablemanadaptshimselfto theworld ...(G.B. Shaw, Reason)

Mobile interactive usersrequire low responsetimes and long battery life. In otherwords,applicationsmustboundthelatency andbatterydrainof interactiveoperations.Aswesaw in Chapter2, thesecostsdependpartlyonresourcesupply, whichis highly variable,especiallyin mobileenvironments.Wirelessnetwork bandwidthvariesdueto contention,noise,andradioshadows. Time-sharedresourcesseefluctuationsin offeredload,bothonthe local hostandon remoteservers. Hardwareenergy-saving measures— scalingCPUfrequency, reducingnetwork transmitpower, or selectively disablingmemorybanks— areyet anothersourceof variability. Theclient OScannotcontrolall of thesefactors:hencetherewill alwaysbesomevariability imposedby theenvironment.

If we cannotguaranteesupply, we mustregulatedemand:applicationsmustadaptde-mandto matchsupply in orderto maintainperformance.In this chapter, I introducethemulti-fidelitycomputationasa simpleyet effectiveabstractionfor adaptationin mobilein-teractiveapplications.Multi-fidelity computationgeneralizesandsubsumesmany previousapproachesto applicationadaptation,asalsoseveraltheoreticalextensionsto thenotionofalgorithm. This generalizeddefinition of multi-fidelity computationis oneof the contri-butionsof this thesis,andthebasisof theAPI andsystemdesigndescribedin Chapters4and5.

I startby definingfidelity, andshow thatthenotionof datafidelitydoesnotcapturetheentirerangeof fidelity adaptation.I complementdatafidelity with a new concept:com-putationalfidelity, anddefinemulti-fidelity algorithmsbasedon this broaderdefinitionoffidelity. I thenextendthedefinitionof fidelity further, to includeperformance-tuningpa-rameters. I briefly discussthe role of nontunableparameters in fidelity adaptation,andoutlinethedifferencesbetweenfidelity andoutputquality. Finally, I discusspreviouscon-structsthataresimilar to multi-fidelity algorithms:approximationalgorithms,anytime al-gorithms,andimprecisecomputations.

23

24 CHAPTER3. MULTI-FIDELITY COMPUTATION

3.1 Fidelity

Previous work hasshown the valueof adaptingdatafidelity: degradingdataobjectstoreducenetwork bandwidthusage[37, 71] or batterydrain[31]. Fox et al. [37] have shownthatdegradingdigital imagescanreducefetchtimesfor webdocuments.Nobleet al. [71]describeasystemthatsupportsmultipleconcurrentadaptiveapplications,eachwith its ownmetricof datafidelity. Datafidelity is generallydefinedastheextentto whichthedegradedversionof adataobjectmatchestheperfector referenceversion.

Data fidelity, while valuable,is not sufficient to describethe entire rangeof fidelityadaptation.Considera searchover an imagedatabase.We couldrun anapproximatever-sion of the query, which runs fasterbut might be more inaccurate:e.g., we can reducetheharvest[36] — theamountof dataexamined— asin onlineaggregation [46] or con-gressionalsampling[4]. Anotherexampleof approximatequeryingis approximatemedi-ans[60], whereaccuracy is tradedfor mainmemoryusage.In all thesecases,we degradefidelity by providing a lesstrustworthy result: however, this fidelity is not associatedwithany specificinput dataitem, but with thecomputationitself. Similarly, we could imagineperforminga numericalcomputationto differentdegreesof accuracy; again,theaccuracyis not apropertyof theinputdata,but of thecomputation.

I definecomputationalfidelityasaruntimeparameterof acomputationthatcanchangethe quality of its outputwithout changingits input. Both dataandcomputationalfidelitysharea key characteristic:they allow us to tradeoutputquality (resolutionof an image,or accuracy of a search)for resourceconsumption.I combinethe two into a generalizeddefinitionof fidelity:O A fidelity metricis a runtimeparameterto acomputationthataffectsits outputqual-

ity, eitherby changingtheinput dataor by changingthecomputationitself.O A singlecomputationcanhave several fidelity metrics: i.e., fidelity canbe multi-dimensional.O Eachfidelity metric hasa specificrangeof values,either discreteor continuous.E.g.,thelossinessof JPEGcompressionanimageis usuallyrepresentedasanumberbetween0, and100: thisJPEGlevel is acontinuousfidelity metricwith avaluerangeR b S q bGb V . If thealgorithmonly allows specificJPEGlevels(say25,50,75,and100),thenwe have a discretefidelity metric. Discretefidelity metricscanalsobe non-numeric:e.g.,aspeechrecognitioncomputationcanchoosebetweena “large” anda“small” vocabulary.O Fidelity valuesfor eachmetric are ordered: eachvalue resultsin a higher outputquality thanthelast.

3.2. MULTI-FIDELITY ALGORITHMS 25

3.2 Multi-fidelity algorithms

Traditionally, an algorithm hasa well-definedoutput specification. E.g., any candidatefor a sortingalgorithmmustpreserve all its input elementsandorder themaccordingtoa givencriterion. Only algorithmsthat adhereto this outputspecificationareconsideredacceptable,andwe comparethembasedon CPUtime consumed,memoryused,or othermetricsof resourceusageandperformance.

In an interactive application,a fixed output specificationwill not always matchtheuser’s needs.It might exceedthe user’s needsandwasteresources,or fall shortof themandgenerateunusableresults. Considera list of itemsthat the userwishesto browseinsortedorder. It is importantto producethe first few items quickly (say within 1s), forgoodinteractiveresponse.Theremainingitemscanbesortedwhile theuseris viewing thefirst few. Often,theuserwill terminatetheoperationafterbrowsingthesefirst few items,resultingin resourcesavings; if we hadsortedthe entire list beforehand,we would havewastedresources.

Thekey featureof theabove scenariois that theoutputspecification— thenumberofsorteditemsto compute— is not known. In fact, in this casewe aresurerof our latencyconstraint thanof theoutputspecification.Thelatency constraintimpliesconstraintsonre-sourcedemand:wemustnotusemoreCPUcyclesor transmitmorenetwork bytesthanwecanwithin 1s,at thecurrentlevel of resourcesupply. Similarly, batterylifetime constraintsimply constraintson thecomputation’senergy demand.

In this example,theright thing to do is to presentasmany sorteditemsaswecancom-putewithin our resourceconstraints.We have invertedtherolesof resourceconsumptionandoutputspecification— ratherthanfix theoutputspecificationandaim for the lowestresourceconsumptionpossible,we boundtheresourceconsumptionandproducethebestpossibleoutput.Of course,in thegeneralcase,we might not want to fix eitherof thesetoa particularvalue: we might specifyan allowablerange,andlet the systemfind the bestpossibletradeoff.

Sincethetraditionalnotionof algorithmdoesnot allow usto performthis inversion,Idefineamulti-fidelityalgorithm:

A multi-fidelity algorithmis a sequenceof computingstepsthat (in finitetime)yieldsaresultthatfallswithin arangeof acceptableoutputspecificationsor fidelity levels. Upon termination,a multi-fidelity algorithm indicatesthefidelity level usedto generatetheresult.

Thekey featureof multi-fidelity algorithmsis that they allow a rangeof differentout-puts,giventhesameinput. Theoutputsarenotall equivalent,however: eachhasadifferentfidelity. The essenceof a multi-fidelity algorithmis thatwe cantradethis fidelity for re-sourceconsumption— the resourceconsumptionin its turn affects the latency, batterydrain,andmonetarycostof thecomputation.


user request

response

nontunable params

"show document"Operation

Render

FetchFidelity

decision nontunable params

tunable params

user preferences

(image size)

(JPEG quality)

(image size)

(network, energy)

(CPU, energy)

Idle state

Figure 3.1: An exampleof an interactive,multi-fidelity operation

Theoperationsdiscussedin Section3.1canall becastasmulti-fidelity algorithms:mydefinitioncoversbothdatafidelity andcomputationfidelity. By extendingany computationwith a pre-processingstepthatdegradesits input anda post-processingstepthatdegradesits output,wecancreateamulti-fidelity computationthatsupportsany datafidelity metricsdefinedon theinput or outputdata,in additionto any computationalfidelity metrics.

3.3 Mobile, interactive,multi-fidelity applications

Multi-fidelity algorithmsareanexcellentmatchfor my targetdomain:mobile interactiveapplications.Mobility resultsin variableresourcesupply, necessitatingadaptation.Inter-activity meansthatthefinal resultof acomputationis viewedby auser, not fedinto anothercomputation.As we saw in thesortingexample,usersareimperfectsourcesof input andtolerantsinks of output. They rarely specify the exact fidelity that they desirefrom anoperation;often they will accepta lower fidelity output in exchangefor lower latency orlongerbatterylife; if they aredissatisfiedwith the resultsof adaptation,they areabletotakecorrectiveaction.

Clearly, therearesomeinteractiveapplicationswherefidelity adaptationis undesirable:e.g.a medicalimagingapplicationwhereaccuracy is of paramountimportance.However,thereis a largeclassof interactive applicationswherefidelity canprofitablybe tradedforperformance— thesearethetargetdomainof this dissertation.

We canmodelan adaptive, interactive applicationasa collectionof operations,eachof which couldbea multi-fidelity computation.Every time we begin sucha multi-fidelityoperation [83], wehaveanadaptivedecisionpoint: wecanchoosefidelity valuesto matchcurrentresourcelevelsandlatency (or energy) constraints.

Recallthe“fetch anddisplaydocument”operationof Chapter2. To savenetwork band-width,wecanJPEG-compressimagesataproxyserverbeforefetchingthem.Thelossiness

3.4. PERFORMANCE-TUNINGPARAMETERS 27

of compressiondependson theJPEGquality: a runtimeparameterto thecompressional-gorithm.HeretheJPEGquality is afidelity metric: if we reducefidelity, we reduceoutputqualityaswell asnetwork demand.Figure3.1showsthemulti-fidelity versionof thefetch-and-displayoperation.It is similar to thenon-adaptive version,with oneadditionalpieceof functionality: choosingthefidelity andactingon thatchoice.Chapters4 and5 describethedesignandimplementationof thesystemthatprovidesthis functionality.

3.4 Performance-tuningparameters

Someapplicationshave useful adaptationsthat do not affect output quality at all. E.g.,remoteexecution— runningpartorall of acomputationonamorepowerfulservermachine— doesnot changeits output. However, dynamicallychoosingthe split of computationbetweenlocal andremotehostsis still extremelyvaluablein a mobileenvironment[33]: itallows us to tradelocal resourcesfor network bandwidthandserver resources.Similarly,parametricqueryoptimization[50] tailors the executionplan of a databasequery to theruntimebuffer availability.

Suchdecisions— how to split a computation,which executionplanto use,or whetherto usea CPU-intensive ratherthana memory-intensive algorithm— areruntimeparame-tersof thecomputation.Changingtheseperformancetuning parameters affectsresourceconsumption,but not outputquality. I view performancetuning parametersasdegener-atecasesof fidelity metrics: thosethat have zeroimpacton outputquality. I extendmypreviousdefinitionof fidelity andof multi-fidelity computation:

A fidelity metric is a runtime parameterthat affects the output qualityand/orresourceconsumptionof a computation.A multi-fidelity computationis acomputationwith oneor morefidelity metrics.

3.5 Nontunable parameters

Fidelity metricsare tunableparameters: we canadjusttheir valuesto regulateresourcedemand.A computationmayhave otherparametersthataffect its resourceconsumption,but arenot tunable.They mayhave beenfixedby theapplicationor theuser, or they maybepropertiesof theinputdata.E.g.,thesizeof theinputdataoftenplaysanimportantrolein determiningresourceconsumption:thecomputationalcomplexity [19] of analgorithmis the relationshipbetweenan algorithm’s input datasizeand its resourceconsumption.In Figure3.1, thereis onenontunableparameter:theoriginal (uncompressed)imagesize.This affectsboththecompressedimagesize(andhencethenetwork usage)aswell astheCPUcostof renderingtheimage.


Nontunableparametersarenot candidatesfor adaptation:changesin their valuesareoutsidetheadaptivesystem’scontrol. However, their valuesdo impactresourceconsump-tionandhenceperformance.Thusany adaptivesystemmustbeawareof nontunableparam-etersandtheireffectonresourceconsumption.Adaptivedecision-makingwill beimprovedby takingall known nontunableparametersinto account:wecanview themastheinput tothedecisionprocess,while thetunableparametersaretheoutput.

3.6 Output quality

By outputquality I meantheuser-perceivedquality of the resultsof a computation(inde-pendentof its otheraspects,suchas its performanceor monetarycost). Clearly, outputquality dependson fidelity; it mayalsodependon externalfactorsrelatingto theuserandtheir environment. E.g. the perceived quality of an imagedisplayedby a web browserdependsonthedisplayresolution,theambientlight, theuser’seyesight,etc.By fidelity, ontheotherhand,I meanobjective,situation-independentmeasuresof how a computationisdegraded.Subjective measurementsof outputquality areoutsidethescopeof this thesis:I assumethroughoutthatoutputquality is identicalto fidelity, or that that thereis a fixed,well-known relationshipbetweenthetwo.

3.7 RelatedConcepts

In additionto theexamplesof Section3.1, I amawareof threeprevioustheoreticalexten-sionsto theconceptof analgorithmthatarerelatedto thiswork. I describetheseapproacheshere,andshow that they canbeviewedasspecialcasesof multi-fidelity algorithms(Fig-ure3.2).

Approximationalgorithmsproduceresultsthatareprovably within someboundof thetrueresult.A subsetof thisclass— polynomialtimeapproximationschemes[39] — haveatuningparameteranalogousto fidelity: theerrorbound Ç . Lowervaluesof Ç leadto longerrunningtimes,correspondingto higherfidelity requiringa higherresourceconsumption.Approximationalgorithmsaretypically of interestfor intractable(NP-hard)problems,andconcentrateon reducingasymptoticcomplexity. A recentexceptionis thework by Friezeet al. on approximatemethodsto do LatentSemanticIndexing [38], a problemfor whichpolynomialtime solutionsalreadyexisted. Similar to approximationalgorithmsareprob-abilistic andrandomizedalgorithms[64], alsousedto generatepolynomial-timesolutionsfor intractableproblems.Herethe“fidelity” metricis theprobabilityof computingacorrectanswer, ratherthantheerrorboundon theanswer.

In contrast,multi-fidelity algorithmsareapplicablein many situationswherelow-orderpolynomialsolutionsareavailable.Sorting,for instance,is È �4ÉËÊjÌ�Í_ÉÎ�

in complexity andyet the examplegiven earlier showed why one might usea multi-fidelity algorithm for

3.7. RELATED CONCEPTS 29

computationsImprecise

Multi−fidelity computations

PTAs

Approximationalgorithms

Anytimealgorithms

Any−dimensionalgorithms

Thefigureshowshow approximationalgorithms,anytimealgorithms,andimprecisecomputationscanbe viewed asspecialcasesof multi-fidelity computations.PTAs (polynomial time approxi-mationschemes)area specialcaseof approximationalgorithms;any-dimensionalgorithmsandimprecisecomputationsaretwo differentgeneralizationsof anytimealgorithms.

Figure 3.2: Conceptsrelatedto multi-fidelity algorithms

this purpose.In therealworld, evena reductionof time by a constantfactoris extremelyvaluable.Further, classicalcomplexity measurestypically only measureCPU usage,andoccasionallyresourcessuchas memoryand network. They do not considerenergy: acritical resourcein mobilecomputing.

A multi-fidelity algorithmmay be composedof diverse,unrelatedalgorithms,oneofwhich is dynamicallyselectedbasedon runtimetradeoffs. Thusapproximationalgorithmsareproperlyviewedasa specialcaseof multi-fidelity algorithms.

Anytimealgorithms[21] andtheir generalization,any-dimensionalgorithms[66], area secondimportantextensionto the conceptof an algorithm. An anytime algorithmcanbe interruptedat any point during its executionto yield a result— a longerperiodbeforeinterruptionyieldsa betterresult. Any-dimensionalgorithmsaresimilar, exceptthat theyallow moregeneralterminationcriteria. Sincea rangeof outcomesis acceptable,theseclassesof algorithmscanbeviewedasmulti-fidelity algorithms.However, theirgeneralityis restrictedby the requirementthat valid resultsbe availablenot only on completionbutat all times. Hence,anytime andany-dimensionalgorithmsarealsosubsetsof the moregeneralclassof multi-fidelity algorithms.

Thethird extension,imprecisecomputations[29,30,48], supportsgracefuldegradationof real-timesystemsunderoverloadconditions.Eachcomputationis modeledasamanda-tory partfollowedby anoptionalanytimepartthatimprovestheprecisionof theresult.Thereal-timeschedulerensuresthatthemandatoryportionof everytaskmeetsits deadline,andthattheoverallerroris minimized.Sincethey allow multipleoutcomes,imprecisecompu-tationsareclearlyinstancesof multi-fidelity algorithms.However, their restrictedstructureandreal-timefocusmakesthemasubsetof thelatterclass.

Chapter 4

A multi-fidelity API

HadI beenpresentat theCreation,I wouldhave givensomeusefulhintsfor thebetterorderingof theuniverse.

(Alfonso theWise)

Chapter3 claimedthatamobileinteractiveapplicationshouldbeviewedasacollectionof multi-fidelity computations,for thepurposesof adaptation.How arethesemulti-fidelityapplicationswritten?Whatsystemsupportdo they need,andwhatAPI shouldthey usetoaccessit?

ThischapterdescribestheprogramminginterfacethatI havedesignedfor multi-fidelityapplications;Chapter5 describesthe runtime systembehindthe interface. I start withsomebackground:theprinciplesthatguidedmy designdecisions,andtheprevioussystemswhich influencedthem. I show that thereis a mismatchbetweenapplication-level andsystem-level abstractions,andintroduceapplication-specificpredictors asa way to bridgethis gap.I thendescribethethreecomponentsof theprogramminginterface:library calls,application configuration files, and hint modules. I concludethe chapterwith a simpleexampleusing the API, and outline a 5-stepprocessfor porting applicationsto usethemulti-fidelity interface.

4.1 Background

4.1.1 Guiding principles

My primary designgoal wasa low barrier to deployment: to make it easyto createandrun multi-fidelity applications.Writing an applicationfrom scratchis often prohibitivelyexpensive: thus,I wasconcernedto minimizethecostof porting existingapplications. Toeasedeployment,I wantedtheapplicationsandtheruntimesupportto run on off-the-shelfplatforms.Theseconsiderationsled to thefollowing designprinciples:

31

32 CHAPTER4. A MULTI-FIDELITY API

O high-levelAPI: provide abstractionsthatareeasyfor applicationsto use,ratherthanthosethatarenaturalfor theOSto provide.O small, minimally invasiveAPI: the fewer modificationsmadeto applicationsourcecode,thebetter.O configurationfiles: wherepossible,usetext configurationfilesto communicateinfor-mationto thesystem,ratherthanAPI callsembeddedin theapplication.Thesefilesareeasierto write, maintain,andmodify thanapplicationsourcecode.O minimalism: wheneverpossible,useexistingOSfeaturesratherthaninventnew ones.I useLinux asabasesystem,andmy systemrunson two popularkernelversions—2.2 and2.4 — andtwo hardwarearchitectures— x86-basedPC andStrongARM-basedItsy [8].

4.1.2 Assumptions

My focusis to provideeffectivesupportfor adaptation:I assumethatany applicationusingthis supporthassomeinherentcapability to adapt. In otherwords, the applicationmusthavesometunableparameters.Further, I assumeaccessto thesetunableparameters,eitherthroughsourcecodeor somecodeinterpositioningtechnique.E.g.,Puppeteer[20] manip-ulatesthebehaviour of Windows applicationcomponentsthroughwell-definedinterfacessuchasDCOM. Of the four casestudiesin this dissertation,threeareof source-availableapplications.The fourth is anunmodifiedwebbrowseraugmentedwith an HTTP proxy:theproxy, for which I hadaccessto sourcecode,transparentlyadaptsimagequality on thebrowser’sbehalf.

4.1.3 Historical context: Odyssey

Thiswork grew outof previouswork in Odyssey [71], asystemthatsupporteddatafidelityadaptationto changesin network bandwidth. The key abstractionin Odyssey was theresourcecallback: applicationsregisteredatolerancewindow with Odyssey, andreceivedacallbackwhenresourcelevelsstrayedoutsidethistolerancewindow. Thecallbacktypicallytriggeredanadaptationof datafidelity andanew tolerancewindow registration.

My implementationis basedon Odyssey: it extendsthe Odyssey API andinfrastruc-tureto supportmulti-fidelity computation.It usesexisting featuresof Odyssey, suchasthenetwork bandwidthmonitor. Someof my designprinciples— minimalismanduser-levelimplementation— areinheritedfrom Odyssey. Others— thehigh-level API andthe ac-centon minimal sourcecodemodification— arebasedon pastexperiencewith portingapplicationsto Odyssey.

4.2. THE IMPORTANCE OF PREDICTION 33

4.2 The importance of prediction

Thereis a basicmismatch,or semanticgap,betweenadaptive applications,the OS, andusers:O applicationprogrammersdealin application-specificparameters:fidelitiesandnon-

tunableparametersO theOSmanagessystemstateandresourcesO userscareaboutusermetrics:outputquality andperformance

Which of thesesetsof abstractionsshouldthe API support? If the goal is to mini-mizeapplicationprogrammerburden,thentheAPI shouldtalk aboutfidelities,ratherthanresources.Similarly, user-specifiedpoliciesandconstraintsshouldbein termsof usermet-rics.

How will the systemtranslatebetweenfidelities, resources,andusermetrics? Fromtheresourcemodel(Chapter2), performanceis a functionof resourcesupplyanddemand;supply is a featureof systemstate;anddemanddependson applicationbehaviour (i.e.,on tunableandnontunableparameters).My designencapsulatesthesedependenciesintocomponentscalledpredictors:O Resource demandpredictors predictapplicationresourcedemandasa function of

fidelity andnontunableparameters.O Resource supplypredictors predict the resourcesupplyavailable to an applicationfrom observationsof systemstate.O Performancepredictorspredictperformancemetrics— latency batterylifetime,mon-etarycost— asa functionof resourcesupplyanddemand.

In addition to latency, battery lifetime, and monetarycost, mobile interactive usersare concernedwith output quality and user distraction. The latter are not performancemetrics,andcannotbeexpressedasresourcecostfunctions. This thesisdoesnot addressthe constructionof predictorsfor outputquality or userdistraction:however, the systemdesignallows for suchpredictorsto beeasilypluggedin, if available.

Supplypredictorsaregenericsystemcomponents;performancepredictorsareapplication-specific,but a single predictorwill often serve a large classof applications. Chapter5describessupplypredictorsfor CPU,memory, network bandwidth,andenergy, aswell asthe generic“latency” and“battery depletioncost” predictorsusedby all the applicationsstudiedin this thesis.My currentprototypedoesnotcontainamonetarycostpredictor.

Demandpredictorsareclearlyapplication-specific:however, I wishedto automate,totheextentpossible,thetaskof constructingsuchpredictors.I outlinemy generalapproachhere:Chapter7 describesandevaluatesthespecificpredictorsthatI or othersbuilt aspartof my applicationcasestudies.


4.2.1 Demandprediction thr ough loggingand learning

My approachto predictingresourcedemandis empirical.We samplethefidelity spacebyrunningtheapplicationat differentfidelities; we log the resourcedemandat eachsamplepoint; andwe usemachine learning techniquesto build predictorsbasedon thesamples.At runtime,thesystemcontinuesto observeapplicationresourcedemand:this informationis usedby thepredictorsto updatethemselvesusingonlinelearning techniques.

Onecouldimagineananalyticapproachto thesameproblem.Algorithmic complexityanalysis[19] cangiveusCPUconsumptionasanasymptoticfunctionof inputparameters.In therealworld, however, constantsmatter, andtheseconstantsvary from onehardwareplatformto another. We couldattempta moredetailedanalysis,basedon processorspec-ification sheets.With modernprocessors,this is virtually impossible:we would needtoaccountfor super-scalarexecution,branchmisprediction,TLB misses,andothercompli-catingfactors.Further, this will only giveusCPUconsumption,andnot memory, networkbandwidth,or batteryenergy consumption.

I proposethat algorithmiccomplexity be usedasa startingpoint: to guidethe learn-ing algorithmsthat processlog data. This allows us to specializea generalasymptoticfunctionalform to thespecifichardwareon which theapplicationruns. We canalsospe-cializeour predictionsto thespecificinput dataon which theapplicationoperates,insteadof alwayspredictingaworst-caseor average-casescenario.

4.3 Multi-fidelity API

Basedon thedesigngoalsexplainedhere,I havedesignedanAPI thatisO small: thebaseAPI has3 C library calls.O high-level: fidelities are the basicabstractions,and the applicationdoesnot dealdirectlywith resources.O usesapplication configuration files: thesedescribethe application’s fidelities andnontunableparametersto thesystem.O synchronous: applicationsrequesta fidelity decisionat eachdecisionpoint, ratherthanwait for thesystemto suggestachange.A synchronousAPI is easierto programto: theperformanceadvantagesof asynchrony canbeobtainedby a library layerthatexports the synchronousAPI, but usescallbacksand cachingto interactwith theunderlyingruntimesystem.O supportsapplication-specificpredictors, in theform of hint modules.

The remainderof this sectiondescribesin detail the C library calls, the applicationconfigurationfile syntax,andthehint moduleinterface.

4.3. MULTI-FIDELITY API 35

4.3.1 C library calls

The basicAPI consistsof 3 calls (Figure4.1). The applicationcalls register fidelity onstartup:thesystemthenloadsthespecifiedapplicationconfigurationfile andhint modules(predictors). At the beginning of eachmulti-fidelity computation,the applicationcallsbegin fidelity op. Thevaluesof nontunableparametersarepassedasinputs,andthetunableparametersarereturnedasoutputs.Theapplicationthenrunsthemulti-fidelity computationwith theparameterssuggestedby thesystem.On completion,it callsendfidelity op: thislets the systemmeasurethe resourceconsumptionandlatency of the computation.Thisinformationis loggedfor futureexamination,andalsousedto updatetheresourcedemandandperformancepredictors.

4.3.2 Application Configuration Files

In orderto choosefidelity valuesfor an application,the systemmustknow what fidelitymetricstheapplicationhas,andwhat their allowablevaluesare. Thesearedeclaredin anapplicationconfiguration file (ACF) with a simple declarative syntax: a sampleACF isshown in Figure4.2. EachACF correspondsto anoperationtype: thustheAPI supportsmultipleoperationtypesperapplication,althoughthecurrentsystemprototypedoesnot.

TheACF declareseachfidelity metricwith thekeyword fidelity , andeachnontunableparameterwith thekeyword parameter. Eachof thesecanbeorderedor unordered.Or-deredmetricshave real-numberedvaluesin somerange,which maybecontinuous,or dis-cretizedwith somestepvalue.An unorderedmetrictakesoneoutof afixedsetof possiblevalues,representedasstrings.

TheACFalsoprovidesauniquedescriptorstringthatdescribesthetypeof multi-fidelitycomputation;a log file for observationsof resourcedemand;andthemethodto beusedinselectingfidelity valuesat runtime. Thelatter is indicatedby themodekeyword, andhasoneof two values:normal, in which thesystemtries to pick themostappropriatefidelityvaluesgiven the currentresourcelevels,and training , wherethe systemsamplesfidelityvaluesrandomly. The training modeis usedoffline to generatelogs that cover the entirefidelity space,whicharethenusedto constructpredictors.

Hint modules

TheACF alsopointsto a hint module:a binaryobjectfile that is dynamicallyloadedintothesystemon applicationstartup.Thehint modulecontainsresourcedemandpredictors,which arespecifiedin theACF asentrypointsinto themodule.PredictorsarewrittenasCfunctionswhich receive thefidelity andnontunableparametervalues,anda “snapshot”ofthe currentresourcesupplylevels,asinput (Figure4.3). Themodulecanalsospecifyanapplication-specificlatency predictorto overridethegenericlatency predictor. Thelatency


/* register operation type with runtime system. Input is a pointer to theApplication Configuration File for that operation type. Output is aunique identifier for the operation type. */

int register_fidelit y( IN char *conf_file, OUT int *optype_idp);

/* query runtime for appropriate fidelity, before starting an operation.

inputs are:dataname: a unique identifier for the operation’s input dataoptype_id: the identifier returned by register_fideli tynum_params: number of non-tunable parametersparams: array of non-tunable parameter valuesnum_fidelities: number of tunable parameters (fidelities).

outputs are:fidelities: array of tunable parameter (fidelity) valuesopidp: unique identifier for this operation

*/int begin_fidelity_o p( IN const char *dataname, IN int optype_id,

IN int num_params, IN fid_param_val_t *params,IN int num_fidelities,OUT fid_param_val_ t *fidelities,OUT int *opidp);

/* report completion of an operation. Inputs are the application id,the operation id, and a failure code. The latter can be

SUCCESS: successful completionABORT: If the program crashes without calling end_fidelity_op , the

runtime will log a value of ‘‘ABORT’’; it is not meant tobe passed in directly to this function.

USER_ABORT: operation aborted by userRSRC_ABORT:operation aborted due to resource constraint violationFAILED: operation failed for some other reason

*/int end_fidelity_op( IN int optype_id, IN int opid, IN failure_code failed);

Figure4.1: The multi-fidelity API

4.3. MULTI-FIDELITY API 37

description radiator:radios it ylogfile /usr/odyssey/et c/ ra di at or. ra di os it y.l ogmode normalconstraint latency 10param polygons ordered 0-infinityfidelity algorithm unordered progressive hierarchicalfidelity resolution ordered 0.01-1

hintfile /usr/odyssey/lib /r ad_hint s. sohint cpu radiator_radiosi ty _c pu_hi nthint memory radiator_radios it y_ memor y_ hi nthint latency radiator_radio si ty _la te nc y_ hi ntupdate radiator_radiosi ty _updat eutility radiator_radios it y_ ut il ity

ACFfor theradiositycomputation.Thereis onenontunableparameter— thenumberof polygonsin theinput data— andtwo tunableparameters— thechoiceof radiosityalgorithm,andtheres-olution. Thefile alsospecifiesa latency constraintof 10 secondson every radiositycomputation,andprovidesahint modulewith CPU,memory, andlatency predictors.

Figure 4.2: Example Application Configuration File

typedef int (*hint_func_t)( IN char *dataname,IN fid_param_val_t *params,IN fid_param_val_t *fidelities,OUT double *val,IN struct snapshot *res_snapshot,IN struct server *whichserver);

typedef int (*update_func_t )( IN char *dataname,IN fid_param_val_t *params,IN fid_param_val_t *fidelities,IN int numvals, IN struct res_value *vals,IN failure_code failed);

Predictorsandutility functionsareof type hint func t. They receive asinput a uniqueidentifierfor theoperation’s inputdata(dataname), thenontunableparameterandfidelity values,aresourcesupplysnapshot,andtheserver chosenfor remoteexecution(if any). They returna floatingpointvaluewhich is the predictedresourceconsumption.Updatefunctionsareof type updatefunc t,andarepassedthedataname,fidelity andparametervalues,anarraycontainingresourcedemandandlatency values,andthefailurecodepassedby theapplicationto endfidelity op.

Figure4.3: Signaturesfor hint module functions


predictorinvokestheresourcedemandpredictors,andthencomputeslatency asa functionof resourcesupplyanddemand.

The hint modulecanalsocontainan updatefunction. This function, if present,willbeinvokedat theendof eachoperationandpassedthefidelity, nontunableparameters,theresourceconsumptionandperformanceof that operation. It canthenupdatethe internalstateof thepredictorsusingthis information.

Finally, the hint modulecontainsa utility function: a function that mapseachadap-tive choiceto a numberrepresentingusersatisfactionor utility. Thusthe utility functionencodesthe user’s policy for trading off betweenfidelity and performance.The utilityfunction is passedthefidelity andnontunableparameters;it cantheninvoke the resourcedemandand/orperformancepredictors,andusethesepredictionsto calculatethedesirabil-ity of any particularfidelity choice.Theutility functionis expectedto returnarealnumberin therange Ï�Ð�ÑJÒ?Ó .Resourceconstraints

As wesaw in Chapter3, oneof theadvantagesof multi-fidelity computationsis theabilityto invert therolesof fidelity andresourceconsumption:to producethebestfidelity, whilekeepingresourcedemandwithin somebound. I call theseboundsresource constraints.TheseresourceconstraintsmightbeobtainedÔ from the user: e.g. a soldieron the battlefieldmight want to limit radio network

transmissionto avoid detection.Ô by observingresourcesupply:e.g. thesupplyof memoryactsasa constrainton thedemand,if weassumetheworkingsetmodelfrom Chapter2.Ô from higher-level performanceconstraints:auser-specifiedboundon latency resultsin aboundonCPUdemand;thevalueof theCPUboundvarieswith theCPUsupply.

Resourceandlatency constraintscanbespecifiedstaticallyin theACF;they canalsobeaddedor updatedat runtimethrougha hint constraint call. Optionally, theapplicationcanalsospecifya toleranceboundandaconstraint violation callback function: if theresourceusage(or latency) of anoperationexceedsthe tolerancebound,thecallbackfunctionwillbeinvoked.Currently, my prototypesupportscallbacksonly for latency constraints.

4.4 Exampleuseof API

Figure4.4 shows an exampleuseof the basicmulti-fidelity API by a radiosityapplica-tion. During initialization, the programcalls register fidelity and the systemreadstheApplication ConfigurationFile shown in Figure4.2. The applicationthenwaits for userrequests.Whenit receivesa userrequest,it invokesbegin fidelity op, passingin thenameof theinput datafile andthenumberof polygonsit contains(i.e.,anontunableparameter).

4.4. EXAMPLE USEOFAPI 39

...int optype_id;register_fidelit y( "/ us r/ ody ss ey /e tc /ra di at or .r adi os it y. co nf" ,

&optype_id);...while (!done) {

int opid, success;char *objname, *algorithm;double num_polygons, resolution;fid_param_val_ t nontunables[1], tunables[2];...get_user_reque st (& obj name, &num_polygons);...nontunables[0] .o rd ere d_va l = num_polygons;begin_fidelity _op( obj name, optype_id, 1, nontunables,

2, tunables, &opid);resolution = tunables[0].or dered _v al ;algorithm = tunables[1].uno rd ere d_va l;...success = do_radiosity(ob jn ame, resolution, algorithm)...end_fidelity_o p( optyp e_id , opid, success);

}...

Figure 4.4: Sampleuseof multi-fidelity API


begin fidelity op returnsthetunableparametervalues:thealgorithmandresolutionto use.After completingtheradiositycomputation,theapplicationcallsendfidelity op.

4.5 Conclusion

Theprocessof portinganapplicationto themulti-fidelity API consistsof fivesteps:Ô Describe: Createan ACF that list the application’s fidelity metricsandnontunableparameters.Ô Modify: Insertcallsto themulti-fidelity API into theapplication.Ô Measure: Acquirea log of resourcedemandat differentfidelity values,by usingthesystemin “training” modeandrunningtheapplicationrepeatedly.Ô Learn: Usemachinelearningtechniquesto build demandandperformancepredictorsbasedon thelogs.Ô Hint: Build a hint modulecontainingthe application’s utility function andthe pre-dictors.

In Chapter7 I show, throughfour applicationcasestudies,that the one-timecostofporting applicationsis modest;in Chapter8 I show that the runtime benefitsof multi-fidelity adaptationareconsiderable.

Chapter 5

Systemsupport for multi-fidelitycomputation

I mustCreateaSystem,or beenslav’d by anotherMan’s.(W. Blake,Jerusalem, pl.10,1.20)

What systemsupportdoesthe multi-fidelity API require? What functionality canbesuppliedbygenericsystemcomponentsratherthanapplication-specificones?If application-specificcomponentsarerequired,how canthesystemaid in building them?

In this chapterI describethedesignandimplementationof runtimesupportfor multi-fidelity computation.I outlinetheprinciplesthatguidedthedesignprocess;I thendescribethe high-level design,followed by a descriptionof eachkey component.Someof thesecomponentsweredevelopedby otherresearchers:thesearedescribedbriefly, while com-ponentsdevelopedby measpartof thisthesiswork aredescribedin moredetail. I concludewith a micro-benchmarkbasedmeasurementof the overheadsimposedby eachruntimecomponent.

5.1 DesignRationale

Deployabilitywasthekey considerationin my runtimedesign,just asapplicationportabil-ity wasin theAPI design.I wantedto deploy bothapplicationsandruntimeonoff-the-shelfplatformswith minimaleffort. This led to adesignphilosophythatfavoursÔ minimalism: I usefeaturesof theunderlyingOSwherepossible,ratherthanreimple-

mentthem.Ô agray-box[6] approachthatavoidskernelmodification.My runtimeis implementedasa user-level processthatoccasionallyrelieson knowledgeof kernelinternals,butneveron modificationof theseinternals.

41

42 CHAPTER5. SYSTEMSUPPORT FORMULTI-FIDELITY COMPUTATION

What runtimesupportdoesthe multi-fidelity API require?The runtimesystem’s pri-maryresponsibilityis to makefidelitydecisionsfor theapplicationat thebeginningof eachmulti-fidelity operation.Secondarily, thesystemshouldprovidemonitoringandloggingofapplicationresourcedemand,for performancedebuggingpurposesandto build predictorsfor resourcedemandin thefuture.

The focus of this dissertationis application adaptation:at eachdecisionpoint, thesystemtriesto pick theapplication’s tunableparametervaluesto achieve thebestpossibletradeoff betweenperformanceandoutputquality. Clearly, applicationperformanceis alsoaffectedby OSbehaviour: changesin resourceallocation,CPUfrequency or diskspindownpolicy will affect theapplication’s resourcesupply. My approachis to observeandpredictsystembehaviour — i.e. resourceavailability andallocation— but not to modify it.

A purelypredictiveapproach[23] hasseveraladvantages:Ô it preservestheassumptionsof my resourcemodelby avoiding cyclic dependenciesbetweenresourcesupplyanddemand.Ratherthanattemptto manageboth supplyanddemand,my approachpredictssupplyandregulatesonly demand.Ô noQoSguaranteesarerequiredfrom theunderlyingOS:I requireaweakerproperty,thatof predictabilityof resourceallocation.Predictabilityrequiresonly thattheOS’sallocationalgorithmsbeknown, andthattheOSexportsufficiently detailedstatisticsto enableus to second-guessits resourceallocationdecisions.However, it doesnotrequirethat theseallocationdecisionsconform to any pre-arrangednotion of fairshare,proportionalshare,or guaranteedrate. We will seethat while the allocationdecisionsof a realOS(Linux) arenot perfectlypredictable— not all of therelevantstateis visible at userlevel — we canstill build predictivemodelsthatwork well inpractice.Ô theapproachworksequallywell for resourceswhereguaranteesareimpossible:e.g.,wirelessnetwork bandwidthis affectedby environmentalconditionsthatareoutsidetheOS’scontrol.Ô legacy, non-adaptiveapplicationsareassuredthattheir resourceallocationsandper-formancewill not be affected: the OS continuesto make the sameallocationdeci-sionsasbefore.Ô a prediction-onlyapproachmakesit easierto avoid OS modificationusinga gray-box approach:we canobserve andpredictsystembehaviour entirely at userlevel,perhapswith knowledgeof kernelinternalsbut without needingto modify them.

Thedisadvantageof aprediction-onlyapproachis alackof integrationbetweensystem-level resourcemanagementandapplication-level adaptation.Whentherearemultiple ap-plicationsrunning,thesystemcouldpotentiallymake betterdecisionsif it knew how eachapplicationwould adaptin responseto thosedecisions.This thesisdoesnot addressthehard problemof integrating resourceallocation,hardware management,andapplicationadaptation:the systemleavesthe first two to the default OS mechanisms,andpassivelyobserveandpredicttheireffecton applicationresourcesupply.

5.2. DESIGN 43

Solver

Demand predictors(CPU, network, remote CPU,)

energy, ....

Utility function

DemandmonitorsCPU, network,

memory, energyremote CPU/memory

predictors

network, energyCPU, memory,

remote CPU/memory

Supply

Performance predictors(latency, battery drain)

Logger

App

begin_fidelity_op

end_fidelity_op

updates

utilityqueries

Hint module

Genericperformance

latency

predictors

resource supplysnapshot

(1) (3)

(4)

(5)

(6)

(7)

(8)

(2)

Figure 5.1: Systemsupport for the multi-fidelity API

5.2 Design

Figure5.1shows thehigh-level designof theruntimesystem.Its primaryfunctionality—makingfidelity decisions— is triggeredby a begin fidelity op call from the application,andis implementedby thefollowing steps:

1. Theapplicationpassesin thevaluesof thenontunableparameters.2. A set of supplypredictors predictsthe application’s resourcesupply for the near

future.This informationis capturedin a “resourcesupplysnapshot”.3. A solversearchesthespaceof tunableparametersto find thebestpossiblevalues.It

evaluatescandidatevaluesfor the tunableparametersby computingtheir goodnessor utility.

4. Utility is computedby the application-specificutility function in the hint module.This function is given the tunableandnontunableparametervaluesaswell as theresourcesupplysnapshot.

5. The utility function invokesperformancepredictors that computethe latency andbatterydrainresultingfrom this particularchoiceof tunableparameters.

6. Performancepredictorsin their turn, call application-specificresourcedemandpre-dictors; they thencombinesupplyanddemandto computeperformance.

Thesecondpieceof functionality— monitoringandlogging— is triggeredby a callto endfidelity op:


7. Resourcedemandmonitorscomputetheaggregateresourcedemandof theoperationjust concluded.

8. A logger recordstheresourcedemand,tunableandnontunableparametervalues,andthesuccess/failurecodeof theoperationin adiskfile.

I divide the runtime designinto generic and application-specificcomponents.Thesolver, supplypredictors,demandmonitors,and loggeraregenericsystemcomponents.Thehint module,which containstheutility function,demandpredictors,andperformancepredictors,is application-specific.Althoughperformancepredictionis application-specific,asimplelatency predictionschemeworksfor a largeclassof applications.I includeamongthegenericsystemcomponentsa latency predictorthatis usedby all theapplicationsstud-ied in this dissertation.

Thischapterdescribestheimplementationof thegenericsystemcomponents:thesolver,thesupplypredictors,thegenericlatency predictor, thedemandmonitorsandthelogger.

5.3 The solver

Thesolver’s taskis to searchthespaceof tunableparametersandfind thesetof valuesthatmaximizetheoperation’sutility. It supportstwo basictypesof parameter:Ô discrete: thesehave a finite setof possiblevalues:e.g. integersfrom somerange,or

oneof asetof enumeratedvalues.Ô continuous: thesetakeareal-numberedvaluefrom aspecifiedrange.In otherwords,thesetof possiblevaluesis infinitebut bounded.

My searchstrategy is very simple: for discreteparameters,the systemexhaustivelysearchesall combinationsof values. Although this approachhascomplexity exponentialin the numberof parameters,it works well whenthe numberof parametersis small. Forcontinuousparameters,it finds theoptimalvaluesby a gradient-descentapproach.Whentherearebothdiscreteandcontinuousparameters,I combinethetwo strategies: for everycombinationof discreteparametervalues,the systemrunsthe hill-climbing algorithmtofind theoptimalvaluesof thecontinuousparameters.I have foundthat this simplesolveris sufficient for theapplicationsI studied;moresophisticatedsolverscaneasilybepluggedinto thesystemwithout modifyingothercomponents.

5.3.1 Restrictedvs. unrestricted utility functions

Thedifficulty of thesolver’s taskdependslargely on theutility function. We canbroadlyclassifyutility functionsinto two types,restrictedandunrestricted.

Restrictedutility functionsareconstrainedto someparametrizedform, e.g.polynomialor piecewise linear. Specifyinga utility functionconsistsof specifyingtheparameterval-

5.3. THE SOLVER 45

ues:e.g. thecoefficientsof a polynomial. With restrictedutility functions,thesolver canexploit its knowledgeof theunderlyingfunctionalform for faster, moreaccurate,or morerobustoperation.E.g.Lee[57] hasshown efficient,near-optimalsolutionsbasedonintegerprogrammingfor weighted-sumutility functionsoverdiscrete,orderedparameterspaces.

Thedisadvantageof restrictedfunctionsis thatwemustcoerceall utility functionsintothespecifiedform. I intendutility functionsto beageneralexpressionof applicationpolicy,andit is not clearif any givenform will suffice for all usefulapplicationpolicies.Further,utility is a functionof performance,which is a functionof resourcedemand.This means,for example,that a linear utility function is only usableby applicationsfor which bothresourcedemandandperformancearelinearfunctionsof thetunableparameters.

I decidedto useunrestrictedutility functions:I only constraintherangeof thefunctionto Ï�Ð�ÑJÒoÓ andrequirethat the sameinputswill alwaysproducethe sameoutputs(idempo-tency). ThisallowedmetoÔ useaprocedure-callinterfacebetweenthesolverandtheutility function.This inter-

faceis standardandindependentof solver implementation.Ô write utility functionsin C, ratherthandevise a specialsyntaxfor them. TheseCfunctionsarecompiledandloadedinto thesolver’saddressspace.

Thedisadvantagesof this approachare:Ô utility functionsareTuring-completeandthesolver canmake no assumptionsabouttheir behaviour. A pathologicalutility function cancausethe searchto fail, to beinefficient,or to generatesuboptimalresults.Ô loadingutility functionsinto thesolver’s addressspaceis unsafe:a buggy or mali-ciousfunctioncancrashthesystem.

Unrestrictedutility functionsgiveussimplicity, generality, andefficiency attheexpenseof safetyand robustness. In Section10.3 I will discussalternative representationsthataddresstheselimitations.

5.3.2 Search vs. feedback-control

An alternative modelto search-basedadaptationis local feedback-control: asthe systemruns,it continuallyevaluatesits currentperformance,andincrementallyadjustsfidelity up-wardsor downwardsasrequired.A purely local approachhasmany drawbacks.It cannotrespondquickly yet accuratelyto dramaticchangesin theenvironment:thatrequireslargeyetaccuratechangesto tunableparametervalues,whichareimpossibleif it cannotpredicttheeffectof thesechangesonperformance.Thus,many iterationsof thefeedbackloopwillberequiredbeforeweconvergeonthecorrectparametervalues.Further, if therearemulti-ple tunableparameters,we cannot know which onewill giveusthedesiredimprovement:wemusttry eachin turn,makingadaptationevenlessagile.


In my design,I usefeedback-controlonly to augmentprediction.E.g.,batteryenergyis managedthroughgoal-directedadaptation[34], a form of feedback-control.I alsousefeedbacktechniqueswhenapplicationbehaviour variesin waysthatthepredictorsdid notanticipate. In suchcases,I usefeedbackto correct the predictors,rather than directlyadjustingtheapplication’s tunableparameters.

5.4 Utility functions and resourceconstraints

In additionto utility functions,applicationsor usersmight have per-operationconstraintson resourcedemandor cost(e.g. latency). As we saw in Chapter3, a valuablefeatureofmulti-fidelity computationis the ability to invert the rolesof outputquality andresourceconsumption:to say“givemethebestpossiblequalitywithoutusingmorethan Õ unitsofresource.” Resourceor performanceconstraintsmight arisefrom thesystem,theapplica-tion programmer, or theuser:Ô To satisfy battery lifetime goals, the runtime systemdynamicallysetsconstraints

on per-operationenergy usage,basedon the currentbatterylevel and the desiredbatterylifetime. Similarly, to ensurethatexcessive memorydemanddoesnot causethrashing,the runtime systemcomputesa constrainton eachoperation’s memorydemand,basedon thecurrentlyavailablememorysupply.Ô The applicationprogrammercanspecifya latency constraintin the ACF, basedontheir knowledgeof acceptableresponsetimesfor theoperation.Ô the usercan override the application’s default constraints,or addadditionalones.E.g.,asoldieron thebattlefieldcouldconstrainnetwork transmissionto avoid detec-tion.

I assumethatperformanceandresourceconstraintshavea binaryeffect on userutility.If the constraintis violated, then utility is greatly reduced;if the constraintis satisfied,utility is unaffected.E.g.,whena memoryconstraintis violated,thesystemthrashes,andutility is nearzero.As longaswestaywithin theconstraint,memorydemandhasnoeffecton utility. I representthis binaryeffect by multiplying a stepfunction(Figure5.2(a))intotheutility function. Whentheconstraintis violated,thestepfunction’s valueis Ð , andtheutility is forcedto Ð . Whentheconstraintis satisfied,thevalueis Ò , andutility is unchanged.Theadvantageof this multiplicativeapproachis thatwe cancomposeanarbitrarynumberof functionsin this way, yet ensurethattherangeof theresultingfunctionis always Ï�Ð�ÑJÒoÓ .5.4.1 Constraints assigmoids

The disadvantageof stepfunctionsis that they do not allow us to specify toleranceor“slack”. E.g.,theusermaydesirea latency constraintof 100ms: all valuesbelow this areequallygood.Above100ms,theuser’s utility degradessteadilyuntil it is nearzeroat 1s:

5.4. UTILITY FUNCTIONSAND RESOURCECONSTRAINTS 47

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2

Util

ity

Latency (sec)

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2

Util

ity

Latency (sec)

(a) Stepfunction

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2

Util

ity

Latency (sec)

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2

Util

ity

Latency (sec)

(b) Sigmoid(0.1sectolerance)

Figure 5.2: Utility function for a latencyconstraint of 1 sec

all valuesabovethisareequallybad.Thus,theregion theregionbetween100msand1s isa tolerancezone, wheretheuser’sutility degradesgracefullyratherthansharply.

To incorporatethe notion of tolerance,I usesigmoid(Figure5.2(b)) ratherthanstepfunctions.A sigmoidfor aconstraintvalue Ö anda tolerance× hastheformÒÒwØXÙlÚ|ÛJÜ,ÝJÞ ©"ßIt goesfrom Ò at à�á to Ð at Ø¼á . Thecurvehasthreeregions:the“good” part Ï(à�á�ÑWÖ�à¼× Ó ,thetolerancezone Ï�Öâà�× Ñ/Ö�Øu× Ó , andthe“bad” part Ï¤ÖÁØh× Ñ5Ø¼áXÓ . Theutility at thetolerancezoneboundariesis 0.88and0.12respectively; in thetolerancezone,utility degradesalmostlinearly. This meansthat,evenin the“bad” zone,utility canbeashigh as0.12. To avoidthis, I actuallyusethesigmoid ÒÒwØUÙ?ã ÚäÛJÜ,Ý?Þ ©"ßwhich hasboundaryvaluesof 0.98 and0.02: the multiplier of å ensuresthat the utilityoutsidethetolerancezoneboundariesis verycloseto 0 (or 1).

My runtimesystemprovidesa helperfunction that computessigmoidsfor any givenconstraintandtolerancevalues.Applicationutility functionscanuseit to incorporateuserconstraintson latency or resourcedemandinto theutility. System-determinedconstraintsonmemoryandbatterydrainaremultiplied in by theruntimesystembeforethefinal utilityfunctionis providedto thesolver.


5.5 Systemresourcepredictors

My currentprototypehasfive supplypredictors— for CPU, memory, network, energy,and file cache— one genericperformancepredictor— for latency — and one genericdemandpredictor— for file cache. Of these,the CPU, memory, and latency predictorsweredevelopedaspartof this thesiswork: I describethemin detail,andtheothersbriefly.

5.5.1 CPU

Thiscomponentpredicts,at thestartof eachmulti-fidelity operation,theCPUsupplyavail-ableto it in cycles/s.It is basedonanumberof simplifying assumptions:Ô the computationof interestis containedwithin one processor thread,and its be-

haviour is independentof theotherprocessesin thesystem.Ô this processis entirelyCPU-boundduringthecomputationÔ the scheduleris round-robin: it givesan equalprocessorshareto all runnablepro-cesses.Ô we have temporal locality: the recentpastpredictsthe nearfuture whenpredictingthenumberof runnableprocessesÔ temporallocality operatesat all timescalesof interest.I.e., theaveragebehaviour ofthelast1spredictstheaveragefor thenext 1s, thelast10spredictthenext 10s,etc.In this dissertationI amconcernedwith timescalesbetween0.5sand50s.

Theseassumptionsareidealistic,butwecanderivefromthemasimplepredictivemodelthatis effective in practice.I predictaprocessæ ’sCPUsupplyover thenext ç secondsasè éëêì�íZîâï?ðwhere

îis theprocessorspeed,and

ðis thepredictednumberof runnableprocesses— the

processorload — over thenext ç seconds.To predictð

, thesystemperiodicallysamplesthe instantaneousloadaveragefrom /proc/loadavg andsmoothesthis measurementusinganexponentialdecayscheme: ðòñ(ó�ôzíöõ�ðòñ Øö÷2ÒIà õ>ømù�ñHere

ð¼ñis thesmoothedvaluecorrespondingto the ú ’ th raw measurement

ù�ñ. Thedecay

constantõ

is computedbasedon thedesireddecaytimeconstantç ª :õ�í Ù Ü,ûäü © ýJþwhere ÿ ê is the time interval betweensamples:my prototypeusesa ÿ ê of 0.5s. With thisvalueof

õ, eachsamplewill decayby afactor Ò ï Ù every ç ª seconds,i.e., ç ª determinesour

“window” into thepast.Basedon theassumptionof temporallocality overall timescales,ç ª is set to the time period ç over which predictionis desired. This is also the desired

5.5. SYSTEMRESOURCEPREDICTORS 49

latency of thecomputation:if we areaimingfor a latency of 1s, thenwe try to predicttheCPUsupplyover thenext 1s.

This load-smoothingschemeis identicalto thestandardloadsmoothingschemeusedin Linux. However, Linux only computessmoothedvalueswith certainfixed decaytimeconstants(60s, 300s, and900s). My schemeallows the decaytime constantç ª to bechosendynamicallyto matchthedesiredtimeperiodof prediction.Wecouldalsoimaginemorecomplex autoregressivemodels[10, 23] for betterpredictionaccuracy: however, thesimpleexponential-decayschemeworkssufficiently well for my purposes.

Sofar I haveassumedthatthefutureloadof thesystemis predictedby thepastload,attheinstantprocessæ beginsacomputation.Thisis areasonableassumptionfor all processesbut æ , giventhattheirbehavior is independentof æ ’s. However, æ itself will beCPU-bound,andcontributea loadof 1 duringthenext ç seconds,irrespective of its offeredloadin thepast.æ ’s pastloadis thusanunderestimateof its futureloadandso

ð��will underestimate

thetotal load.Thesystemactuallyusesacorrectedestimateð�� í�ð�� ØZÒIà ù ÷(æ øwhere

ù ÷ëæ øis theamountof loadofferedby æ in thepast. At any givenmoment,

ù ÷ëæ øis

either0 (not runnable)or 1 (runnable);it is sampledfrom /proc/pid/stat andsmoothedina mannersimilar to

ð��. Thusmy final predictorfor a processæ ’s CPUsharerelieson two

measurements:theoverallsystemloadð��

andasæ ’spastloadù ÷(æ ø

.

5.5.2 Memory

Physicalmemoryon Linux andmany otherOSesis not ahardconstraint:applicationscanusemorevirtual memorythanthe availablephysicalmemory. However, if the combinedworkingset[22] of all applicationsexceedstheavailablephysicalmemory, thenthesystemthrashes.Thus, I candefinethe “supply” of memoryto an applicationasthe amountofmemoryit canusewithout thesystemthrashing(Section2.3.3).

Therearethreekindsof virtual memorypagesthatanapplicationcansafelyuse:Ô pagesalreadyownedby theapplication:its residentsetÔ unmappedpagesfrom thefreepoolÔ pagesownedby someotherprocess,but not in thatprocess’s working set: inactivepages

Thus, the total size of an application’s working set shouldbe limited to the sum ofresidentsize,freepool sizeandnumberof inactive pages.The residentsetandfreepoolsizesarestraightforwardto measure;how do we estimatethenumberof inactive pagesinthesystem?

TheLinux 2.4VM systemusespageaging, anLFU schemefor pagereplacement.Theleastfrequentlyusedpagesareplacedon an “inactive list”: theseareassumedto be the


pagesleastlikely to beaccessedin thenearfuture. Victims for eviction arechosenfromthis list. The list is divided into cleananddirty pages,dependingon whetherthe pageshave datato beflushedto disk. Thenumberof inactive cleanandinactive dirty pagescanbereadfrom /proc/meminfo.

It would seemthatthatthe“inactivepages”statisticis exactly theonewe need:it tellsushow many pagescanbereclaimedwith alow probabilitythatthey will beaccessedagain.Unfortunately, Linux’snotionof inactivity is relative: membershipof thelist indicatesonlythatapageis lessactivethanotherpagesin thesystem,not thatit is inactivein any absolutesense.Whenthereis little memorypressure,pageaginghappensrelativelyslowly, andonlypagesthathavenotbeenaccessedfor a long timewill maketheinactive list. Whenthereisdemandfor memory, thekernelagesmoreaggressively, andthethresholdfor inactivity islowered.

Thus,the“inactivepages”statisticmight over- or under- estimatetheactualnumberofinactive pagesin thesystem.Further, someof theseinactive pagesbelongto theprocessfor whichwearedeterminingmemorysupply:addingtogethertheresidentsetsizeandtheinactivepagecountwill double-countthesepages.To compensatefor theseerrors,I useacontrol-feedbackapproach.At eachdecisionpoint, thesystemincreasesits estimateby theamountof availablememory;if thereis swappingactivity, it infers that theapplicationisusingtoomuchmemory, andreducesits estimate:è�� è�� Ø � Ø��\à îHere

è�� and

è �� are the new andold memorysupply estimates(measuredin

pages)respectively;�

is the numberof free pages; �� is the numberof inactive cleanpages;and

îis thesystempagingactivity duringthelastoperation.îËí�� ÷ î��! 4ê5ñ#" Ñ î��$�% ê&'ì û ø

whereî��$�% êñ'"

andî��$�! 4ê&'ì û arethenumberof pagesswappedin from, andswappedout to,

disk. Theintuition is thatwe needonly worry aboutpagesthatareswappedin aswell asout: thesearepotentialindicatorsof thrashing.

5.5.3 Latency

Thegenericlatency predictorcomputeslatency asalinearadditivecostmetric(Section2.4.1).It canbeusedfor any operationwhichspendsall its elapsedtimeusingeithertheCPU(ei-ther local or remote)or thenetwork: i.e., it doesnot overlapcomputationandI/O, or usemultipleCPU’s in parallel.

Thepredictedlatency is

( í*) éëêìè é~êì Ø ) �+�$�� û � é~êìè��+�$�� û �¿éëê5ì Ø ) Û �%ñ ûè Û �%ñ û Ø ) �+�'é-,è��+� é�, Ø ) � û(ûè�� û(û

5.6. DEMAND MONITORS 51

Hereè éëêì

andè��+�$�� û � éëêì

arethe availableCPU supply in cycles/s,returnedby the CPUsupplypredictorsrunningon the local andremotemachinesrespectively.

è Û ��ñ û andè��+� é-,

arethe transmitandreceive bandwidthmeasuredby thenetwork predictor;è�� û(û is the in-

verseof thenetwork round-triptime, i.e. thenumberof round-tripspersecond.Resourcedemand( ) éëêì

, ) Û �%ñ û , etc.) is obtainedby invokingapplication-specificdemandpredictors.Here ) � û(û is thenumberof round-trips(i.e. serverRPCs)madeduringtheoperation.

5.5.4 Other predictors

Network

For network prediction,I usetheexistingOdyssey network monitoringinfrastructure[71].Odyssey passively monitorstraffic on eachconnection,andestimatesits bandwidthandround-triptime. Themonitormakestwo assumptions:Ô thebandwidthbottleneckis at thefirst hop,which is sharedby all connections:this

is a reasonableassumptionfor mobilehostswith awirelessfirst hop.Ô thebottlenecklink is symmetric,with equalthetransmitandreceivebandwidths.

Energy

Theenergy predictormonitorsbatterycharge levelsandpower state,andestimatesthere-mainingbatterylifetime. I usethebatterymonitoringinfrastructuredevelopedbyFlinn [32],which queriesthehardwareusingtheACPI [49] interface.

File cachepredictor

I usetheCodafile system[55] to storedatasharedby clientsandservers,andto managedataconsistency. The file cachepredictor[32] queriesCodato get the file cachesupply,i.e., thelist of valid cacheentries.

5.6 Demandmonitors

At theendof eachoperation,thesystemcollectsinformationabouttheoperation’sresourceconsumptionandperformance.This is doneby asetof demandmonitors,oneperresourceor performancemetric. We observed that supply predictorsanddemandmonitorsfor agivenresourceusethesamemechanisms:in my design,I combinetheminto asinglemod-ule. Eachresourceis handledby asinglecodecomponentthatis responsiblefor measuringbothsupplyanddemand.

Currently, my prototypemonitors


Ô thenumberof CPUcycles(bothlocalandremote)usedby theoperation,from /proc.Actually, /proc providesthe CPU usagein seconds;I scalethis by the known pro-cessorspeedin MHz (from /proc/cpuinfo) to obtainmillions-of-cycles.Ô the numberof network bytestransmittedand received, and the numberof RPC’s(roundtrips) made.Ô theoperation’sworkingsetsize:I describethis measurementin detailbelow.Ô theenergy consumption,if thehardwaresupportsACPI,or if we areusingthePow-erScopeenergy profiling infrastructure[31].Ô theCodafilesaccessedby theoperation,by queryingCoda.Ô theoperationlatency, by callinggettimeofday.

5.6.1 Measuring an operation’sworking setsize

How do we measurethe working set sizeof an operation? I assumethat the process’sresidentsetsizeis anupperboundonits workingsetsize:this is trueif thereis nomemorycontention,i.e., theprocess’s pagesarenever evictedduring theoperation.I approximatethe working setsizeby measuringthe residentsetsizeat the endof eachoperation.OnLinux, memoryallocationdoesnotalwayscreatenew residentpages,nordoesdeallocationalwaysreducetheresidentsetsize:theprocess’sheapmaycontainresidentbut unallocatedpages.This meansthememorydemandmeasurementis only valid for thefirst operationexecutedby theapplicationprocess:whenwe generatelogsof memorydemand,we mustrestarttheapplicationfor eachoperation.

Thus,memorydemandmeasurementsareonlyvalidundercontrolledexperimentalcon-ditions: in Chapter7, I will show how I generatelogsof memorydemandundersuchcon-ditions,andusethemto generatepredictorsthat canbeusedevenwhenthoseconditionsareno longertrue.

5.7 The logger

Theloggerassemblesall theinformationabouttheoperation— thefailurecodepassedinto endfidelity op, thetunableandnontunableparametervalues,theresourceconsumptionand latency — andwrites it out asa timestampedlog entry. I usea text format that ismeantto behuman-readable,andalsoeasilyparseableby scripts:a seriesof white-spaceseparatedtermsof the form ./1032lú�0 « 4 Ù65 í .$/70 4-8 Ù65 . A samplelog file entry is shown inFigure5.3.

5.8. REMOTE EXECUTION 53

...995296364.0889 80 [radiator:radio si ty ] opid=0 dataname="drago n_lig ht ed"polygons=10859 0. 000000 algorithm="prog re ssi ve " resolution=0.07 8310failed=0 energy:pid=167 47.0 00000 energy:start_t s=995296354.0 75474energy:stop_ts =995296364. 086669 cpu=2014.159510 child_cpu=2014 .15 9510memory=5622169 6. 000000 latency=9.93535 1 net:xmit=0.00000 0net:recv=0.000 000 net:roundtrips=0 .0 00000 remote_cpu=0.0 00000remote_child_c pu=0.00 0000 remote_coda:fi les = predicted_cpu=1 782.0 02520predicted_memo ry =56652476.3 88394 predicted_late nc y=8.5 59642constraint_lat ency =10.0 00000 constraint_memor y=61245030. 400000...

Figure 5.3: A log entry for the “radiator” program

5.8 Remoteexecution

Remoteserversareaccessedthroughthe Spectra[33] remoteexecutionengine,which isintegratedinto themulti-fidelity runtime. Theseserversrun application-specificservices:Spectrakeepstrackof availableserversandtheservicesprovidedby them.It tracksband-width andround-triptime to eachserver, andcommunicateswith CPU andmemorypre-dictorsrunningon theserver to obtainestimatesof CPUandmemorysupply. SpectraalsoprovidesanRPC-like interfacefor applicationsto communicatewith theservers. It tracksthetraffic overeachRPCconnection,andreportsit to thenetwork demandmonitor.

Whenafidelity decisionis to bemade,thesolverobtainsalist of candidateserversfromSpectra,aswell asthecorrespondingresourcesupplyvalues.It choosestheserverthatwillmaximizeoperationutility, andusesthatserver for all remoteexecutionsrequestedby thatoperation.Thus,serverdiscoveryandselectionaremanagedby theruntimesystemandarecompletelytransparentto theapplication.

5.9 Overheads

In order to measurethe runtimeoverheadof the variouscomponents,I createda “null”operation:onethatusesthemulti-fidelity API but doesnocomputation.For thenull opera-tion, overheadis just theelapsedtime from theinvocationof begin fidelity op to thereturnfrom endfidelity op. I computedtheper-componentcostby selectively enablingcompo-nentsandcomparingthe resultingoverheadto thebaseline.Sincea singlenull operationtakeslessthan1ms, I ran 1000consecutive operationsduring eachtrial, anddivided by1000to gettheper-operationelapsedtime. Eachoverheadnumberis themeanof 100suchtrials. I alsoreport the standarddeviations,which werecomputedsimilarly: I computedthestandarddeviation over the100trials, eachof 1000operations,anddividedby 9 Ò�Ð�Ð�Ðto gettheper-operationstandarddeviation.


Component(s) OverheadTotal Component

(Baseline) 0.72ms (0.36ms)Logging 0.87ms (0.29ms) 0.15msLogging+ Synchronousflushto kernel 0.92ms (0.40ms) 0.05msLocalCPUmonitor 2.10ms (0.84ms) 1.38msLocalmemorymonitor 8.82ms (0.53ms) 6.72msLoc. CPUmon.+ Solver 12.66ms (1.50ms) 10.56msLoc. CPUmon.+ Solv. + Genericlatencypred. 20.66ms (0.87ms) 8.01ms

Thetableshows microbenchmark-basedoverheadsfor variousruntimecomponent.Thefirst col-umnshows whatcomponentswereactive in additionto thebaselineoverheadof two calls to theAPI. The secondcolumnshows the meanper-operationoverhead(standarddeviationsin paren-theses)computedover100trials of 1000operationseach.Thethird columnshows theadditionaloverheadimposedby thehighlightedcomponentalone.

Figure 5.4: Per-operation overheadof runtime system

Figure5.4 shows the componentoverheadsmeasuredon an IBM ThinkPad 560 witha 233MHz Mobile PentiumMMX processoron a Linux 2.4.2kernel. Thebaselineover-head— makingtwo callsto theruntimesystem— is primarily dueto thecommunicationoverheadof theOdyssey infrastructure:eachAPI call is convertedto a systemcall, whichentersthe kernelandis thenredirectedto a user-level process;the returnpathalsogoesthroughthe kernel. Thereis a small cost for logging eachoperationto a disk file, andasmalladditionalcostfor synchronouslyflushingthelog from userspaceto thekernelaftereachoperation.Thisensuresthatall completedoperationsareloggedevenif theuser-levelservercrashes.Notethatthis doesnot protectagainstOScrashes,sincetheOSkernelstillbuffersdisk writes: synchronouswriting to disk on eachoperationwould imposeanevenlargeroverhead.

TheCPUandmemorymonitorsmustopenandreadfiles in /proc at thebeginningandendof eachoperation,in orderto computetheresourcedemandat theendof theoperation.Thesefile operationsarethemainsourceof overheadin theresourcemonitors.

To measuretheoverheadaddedby thesolver, I addeda singlecontinuous-valuedtun-ableparameterto the null operation:the applicationsstudiedin this dissertationhave atmostonecontinuous-valuedtunableparameter. I alsoaddedapplication-specificCPUandlatency predictors.Sincethetunableparameteris real-valued,themeasuredoverheadis re-ally that of the iterative gradient-descentalgorithm. Adding discreteparameterswouldmultiply this overheadby the numberof discretevalue combinations,since the solverwould checkeachof them separately. Finally, the genericpredictorcheckslocal CPU,network, andremoteCPUavailability: thusit hasanadditionaloverheadcomparedto theapplication-specificpredictor, which “knows”, for example,that the operationonly usesthelocalCPUresource.

5.10. SUMMARY 55

In theworstcase,thesystemcouldhave anoverheadof about27ms: addingtheover-headof the last experiment(local CPU monitor, solver, andgenericlatency predictor)tothat of the local memorymonitor. Although a total overheadin the tensof millisecondsseemshigh, I foundit acceptablefor theapplicationsstudiedin this dissertation.All theseapplicationshaveoperationlatencieson theorderof 1s or more,andthelatency gainsdueto multi-fidelity adaptationoutweighthecostsby far. Therearetwo mainwaysto reducetheoverheadfurther:Ô moreefficient implementationsof theapplication-systeminterfaceandthesolver.Ô custominterfacesfor efficiently readingkernel resourcestatisticswhenmeasuring

operationresourcedemandfor the logs. My gray-boxapproachrestrictsthesystemto thestandardsysteminterface(/proc): I have madea consciousdecisionto acceptthis limitation ratherthansacrificedeployability.

5.10 Summary

In thischapterI describedthedesignandimplementationof aruntimesystemto supportthemulti-fidelity API. I usea minimalist, gray-boxapproachthat avoids kernelmodificationbut sometimesrelieson knowledgeof kernelinternals.I presentedthetop-level designaswell andthendescribedtheprincipalsystemcomponents:thesolver, theutility functions,and predictorsfor resourcesupply, resourcedemand,and performance. I describedindetailtheCPU,memory, andlatency monitors,whichwerebuilt aspartof this thesiswork;we briefly describedthe remoteexecutionengineandthe network, energy andfile cachemonitors,which werebuilt by otherresearchers.Finally, I showed that the total runtimeoverheadper operationcanrangefrom under1ms to almost30ms, dependingon whichcomponentsareused.Theprimarysourceof overheadis readingkernelstatisticsfrom the/proc file system.

Chapter 6

Machine Learning for DemandPrediction

Thouart besidethyself;muchlearningdothmake theemad.( ACTS 26:24)

Resourcepredictionis crucialfor adaptation.In Chapter5, I describedhow theruntimesystempredictsresourcesupply. HereI addressthe complementaryproblem: predictingan operation’s resourcedemandasa function of its tunableand nontunableparameters.Resourcedemandpredictorsareapplication-specific:hereI describethebasictechniquesthatweuseto constructthem.Chapter7 describestheapplication-specificresourcedemandpredictorsthatwebuilt usingthesemethods.

To build a resourcedemandpredictor, I observe resourcedemandat differentpointsintheparameterspace;I appendthis informationto ahistorylog; andI fit thelog datato someparametricmodelusingmachinelearning. The aim hereis not to develop sophisticatedmachinelearningalgorithms,but to show thefeasibility of history-basedprediction.I usesimple learningtechniques,refining them only when necessary. This chapterdescribesthe learningtechniquesusedin this dissertation:least-squaresestimation, online update,data-specificlearning, andbinning.

I startwith anoutlineof thestepsinvolvedin constructingaresourcedemandpredictorfrom anapplicationlog. I thendescribeeachlearningtechniquein detail.Finally, I describehow I evaluatepredictoraccuracy, by measuringcommon-caseerror andbad predictionfrequency.

6.1 How to build a predictor

Usingmachinelearningto build predictorsconsistsof four steps:

57

58 CHAPTER6. MACHINE LEARNING FORDEMAND PREDICTION

1. Selectthe input and output features. The output featureis the value we wish topredict. The input featuresarethosetunableandnontunableparametersthatmightaffecttheoutput.E.g. if wewishto predicttheCPUdemandof arenderingoperation,we would expectthat it will dependon the numberof polygonsto be rendered.Iftheapplicationcanscalethenumberof polygonsbeforerendering,thenthescalingfactorwill alsohave animpacton theCPUdemand.In this case,thepolygoncountandthescalingfactoraretheinputs;CPUdemandis theoutput.

2. Selecta parametrizedmodelthat representstherelationshipbetweenthe inputsandthe output. Clearly, a thoroughunderstandingof the computationbeingmeasuredandthemulti-fidelity algorithmsin it will greatlyaid theselectionof a goodmodel.However, I have found that it is possibleto build goodpredictorsevenwith a smallamountof domainknowledgeandsimplelearningmodels.

3. Extractthe input andoutputfeaturesfrom theapplicationlog, andfind theoptimalmodelparametervalues:thosevaluesthatminimizetheoverallpredictionerror.

4. If required,addanupdatemethod:away to recomputetheoptimalparametervalueswhennew measurementsareobtained.This allows thepredictorto trackchangesinenvironmentalconditionsor userbehaviour over time.

6.2 Least-squaresregression

In all theapplicationsI studied,onebasictechnique— least-squaresestimation[42,98] —provedto beof greatvalue.Often,theoutputis somelinearfunctionof theinput features;in othercases,we cantransformthe input featuresin someway, andthenapply a linearfunction.

Theleast-squaresmethodfits datato a linearmodelof theform

: í<;>= Ø ;Jô+?¿ô Ø ;&@A?B@ ØDCECEC ;F"G?H"Æí IÕ ýKJ ILwhere

IÕ ýis thetransposeof theinput featurevector:

IÕ ý í Ï(Ò�Ñ ?¿ô Ñ ?B@ ÑECECEC6Ñ ?H" Óand

ILis thecoefficient vector

IL�í Ï ;>= Ñ ;�ô Ñ�CECEC6Ñ ;&" Ó ýGivenasetof datapoints

. : ô Ñ IÕ ô 5oÑM. : @ Ñ IÕ @ 5oÑECEC�C?Ñ6. :ON Ñ IÕ N 5wefind thevalueof

ILthatminimizesthesum-of-squareserror

P í NQ ñSR�ô ÷ : ñ à IL J IÕ ñ�ø @

6.2. LEAST-SQUARESREGRESSION 59

0

500

1000

1500

2000

2500

0 0.02 0.04 0.06 0.08 0.1

Dcp

u (m

illio

ns o

f cyc

les)

T

r (resolution)

Figure6.1: Least-squareslinear fit for CPU demandin Radiator

by solving UV J IL²íWIXwherethematrix

UVandthevector

IXaregivenbyUV í NQñSR�ô IÕ ñYIÕ ýñIX í NQñSR�ô : ñ IÕ ñ

(EachIÕ ýñ

is thetransposeof thecorrespondingIÕ ñ

.)

My codebaseincludesscriptsthat,givenanapplicationlog andthenamesof theinputandoutputfeatures,will computetheregressioncoefficients.

For example,the “progressive radiosity” operationin the RadiatorapplicationhasaCPUdemandthatcanbemodelledas

) éëêì�í<;>= Ø ;Jô æÃØ ;&@ æH2Here æ is a nontunableparameter(numberof input polygons)and 2 is a tunableparameter(scalingfactoror resolution).To build aCPUdemandpredictorfor Radiosity, I transformedtheinput featuresetfrom Ï æ¿ÑA29Ó to Ï æ Ñ æH29Ó , andthenusedleast-squaresestimationto find theoptimalvalues

;&= Ñ ;Jô Ñ ;&@ .If we want to build a predictorfor a specificdataobject, then æ is fixed, andwe can

combinetheterms;>= Ø ;�ô æ into a singleconstant.In this casewe canusea simplemodel

with only 2 coefficients: ) éëêì�í�;>= Ø ;Jô 2Figure6.1showsgraphicallyhow I fit thelattermodelto theCPUdemandof Radiator, foraspecificscene(“dragon”)containing100Kpolygons.Thepointsrepresentmeasurementsof ) éëêì

and 2 obtainedby runningtheprogramwith differentvaluesof 2 . Theline repre-sentstheleast-squaresregressionfit to thesepoints,i.e. it is theline : íZ;&= Ø ;�ô�?

. (In thiscase,

;&=Áí<[ Ò�Ò\C^]3_ and;�ôtíZ` Ð ` ]\]%C|Ð\a ).


6.3 Online updates

Someapplicationruntimeparametersare invisible: i.e. the predictoris unawareof theseparameters.Considera userbrowsinga virtual art gallery, who movesfrom paintingsbyRenoirto sketchesby Picasso.TheJPEG-compressibilityof theimageschanges,andwithit thenetwork andenergy demandof fetchingthem;but theapplicationandsystemareun-awareof thesechanges.In othercases,thepredictoris awareof someruntimeparameters,but doesnotknow how they impactresourcedemand.E.g.,in arenderingapplication,CPUdemandvarieswith thecamerapositionandorientation,but we do not know therelation-shipbetweencameraparametersandCPUdemand:thecameraparametersareasgoodasinvisible,sincewedonot know how to makeuseof them.

Whenthereareinvisible parameters,thepredictormustusean incompletemodel:onethat mapsvisible parametersto resourcedemand,but ignoresinvisible ones. When thevalueof aninvisibleparameterchanges,sodoestheresourcedemand.Fromthepredictor’spoint of view, it appearsthatthemappingfrom visible parametersto resourcedemandhaschanged:thepredictormustupdateitself in orderto reflectthenew mapping.How do weperformthis updatewithout knowledgeof theinvisibleparameters?

In mostcases,we canmake useof a commonpropertyof dynamicsystems:temporallocality. We canassumethat the invisible parameterschangeonly by small amounts,orwith low probability, overshortintervalsof time. Thismeansthatthemappingfrom visibleparametersto resourcedemandalsochangesgradually. Recentmeasurementsof visibleparametersand resourcedemandwill reflect the currentmappingmore accuratelythanolderdata.

To make useof temporallocality, we needpredictorsthatgraduallyforget thepastasthey acquirenew data: i.e., we needonline updatemethodsthat give greaterweight torecentdata. Suchmethodsallow us to track changesin applicationbehaviour over time,withoutknowing theunderlyingcausesfor thesechanges.

6.3.1 RecursiveLeastSquares

For linear predictorsgeneratedby least-squaresestimation,thereis a well-known onlineupdatemethod:exponentiallyweightedrecursiveleast-squaresestimation[98, pp. 60–65],which we will refer to asRLS. RLS is a simplemodificationto least-squaresestimationthat givesgreaterweight to recentdata. It is iterative: the estimationcoefficientscanberecomputedcheaplywheneverweacquireanew datapoint.

RLS usestheweightedsum-of-squareserrormetric

P í NQñSR�ô ÷ : ñ à IL J IÕ ñ�ø @ õ N Ü ñwhere

õ( Ðcb õ b Ò ) is theexponentialdecayconstant.Intuitively, eachnew datapoint

6.3. ONLINE UPDATES 61

050

100150200250300350400450500

0 20 40 60 80 100 120 140 160

Dcp

u (m

illio

ns o

f cyc

les)

T

p (thousands of polygons rendered)

Position 1Position 2

Figure 6.2: Changein linear fit with cameraposition in GLVU

causesthe importanceof every previousdatapoint to decayby a factorõ

. Smallervaluesof

õcausethepastto beforgottenmorerapidly.

To findIL, wesolve UV J IL²í IX

where UV í �Q ñSR�ô IÕ ñdIÕ ñ ý õ N Ü ñIX í �Q ñSR�ô : ñeIÕ ñ�õ N Ü ñ

Wheneverwe receiveanew datapoint . :Ef ó�ô Ñ IÕ N ó�ô 5 , weupdateUV � õ UV Ø IÕ N ó�ô IÕ ýN ó�ôIX � õgIX Ø :hN ó�ôiIÕ N ó�ô

andrecomputeIL

beforewemakeanotherprediction.

RLS requiresus to maintainstateof size j ÷ ù @ ø , and the complexity of solving thematrix equationon eachupdateis j ÷ ù�kø , where

ùis thenumberof coefficients.However,

the costsof statemaintenanceand incrementalupdateare independentof the numberofdata points: thus,RLS is extremelyefficient whenwe have a linear modelwith a smallnumberof coefficients.

For example,theCPUdemandof renderingascenein thevirtual walkthroughapplica-tion GLVU is linearin thenumberof polygonsused.Wecanuse

) éëêì�íl;>= Ø ;�ô æwhere æ is the numberof polygonsthat we scalethe sceneto. However, the value ofIL í Ï ;>= Ñ ;Jô Ó ý also dependson the camerapositionand orientation. I.e., as the camera


moves,therelationshipbetweenCPUdemandandpolygoncountalsochanges:whenwearelooking at a complex portionof thescene,) éëê5ì

will behigher(for thesamevalueofæ ) thanwhenwe arelooking at a portionof lower complexity. Figure6.2 shows theCPUdemandof GLVU asa functionof polygoncount,at two differentcamerapositions.Weseethateachpositionhasadifferentbestfit line, i.e.,differentcoefficientvalues

;>=, and

;�ô. As

thecameramovesfrom position1 to position2, RLS will move thebestfit line from thefirst line to thesecond.

A hybrid approach

An online methodsuchasRLS customizesthe predictive model to the recentbehaviourof the application. Sometimes,the recentpastmight be misleadingandcausepredictionanomalies.E.g.,in theshortterm,evenif fidelity is decreasing,resourcedemandmight beincreasingfor otherreasons.RLS cannotdistinguishsuchtemporal variationin resourcedemand(e.g., becauseof changingcameraposition) from parametric variation (due tochangingfidelity). Thus, it will concludethat resourcedemandis a decreasingfunctionof fidelity; thesolverwill thenattemptto maximizeutility by picking thehighestpossiblefidelity, leadingto a large latency for that operation. Theseanomaliesare rare and areimmediatelycorrectedby RLS; however, they do causeannoying performanceglitcheswhenthey occur.

A staticpredictor, sinceit incorporatesa wide rangeof input data,workloadscenarios,andfidelities,is robustagainstsuchanomalies,but hasbadaccuracy in thecommoncase:it doesnotcustomizeits behaviour to thecurrentsituation.I combinetheadvantagesof thestaticanddynamicmethodsby usingahybridRLS approach.I useamodifiedupdate:

UV � õ UV Ø�m UV = Ø IÕ N ó�ôiIÕ ýN ó�ôõ Ø�m ØZÒIX � õ IX Ø�m IX = Ø :hN ó�ô IÕ N ó�ôõ Ø�m Ø[Ò

HereUV =

andIX =

are the valuesfor the offline least-squaresfit: we periodically “re-inject” theminto theonlinepredictor, to exponentialdecayfrom forgettingtheoffline his-tory altogether. In otherwords,wenow haveahistory-basedpredictorwhereÔ themostrecentobservationis givena weightof 1.Ô recent(exponentially-decayed)historyis givenacollectiveweightof

õ.Ô offline measurementsaregivenacollectiveweightof m .

Sucha predictoravoids theunstableworst-casebehaviour without reducingagility in thecommoncase.I believe thatmerelyincreasingthevalueof

õwill not provide this benefit,

but only increasestabilityatthecostof agility. However, I havenotdoneadetailedanalysisto validatethis claim.

6.3. ONLINE UPDATES 63

In practice, m shouldbe small: the offline datashouldprovide stability againstrareanomalies,but not interferewith predictionaccuracy in thecommoncase.For thevirtualwalkthroughapplication,I use

õ�í Ð%C'n and m í Ð7CäÐ3n . I have foundthatthesevaluesworkwell in practice,althoughI havenotdonea sensitivity analysison theseparameters.

6.3.2 Gradient Descent

Incrementalgradientdescent[63, pp. 89–94]is anonlineupdatealgorithmfor linearmod-els. It is a cheaperbut lessrobustway to implementthesamefunctionalityasRLS.Givena linearmodel : í IÕ ýKJ ILweupdate

ILfor anew datapoint . :hN ó�ô Ñ IÕ N ó�ô 5 with

ILo� IL Øqp J ÷ :hN à IL J IÕ ýN ó�ô ø�IÕ ýN ó�ôIntuitively, wemove

ILin thedirectionof steepestgradientwrt : , by anamountpropor-

tional to theerrorobservedon the latestdatapoint. I.e., we want to reducetheerrorwiththe leastpossibleperturbationof

IL. The learningrate p determineshow aggressively we

compensatefor this error: highervaluesof p causeusto forgetthepastmorerapidly.

Althoughthisalgorithmis moreefficient— j»÷ ù øin bothtimeandspace— thanRLS,

it is notasrobust. Its accuracy dependscritically onall input features?Eñ

beingon thesamescale:if one

?Eñis muchlarger thantheothers,thecorresponding

;ñwill seea correspond-

ingly largecorrection.Theneteffect will beanoscillationin thevalueof this;5ñ

, andlittlechangein theothercomponentsof

IL.

For thecasestudiesin thisdissertation,I uselinearmodelswith asmallnumberof inputfeatures:I useRLSfor onlineupdatesbecausetheextraoverheadis negligible andjustifiedby theaddedrobustness.

6.3.3 Online learning asfeedback-control

Online learningforms part of a control-feedbackloop: it modifiesthe predictors,whichinfluencethe adaptive behaviour of the system,which in turn determineswhat new datapointswill be fed to the online learningalgorithm. Thus,the efficacy of online learningwill dependon the natureand the rateof changesin applicationanduserbehaviour. Itis properly viewed as a fall-back technique,for situationswhereoffline learningis notadequate.


05

101520253035404550

0 10 20 30 40 50 60

Dcp

u (b

illio

ns o

f cyc

les)

T

rp (thousands of polygons)

Best fit (Dragon)Best fit (Whale)Best fit (generic)

Figure 6.3: Data-specificCPU demandin Radiator

6.4 Data-specificprediction

Theresourcedemandof acomputationdependsnotonly onits runtimeparameters,but alsoon its input data.Somefeaturesof theinput dataareeasilyextracted,andcanbeincludedasnontunableparametersto theoperation.E.g.,inputdatasizeis oftenanimportantfactorin determiningresourceconsumption.However, in the caseof complex operations,suchasthe3-D graphicscomputationsconsideredin this thesis,thereoftenaredata-dependenteffectstoocomplex for usto expressor evento understand.

In someapplications,the sameoperationmight be executedmore than onceon thesameinput data,with differentruntimeparametervalues.Our architectmight first requesta quick-and-dirtyrenderingof a scene,andthenwish to view the samesceneat a higherfidelity. In suchcases,we would like our resourcepredictorsto make useof previouslyacquired,data-specificknowledge. In orderto do this, I requireapplicationsto provide auniquelabel for its input data: this could be a filename,a hashof the data,or any otheruniquelyidentifying string. (This label is passedto begin fidelity op asthedatanamepa-rameter.)

We cannow make data-specificpredictionsby maintaininga cacheof previously seenlabels:we have a genericpredictorcomputedover all log entries,andalsoa data-specificpredictorfor eachlabel seenso far. Whenwe seea new label,we createa new predictorfor it, andinitialize it with a copy of thegenericpredictor. Subsequently, only log entrieswith thatparticularlabelwill beusedto updatethedata-specificpredictor.

Of theapplicationsstudiedin thisdissertation,data-specificpredictionis of greatvaluein thetwo 3-D graphicsapplications:GLVU andRadiator. Theresourcedemandof bothofthesedependsnot only on parameterssuchaspolygoncount,but alsoon thespecific3-Dscenebeingoperatedon. Often we will performseveral operationson a singlescene:inthis case,wewill wantto tuneourpredictorfor thatparticularscene.

For example,theCPUdemandof radiositydependson theresolution2 (a tunablepa-rameter)andon theoriginal polygoncount æ (a nontunableparameter).In fact,CPUde-

6.4. DATA-SPECIFICPREDICTION 65

mand ) éëêìis a linear functionof thescaledpolygoncount 25æ . However, differentscenes

have differentlinearrelationshipsbetween) é~êìand 26æ : i.e. therearedata-specificeffects

thatarenotcapturedby thepolygoncountæ . Figure6.3showsCPUdemandfor the“hier-archicalradiosity” operationasa functionof thescaledpolygoncount 25æ for two differentscenes.We seethatdata-specificlinearpredictorsaremuchmoreaccuratethana genericlinearpredictorthatattemptsto fit datafrom bothscenes.

Similarly, whenusingJPEG-compressionon web images,we could usedata-specificpredictorsfor thecompressibilityof eachindividual image.Thesecouldbecomputedof-fline: e.g.,by compressingtheimageatdifferentquality levels,measuringthecompressionratio in eachcase,applyinga least-squares-basedmodel,andstoringthe least-squaresco-efficientsalongwith theimage.

In speechrecognitionalso,resourcedemanddependsontheinputdata,i.e. theutterancebeingrecognized.However, we areextremelyunlikely to seeexactly the sameutterance(i.e. sampledsoundwaveform)ever again.Thusdata-specificpredictionis of little useinthis application.

6.4.1 Binning

Whenwe have discreteparameterswith a small numberof valuecombinations,we canbuild a predictorthat is essentiallya lookup table. Eachvalueor combinationof valuesmapsto an entry in the table. Eachentry would containa predictorthat factoredin theremaininginput features:thosethatarecontinuous-valued,or arediscretebut with a largenumberof possiblevalues.

E.g.,thespeechrecognitionapplicationhastwo discrete-valuedfeatures.“Vocabularysize” canbe small or large and“location” canbe local, remote, or hybrid. We alsohaveonecontinuous-valuedfeature,“utterancelength”, which rangesfrom Ð to á . My localCPU, remoteCPU,andnetwork predictorsfor this applicationareall lookup tableswith6 entrieseach:onefor eachcombinationof vocabulary sizeandlocation. Eachentry is alinearpredictorwhosesingleinput is utterancelength.

Building suchpredictorsis straightforward: we collect logs that cover all the valuecombinations,whichwecall “bins”. Wethenseparateout thelog entriesbelongingto eachbin, andapply othertechniquessuchasleast-squaresestimationto createa predictorforthatbin.

We call this technique“binning”; it is extremelysimple, andperhapsdoesnot evenqualify as“machinelearning”. However, it hasgreatpracticalvalue: many applicationshave a small numberof discrete-valuedfeatureswith a small numberof possiblevalues.We canview data-specificpredictionasa specialcaseof binning,wherewe do not knowall thebinsbeforehand,but createthemdynamicallyasandwhenweseenew datalabels.


Full vocabulary ReducedvocabularyLocal r local:map.dict,local:map.lms r local:smallmap.dict,local:smallmap.lmsHybrid r remote:map.dict,remote:map.lms r remote:smallmap.dict,remote:smallmap.lmsRemote r remote:map.dict,remote:map.lms r remote:smallmap.dict,remote:smallmap.lms

(a) File accesspredictions

map.dict 55KBmap.lm 277KBsmallmap.dict 27KBsmallmap.lm 224KB

(b) File sizes

For eachcombinationof vocabulary sizeandlocation,the tableshows the local andremotefilesaccessedby Janus.The lower tableshows thesizeof eachfile (the local andremoteversionsofeachfile areidentical).

Figure6.4: Binning for file accessprediction in Janus

6.4.2 File accessprediction

I usea simple extensionof the binning methodto predictfile cache demand: the Codafiles thatwill beaccessedby anoperation.Thefile cachedemandpredictor, built by JasonFlinn [32], runsboth on clientsandon computeservers. It usesbinning on all discreteparametersandignorescontinuousparameters:typically, it is only discreteparametersthathave an effect on which files are accessed,sincethey changethe executionpath of theapplication.

Within eachbin, the predictormaintainsa list of files, with an associatedprobabilityof access.Theseprobabilitiesareupdatedaftereachoperationusinganexponentialdecayscheme:if the file wasaccessedby the last operation,we have a samplevalueof 1, oth-erwise0. Given the accessprobability for eachfile andthe file cachesupply(i.e., whichfiles arecached),thepredictorcomputesfind theprobabilityof a cachemisson eachfile;by combiningthis with the last-known file sizes,it estimatesthecachemisscostin termsof bytesfetchedfrom thefile server(s). It thenqueriesCodato find thebandwidthto thefile server(s),andcomputestheexpectedlatency costof thecachemisses.

E.g., the Janusspeechpredictorhastwo discreteparameters:vocabulary size(eitherfull or reduced)and location (eitherlocal, hybrid, or remote).Eachcombinationof theseparameterscausesJanusto readadifferentdictionaryandlanguagemodelfrom Coda.Thevocabulary size and location determineexactly which files are accessed;thesefiles areread-only, andhencetheir sizesarealsoknown exactly. Theseobservationsgave rise to

6.5. EVALUATING PREDICTIONACCURACY 67

a very simplebinningpredictor, shown in Figure6.4. Whenthevocabulary sizeis “full”,Janusaccessesthelargedictionaryandlanguagemodel(map.dict andmap.lm); whenit is“reduced”,it accessesthesmallerversions(maps.dictandmaps.lm). Whenthelocationis“local”, thesefiles areaccessedfrom Codaat thelocal client. For “hybrid” and“remote”,the computationis shippedto a remotecomputeserver, andso the files areaccessedattheserver. Sincethevocabulary sizeandlocationexactly determinethefiles accessed,theprobabilityof accessis alwayseither1 or 0. In general,however, file accessesmaynot beexactly predictablegiven the parametervalues. In thesecases,for eachbin, we estimatetheprobabilitythateachfile will beaccessed,basedon log entriesbelongingto thatbin.

6.5 Evaluating prediction accuracy

The error madeby a predictoris the differencebetweenthe prediction æ and the actualobserved value

?. When æ and

?arenumericalvalues(e.g. CPU, memory, or network

demand),we canrepresenterrorby thenumericaldifference t æ à ? t . Typically, however,weareinterestednot in theabsoluteerror, but in therelativeerror

u í t æ.à ? t?This is theerrormetricthatI usefor real-valuedpredictorsthroughoutthisdissertation.

Clearly, the error on a single predictionis not a good indicator of overall predictoraccuracy. Wemustmeasuretheerroroverasetof samplesthatcoversthespaceof predictorinputs.We alreadyhave sucha sampleset: thelog datausedto derive thepredictorin thefirst place. Conventionally, a predictive model is evaluatedon a testsetwhich is distinctfrom the training setusedto generatethemodel: the two setsarechosenrandomlyfromthetotal setof observations.This would helpusto detectcaseswherethemodelperformswell onobserveddata(thetrainingset)but badlyon futuredata(thetestset).

In thecaseof linearregression,thereis no needfor suchseparation,providedwe haveenoughdata. I.e., a linearmodelcomputedfrom a (large) randomlychosensubsetof thedatawill look very similar to thatcomputedover all thedata,andthuswill have thesamepredictionaccuracy for futuredatapoints.In this thesis,I uselinearpredictorsof very lowcomplexity (2 or 3 parameters)with datasetswith 50 or morepoints,and,for simplicity, Iusetheentiredatasetbothto deriveandto evaluatethepredictors.

Giventherelativeerrorfor eachsample,wecancomputetwo aggregatemetrics:Ô the badpredictionfrequency( v @+= ): the percentageof samplesthathada predictionerrorabovesomethreshold.I usea thresholdof 20%.Ô the common-caseerror (

Piw =): an error boundsatisfiedby the majority (90%) of

predictions. I.e., the common-caseerror is the 90-th percentilevalueof the errordistribution.


0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liersx

Error threshold

f20=8%

E90=19%

Figure6.5: Exampleof outlier distrib ution plot

Often,however, I will simply show theentiredistribution of errorswith anoutlier dis-tribution plot. This plot haserrortoleranceon the

?-axis,andthefrequency of outliers—

predictionerrorsthatexceedthetolerance— on the : -axis. In otherwords,it is thecumu-lative frequency distribution of (relative) predictionerrors. Figure6.5 shows an exampleof sucha plot. On sucha plot, v @+= is the : correspondingto an

?of 20%;

P�w =is the

?correspondingto a : of 10%. In this example, v @+=.í a3y and

P�w =�í ÒM]3y , which meansthat:Ô 92%of thetime,wewill seelessthan20%errorÔ 90%of thetime,wewill seelessthan19%error

6.6 Summary

This chapterdescribedthe techniquesI useto build andevaluateapplicationresourcede-mandpredictors.We use least-squaresregression, and its online variant, recursive least-squares regression, to model the dependenceof resourcedemandon continuous,real-valuedparameters.I usebinning to model the effect of discrete-valuedparameters,ef-fectively by building a separatepredictorfor eachvaluecombination.Similarly, I handledependenceon input datawith data-specificpredictors. Thechapterconcludeswith a dis-cussionon evaluatingpredictoraccuracy. In this dissertation,I usetheoutlier distribution

6.6. SUMMARY 69

plot to representpredictoraccuracy. I alsousetwo valuesderivedfrom this plot: thebadpredictionfrequencyv @+= andthecommon-caseerror

Piw =.

Chapter 7

Applications

Il n’existepasdesciencesappliquees,maisseulementdesapplicationsdela science.(LouisPasteur)

In orderto validatethemulti-fidelity API andimplementation,I andotherresearchersporteda small, selectedsetof applicationsto the multi-fidelity API. I choseapplicationsthat:Ô arerealistic,andnot toy applications.Ô werewritten by otherpeople,to evaluatethedifficulty of portinglegacy codeto the

API.Ô areinteractive,andwouldbeusefulin amobile/wearablescenario.Ô areresource-intensive,requiringadaptationto runwith goodperformanceonmobilehardware.Ô permitadaptationof fidelity andperformance:this rulesout applicationsthatcannottoleratefidelity degradation(e.g. medicalimaging)or requireperformanceguaran-tees(hardreal-timeapplications).Ô arecompatiblewith, or easilyportedto, thebaseOS(Linux).

I chosefour applicationsfor my casestudies:ÔGLVU, a “virtual walkthrough”program.Ô Radiator, a radiosityprogramÔ Netscape, awebbrowserÔ Janus, aspeechrecognizer

Webbrowsingandspeechrecognitionarewell-knownapplications,andhavepreviouslybeenshown to benefitfrom adaptation[71, 31]. HereI show how thebenefitsof adaptationcanbeobtainedwith asmallamountof codemodification,andhow theeffectsof adaptationon resourcedemandcanbepredicted.

71

72 CHAPTER7. APPLICATIONS

GLVU and Radiatorare relatively novel to the mobile computingworld: they are 3-D graphicsapplications,chosenfor their relevanceto the augmentedreality scenarioofChapter1. Section7.2 explains briefly why theseapplicationsare important,and howmulti-resolutionmodelsprovide themwith anintrinsic notionof fidelity.

Theremainderof thischapteris organizedasfollows. Section7.1broadlydescribesthemethodologyandevaluationmetricsusedin my casestudies.Sections7.3–7.6aredetailedcasestudiesof thefour applications,in theordershown above.

7.1 Methodology

In eachcasestudy, I describetheapplicationandits coremulti-fidelity operation;theop-eration’s tunableandnontunableparameters;thework involvedin porting theapplicationto usethemulti-fidelity API; the resourcedemandpredictorsI constructedfor it; andtheaccuracy of thesepredictors.Theexperimentalplatformin all caseswasanIBM ThinkPad560 with a 233MHz Mobile PentiumMMX processorand 96MB of memory, runningLinux 2.4.Theremainderof this sectiondescribesthemetricsthatI usein eachcasestudyto quantifyporting costandpredictionaccuracy.

7.1.1 Porting cost

Oneof theprincipalaimsof themulti-fidelity API is to minimizethecostof portinglegacyapplicationsto it. In eachof thecasestudiespresentedhere,I evaluatedthis portingcostin termsof numberof modifiedsourcelinesandfiles. I derivedthis numberby runningthe“dif f ” programto compareeveryfile in themodifiedsourceto thecorrespondingfile in theoriginal source.I thenquantifiedtheresultsof this diff in termsof thenumberof linesandfiles modified,asdescribedbelow. I chosethe methodto err on the sideof caution: i.e.,alwaysto overestimateratherthanunderestimateline counts.

For eachdiffering segment,I countedthenumberof linesin theold andnew versions,andtook the larger of the two values. The sumof all thesenumbersis the porting cost.Whennew files wereadded,I addedtheir entireline countto theportingcost;whenfilesweredeleted,I ignoredthem.Similarly, modifiedandaddedfilescontributeto themodifiedfile count,but deletedfilesdonot.Insomecases,therewerechangesunrelatedto themulti-fidelity API: e.g.bugfixesandadditionof traceplaybackfor repeatableexperiments.Whenthediffs correspondingto thesechangeswereclearlyidentifiable,I excludedthemfrom theline count;otherwise,I includedthem.

Of course,“lines andfiles changed”is not theonly way to measurethecostof modi-fying code.However, it is theonly metricthatwaseasilyreliably measurablein eachcasestudy. Further, thesmallamountof modificationrequiredin eachcasestudyindicatesthatothermetricsof modificationcost(e.g.man-hours)wouldalsobesmall.

7.2. AUGMENTEDREALITY 73

7.1.2 Prediction accuracy

For eachapplication,I also measuredthe accuracy of history-baseddemandprediction.Recall from Chapters4 that theseare built by logging applicationresourcedemandforvariousinput dataandruntimeparametervalues. I appliedthe predictiontechniquesdis-cussedin Chapter6 — least-squaresregression,data-specificprediction,andbinning —to theselogs. I thencomparedthevaluesgeneratedby thepredictorsto theactualloggedvalues,andcomputedthepredictionerrors.I reportthebad-predictionfrequency ( zG{+| ) andthe common-caseerror ( }i~+| ); I alsoshow the datagraphicallyasan outlier distributionplot. Recall from Section6.5 that zG{+| is the percentageof predictionsthat exceedan er-ror thresholdof 20%; }��\� is thehighesterrorobservedon thebest90%of thepredictions(i.e.,the90thpercentileerror);andtheoutlierdistributionshows zG� (percentagepredictionsexceedinganerror � ) asa functionof � .

I evaluateddemandpredictorsfor thefollowing resources:

� CPU (both local andremote),measuredin millions of cycles. RecallthatCPUde-mandis actuallymeasuredin seconds,andthenscaledby the processorspeed(inMHz). In Section7.4.4we will seethatalthoughthis scalingdoesnot compensateperfectlyfor all thedifferencesbetweendifferentprocessors,it doessomuchbetterthana time-based(unscaled)measurementscheme.� Memory, measuredin MB� Energy, measuredin Joules� Network transmitandreceive,measuredin bytes

Thecorecomputationsin theseapplicationsareoftenlargeandcomplex. My objectivein building andevaluatingresourcedemandpredictorswasnot to understandor captureeverydetailof thesecomputations,but to extractasmall,simplesetof featuresthataccountfor mostof thevariationin resourcedemand.I believethatthesimplepredictorsdescribedhereshow theviability of resourcedemandprediction:if greateraccuracy is required,it canbeprovidedby moresophisticatedpredictorsbuilt by domainexpertsandmachinelearningresearchers.

7.2 AugmentedReality

Section1.1describedanarchitectusingaugmentedrealitysoftwareto aid in adesigntask.Onekey functionalityof suchsoftwareis theoverlayingof virtual 3-D objectsor scenesontherealworld. E.g.,in thearchitectscenario,we might want to virtually adda wall to theinterior of a room; this requiresthe softwareto overlay the relevantportion of the user’sfield of view with theimageof thewall.

A full-fledgedaugmentedreality applicationwouldbeextremelycomplex, andrequiresubstantialdomainexpertiseto understandandmodify. Further, suchapplicationsarehard


to find in source-available,Linux-compatibleform. I decidedinsteadto experimentwithtwo simplerbut representative3-D graphicsapplicationsthatimplementdifferentpiecesofaugmented-realityfunctionality. GLVU is avirtual walkthroughprogram:it allowsausertospecifya3-D scene,and“walk through”thescene:i.e.,renderit from any camerapositionandorientation.Radiator usesradiosity[18] to colourandshadea3-D sceneaccordingtothelight sourcesilluminating thatscene.Wecanthink of Radiatorasapre-processingstepfor GLVU: it producesa morerealisticlookingscenefor rendering.

7.2.1 Multir esolutionmodels

The input to both GLVU andRadiatoris a 3-D scenerepresentedasa polygonalsurfacemodel: a setof verticesandpolygons.A vertex is simply a point in 3-D space;a polygonis an orderedlist of vertices. The resourcedemandof many 3-D graphicsapplications,includingGLVU andRadiator, scaleswith thenumberof polygonsin theinput scene.

TheQSlim [40] programconvertsa3-D surfacemodelinto amultiresolutionmodel: anannotatedversionof the original that providessuccessive approximationsby eliminatingsomepolygonsandvertices. The degradedversionsaresimilar to the original, but withfewer polygonsand correspondingloss of detail. This gives us a naturalway to tradefidelity for resourceconsumption.Thefidelity metricis resolution: thenumberof polygonsretainedin thedegradedversion,expressedasafractionof theoriginalpolygoncount.Thisnumbervariesfrom � to � .

QSlim usesa quadric-basedsimplification algorithm [41] to annotatethe 3-D scenewith a list of edge contractions. Eachcontractioneliminatestwo adjacentverticesandreplacesthemwith a singlenew vertex. It alsoeliminatesany polygonsincidenton thecontractededge.By applyingor undoingthesecontractions,wecanscalethemodelto anydesiredresolution.Thecontractionsareorderedin increasingorderof error:wefirst dothecontractionswhichcausetheleastdeviation from theoriginal shape.

I useQSlim-generatedmultiresolutionmodelswith both GLVU and Radiator. ForGLVU, we canreducethe resolutionof the input scenebeforethemaincomputation:i.e.,resolutionis a datafidelity metric. In Radiator, on the otherhand,the bulk of the com-putationis doneon thedegradedversion,but thefinal stepis appliedto theoriginal, full-resolutionversion. I.e., both the input andoutputof Radiatorhave resolution1, but thedetail andaccuracy of the shadingandcolouring in the outputdependon the resolutionchosenfor thecomputation.In this case,resolutionis moreproperlyviewedasacomputa-tional fidelity metric.

7.3. GLVU 75

description glvu:drawlogfile /usr/odyssey/et c/ gl vu _dr aw.l ogmode normalhintfile /usr/odyssey/l ib /g lv u_d ra w. soutility glvu_draw_utili tyupdate glvu_draw_updateinit glvu_draw_init

param polygons ordered 0-infinityfidelity resolution ordered 0-1

hint latency glvu_draw_late nc y_h in thint cpu glvu_draw_lcpu _hin t

Figure7.1: Application configuration file for GLVU

7.3 GLVU

GLVU [75] is a toolkit thatallows usto virtually “walk through”any 3-D scene.Thecorecomputationperformedby GLVU is to rendera view of the scenebasedon the currentpositionandorientationof the camera,i.e. the user. Thus,“render” is the multi-fidelityoperationfor this application: we tradeits fidelity againstits resourceconsumptionandlatency. Renderingis almostentirely processor-boundin the absenceof specialized3-Dgraphicshardware, so I focus on the CPU resource. Figure 7.1 shows the ApplicationConfigurationFile for therenderoperation.It has1 tunableparameter— theresolution—and1 nontunableparameter— thescene’s originalpolygoncount.

In theremainderof this section,I describethemultiresolutionextensionsthat I addedto GLVU, andevaluatethecostof thismodificationaswell asthatof insertingmulti-fidelitycalls into GLVU. I thendescribetheconstructionof a CPUdemandpredictorfor the“ren-der” operation:creatinglogsof resourcedemand,creatinga predictive modelwith offlinelearning,andupdatingthemodelthroughonlinelearning.Finally, I evaluatethepredictor’saccuracy in variousscenarios.

7.3.1 Extending GLVU for multir esolutionrendering

I extendedGLVU’s file parserto readmultiresolutionmodels,andthe “scene”object tosupportdynamicscalingof polygoncount. Theverticesof themultiresolutionmodelarestoredin abinarytree(Figure7.2).Wheneveranedgecontractioncollapsestwo vertices�and � into anew vertex �<�l��A�7� , wemake � theparentof � and � . Eachleafof thetreecorrespondto a vertex of the original model;eachinterior vertex correspondsto an edge


v2

v3

v8

v v6

v7

v9v

4v

10

v11

v13

v12

5v

1

P1

P2

Thefigureshows how verticesof a multiresolutionmodelareorganizedinto a tree. Theoriginalverticesof the modelarethe leavesof the tree,i.e., ��&� . A simplified versionis givenby across-sectionof thetree,e.g.thevertices��-��$�F��$�&�&��&� . Thefigurealsoshows two of themodel’spolygons,�!� and �%� . In thesimplifiedversion,�!� becomesinvisible sincetwo of its vertices( �&�and �&� ) havebeencollapsed.� � is transformedfrom � �&�>��&�&�$� �� to � � �� +� .

Figure 7.2: Vertex tr eefor multir esolutionmodels

contraction.

The set of visible verticesat any given resolutionis given by a cross-sectionof thetree(the colouredverticesin Figure7.2). We apply a contractionby moving up the tree(colouring ��+� insteadof ��$| and � ã ) andmarkingasinvisible thepolygonsincidentonboththecontractedvertices:thereareat mosttwo suchpolygons.Reversingthesestepsundoesthecontraction;bothcontractionandits inverseoperationtake K��6� time.

In additionto addingthevertex-treedatastructureandavisible-polygonbitmap,I mod-ified thecorerenderingloop to skip over any polygonsmarkedinvisible. For eachvisiblepolygon,we find its verticesby startingfrom eachoriginal vertex andmoving up the treeuntil we find its unique“visible ancestor”. To avoid repeatedpointer-chasing,we cachepointersto thevisible ancestorof eachvertex. Thesepointersaresoft state:if we followoneanddonotfind avisible vertex, we recomputethepointer.

Thesedatastructuresandalgorithmsaredesignedfor efficient contractionanddecon-traction,especiallywhenthe target resolutionis closeto the currentresolution. The as-sumptionis thatsmallchangesin fidelity will befrequentandlargeroneslessso.

7.3. GLVU 77

Files Lines Purpose1 13 ApplicationConfigurationFile1 245 Hint module1 170 Callsto gluecode2 132 Gluecode

5 560 Totalmodificationsfor multifidelity144 26951 Original codebase

2 369 Multiresolutionextensions

Figure7.3: Modifications madeto GLVU

7.3.2 Porting GLVU to the multi-fidelity API

To make GLVU adaptive, I wroteanApplicationConfigurationFile (Figure7.1),aswellasahint modulecontainingtheCPUdemandpredictordescribedlaterin thissection.I alsomodifiedGLVU’s sourcecodeto invoke themulti-fidelity API. I did thisby writing agluecodelayerthatconvertsfrom a GLVU-specificinterfaceto thegenericmulti-fidelity API; Itheninsertedcalls to the application-specificinterfaceinto thesourcecode. Thepurposewasto minimizechangesto existingsourcefiles,at thecostof addingtwo additionalfiles.

Figure7.3showsthecostof thesemodifications:560linesin 5 files,about2.1%of thetotal codebase.I alsoshow thecostof themultiresolutionextensions:thesechangesareindependentof themulti-fidelity API but requiredfor adaptation.

7.3.3 Predicting GLVU’sCPU demand

My approachto demandpredictionis empirical(Chapters4 and6): I measuretheapplica-tion’sresourceconsumptionoveravarietyof tunableandnontunableparametervalues,andderive a modelthat mapsparametervaluesto resourceconsumption.I generatedlogs ofGLVU’s resourceconsumptionatdifferentresolutionsfor 4 differentscenes,rangingin sizefrom 127K to 236K polygons.Figure7.4shows thefour scenesandtheir polygoncounts.Figure7.5 shows thevisual differencebetweenfull resolutionand10%resolutionon the“Notre Dame”scene.

For GLVU’s CPUdemand,I usea linearpredictorof theform¡£¢S¤F¥ �<¦>|�§q¦E�-¨H©

where is theoriginalpolygoncount, © is theresolution,and!© is thepolygoncountaftermultiresolutionscaling.

To evaluatethis predictor, I collected100 log entriesperscene,eachwith a randomlychosenresolutionandcameraposition/orientation.Figure7.6(a)shows thegraphof CPU


(a)Taj Mahal (b) Cafe (interior)

(c) NotreDame (d) BuckinghamPalace(inte-rior)

Scene Numberof polygonsTaj Mahal 127406Cafe 138598NotreDame 160206BuckinghamPalace 235572

(e)Scenepolygoncounts

Figure7.4: 3-D scenesusedin GLVU

7.3. GLVU 79

(a)NotreDame,full resolution

(b) NotreDame,0.1resolution

Figure7.5: Effect of resolutionon output in GLVU


0

100

200

300

400

500

600

700

800

900

1000

0 50000 100000 150000 200000 250000

CP

U d

eman

d (m

illio

ns o

f cyc

les)

ª

Number of polygons rendered

Taj MahalCaféNotre DameBuckingham Palace

(a)Randomcameraposition

0

100

200

300

400

500

600

700

0 50000 100000 150000 200000 250000

CP

U d

eman

d (m

illio

ns o

f cyc

les)

ª

Number of polygons rendered

Taj MahalCaféNotre DameBuckingham Palace

(b) Fixedcameraposition

The « -axis is thenumberof polygonsrendered,i.e. ¬G where¬ is theoriginal modelsizeandthe is theresolution.The ® axisis theCPUdemandin millions of cycles.

Figure 7.6: GLVU’s CPU demand: random (top) and fixed (bottom) camerapositions

7.3. GLVU 81

demandagainstrenderedpolygoncountfor theseexperiments.We seea roughcorrelationbetweenrenderedpolygoncountandCPUconsumption,but with a largeamountof varia-tion. I repeatedtheexperiment,this time with thecamerafixedat anarbitrarypositionforeachscene(Figure7.6(b)).ThisgaveusastronglinearcorrelationbetweenpolygoncountandCPU demand,with someamountof noise. However, eachscenefollows a differentline, indicatingthatweneeddata-specificprediction.

I built both data-specificanddata-independent(generic)linear predictorsfrom theselogs. Eachdata-specificpredictorwasderivedthroughleast-squaresregressionon thelogentriesfrom a specificscene;the genericpredictorwasderived from the union of all thelogs.For eachscene,I evaluatedtheaccuracy of thedata-specificandthegenericpredictor:I thenaggregatedthe resultsfor thegenericpredictorto evaluateits accuracy acrossall 4scenes.

Figures7.7 and 7.8 show theseresults. With a fixed cameraposition, data-specificpredictiondoesvery well (the badpredictionfrequency is at most 5%), but the genericpredictorperformspoorly. This is unsurprising:cameraposition is scene-specific,andfixing thecamerapositiondoesnot reducethevariationacrossscenes.Whenthecamerapositionis random,bothschemesperformpoorly: thevariationin camerapositioninjectsa largeamountof noiseinto thedata.Theseresultsshow thatdata-specificpredictorsoffera clearbenefitover genericpredictors,but only if we cansomehow accountfor cameramovement.

Userpath traces

Theresultsjust presentedshow thatCPUdemandis affectedby cameramovement.How-ever, neitherthe“fix edcamera”nor the“randomcamera”scenariosis a realisticmodelofsuchmovement.Usersrarelyjumprandomlyfrom onepositionto another;andthey donotstayatafixedposition.Rather, they movealongsomepath,causingacontinuousvariationin camerapositionandorientationover time.

In order to createrepeatableyet realisticworkloads,I usedGLVU’s path record andplayback functionality: asa usernavigatesa scene,thesoftwarecansave thecameraposi-tion at eachstepto a disk file. Subsequently, we canreplaythis trace,creatingexactly thesameeffect asthe original user. The tracedoesnot recordtiming information,however,andthusdoesnotcapture“userthink time”.

I usedthesetracesasclosed-loopworkloads, whereeachoperationstartsassoonastheprecedingoneis completed.Suchworkloadsmodelanimpatientuserwho is alwaysaheadof thesoftware. I createdfour suchtracesof realusers:onefor eachof the3-D scenesinthe casestudy. I alsomodifiedGLVU to automaticallyreplaythesetracesin batchmode,without requiringuserinteraction. In the remainderof this dissertation,all experimentswith GLVU arebasedon thesetraces.


0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé

Notre DameBuckingham Palace

All models

(a)Randomcamera

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé


All models

(b) Fixedcamera

Thegraphsshow theoutlierdistribution for linearpredictorsof CPUdemandin GLVU, bothwhenthecamerais positionedrandomlyfor eachrender, andwhenit is fixed. I evaluatedadata-specificpredictorfor eachscene,aswell asa genericpredictoroverall four scenes.

Figure7.7: CPU demandprediction in GLVU: outlier distrib ution

7.3. GLVU 83

Scene Randomcamera FixedcamerazG{+| }i~+| zG{+| }i~+|Taj Mahal 27% 36% 5% 14%Cafe 46% 39% 3% 12%NotreDame 27% 31% 1% 9%BuckinghamPalace 22% 30% 5% 12%All scenes 45% 42% 42% 54%

Thetableshowsthebadpredictionfrequency andcommon-caseerrorfor linearpredictorsof CPUdemandin GLVU, both whenthe camerais positionedrandomlyfor eachrender, andwhenit isfixed. We show the accuracy of a data-specificpredictor for eachscene,as well as a genericpredictoroverall four scenes.

Figure 7.8: CPU demandprediction accuracyin GLVU

Online-update predictor

Given that camerapositionvariesover time, and that CPU demanddependson cameraposition,how dowebuild acamera-awarepredictor?It seemswemustlearnCPUdemandasa function of all 6 cameraparameters,in additionto polygoncount: 3 parametersforpositionand3 for orientation.Othercameraparameters— aspectratio, field-of-view, andnearandfar clip planedistances— typically do not changeover time. However, buildingsuchapredictoris difficult, andcertainlyimpossiblewith asimplelinearmodel.TheCPUdemandat any particularcamerapositionwill dependon thelocal propertiesof thescene,andthereis no singleglobalmodelthatcancapturetheCPUdemandat all positions.Wemight try to samplethe cameraparameterspace,and usea nearest neighbour[63, pp.231–236]approachto find the closestmatchingsampleto any givenpositionat runtime.Unfortunately, samplingat a fine granularityin an 6-dimensionalspacewould requireaprohibitively largenumberof samples.

Instead,I adopteda simplerapproach.I observedthat,in realuse,camerapositionhastemporal locality: i.e., it changesincrementallyover time. If CPUdemandhasspatial lo-cality — asmallchangein cameraparameterscausingonly asmallchangein CPUdemand— thenit shouldalsohave temporal locality — i.e.,varysmoothlyovertime. Thissuggestsanonline-updatepredictor— onethattracksrecenthistory, andgraduallyforgetsthemoreremotepast.

I implementeda data-specific,online-updatepredictionschemefor GLVU. At applica-tion startup,thedemandpredictorusesagenericlinearestimatorderivedfrom the“randomcamera”data. Whenever a new sceneis loaded,it createsa data-specificpredictorwhichis initially identicalto thegenericpredictor. With eachsubsequentoperationon thescene,thedata-specificpredictorupdatesitself usinga hybrid RLS predictor(Section6.3.1)with° �±�%²'³ and ´q�±�%²^�3³ . Thuswe have specializedthepredictornot only to thescene,butalsoto thecamerapositionwithin thescene.


−60 −40 −20

0 20 40

0 10 20 30 40 50 60 70 80 90 100Pre

dict

ion

erro

r (%

)

µ

Trace step

0

200

400

600

800

1000

CP

U d

eman

d (m

illio

ns o

f cyc

les)

¶

ActualPredictedFrames shown below

(a)CPUdemandandpredictionerror

(b) Viewsat highlightedtimepoints

Thegraphshows theCPUdemandof renderingat full resolution,for thefirst 100stepsof a usertraceon the“Notre Dame”scene;we alsoshow the% errorof theRLS predictoralongthesametime line. The imagescorrespondto thecameraviews at thethefour highlightedtime pointsareshown below.

Figure 7.9: GLVU’s CPU demand: user trace, “Notr eDame” scene

7.4. RADIOSITY 85

I evaluatedthis online-updateschemeon userpathtracesfor eachof the four scenes.I.e., I ran GLVU on thesetraces,andloggedboth theactualaswell asthe predictedCPUdemand.Figure7.9(a)showshow thepredictortracksCPUdemandover time. Weseethatthe very first predictionwasextremely inaccurate:we startedwithout any scene-specificor camerapositionspecificknowledge. As soonasthepredictoracquiredfeedbackfromthefirst operation,it rapidly adjusteditself to trackthecurrentsceneandcameraposition,andsubsequentpredictionsweremuchmoreaccurate.NotethatalthoughtheCPUdemandvariesby a factorof 2, predictionerroris usuallyunder20%.

To illustratethe reasonfor the wide variationin CPU demand,I selectedfour pointsin the tracewith widely differentCPUdemandvalues(the four highlightedpointsin Fig-ure7.9(a)).Thecameraviews at thesefour pointsareshown in Figure7.9(b). Intuitively,CPUdemanddependson the“complexity” of thevisible portionof thescene:however, Iknow of noquantitativemeasurefor thiscomplexity thatcouldaid in CPUdemandpredic-tion.

Figures7.10,7.11,and7.12show theoverallpredictionaccuracy achievedby thedata-specificdynamicpredictorson thefirst 100stepsof eachof the4 usertraces.I comparedthesewith data-specific,but static,predictorscomputedusingleast-squaresregressionoverall logentriesfor aparticularexperiment.I rantheexperimentswith theresolution(fidelity)varying randomly, aswell aswith resolutionfixedat 1. In all cases,the dynamicpredic-tors weresubstantiallymoreaccuratethanthe staticpredictors;both staticanddynamicpredictorsdid betterwhenresolutionis fixedratherthanvaryingrandomly. In realuse,wewouldexpecttheresolutionto vary, but not randomly:thesystemwill adapttheresolutionto varying CPU load, andwe canexpect a predictionaccuracy somewherebetweenthe“randomresolution”andthe“fix edresolution”cases.

7.3.4 Summary

I studiedGLVU, a virtual walkthroughprogramwhosecore task is the renderingof 3-D scenes.I describedhow I modified this applicationfor multi-fidelity adaptation,andshowedthat thecostof this modificationwasreasonable.I observedthat this wasa CPU-boundoperation,andthatCPUdemandcouldbeadaptedby changingthenumberof poly-gonsrendered.I useda simplelinear modelto built a CPU demandpredictorfor GLVU,basedon logsfrom 4 differentscenes.I showedthatpredictoraccuracy is significantlyim-provedby data-specificlearning— customizingthepredictorto new scenesasthey appear— andonline learning— trackingvariationover time dueto changesin cameraposition.With theseimprovements,thecommon-casepredictionerrorvariedbetween4% and27%,andthebadpredictionfrequency between1% and16%.


0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé


(a)Staticpredictors

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé


(b) Dynamicpredictors

The graphsshow CPU demandpredictionaccuracy for GLVU runningusertraceworkloads: foreachscene,thecameramovesthroughthefirst 100stepsof thecorrespondinguserpathtrace,andtheresolutionvariesrandomly. We comparethestaticlinearpredictorwith a dynamicRLS-basedlinearpredictor.

Figure7.10: Static vs. dynamic predictors: random resolution

7.4. RADIOSITY 87

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé


(a) Staticpredictors

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80%

Out

liers¯

Error threshold

Taj MahalCafé


(b) Dynamicpredictors

The graphsshow CPU demandpredictionaccuracy for GLVU runningusertraceworkloads: foreachscene,the cameramovesthroughthe first 100 stepsof the correspondinguserpath trace,andtheresolutionis fixedat 1. We comparethestaticlinearpredictorwith a dynamicRLS-basedlinearpredictor.

Figure7.11: Static vs. dynamic predictors: fixed resolution


Scene Staticpredictors DynamicpredictorsRandom© Fixed © Random© Fixed ©z6{+| }�~+| z6{+| }�~+| zG{+| }i~+| zG{+| }i~+|

Taj Mahal 32% 42% 12% 33% 9% 18% 1% 4%Cafe 43% 33% 78% 42% 13% 23% 1% 9%NotreDame 23% 36% 21% 31% 16% 27% 3% 14%BuckinghamPalace 47% 52% 42% 45% 11% 22% 4% 13%

The tableshows CPU demandpredictionaccuracy for GLVU runningusertraceworkloads: foreachscene,thecameramovesthroughthefirst 100stepsof thecorrespondinguserpathtrace.Wecomparethestaticlinearpredictorwith a dynamicRLS-basedlinearpredictorin two cases:withrandomlyvaryingresolution,andresolutionfixedat 1.

Figure7.12: CPU demandprediction accuracyfor GLVU running user traces

7.4 Radiosity

A radiosity[18] computationcoloursandshadesa3-D sceneaccordingto thelight sourcespresentin thescene.Wewould runsuchacomputationonourvirtual or augmentedrealitywhenever the scenechanges,i.e., whenobjectsor light sourcesare added,removed, ormodified. Radiosityis view-independent: it is computedover over the entirescene,andneednot be recomputedwhenthe camerapositionor orientationchanges.The outputofradiosityhaseachinput polygonannotatedwith colourandlighting informationthatmakefor morerealisticrendering.Figure7.13showsascenerenderedbeforeandafterradiosity,bothatlow fidelity (0.01)andathighfidelity (0.1)In theory, thefidelity (i.e.,theresolution)canbe ashigh as1, but for a scenewith 100kpolygons,this is well beyond the resourcelimits of my testplatform.

Radiator[96] is apublicly availableimplementationof severalcommonradiosityalgo-rithms,with built-in supportfor multiresolutionmodels.It allows theuserto loada scene,selecta radiosityalgorithmanda resolution,andrun the algorithm. In this dissertation,I considertwo of the most commonalgorithms: progressiveand hierarchical radiosity.While bothalgorithmshavenon-trivial CPUandmemorydemand,progressiveradiosityismorememory-intensiveandhierarchicalradiosityis moreCPU-intensive.Thustheoptimalchoiceof algorithmat runtimewill dependon the CPU andmemorysupply. The outputqualitiesof thetwo algorithmsarecomparable,andI assumeherethatthethey areequalatany givenresolution:astudyof user-perceivedquality is beyondthescopeof thethesis.

Figure7.14shows theApplicationConfigurationFile for Radiator. It hastwo tunableparameters,onediscrete— thechoiceof algorithm— andonecontinuous— theresolution.Thereis onenontunableparameter:theinputscene’s polygoncount.

Theremainderof thissectiondescribesthecostof modifyingRadiatorto usethemulti-fidelity API; theCPUandmemorydemandpredictorsfor bothprogressiveandhierarchical

7.4. RADIOSITY 89

(a)Beforeradiosity

(b) After radiosity(low fidelity) (c) After radiosity(highfidelity)

This scenecontainsthreeelements:a 100K-polygonmodelof a dragon,theground,representedby a singlegreensquarebelow the dragon,anda light sourcevertically above the dragon. Thetop imageshows the imageasrenderedbeforea radiositycomputation:we seethat the lightinghasno effect on the observed image. The bottomimagesshow the sameimageafter a radiositycomputationat low fidelity (0.01),andat highfidelity (0.1).

Figure 7.13: Effect of radiosity on a 3-D model


description radiator:radio si tylogfile /usr/odyssey/etc /r adia tor .r adio si ty. lo gmode normalconstraint latency 10param polygons ordered 0-infinityfidelity algorithm unordered progressive hierarchicalfidelity resolution ordered 0.01-1

hintfile /usr/odyssey/li b/ ra d_hin ts .s ohint cpu radiator_radios it y_ cp u_h in thint memory radiator_radio si ty _memory _hin thint latency radiator_radios it y_l at ency _hintupdate radiator_radios it y_ updateutility radiator_radiosi ty _uti lit y

Figure7.14: Application Configuration File for Radiator

radiosity;andtheaccuracy of thesepredictors.

7.4.1 Porting Radiator to the multi-fidelity API

Figure7.15summarizesthemodificationsthatI madeto Radiator. In orderto make Radi-atoradaptive, I wroteanACF, providedgluecodefor themulti-fidelity API, andinsertedcalls to thegluecodeinto theapplication,both thecommand-lineandthe GUI version. Ialsowroteahint modulecontainingtheCPUandmemorydemandpredictorsdescribedinthis section.In all, I modified5 filesand599lines,about1.2%of thetotal codebase.

I alsosimplifiedRadiator’s GUI. Theoriginal GUI hadseveralcontrolsfor expertusersto experimentwith radiosity parametersettings. I eliminatedthese: the algorithm andtheresolutionarenow chosenadaptively by invoking themulti-fidelity API, andall otherparametersarefixedattheirdefaultvalues.Theuseris givenasinglecontrolto specifytheirdesiredlatency. My aimwasto simplify theuseof theapplicationfor anon-expertuser, yetintelligentlyadaptthoseparametersthatimpactresourceconsumptionandperformance.Tomodify the GUI, I madea small numberof changesto 3 sourcefiles, andeditedthe GUI

layout definition usingthe XFormsUserInterfaceDesigner[100]. The last two lines ofFigure7.15show thecostof thesemodification.

Additionally, I fixedanumberof memoryleaksandotherbugsin Radiator:thesemodi-ficationsarenotcountedin theevaluation,asthey aretangentialto multi-fidelity adaptation.

7.4. RADIOSITY 91

Files Lines Purpose1 15 ApplicationConfigurationFile1 274 Hint module1 68 Callsto gluecode(command-lineversion)1 123 Callsto gluecode(GUI version)2 119 Gluecode


1 125 GUI layoutdescription3 23 OtherGUI modifications

Figure7.15: Modifications madeto Radiator

Model PolygoncountDragon 108590Whale 101814Bunny 69543Car 56972Polarbear 48963Humanbust 29450

These3-D modelswereprovidedby Andrew Willmott, designerandauthorof Radiator.

Figure 7.16: 3-D modelsusedwith Radiator

7.4.2 Predicting Radiator’ sCPU demand

Radiator’s CPU demandpredictoris similar to GLVU’s: I measuredthe CPU demandofboth progressive andhierarchicalradiosityfor 6 differentscenes.Eachsceneconsistsofa 3-D model,an overheadlight source,anda greenpatchof “ground” underneath.Themodelswerechosento have a wide rangein size: from 29k polygonsto 109k polygons.Figure7.16lists the3-D modelsusedin thetestscenes.

For eachscene,I ranprogressive andhierarchicalradiosity50 timeseach,with a ran-domlychosenresolutionfor eachcomputation.Thelogsfrom theseexperimentswereusedto derive bothCPUandmemorydemandpredictors.To avoid measurementerrorsduetoheapre-use(Section5.6.1),I restartedtheapplicationfor eachexperimentalrun. I limitedtheCPUconsumptionof eachrunto 300s(about70billion cyclesonmy testplatform)andthememoryfootprint to 64MB. Eachrun includesapplicationinitializationandloadingofthescenefile, aswell astheradiositycomputationitself: thustheactualresourcelimits forthe radiositycomputationweresomewhat lower. This meantthat, for someof the larger


models,I couldnotcover theentirerangeof possibleresolutions.

Figure7.17shows the resultsof theseexperiments.We noticethat the CPU demandfor hierarchicalradiosityis anorderof magnitudehigherthanthatof progressiveradiosity:whentheprocessoris thebottleneck,we shouldalwaysusethe latteralgorithm. We alsoseethat CPU demandincreaseslinearly with polygon count, and that eachscenehasadifferentslope.

Theseresultssuggestdata-specificpredictorsfor CPU demand.Sincepart of the ra-diosity algorithmoperateson theoriginal model(i.e., on all thepolygons),we expectthattheoriginalpolygoncount , aswell asthescaledpolygoncountH© , will alsohaveaneffecton CPUdemand.I useda linearmodel:¡£¢S¤&¥ �l¦&|�§q¦E��¨·§�¦>{¨H©for thedata-independentpredictor. For data-specificprediction,however, ¨ is a constant,andI couldreducethemodelto ¡£¢S¤&¥ ��¦F¸| §q¦F¸ � ©Figures7.18and 7.19show theaccuracy of data-specificpredictorsfor eachscene,andthatof a data-independentpredictor. We seethat for all scenesexceptfor the “Human bust”,progressiveradiosityhasverypredictableCPUdemand;thedata-independentpredictorhasa higher, but still acceptablecommon-caseerror (17%) thanthedata-independentpredic-tors. Hierarchicalradiosity’s CPU demandis slightly lesspredictablefor data-dependentpredictors,andveryunpredictablewith thedata-independentpredictor( }�~+|¹�l³\³\º ).

Unlike GLVU, Radiatoris view-independentand unaffectedby variationsin cameraposition:thus,wedonotneedonlinelearningupdatesto capturesuchvariationsovertime.However, I douseonlinelearningto customizethedata-independentpredictorto everynewscene,i.e., to createdata-specificpredictorsfor new scenesat runtime.

7.4.3 Predicting Radiator’ smemory demand

To build amemorydemandpredictorfor Radiator, I took thesamelogsasbeforeandplot-tedmemorydemandagainstresolution(Figure7.20). We seethatprogressive radiosity’smemorydemandincreasesalmost7 timesasfastashierarchicalradiosity’s. We alsosee,for bothalgorithms,a stronglinearcorrelationbetweenmemorydemandandthepolygoncount ¨H© . Thereseemsto bedata-specificity— eachscenefollows a differentline — butall theselineshave similar slopes,anddiffer only in their offsets. I hypothesizedthat theinitial offsetis contributedby theoriginal (full-resolution)model,andtheremainderby thedegradedversion.Thus,a linearmodelof theform¡£» � »�¼�½�¾ ��¦>|�§q¦E��¨�§q¦&{�¨H©shouldbeagoodpredictorof memorydemand,where¦>| is thefixedmemoryoverhead,¦E�-¨is theoriginalmodel’s contribution,and ¦>{�¨H© is thememoryfor operatingon thedegradedversion.

7.4. RADIOSITY 93

0

0.5

1

1.5

2

2.5

3

0 2 4 6 8 10 12 14 16 18

CP

U d

eman

d (b

illio

ns o

f cyc

les)

¿

Number of polygons after scaling (thousands)

DragonWhaleBunny

CarPolar bear

Human bust

(a) Progressiveradiosity

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50 60 70

CP

U d

eman

d (b

illio

ns o

f cyc

les)

¿

Number of polygons after scaling (thousands)

DragonWhaleBunny

CarPolar bear

Human bust

(b) Hierarchicalradiosity

The « -axis is thenumberof polygonsafterscaling,i.e. ¬G where¬ is theoriginal modelsizeandthe is the resolution.The ® axis is theCPUdemandin millions of cycles. Note thatbothaxesareon very differentscalesfor thetwo graphs.Progressive radiosityusesmuchlessCPU,but isconstrainedby its memorydemandto a maximumof 18k polygons.

Figure7.17: CPU demandof radiosity


0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liers¯

Error threshold

DragonWhaleBunny

CarPolar bear

Human bustAll Scenes

(a)Progressive radiosity

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liers¯

Error threshold

DragonWhaleBunny

CarPolar bear



Thegraphsshow theoutlierdistributionfor linearpredictorsof CPUdemand,for bothprogressiveandhierarchicalradiosity. I show thedistributionbothfor data-specificpredictors,andfor asinglegenericpredictorappliedacrossall scenes.

Figure 7.18: CPU demandprediction for Radiator: outlier distrib ution

7.4. RADIOSITY 95

Model ProgressiveRadiosity HierarchicalRadiosityzG{+| }i~+| zG{+| }�~+|Dragon 0% 5% 4% 16%Whale 0% 5% 0% 3%Bunny 0% 9% 10% 22%Car 0% 8% 8% 18%Polarbear 0% 14% 6% 8%Humanbust 19% 24% 4% 5%All scenes 4% 17% 39% 55%

Thetableshows thebadpredictionfrequency À �� andthecommoncaseerror Á �� for linearpre-dictorsof CPU demand,for both progressive andhierarchicalradiosity. We show the accuracybothof data-specificpredictors,andfor a singlegenericpredictorappliedacrossall scenes.

Figure7.19: CPU demandprediction accuracyfor Radiator

Figures7.21and7.22show theaccuracy of a data-independentlinearpredictorof thisform, aswell asthat of data-specificpredictors.We seethat the data-independentmodelhasvery goodaccuracy: we have 0% outliersfor both algorithms,andthe common-caseerroris 1.3%for progressiveand3.3%for hierarchicalradiosity. At runtime,I usethedata-independentpredictorfor its simplicity: it avoidstheoverheadof doingonlinelearningandmaintainingscene-specificpredictors,at averysmallcostin increasedpredictionerror.

7.4.4 Extrapolating predictors for higher fidelities

Thelimitationsof my mobilehardwareplatformmadeit impossibleto measureRadiator’sresourcedemandover the entire rangeof fidelities: I hadto restrictmemorydemandtoavoid thrashingandCPUdemandto boundruntimeto a reasonablevalue.I couldmeasureprogressiveradiosityonly up to 8k polygonsandhierarchicalradiosityup to 70kpolygons,whereasthelargestmodelhas109kpolygonsat full fidelity.

This limitation led usto askthefollowing questions:

� Does the dataacrossall fidelities, on a fastermachine,show the sametrendsasbefore:linearanddata-independentfor memory;linearanddata-dependentfor CPU?� Canwe extrapolatethemeasurementson thebaselineconfiguration(IBM ThinkPad560with 233MHz Mobile Pentiumand96MB memory)to predictresourcedemandon thefastermachine?

To answerthesequestions,I repeatedthe measurementsof Radiatoron a fastservermachine(2.2GHz Intel Xeonprocessor, 512MB of memory): this time I testedtheentirerangeof resolutionsfor bothprogressive andhierarchicalradiosity. Figures7.23and7.24show the CPU and memorydemandcurves from theseexperiments. We seethe same


0

10

20

30

40

50

60

0 2000 4000 6000 8000 1000012000140001600018000

Mem

ory

dem

and

(MB

)

Number of polygons after scaling

DragonWhaleBunny

CarPolar bear

Human bust


0

10

20

30

40

50

60

0 10000 20000 30000 40000 50000 60000 70000

Mem

ory

dem

and

(MB

)


DragonWhaleBunny

CarPolar bear

Human bust


The « -axis is thenumberof polygonsafterscaling,i.e. ¬G where¬ is theoriginal modelsizeandthe is theresolution.The ® axisis thememorydemandin megabytes.Notethatthe « -axisis ondifferentscalesfor thetwo graphs.

Figure 7.20: Memory demandof radiosity

7.4. RADIOSITY 97

0%

20%

40%

60%

80%

100%

0% 1% 2% 3% 4% 5%

Out

liers¯

Error threshold

DragonWhaleBunny

CarPolar bear



0%

20%

40%

60%

80%

100%

0% 1% 2% 3% 4% 5%

Out

liers¯

Error threshold

DragonWhaleBunny

CarPolar bear


(b) Outlierdistribution: hierarchicalradiosity

The graphsshow the outlier distribution for linear predictorsof memorydemand,for both pro-gressiveandhierarchicalradiosity. We show thedistributionbothfor data-specificpredictors,andfor a singlegenericpredictorappliedacrossall scenes.

Figure7.21: Memory demandprediction for Radiator: outlier distrib ution


Model ProgressiveRadiosity HierarchicalRadiosityz6{+| }i~+| zG{+| }i~+|Dragon 0% 0.1% 0% 2.1%Whale 0% 0.2% 0% 0.9%Bunny 0% 0.2% 0% 3.8%Car 0% 0.7% 0% 1.2%Polarbear 0% 0.5% 0% 1.3%Humanbust 0% 0.5% 0% 1.2%All objects 0% 1.3% 0% 3.3%

Thetableshows thebadpredictionfrequency À �� andthecommoncaseerror Á �� for linearpre-dictorsof memorydemand,for bothprogressiveandhierarchicalradiosity. We show theaccuracybothof data-specificpredictors,andfor asinglegenericpredictorappliedacrossall scenes.

Figure 7.22: Memory demandprediction accuracyfor Radiator

trendson the fastserver ason the ThinkPad (Sections7.4.2and7.4.3): CPU demandislinearbut data-dependent;memorydemandis linearandappearsto bedata-dependent,butthedifferencesbetweenscenesaredueto thedifferencesin their originalpolygoncount .

To confirm thesefindings, I generatedlinear predictors— both data-dependentanddata-independent— for the new data,andmeasuredtheir predictionerrors. Figure7.25shows the CPU demandpredictionaccuracy for the new data,andcomparesit with twoother cases:the original (ThinkPad) predictorstestedon the new (Xeon) data,and theThinkPadpredictorstestedon theThinkPaddata.We seethat:

� asbefore,data-dependentpredictorsdobetterthandata-independentones.� linearpredictorsdowell, but notaswell asbefore:perhapsbecausenon-lineareffectssuchasL1 cachecontentionaremoreapparentat higherfidelities.� The extrapolatedpredictor(TP/Xeon)doeswell for progressive radiosity, but verybadly for hierarchicalradiosity. In the latter case,the slopeof CPU demandvs.polygoncount(Figure7.23(b))is consistentlyhigher for the Xeon. I.e., the fasterprocessoris actuallydoing lessper cycle, causingthe extrapolatedpredictorto un-derestimatethenumberof cyclesrequired.� Theextrapolatedpredictor(TP/Xeon/unscaled)hasenormouserrors.In otherwords,if wejustmeasureCPUdemandin termsof time,withoutscalingfor processorspeed,thenwe consistentlyoverestimateCPUdemandby a factorof 5-11(theXeon’s pro-cessorspeedis 9.5timesthatof theThinkPad’s).

Thesefindingsconfirm my previous observations: that CPU demandis linear anddata-dependent.They alsoshow that “cyclesconsumed”asa metricof CPUdemanddoesnotalwaystranslatewell acrossprocessorarchitectures:predictorsderived on oneprocessormustbemodifiedwhenusedonanother. For simplepredictors,onlineupdateswill achieve

7.4. RADIOSITY 99

0

5000

10000

15000

20000

25000

0 20000 40000 60000 80000 100000 120000

CP

U d

eman

d (m

illio

ns o

f cyc

les)

¶


DragonWhaleBunnyCarPolar bearHuman bust


0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

0 20000 40000 60000 80000 100000 120000

CP

U d

eman

d (m

illio

ns o

f cyc

les)

¶


DragonWhaleBunnyCarPolar bearHuman bust


The « -axis is thenumberof polygonsafterscaling,i.e. ¬G where¬ is theoriginal modelsizeandthe is theresolution.The ® axisis theCPUdemandin millions of cycles:notethatit is on verydifferentscalesfor thetwo graphs.

Figure7.23: CPU demandof radiosity on fast server


0

50

100

150

200

250

300

350

0 20000 40000 60000 80000 100000 120000

Mem

ory

dem

and

(MB

)


DragonWhaleBunny

CarPolar bear

Human bust


0

10

20

30

40

50

60

70

80

0 20000 40000 60000 80000 100000 120000

Mem

ory

dem

and

(MB

)


DragonWhaleBunny

CarPolar bear

Human bust


The « -axis is thenumberof polygonsafterscaling,i.e. ¬G where¬ is theoriginal modelsizeandthe is the resolution. The ® axis is the memorydemandin megabytes:notethat it is on verydifferentscalefor thetwo graphs.

Figure 7.24: Memory demandof radiosity on fast server

7.4. RADIOSITY 101

Model Xeon/Xeon TP/Xeon TP/Xeon/unscaled TP/TPzG{+| }i~+| zG{+| }i~+| zG{+| }i~+| zG{+| }i~+|Dragon 4% 4% 0% 6% 100% 872% 0% 5%Whale 8% 13% 14% 22% 100% 751% 0% 5%Bunny 24% 78% 14% 25% 100% 944% 0% 9%Car 4% 6% 0% 9% 100% 857% 0% 8%Polarbear 14% 30% 2% 16% 100% 890% 0% 14%Humanbust 2% 11% 4% 19% 100% 940% 19% 24%All objects 13% 23% 19% 25% 100% 882% 4% 17%


Model Xeon/Xeon TP/Xeon TP/Xeon/unscaled TP/TPzG{+| }i~+| zG{+| }i~+| zG{+| }i~+| zG{+| }i~+|Dragon 10% 24% 94% 54% 100% 563% 4% 16%Whale 2% 4% 96% 36% 100% 594% 0% 3%Bunny 0% 11% 84% 34% 100% 1114% 10% 22%Car 2% 6% 92% 40% 100% 766% 8% 18%Polarbear 0% 5% 52% 26% 100% 908% 6% 8%Humanbust 0% 4% 42% 23% 100% 849% 4% 5%All objects 64% 95% 67% 62% 100% 862% 39% 55%


The first six lines of eachtableshow the badpredictionfrequency À �$� andcommon-caseerrorÁ �� for data-specificlinear predictors;the last line correspondsto a data-independentpredictorevaluatedoverall theobjects.We show four combinationsof predictorsandtestdata:Xeon/Xeon(derivedfrom andtestedonXeondata);TP/Xeon(derivedfrom ThinkPaddataandtestedonXeondata);TP/Xeon/unscaled(the sameas the former, but without CPU speedscaling);andTP/TP(derivedfrom ThinkPaddataandtestedon ThinkPaddata).

Figure 7.25: CPU demandprediction accuracyfor Radiator on a fast server


Model Xeon/Xeon TP/Xeon TP/TPz6{+| }�~+| zG{+| }i~+| zG{+| }i~+|Dragon 0% 1.6% 0% 1.5% 0% 0.1%Whale 0% 0.4% 0% 1.2% 0% 0.2%Bunny 0% 1.2% 0% 2.4% 0% 0.2%Car 0% 0.5% 0% 2.8% 0% 0.7%Polarbear 0% 0.5% 0% 1.4% 0% 0.5%Humanbust 0% 1.1% 0% 2.3% 0% 0.5%All objects 0% 1.4% 0% 1.4% 0% 1.3%


Model Xeon/Xeon TP/Xeon TP/TPz6{+| }�~+| zG{+| }i~+| zG{+| }i~+|Dragon 0% 1.4% 0% 3.7% 0% 2.1%Whale 0% 1.3% 0% 2.2% 0% 0.9%Bunny 0% 3.9% 0% 4.8% 0% 3.8%Car 0% 1.2% 0% 1.9% 0% 1.2%Polarbear 0% 1.2% 0% 2.5% 0% 1.3%Humanbust 0% 1.1% 0% 3.3% 0% 1.2%All objects 0% 3.6% 0% 4.2% 0% 3.3%


The first six lines of eachtableshow the badpredictionfrequency À �$� andcommon-caseerrorÁ �� for data-specificlinear predictors;the last line correspondsto a data-independentpredictorevaluatedoverall theobjects.Weshow threecombinationsof predictorsandtestdata:Xeon/Xeon(derivedfrom andtestedonXeondata);TP/Xeon(derivedfrom ThinkPaddataandtestedonXeondata);andTP/TP(derivedfrom ThinkPaddataandtestedon ThinkPaddata).

Figure7.26: Memory demandprediction accuracyfor Radiator on a fast server

7.5. WEB BROWSING 103

this aftera few operations;morecomplex predictorsneedto be“ported” by othermeans.Nevertheless,“cyclesconsumed”is amuchbettermetricthan“CPU time”, whichdoesnotcompensatein any way for changesin processorspeed.

Figure7.26shows thepredictionaccuracy for memorydemand.All threetestconfigu-rationsshow very low error, bothfor data-dependentanddata-independentprediction(theTP/Xeonpredictorhasslightly highererrors,but this is only becausetheextrapolationin-evitably magnifiesany noisein theoriginaldata).Thisconfirmsmy originalfindingsaboutRadiator’s memory— that it is linearanddata-independent– andalsoshows that it is notsensitive to changesin processorarchitecture.

7.4.5 Summary

I studiedRadiator: an implementationof radiosity, a view-independent3-D shadingal-gorithm. With a small numberof modifications,I madethe applicationadaptive: it nowchoosesoneof two algorithms,anda resolutionbetween0 and1, by invoking themulti-fidelity API. I built and evaluatedlinear predictorsfor both CPU andmemorydemand.Data-specificpredictorsfor CPU demandgave usa common-caseerrorof 3%–24%,andweresignificantlybetterthandata-independentpredictors.Data-independentpredictorsformemoryhadanerrorof at most3.3%;data-specificonesdid slightly better.

7.5 Web browsing

Web browsing is a commonactivity on both mobile anddesktopcomputers.In the mo-bile case,fetchinglargeimagesoverawirelessnetwork resultsin consumptionof valuablebatteryenergy, aswell as increasednetwork traffic. For suchimages,lossyJPEGcom-pression[94] is a well-known way to adaptenergy [34] andnetwork [37, 71] demandtoresourcesupply.

My adaptive webbrowser[71] consistsof anunmodifiedNetscapebinaryanda localHTTP proxycalledthecellophane, originally writtenby Eric Tilton. Thecellophaneinter-ceptsall webrequestsandtransformstheminto Odyssey [71] systemcalls. Odyssey thenrequestsa degradedversionof theimagefrom a distilling server locatedon theothersideof thewirelesslink. Thedistiller fetchestheimagefrom thewebserver, JPEG-compressesit if required,andreturnsit to theclient; it usestheIndependentJPEGGrouplibrary [43]to do theactualcompression.

For this application,a multi-fidelity operationconsistsof compressing,fetching andrenderinga singleimageon thescreen.CurrentlyI ignorethecostof fetchingthe imagefrom thewebserver to thedistiller: this stephaslittle impacton themobileclient’senergyusageor wirelesstraffic, thoughit doesincreasethe overall latency. Eachoperationhasone datafeatureor nontunableparameter— the size of the original image— and one


description web:fetchimagelogfile /usr/odyssey/et c/ web.f et ch im age .l ogparam imagesize ordered 0-infinityfidelity jqf ordered 5-80 step 1

Figure 7.27: Application Configuration File for web imagefetch

fidelity metric — the JPEGQuality Factor(JQF)[94, 14] at which compressionis done.Figure7.27showstheApplicationConfigurationFile for thisapplication.TheJQFcantakeany integervaluefrom 0–100:however, I foundthatvaluesbelow 5 causethecompressionalgorithmto behave unreliably, andvaluesabove 80 causea sharpincreasein the sizeofthe compressedimage,sometimesexceedingthat of the original image. Consequently, IrestrictedtheJQFto therange5–80.

In the remainderof this sectionI describehow I modified the cellophaneto usethemulti-fidelity API, andhow I predictedthenetwork andenergy demandof fetchingimagesover a wirelessnetwork. In all of my experiments,the distiller ran on an IBM ThinkPad570with a366MHz Mobile PentiumII processorand128MB of RAM. Thewirelesslinkbetweenclientandserverwasprovidedby a2Mbps,2.4GHzLucentWaveLAN in ad-hocmode.All imageswerefetchedfrom awebserver runningonthesamehostasthedistiller.

7.5.1 Porting Netscapeto the multi-fidelity API

The Netscape“application” consistsof threeparts: the Netscapebinary (at the time, thesourcecodewasnot freely available),thecellophane,andthedistilling server. In ordertousethemulti-fidelity API, I only neededto modify thecellophane.On receiving a requestfor animage,themodifiedcellophane

� getstheimagesizefrom thedistiller.� invokes « ��Â1ÃÄ z�Ã�Å��MÆ-ÃÇÉÈ Ê&¨ to find theappropriatefidelity (JQF)for thatimagesize.� passestheJQFto Odyssey.� fetchesthedegradedimagefrom thedistilling server (throughOdyssey), andpassesit to Netscape.� waits for Netscapeto finish renderingtheimage,by trackingNetscape’s statuswin-dow.� calls �EÄËÅ z�Ã�Å��MÆ-ÃÇÉÈ Ê&¨ .

Normally, Netscape’s operationis drivenby a userinteractingwith its GUI. However,whencollectinglog data,I hadto automaticallyexecutea largenumberof operationswith-out requiringuserinteraction. For this purpose,I useda remotecontrol program,basedon theNetscapeRemoteControlreferenceimplementation[99], whichallowedusto issue“fetch anddisplayURL” commandsto Netscapefrom anotherprocess.An extendedver-


Figure 7.28: Netscape’s “shooting stars” statuswindow: idle (left) and animated

Files Lines Purpose1 83 Insertionof multi-fidelity calls1 393 Remotecontrolprogram1 211 TrackingNetscapestatuswindow1 4 ApplicationConfigurationFile2 170 Hint module

6 861 Totalmodificationsfor multifidelity2 697 Cellophane(original size)7 3224 Distiller (unmodified)

Note that the count for the remotecontrol program(line 2) excludescode from the originalNetscapereferenceimplementation,andcodeto trackNetscape’sstatuswindow.

Figure 7.29: Codemodifications for adaptiveweb browsing

sion,built by JasonFlinn andKip Walker [81], canalsodetectwhenNetscapehasfinisheddisplayingadocument.It doessoby trackingthestateof Netscape’s“shootingstars”statuswindow (Figure7.28),whichdisplaysastaticimagewhenNetscapeis idle andananimatedonewhenit is busyfetchingor renderingadocument.Theremotecontrolprogramalsoin-vokesmulti-fidelity API andsetstheJQFonbehalfof theapplication:thus,in batchmode,I usedtheunmodifiedcellophanewhichdoesnotmakemulti-fidelity calls.

Figure7.29shows thecostof portingNetscapeto themulti-fidelity API.

7.5.2 Predicting the network demandof imagefetch

Thenetwork demand— thenumberof bytesreadfrom theserver — is determinedby thesizeof the JPEG-compressedimage. Canwe predict this sizeasa function of the JPEGQualityFactor?I measuredthecompressionratioachievedby JPEGatvariousJQFvalues,on six imagesof sizesvaryingfrom 8KB to 1.3MB. (Figure7.30). Figure7.31(a)showsthe compressionratiosasa function of the JQF. We seethat the plot is roughly linear in


Image Size(bytes)nsh 1394081apple 174650radio 114816castle 58223circuit 19685laserdt 8802

Figure 7.30: Testimagesusedwith web browser

therange5–80:beyond80 it beginsto increaserapidly andnon-linearly. We seethis non-linearityevenonalog-logplot (Figure7.31(b)),indicatingthatit is notasimplepolynomialor exponentialrelationship.Sincecompressionratioscloseto or exceeding1 areneitherpredictablenoruseful,I avoidedJQFvaluesabove80.

In therange5–80,I modelthecompressionratio © asa linearfunctionof theJQF z :

©Ì�l¦&|�§�¦M�ÍzAs Figure7.31shows, the valuesof ¦>| and ¦M� aredifferent for different images,i.e. themodelneedsto bedata-specific.Giventhecompressionratio © andtheoriginal imagesizeÎ

, wecanpredictthenetwork demandas

¡·½ � ¢-Ï �l¦F¸| §�© Î �l¦F¸| §l�¦&|�§q¦E�ÐzË� Î �l¦F¸| §�¦&¸ � Î §q¦F¸{ z ÎFigure7.32showstheaccuracy of thismodel,evaluatedover6 imageswith 100datapointseach.As wemight expect,weseehighaccuracy for data-specificpredictorsandlow accu-racy for thedata-independentpredictor:wehavealreadyseenthattherelationshipbetweenJQFandcompressionratio is data-specific.

7.5.3 Predicting the energy demandof web imagefetch

Fetchinglarge objectsover a wirelessnetwork consumesa substantialamountof energy.Compressioncanhelpherealso:it reduceswirelesstraffic, andalsothetime for which thewirelessinterfaceis active. To measuretheeffectof compressiononenergy usage,I ranthesameexperimentsasin theprevioussectionandmeasuredtheenergy usageof eachopera-tion usingPowerScope[31]. PowerScopeallowsusto samplethepowerconsumptionof alaptop,andto attributeit to oneof themany processesrunningon themachine.I extendedPowerScopeto includea timestampwith eachsample. In post-processing,I usedthesetimestampsto correlatepowersampleswith theoperationsloggedby Odyssey. I computedthetotalenergy consumedduringanoperation,subtractedouttheknownbackgroundpowerconsumption,andattributedthe remainingenergy consumptionto thatoperation.On mytestmachine,thisbackgroundor baselinepowerconsumptionwas7.94Watts.


0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Com

pres

sion

rat

io

JPEQ Quality Factor

nshappleradiocastlecircuitlaserdt

(a) Normalscale

0.01

0.1

1

10

1 10 100

Com

pres

sion

rat

io

JPEQ Quality Factor

nshappleradiocastlecircuitlaserdt

(b) Log scale

The « -axisshows theJPEGQuality Factor À , from 0 to 100. The ® -axisshows thecompressionratio achievedat thatquality factor.

Figure7.31: Compressionratio asa function of JPEG Quality Factor


0%

20%

40%

60%

80%

100%

0% 5% 10% 15% 20% 25% 30%

Out

liersÑ

Error threshold

nshappleradio

castlecircuit

laserdtAll Images

(a)Outlierdistribution

Image Pred.errorzG{+| }i~+|nsh 0% 5%apple 0% 7%radio 0% 5%castle 0% 6%circuit 0% 8%laserdt 0% 4%All images 70% 526%

(b) Bad prediction fre-quency andcommon-caseerror

Thegraphshowspredictionaccuracy for data-specificanddata-independentlinearmodelsof net-work demandfor imagefetches.Thetableshows thecorrespondingbadpredictionfrequency À&�$�andthecommoncaseerror Á�� .

Figure7.32: Network demandprediction accuracyfor web imagefetch


0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liersÒ

Error threshold

nshappleradio

castlecircuit

laserdtAll Images

(a)Outlier distribution: Netscape

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liersÒ

Error threshold

nshappleradio

castlecircuit

laserdtAll Images

(b) Outlier distribution: simplebrowser

Image Netscape Simplebrowserz6Ó-ÔÐÕ }i~+| z6Ó-ÔÐÕ }i~+|nsh 0% 10% 1% 3%apple 9% 20% 1% 9%radio 14% 22% 0% 8%castle 23% 34% 0% 11%circuit 60% 60% 3% 8%laserdt 58% 62% 1% 10%All images 55% 100% 58% 72%

(c) Bad predictionfrequency andcommon-caseerror

The graphsshow predictionaccuracy for a linear modelof network demandfor imagefetches:usingNetscape,andusinga simplebrowser. The tableshows the correspondingbadpredictionfrequency À&ÖØ×ÉÙ andthecommoncaseerror Á�� .

Figure7.33: Energy demandprediction accuracyfor web imagefetch


0

1

2

3

4

5

0 20 40 60 80 100

Ene

rgy

dem

and

(Jou

les)

Ú

JPEQ Quality Factor

(a) Netscape

0

1

2

3

4

5

0 20 40 60 80 100

Ene

rgy

dem

and

(Jou

les)

Ú

JPEQ Quality Factor

(b) Simplebrowser

The « -axis is theJPEGQuality factor. The ® -axisis theenergy demandin Joulesof fetchingandrenderingthe“castle” image.Eachpoint representsanexperimentalrun; thelinesarelinearbestfits throughthesamplepoints.

Figure7.34: Energy demandof different browsers

We expectthe energy demandto have a fixed portion,a network portion linear in theamountof datafetched,a decompressionportion linear in the compressedand uncom-pressedimagesizes,anda renderingportionlinearin theuncompressedimagesize.I.e.,

¡ �ÛE� ½�ÜÉ¾ �<¦>|�§q¦E�É© Î §l�$¦>{Í© Î §q¦&Ý Î �Þ§q¦ ã Îwhere© is thecompressionratio. This reducesto

¡ �Û�� ½�ÜÉ¾ �l¦ ¸| §q¦ ¸ � © Î §�¦ ¸{ ÎIf weusea linearmodelfor compressionratio © asa functionof theJQF z , weget

¡ �Û�� ½�ÜÉ¾ ��¦F¸ ¸| §q¦F¸ ¸� Î §q¦F¸ ¸{ z ÎFigure7.33(a)shows theaccuracy of this predictorfor Netscape’s energy demand.We seethat,asbefore,data-specificpredictorsoutperformedthegenericpredictor;however, eventhedata-specificpredictorshadahigherrorrate,especiallyfor thesmallerimages.

A predictormight perform badly for two reasons:either the underlyingmodel waspoorly chosen(i.e. the true relationshipis not linear), or there are noisesourcesthatcausedeviations from the model. In the caseof Netscape,the poor accuracy was dueto noisein thedata. I suspectthat thenoiseis causedby thebackgroundthreadthatani-matesNetscape’s“shootingstars”statuswindow: non-deterministiceffectsin theuser-levelthreadschedulercausethis threadto consumeavariableamountof CPUandhenceenergy.

How well would my predictorsperformin the absenceof thesenon-deterministicef-fects?SinceI did nothavesourcecodeto Netscapeat thetime,I answeredthisquestionby

7.6. SPEECHRECOGNITION 111

building another, simplerbrowser. ThisprogramcansendHTTPrequeststo thecellophane,readimagedata,anddisplaytheimageusingxv [11], a freelyavailableimageeditorfor X.I repeatedmy experimentswith this browserandfoundadramaticimprovementin predic-tion accuracy, asshown by Figures7.33(b)and7.33(c).Figure7.34showsgraphicallythecontrastbetweenNetscapeandthesimplebrowserfor the“castle”object.

7.5.4 Summary

I studiedthe network andenergy demandof web imagefetch, andshowed that both canbereducedthroughmulti-fidelity adaptation.I alsoshowedthatmulti-fidelity adaptationispossibleevenwithout applicationsourcecode,throughtheuseof proxies. However, thisblack-boxapproachleadsto additionalcomplexity of implementation.It alsohindersthebuilding of simpleandaccuratepredictors:sometimestheapplicationexhibits side-effectsthatcannotbedebuggedwithout sourcecodeaccess.

I alsoshowed thatdata-specificpredictorscanaccuratelypredictthe compressedsizeof animageasafunctionof theJPEGQualityFactor. However, adata-independentpredic-tor of compressedsizehadlow accuracy. This is unsurprising:JPEGcompressionratiosdependon imagecontent,andcannotbepredictedfrom imagesizealone.My resultsherearebroadlyin agreementwith thoseof Hanetal. [44], whostudiedJPEGtranscodingfor alargenumberof webimages.They show that,for afixedJPEGlevel:

� Thereis a linearcorrespondencebetweenuncompressedandcompressedimagesize,but with a largeamountof noise,i.e. variationacrossimages.� Thereis a betterlinear correspondencebetweenthe area (numberof pixels)of theuncompressedimageandthecompressedimagesize.� Thevariancefrom thetrendis higherfor larger images( in termsof eitherbytesizeandarea),i.e. thedistribution is heteroscedastic.

By combiningtheseresultswith mine— perhapswith a linearmodelon JPEGlevel andimageareaaswell as input byte size— the accuracy of the data-independentpredictormight beimproved.

7.6 Speechrecognition

Janus[92] is a speechrecognitionsystemthat takesa digital soundsample— an utter-ance— and returnsthe words it recognizesin the utteranceas an ASCII string. It isbothCPUandmemory-intensive: hereI focuson adaptingJanus’sCPUdemandusingthemulti-fidelity API. TheCPUdemanddependson thesizeof the languagemodelused:oneadaptationtechniqueis to usea smallerlanguagemodelfor thesametask,giving usfasterrecognitionat the price of degradedrecognition. In the experimentsdescribedhere,I al-low Janusto switch betweena normal(“large”) languagemodelanda reduced(“small”)


description janus:recognizelogfile /usr/odyssey/et c/ ja nus _r ec ogni ze. lo gmode normalhintfile /usr/odyssey/l ib /j anus- re co gnize .s outility janus_recognize _uti lit yinit janus_recognize_ in it

param utterance_lengt h ordered 0-infinityfidelity vocab_size unordered small largefidelity location unordered local hybrid remote

Figure7.35: Application Configuration File for Janus

version.

Theheavy resourcedemandsof Janusalsomakeremoteexecutionanattractivestrategy.JasonFlinn hasmodified Janusso that it can executeeachrecognitionin one of threedifferentconfigurations:entirely local, entirely on a remotecomputeserver, or “hybrid”.In the hybrid mode,the first phaseis donelocally, in orderto reducethe amountof datashippedover thenetwork. This is a vectorquantization[76] stepthat transformsthe rawspeechutteranceinto a morecompactrepresentation.The bulk of the processingis thendoneat the remoteserver. Thus“location” is a tunableparameterfor Janus,with threepossiblevalues.Althoughtheoutputquality is thesamein eachcase,theresourcetradeoffsbetweenlocalCPU,memory, network bandwidth,andremoteCPUaredifferent.

Dependingon the resourcesupplyat runtime,the solver will pick oneof thesethreemodes.If a remoteserver is required,thesolveralsodeterminesthebestserver to use:thisinformationis notexposedto theapplication.To performtheremoteexecution,thesystemusesSpectra[33], anenginefor serverdiscoveryandclient-servercommunicationthatwasbuilt by JasonFlinn, andis integratedwith themulti-fidelity runtime.

Theresourcedemandof speechrecognitionalsodependson thelengthof theutterancebeingrecognized.Thus,Janushastwo fidelity metrics— modelsizeandlocation— andoneinput parameter— utterancelength(Figure7.35).

7.6.1 Porting Janusto the multi-fidelity API

TheJanuscodebaseis structuredasa toolkit. Thecorefunctionality is implementedasCprocedures;Janus“applications”areTcl scriptsthat invoke this functionality. To make amulti-fidelity Janusapplications,we

� wroteanACFandresourcehint module.


Files Lines Purpose1 10 ApplicationConfigurationFile1 86 Hint module1 394 Command-lineclient3 350 Server-sidewrapper(C)4 241 Server-sidewrapper(Tcl)


1 1045 Large(original)dictionary1 105 Small(reduced)dictionary

Notethat theTcl files werelargely basedon existing Janussourcecode: the line countshereareconservative, i.e. they over-estimatethenumberof new linesof codewritten. Also notethat thesmalldictionarywascreatedsimply by deletingentriesfrom thelargedictionary(eachdictionaryentrycountsasoneline).

Figure7.36: Modifications madeto Janus

� createda reducedversionof the languagemodel. The languagemodelis automati-cally createdfrom adictionary— atext file with oneentryperline. Theexperimentsin this sectionusea “Pittsburgh tourist” languagemodel,which is partof theJanusdistribution. I createdthe reducedversionof the languagemodelby removing allpropernounsfrom thedictionary.� createdaremoteexecutionserver, by addingacodelayerthatinterfacesbetweentheSpectraserver-sidesupportand the Janustoolkit. This server exportsfive RPC’s:“load vocabulary”, “unload vocabulary”, “do quantization”,“recognizequantizeddata”, and“recognizeraw (unquantized)data”. This server runsboth on the clientmachineandonremotecomputeservers.� wroteasimpleclientprogram,whichreadsasoundfile andinvokesthemulti-fidelityAPI to find the appropriatelocationandvocabulary size. The client then invokesSpectrato performthecomputationon thelocal and/orremoteservers.

Figure7.36showsthecostof thesemodifications.

7.6.2 Predicting resourceconsumptionfor Janus

To build resourcedemandpredictorsfor Janus,I used15 pre-recordedutterancesrangingin sizefrom 20KB to 140KB. For eachutterance,I performed5 recognitionsfor eachofthe6 combinationsof locationandvocabulary size,for a total of 450samples.

Figures7.37, 7.38, and 7.39 show the local CPU, remoteCPU, and network trans-missiondemandrespectively, for theseexperiments.In eachcase,I plot resourcedemand


0

500

1000

1500

2000

2500

3000

3500

0 50 100 150 200 250

Loca

l CP

U d

eman

d (1

0

Ú6 c

ycle

s)

Utterance length (KB)

Full vocabularyReduced vocabulary

(a) “Local” mode

0

10

20

30

40

50

60

70

80

0 50 100 150 200 250

Loca

l CP

U d

eman

d (1

0

Ú

6 cyc

les)



(b) “Hybrid” mode

The « -axis is the utterancelength in bytes. The ® -axis is the local CPU demandin millions ofcycleson a Mobile PentiumMMX processor. Eachbar is the min-maxrangefor 5 recognitionsof a singleutterance.Thelinesarebestfits for thefull andreducedvocabularycase,respectively.Note thedifferent ® scalesfor the“local” and“hybrid” modes.The“remote” modeis not shownasit usesanegligible amountof localCPU.

Figure7.37: Local CPU demandof speechrecognition

0

500

1000

1500

2000

2500

0 50 100 150 200 250Rem

ote

CP

U d

eman

d (1

0

ß

6 cyc

les)



(a) “Hybrid” mode

0

500

1000

1500

2000

2500

0 50 100 150 200 250Rem

ote

CP

U d

eman

d (1

0

ß

6 cyc

les)



(b) “Remote”mode

The « -axis is theutterancelengthin bytes.The ® -axis is the remoteCPUdemandin millions ofcycleson a Mobile PentiumII processor. Eachbar is the min-maxrangefor 5 recognitionsof asingleutterance.Thelinesarebestfits for thefull andreducedvocabularycase,respectively.

Figure7.38: RemoteCPU demandof speechrecognition


0

10000

20000

30000

40000

50000

60000

0 50 100 150 200 250

Net

wor

k by

tes

tran

smitt

edà



(a) “Hybrid” mode

0

50000

100000

150000

200000

250000

300000

0 50 100 150 200 250

Net

wor

k by

tes

tran

smitt

ed

à



(b) “Remote”mode

The « -axis is theutterancelengthin bytes.The ® -axis is thenetwork transmissionin bytes.Eachbar is the min-maxrangefor 5 recognitionsof a singleutterance.The lines arebestfits for thefull and reducedvocabulary case,respectively. Network receive demandis not shown as it isnegligible.

Figure 7.39: Network demandof speechrecognition

againstutterancesize.Eachutteranceis plottedasaverticalbarrangingfrom theminimumto themaximumof the5 runs.I alsoshow thebest-fitlinesfor resourcedemandasa func-tion of utterancesize.I donotshow network or remoteCPUdemandfor the“local” mode,sinceit doesnot usethoseresources;I alsoomit the local CPU demandof the “remote”mode,andthenetwork receivedemandin all cases,asthey werenegligible.

Wenotethefollowing featuresof thedata

� local andremoteCPUincreaseroughlylinearlywith utterancelength.� network transmissionincreasesexactly linearly with utterancelength.� the remotemodealwaystransmitsexactly 5 timesasmuchdataasthe hybrid: thequantizationstepalways“downsamples”theaudiodataby a factorof 5.� the local CPU usedby the hybrid mode is small (less than 3% of the total CPUdemandof recognition);thus“hybrid” is preferableto “remote” unlesswe have ex-tremelyhigh bandwidthto theremoteserver, or a very largedisparitybetweenlocalandremoteCPUspeeds.� the local CPU demandof the “local” modediffers slightly from the remoteCPUdemandof the “remote” mode,althoughboth performthe samecomputation:thisis becausethe two CPUshave slightly differentarchitectures,andrequiredifferentnumbersof cyclesfor thesamecomputation.� thevariationin resourceconsumptionfor a singleutterance(for a givenremoteexe-cutionmodeandvocabulary size)is extremelysmall.

Thevariationacrossutterancesindicatesthatadata-specificresourcepredictionscheme


0%

20%

40%

60%

80%

100%

0% 10% 20% 30% 40% 50%

Out

liersÒ

Error threshold

local,fulllocal,reduced

hybrid,fullhybrid,reduced

(a)Outlier distribution: localCPU

0%

20%

40%

60%

80%

100%

0% 10% 20% 30% 40% 50%

Out

liersÒ

Error threshold


remote,fullremote,reduced

(b) Outlier distribution: remoteCPU

0%

20%

40%

60%

80%

100%

0% 0.5%

Out

liersÒ

Error threshold


remote,fullremote,reduced

(c) Outlierdistribution: network

Resource Localmode Hybrid mode RemotemodeFull Reduced Full Reduced Full ReducedzG{+| }i~+| z6{+| }�~+| z6{+| }�~+| z6{+| }�~+| z6{+| }�~+| zG{+| }i~+|

LocalCPU 25% 31% 3% 11% 3% 10% 1% 8% n.a. n.a.Rem.CPU n.a. n.a. 24% 33% 8% 13% 27% 32% 8% 15%Network n.a. n.a. 0% 0.2% 0% 0.2% 0% 0.02% 0% 0.02%

(d) Badpredictionfrequency andcommon-caseerror

The graphsshow outlier distributions for linear predictorsof resourcedemand(local CPU, re-moteCPU,andnetwork transmission)for speechrecognition.Thetableshows thebadpredictionfrequency À �$� andthecommon-caseerror Á �� correspondingto thegraphs.

Figure 7.40: Resourcedemandprediction accuracyfor Janus


would performwell. Unfortunately, suchpredictorsarenot practical,sinceit is very un-likely thata realuserwill generateexactly thesameaudiodataa secondtime. We areleftwith utterancelength á asthe only easilyextractedfeatureto baseour predictionson. Iusea linearmodel ¡ �l¦&|�§q¦E�Fáfor all threeresourcesof interest:local CPU,remoteCPU,andnetwork transmission

Figure7.40showstheaccuracy of linearpredictorsof localCPU,remoteCPU,andnet-work transmission.It showstheoutlierdistribution,badpredictionfrequency andcommoncaseerror for eachresource,for eachof the 6 “bins” definedby locationandvocabularysize.Wenotethat

� the network demandpredictorsare extremely accurate,with errorsalways below0.5%.� the local CPU demandfor “hybrid” (i.e. the vectorquantizationstep),apartfrombeingsmall,is alsoverypredictable,with }�~+| under3%.� the CPU demandof the main (post-quantization)recognitionstep— whetherexe-cutedlocally or remotely— is smaller, andalsomorepredictable,with thereducedvocabulary thanwith thefull vocabulary.

We havesizeable( }i~+|i�Zâlå�º ) predictionerrorsonly for theCPUdemandof themainrecognitionstep(i.e., local CPU demandfor “local”, remoteCPU demandfor “hybrid”and“remote”) whenwe usethe full vocabulary. However, in practice,even theseerrorsdo not usually leadto incorrectadaptive decisions.Typically, the differencein predictedlatency betweendifferentchoices— e.g. executinglocally vs. remotely— is larger thanthe predictionerror, andso the runtimesystemmakes the correctchoice. Of course,inscenarioswheredifferent adaptive choicesgive us only slightly differing latencies,thesystemis morelikely to makeincorrectchoices:however, in suchscenarios,thepenaltyfortheseincorrectchoicesis alsolow. JasonFlinn andtheauthorhave shown that themulti-fidelity systemalwayschoosestheoptimalremoteexecutionmodeandvocabulary sizeina varietyof resourcescenarios[33]: in otherwordsalthoughthereareinevitably errorsinresourceprediction,they aresmallenoughthat theoptimalconfigurationis still predictedto beoptimal.

7.6.3 File accessprediction for Janus

In Chapter2, I observedthatfile cachestateisakey resourcein mobilesystems.Thismeansthat it is importantto beableto predictthefiles thatwill beaccessedby theapplicationatvariousfidelities,in orderto pick fidelity valuesthatreducecachemisses.

In the caseof Janus,eachscenario(e.g.,“Pittsburgh tourist”) hasits own specializedvocabulary. If the requiredlanguagemodel is not in the cache,thenrecognitionon thatmachinewill incur anexpensivecachemiss.(Thelanguagemodelsusedin my evaluation


arearound250KB in size.)

I storedthelanguagemodelsin Coda,andusedthefile cachepredictor(Sections5.5.4and 6.4.2)to predictthe cachemisscostfor any combinationof remoteexecutionmodeandvocabulary size. This approachworksextremelywell for Janus:thefiles requiredontheclient andserver aredeterminedentirelyby thescenario,vocabulary size,andremoteexecutionmode. I.e., it can always predict, with 100% accuracy, the files that will beaccessedby a recognitionoperation[33].

7.6.4 Summary

I studiedspeechrecognition,which adaptsits resourcedemandby reducingits vocabularysizeand/oroffloadingsomecomputationon to remotecomputeservers. I showedthat thelocalandremoteCPUdemandof thisoperationarepredictableto within 24%,andthatthenetwork demandof shippingdatato remoteserversis predictableto within 0.5%.

7.7 Other applications

JasonFlinn andSoYoungPark have portedthefollowing applicationsto themulti-fidelityAPI:

� LATEX, adocumentprocessor(portedwith nosourcecodemodification)� PANGLOSS [70], a languagetranslationsystem

They providesomefurthervalidationof thegeneralityof my programmingmodel;however,I did not evaluatethemquantitatively, anddonotdiscussthemin this dissertation.

Chapter 8

Evaluation

Measurementbeganourmight(W.B.Yeats,UnderBenBulben, IV)

Theevaluationof themulti-fidelity runtimesystemwasdrivenby the following ques-tions:

� Is applicationadaptationagile: doesit respondquickly to changesin resourcesup-ply?� Is adaptationaccurate: doesit pick the fidelity that bestmeetsusergoalsat thecurrentlevel of resourcesupply.� Is it beneficial: whatis theimprovementover thenon-adaptivecase?

This chapteranswersthesequestionsthrougha seriesof experimentswith GLVU andRadiator, aswell assyntheticCPU andmemoryload generators.The chapteris organ-isedasfollows. Section8.1 describesmy evaluationmethodology:thebaseplatform,thesyntheticapplicationsandbackgroundloadgenerators,andtheperformancemetricsused.Section8.2 evaluatesadaptationto varyingCPUload in a syntheticapplicationaswell asin GLVU. Section8.3evaluatesadaptationto varyingmemoryload in a syntheticapplica-tion andin Radiator. Section8.4thenevaluatessystembehaviour whenGLVU andRadiatorrun concurrentlyandcompetefor resources.Section8.5 summarizesmy mainevaluationresults.

8.1 Evaluation Methodology

This sectiondescribesvariouscomponentsof the experimentalsetup: the hardwareandsoftwareplatform,thesyntheticworkloadsandloadgenerators,andtheevaluationmetrics.Realapplicationworkloadsbasedon GLVU andRadiatoraredescribedlaterin thechapter.

119

120 CHAPTER8. EVALUATION

8.1.1 Experimental Platform

All experimentsusedthe sameconfiguration:an IBM ThinkPad 560X with a 233MHzMobile Pentiumprocessorand96MB of memory. This hardware is a generationor twobehindtoday’s desktopandlaptopworkstations,andthusrepresentative of theprocessingpowerwecanexpectfrom lightweighthandheldor wearablecomputersin thenearfuture.

The baseOS is a standard,unmodifiedLinux 2.4.2 kernel. This kernel had severalvirtual memorymanagementbugs: for memory-relatedexperiments,I useda laterkernelversion(2.4.17-rmap12a)in which many of thesebugshavebeenfixed.

8.1.2 Syntheticworkloads

My evaluationusestwo syntheticapplications:cputestandmemtest. Theseapplicationsusethemulti-fidelity API to adapttheir resourcedemandto thepreciseamountsuggestedby the system: thus, their performancedependsentirely on the accuracy and agility ofresourcesupplyprediction.Syntheticapplicationsprovideduswith easilyunderstoodmi-crobenchmarks,while experimentswith real applicationsgave us a pictureof systembe-haviour in morerealisticscenarios.

cputest is a synthetic,CPU-bound,multi-fidelity “application”. Eachof its “opera-tions” consistsof spinningin a loop for a specifiedamountof CPU time, measuredincycles. This amountis the single “fidelity” (tunableparameter) . cputest’s utility in-creaseslinearly with ¨ : the runtimesystemmustchoosethe highestvalueof ¨ that doesnot violate latency constraints.Thus,the optimal value ¨ ¼Ø¤Fã resultsin a latency ä ¼Ø¤Fã thatexactly matchesour constraint.To predictthe latency for any particularvalueof ¨ , I usethegenericlatency predictor(Section5.5.3)

äå�¡£¢S¤F¥Î ¢S¤F¥æ� ¨Î ¢S¤&¥

Hence ¨ ¼Ø¤Fã �lä ¼Ø¤Fã Î ¢S¤F¥ä ¼Ø¤Fã is known; thustheaccuracy of computing

¼ç¤&ãdependsentirelyon theaccuracy of the

CPUsupplyestimateÎ ¢S¤&¥

.

memtestis amemory-bound,synthetic,multi-fidelity application.Its “fidelity” � spec-ifies theoptimalworking setsizefor eachoperation.An operationconsistsof writing oneword to everypagein a block of size � andthensleepingfor a specifiedperiodof time è .On startup,memtestpreallocatesa largeblock (128MB) of virtual memory;eachopera-tion thentouchessomeinitial portion of size � , to emulatea working setof size � . Thepreallocationavoids the fragmentationand unpredictableaccesspatternsthat can resultfrom repeatedcallsto éëê�Æ$Æ�ÊG¦ and z�©h�M� .

SincetheCPUtimespentin touchingthepagesis negligible, theexpectedlatency is èwhenall thepagesareresidentin physicalmemory. Whenthereis memorypressure,i.e.,

8.1. EVALUATION METHODOLOGY 121

memorysupply is lessthan � , all pageswill not be residentsimultaneously, andlatencywill increasesharplydueto paging.

If memorysupplydecreasessharplywhile an operationis in flight — e.g.,if anotherapplicationstartsup— thesystemmightbegin to thrash.Theapplicationcanonly adaptitsfidelity at thenext adaptivedecisionpoint: thestartof thenext operation.Whenthesystemis thrashing,thecurrentoperationmight takeavery long time to complete:it is sometimesbetterto abortit, andrestartat a lower fidelity, ratherthanincur thepenaltyof thrashing.To do this, I usea latencyconstraint violation callback (Section4.3.2),which is triggeredwhenanoperationtakesmuchlongerthanexpected:in my experiments,I useda thresholdof 10 timestheexpectedlatency è . memtestthenabortsthecurrentoperationandstartsanew one: theassumptionis that theruntimewill choosea muchlower fidelity for thenewoperation,giventheincreasedmemorypressurein thesystem.

8.1.3 Synthetic load generators

JustasI usecputestandmemtestto mimic adaptivemulti-fidelity applications,I usesyn-thetic load generatorsto mimic the non-adaptive, non-interactive backgroundload in thesystem.Thisallowsusto runexperimentsundervariableloadconditionswithout thecom-plexity andnonrepeatibilityof runningrealbackgroundapplications.I usethesesyntheticloadgeneratorsto measuretheadaptive behaviour of bothsyntheticandrealapplications.Theseload generatorsaresimilar in motivation to host load playback [24] andnetworktracemodulation[72], but simpler: I do not replaytracesof real load,but only syntheticones.

cpuloadcanemulateany integer-valuedprocessorloadaverageÄ : it spawns Ä threads,eachof which spinsin a tight loop. It takestwo parameters:the desiredload Ä , andthedesiredexecutiontime è . I generatedtime-varyingloadpatternswith scriptsthatrepeatedlyinvokecpuloadwith differentparametervalues.

memloadtakesthreeparameters:theamountof memory ì to use,theexecutiontimeè , andthe refreshperiod Ç . Theprogramallocatesa contiguousvirtual memoryblock ofsize ì , anddirtieseachpagein thisblockevery Ç seconds.In my evaluationI usearefreshperiodof 1s. Onmy experimentalplatform,this is sufficient to keepall theallocatedpagesactivewith a low CPUoverhead:theinitial allocationandpage-incostsabout0.05msperpage(4KB), anda 1s periodicrefreshof a 64MB block causesa backgroundCPU loadof about0.002%on anunloadedsystem.As with cpuload, I generatedtime-varyingloadpatternsthroughrepeatedinvocationsof theprogram.

I usecpuload and memload to generatesimple squareload waveforms. Thesearesimple, repeatable,andeasyto analyse. I do not intend this load patternto capturethecomplex propertiesof real workloads. RatherI useit to observe the adaptive system’sresponseunderstressfulconditions(frequent,sharptransitions).I alsousethesewaveformsto studytheeffectof frequency andamplitudeof loadvariationonmulti-fidelity adaptation.


0

0.2

0.4

0.6

0.8

1

0.5 1 1.5

Util

ity

Latency (sec)

L opt

=0.

9 se

c

Thegraphshowshow utility varieswith latency, for a latency targetof 1sanda toleranceof 10%.Thedottedline shows theoptimallatency íïîØðÉñ , 0.9s in this example.

Figure8.1: Sigmoidutility function

8.1.4 Evaluation metrics

Theprimaryaim of this thesiswork is to improve theperformanceof interactive applica-tions: to boundthelatency of interactiveoperations,andto reducethevariability in latency.To do this, thesystemmustmustmatchthedemandfor resources— CPUandmemory—to theavailablesupply. Thegoalis not to minimizedemandby runningat thelowestavail-ablefidelity, but to keepthelatency within user-specifiedconstraintswithoutunnecessarilysacrificingfidelity. Toohigha latency degradesinteractivity; too low a latency indicatesanunnecessarilylow fidelity. Similarly, aggressivememoryusagecausesthrashing,while ex-cessiveconservatismwastesopportunitiesto improvefidelity. For this reason,my primaryevaluationmetric is not the meanvalueof latency or memorydemand,but its deviationfrom theoptimalvalue.

The optimal latency ä ¼Ø¤Fã for anoperationis easilyderived from theuser-specifiedla-tency constraintandtolerance.E.g.,a1s constraintwith a10%toleranceis expressedasasigmoid(Figure8.1) whosetolerancezoneis òó�%²^�%�E�\²ô��õ : theutility startsdroppingat 0.9s,andis nearzeroat 1.1s. ä ¼Ø¤Fã is 0.9s, thepointat which theutility beginsto dropsharply.

In a memory-boundapplication,it is not desirableto aim for theuser-specifiedlatencyä ¼Ø¤Fã . The effect of memorydemandon latency is non-linear: whenthereis no memorycontention,theeffect is nearzero,andwhenthereis memorycontention,latency increasesrapidly with small increasesin memorydemand.Theoptimalmemorydemandì ¼Ø¤Fã

is atthekneeof this curve: thehighestpossiblememorydemandthatcausesno swapactivity.This is preciselythequantitythatthememorysupplypredictor(Section5.5.2)estimatesatthestartof eachoperation.

ì ¼Ø¤Fãis not known a priori: how thendo we know whetherour supplypredictores-

timatesit accurately?We caneasilydetectoverestimatesby observingthe swap activityinducedin thesystem;but underestimatescannotbedetectedwithoutknowing ì ¼Ø¤Fã

. Even


in theabsenceof explicit backgroundload(i.e.,memload), wecannotassumethat ì ¼Ø¤Fãis

equalto the total physicalmemoryin thesystem:thereis alwayssomememorydemandfrom thekernel,theX server, andvariousbackgroundtasks.

In my evaluation,I determineì ¼Ø¤Fãpostfacto: afterall the trials of any givenexperi-

ment,I find the highestmemorydemandthat the applicationwasableto sustainwith noswapactivity. I assumethatthis valueis optimalfor operationswhenmemload is not run-ning. If therewasa backgroundload of ìöÓ Ü during an operation,thanthe optimal valuefor thatoperationis ì ¼Ø¤Fã à�ì÷Ó Ü .

Agility

Wewantadaptationto beagile: to respondrapidlyto changesin resourcesupply. I measureagility by subjectingthe systemto load transitions, both upwardsand downwards,andstudy the system’s behaviour aroundthe transitions. Typically, the systemwill deviatefrom its optimal latency or memorydemandfor someperiodof time, andthenreturntonormaloperation(I definenormaloperationas“within 10%of theoptimalvalue”).

Figure8.2showsthelatency of ahypotheticalapplicationovertime. Whenlatency is inthe“targetzone” (betweenthetwo horizontallines), thesystemis in “normal operation”;otherwise,it is “adapting”.Thegraphshows threedistinctadaptivephases:

� initial convergence(phase‘A’ in the graph): after applicationstartup,the time forwhich latency is outsidethetargetzone.� adapt-down(phase‘B’ in the graph): after an upward load transition,the time forwhich latency is above thetargetzone.� overshootcorrectionafter adapt-down(phase‘C’ in the graph): after adapt-down,thetime (if any) for which latency is below thetargetzone.

In addition,I definetwo kindsof adaptivephasenot shown in thegraph:

� adapt-up: aftera downwardloadtransition,thetime for which latency is below thetargetzone.� overshootcorrectionafter adapt-up: after adapt-up,the time (if any) for which la-tency is above thetargetzone.

To measureagility, I subjectedasyntheticapplicationto asimple“up-and-down” back-groundload: no load for sometime è , a constantload for time è , andno loadagainfortime è . I thenmeasuredthe lengthsof the adaptive phasesthat follow the upward anddownwardloadtransitions.


0 1 2

0 2 4 6 8 10 12 14 16

Load

Time (sec)

0

0.5

1

1.5

2

Late

ncy

(sec

)

AB

C

The graphdepictspictorially the notion of initial convergence,adapt-down, andovershootcor-rectionphases:it doesnot representthemeasuredperformanceof any realapplicationor system.The bottomgraphshows the backgroundload, which increasesfrom 1 to 2 at T=5s. In the topgraph,thepointsrepresentoperationlatency over time. Thehorizontallinesatheights0.9and1.1demarcatethetargetzone.ThethreesegmentsA, B, andC, correspondto theinitial convergence,theadapt-down, andtheovershootcorrectionperiodsrespectively.

Figure8.2: Adaptation phasesfor a hypothetical application and system

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liersø

Error threshold

f20=30%

E90=43%

Figure8.3: Exampleof deviation plot


Accuracy

To measurethe accuracy of adaptation,I subjectedadaptive applications— GLVU andRadiator— to a time-varying backgroundload (CPU or memory). I thenmeasuredthelatency or memorydemandof eachoperation,andcomparedit to theoptimal value. ForCPU-boundoperations,I measuredthedeviation of latency from ùeúØûFü ; for memory-boundoperations,thatof memorydemandfrom ý÷úØûFü . I reportthesemeasurementsin threeways:

þ a timeline graphof one representative run of the experiment,showing latency (ormemorydemand)of the adaptive applicationover time, aswell asthe backgroundCPUor memoryload.þ a histogramof latency or memorydemandover all the operationsin 5 runsof theexperiment.þ theoutlierdistributiongraph,thebaddeviation frequency ÿ�� , andthecommon-casedeviation

�� over 5 runs. Theseshow the frequency ÿ with which deviation fromtheoptimalresourcedemandexceedssomethreshold

�, asa functionof

�.�

is theunsignedrelativeerror. E.g. the latencydeviation

�� ü�� ù�àæùeúØûFü �ù úçû&ü

where ù is themeasuredlatency and ùeúØûFü is theoptimalvalue. ÿ�� is the frequencywith which we seean

�exceeding0.2(20%);

�� is theerrorthresholdthatbounds90%of observations.Thesemetricsareexactlyanalogousto thoseusedfor predictoraccuracy (Section6.5). In thelattercase,I measuredthedeviationof predictedvaluesfrom observed values;here, I measurethe deviation of observed valuesfrom theoptimal value. Figure8.3 shows an exampleof a deviation plot, andthe interceptscorrespondingto ÿ�� and

�� .

Benefitof adaptation

To evaluatethebenefitof adaptation,I comparedthebehaviour of theadaptiveapplicationundertime-varyingloadto thatof thesameapplicationrunningwithoutadaptation.I reportthe outlier distribution, the baddeviation frequency, and the common-casedeviation forbothconfigurations.

In thenon-adaptive case,theapplicationrunsat a fixedfidelity. In somecases,this isthe maximumpossiblefidelity: e.g. if GLVU doesno multiresolutionscalingat all, thenits resolutionis 1. In othercases,full fidelity is not a viableoption: e.g.,runningradiosityat fidelity 1 would requirefar more memorythan is available on my test platform. Inthis cases,I chosea reasonablevaluefor fidelity: the chosenvaluesarenot necessarilyoptimal for any givenexperiment,but they tell uswhatwe canrealisticallyexpectfrom anon-adaptiveapproach.


8.2 Adaptation to CPU load

8.2.1 Agility

To measurethe agility of the CPU supplypredictor, I measuredthe operationlatency ofcputestrunninga largenumberof consecutiveoperations.Thebackgroundloadwasgen-eratedby cpuload: 30s of no load,30s of a loadof 1 (i.e., oneCPU-boundprocess),and30s of no loadagain. I seta latency constraintof 1s with a 10%tolerance,which meanstheoptimallatency ù úçû&ü is 0.9s.

Figure8.4 shows the resultsof this experiment.Figure8.4(a)shows theoperationla-tency over time for a single 90s run. We seethe initial convergence,adapt-down, andadapt-upphases,but no overshoot.During adapt-down, oneoperationsuffereda large in-creasein latency: this wastheoperationin flight whenthe loadtransitioned.We alsoseethatwhenthereis backgroundload,thelatency constraintis maintained,but thebehaviouris noisier. This is causedby thegranularityof CPUschedulingin Linux: competingappli-cationsreceive quantaof 200msat a time, andsocputest’s CPU-shareover a 1s period,whenthe backgroundload was1, could vary between45% and55%. Whenthe systemwasunloadedtheCPUsharewasconstant(closeto 100%)andthe latency constraintwasmaintainedexactly.

Figure 8.4(b) shows the length of eachadaptive phase,averagedover 5 runs. Fig-ure8.4(c)shows how theoperationlatenciesaredistributed.We seethatthevastmajorityarearound0.9s, asexpected:deviationsoccuronly at andaroundload transitions.Fig-ure 8.4(d) shows the outlier frequency distribution. 10% of the time, the error is above9.2%(

�� is 9.2%);5.4%of thetime, it is above20%( ÿ�� is 5.4%);and0.8%of thetime,it is morethan40%.This0.8%correspondsto thesingleoperationperrunthatwasin flightduringtheupwardloadtransition.

8.2.2 Accuracy: CPU adaptation in GLVU

In orderto measuretheaccuracy andeffectivenessof adaptationin GLVU, I useda traceofausernavigatingthe“Notre Dame”scene(Section7.3.3).Thebackgroundload,generatedby cpuload, wasasquarewaveform:10speriodsof noloadalternatingwith 10speriodsofload1. I usedautility functionthatincreaseslinearlywith fidelity, anda latency constraintof 1s� 10%, which givesus a target latency ( ùeúØûFü ) of 0.9s. Thesevaluesrepresentthebaselinecase: later in this sectionI show the effects of varying the input scene,targetlatency, loadtransitionfrequency, andpeakload.

Figure8.5showstheresultsof thebaselineexperiment.Figure8.5(a)showsapplicationperformanceover time for onerepresentative run: the fidelity chosenfor eachoperation,its latency, andthebackgroundload. We seethat thatmostoperationshada latency closeto optimal (0.9s), with somedeviation dueto variationin CPUsupplyanddemand.Fig-

8.2. ADAPTATION TO CPULOAD 127

0 1 2

0 10 20 30 40 50 60 70 80 90

Load

Time (sec)

0

0.5

1

1.5

2

Ope

ratio

n la

tenc

y (s

ec)

�

ActualPredicted

(a) Timeline

Typeof adaptation Time OperationsInitial convergence 2.0s (0.8s) 2.4 (1.1)Adapt-down 2.3s (0.9s) 2.0 (0.7)Overshootcorrectionafteradapt-down 0.0s (0.0s) 0.0 (0.0)Adapt-up 2.8s (0.4s) 3.8 (0.4)Overshootcorrectionafteradapt-up 0.0s (0.0s) 0.0 (0.0)

(b) Adaptationtimes

0%

20%

40%

60%

80%

100%

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Fre

quen

cy�

Latency (sec)

(c) Latency distribution

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100% 120%

Out

liersø

Error threshold

f20=5.4%

E90=9.1%

(d) Outlier distribution

Thetimelineshowslatency andbackgroundloadfor onerunof cputest. Thetableshowsthemeanlength(andstandarddeviation) of thedifferentadaptive phases,over 5 runs. Thebottomgraphsshow thelatency distributionandthedeviation from thetarget(0.9s), for 5 runs.

Figure8.4: Agility of adaptation to CPU supply


0 1 2

0 50 100 150 200 250 300

Load

Time (sec)

0

1

2

3

Late

ncy

(sec

)

0

1F

idel

ity�

(a)Timeline

0%

5%

10%

15%

20%

25%

30%

0 0.5 1 1.5 2 2.5

Fre

quen

cy�

Latency (sec)

(b) Latency distribution

0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liersø

Error threshold

f20=30%

E90=43%

(c) Outlierdistribution

Thetop graphshows fidelity, latency andbackgroundloadfor onerun of the“Notre Dame”usertrace. The lower horizontalline shows the optimal latency (0.9s) andthe higheroneshows theupperlimit of thelatency constraint(1.1s). Thebottomgraphsshow the latency distribution andthedeviation from theoptimallatency (0.9s) over5 runs.

Figure8.5: Adaptation in GLVU


ure8.5(b)showsthedistributionof operationlatenciesaggregatedover5 runsof theexper-iment: weseethatmostof thedeviation is on thelow sideof ùeúØûFü . Figure8.5(c)showstheoutlier distribution: 30%of theoperationsdeviatedfrom thetargetby morethan20%( ÿ��is 30%)andonly 10%by morethan43%(

��is 43%).

Benefitof adaptation

In orderto measurethebenefitof adaptation,I comparedtheresultsof thesameexperimentwith threedifferentconfigurations:

þ no adaptation: GLVU did not adaptat all, but alwaysrenderedthesceneat full reso-lution, i.e.,with fidelity 1.þ demandadaptationonly: GLVU adaptedits CPUdemandto meetthe1s constraint,but ignoredthevariationin supplyþ supplyanddemandadaptation: this is thebaselinecasefrom theprevioussection.

Figure8.6 shows the resultsof theseexperiments:for eachconfiguration,I show thetimeline from onerun andthe latency distribution over 5 runs. With no adaptationat all,therewasa very wide variationin latency: moreover, theapplicationnever met its latencyconstraint,but alwayshada latency above 1s. With demand-onlyadaptation,theapplica-tion wasableto limit eachoperationto 1s of CPUtime. This gave usthedesiredlatencywhenthe systemis unloaded,but twice the desiredlatency undera load of 1. Whenthesystemadaptedto bothsupplyanddemand,latency wasclusteredaroundtheoptimal0.9s.Note thatwith a closed-loopworkload,high latenciesalsoresultin a longertime to com-pletion: thusthenon-adaptivecasetook 1000s, thedemand-onlycase400s,andthefullyadaptivecase300s to executethesameworkload.

Figure8.7 shows theoutlier distributionsfor 5 runsof eachof thethreecases:we seethatbothsupplyanddemandadaptationcontributeto reducingdeviation from the latencyconstraint.

Variation with input data

Figure8.8showstheoutlierdistributionfor GLVU runningauserpathtraceonfourdifferentscenes,with 5 experimentalrunsper scene.We seethat the distribution is similar in allcases:thesystemmaintainsits latency constraintequallywell acrossthedifferentscenes.

Variation with target latency

When the application’s latency constraintchanges,the time scaleat which the systemadapts,andthusits adaptive behaviour, alsochange.Figure8.9 shows the outlier distri-bution for threedifferenttarget latenciesin additionto thebaseline1s case:0.25s, 0.5s,


0

1

2

3

4

5

6

7

8

0 200 400 600 800 1000

Late

ncy

(sec

)

Time (sec)

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

0 1 2 3 4 5 6 7 8

Fre

quen

cy�

Latency (sec)

(a)No adaptation

0

1

2

3

4

5

6

7

8

0 50 100 150 200 250 300 350 400

Late

ncy

(sec

)

Time (sec)

0%

5%

10%

15%

20%

25%

30%

35%

40%

0 1 2 3 4 5 6 7 8

Fre

quen

cy�

Latency (sec)

(b) Demandadaptationonly

0

1

2

3

4

5

6

7

8

0 50 100 150 200 250 300

Late

ncy

(sec

)

Time (sec)

0%

5%

10%

15%

20%

25%

30%

0 1 2 3 4 5 6 7 8

Fre

quen

cy�

Latency (sec)

(c) Supplyanddemandadaptation(baseline)

The left-handgraphsshow latency during one run of the “Notre Dame” trace. The right-handgraphsshow the latency distribution over 5 runs.Thegraphsshow threedifferentconfigurations:no adaptation,demandadaptationonly, andbothsupplyanddemandadaptation.

Figure 8.6: Latency in GLVU: with and without adaptation


0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liers�

Error threshold

No adaptationDemand only

Supply and demand

(a)Outlier distribution

Configuration ÿ�� No adaptation 100% 494%Demandadaptationonly 44% 116%Supplyanddemandadaptation 30% 43%

(b) Baddeviation frequency andcommon-casedeviation

Figure8.7: Latency outliers in GLVU: with and without adaptation


0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liers�

Error threshold

Taj MahalCafé


Scene Numberof polygons ÿ�� Taj Mahal 127406 30% 40%Cafe 138598 26% 41%NotreDame 160206 30% 43%BuckinghamPalace 235572 35% 47%

Figure8.8: Outlier distrib ution for different 3-D scenesin GLVU


0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liers�

Error threshold

0.25 sec0.5 sec

1 sec2 sec

Latency constraint ÿ�� 0.25s 48% 68%0.5s 30% 39%1.0s 30% 43%2.0s 38% 48%

For the2s latency constraint,32.8operationson average(out of 324)wereat full fidelity but hada latency lower thanthetarget1.8s: thelatency of theseis countedas1.8s.

Figure8.9: GLVU outlier distrib ution for different target latencies


and2s,correspondingto ùeúØûFü valuesof 0.225s,0.45s,and1.8s respectively. Thesevaluesspantherangeof reasonablefidelitiesandinteractive responsetimesfor GLVU on thetestplatform.

Performancewasbestat the 0.5s and1s time scales:both higher and lower valuesincreasedtheamountof deviation:

þ At time scalesbelow 0.5s, thegranularityof processorschedulingcausesdeviation.On Linux, CPU-boundprocessesgetquantaof 0.2s at a time. Whenthereis aback-groundload of 1, GLVU gets50% of the CPU on average,but in any given 0.2speriod,it mayreceiveanything from 0% to 100%.þ At timescalesabove1s,loadtransitionsappearmorefrequentlyrelativeto thework-load.For 1soperations,andaloadthatvariesevery10s,about10%of theoperationsarehit by a transition,andmisstheir target latency. With a 2s target,twice asmanyoperationsareaffected. At time scalesmuchlarger thanthe load period(10s), wewould expectdeviation to decreaseagain:eachoperationwould seemultiple transi-tions,andanaverageloadof 0.5. I.e., thevariationwould besmoothedout over thecourseof theoperation.However, suchlargetime scalesarenot meaningfulfor thisapplication.

For the2s case,I foundsomeoperationsthattook lessthantheoptimal1.8s,but wereat full fidelity. Theseoperationsdid not sacrificefidelity unnecessarily, and thusdo notrepresenta deviation from optimal behaviour; in the latency and outlier distributions, Ireportthelatency of suchoperationastheoptimalvalue(1.8s).

Variation with background load

We have alreadyseenthat the time scale,i.e., frequency, of backgroundload variationhasan effect on adaptationaccuracy. I measuredthe adaptationaccuracy acrossa rangeof load frequencies:I usedthesamesquare-wave loadpattern,but variedthe time periodbetweenloadtransitions(Figure8.10).Weseethatasthistimeperioddecreases,adaptationaccuracy getsworse,astheprobabilityof a transitionduringanoperationincreases.It isworstat 1s: a loadvaryingat thesamefrequency astheadaptive decision-makingcausesthe most interference.With a further decreaseto a 0.5s time period,accuracy improvessignificantly: now theeffect of loadvariationis smoothedout over thecourseof a singleoperation.

I alsostudiedthe effect of peakload on adaptation:i.e., I kept the square-waveformwith a1speriod,andvariedits amplitude(Figure8.11).As expected,largertransitionsledto worseadaptation:in the“10 process”case(i.e. backgroundloadswitchesbetween0 and10 everysecond),thecommon-casedeviationwas89%.


0

1

2

0 5 10 15 20 25 30 35 40 45 50

Load

Time (sec)

10 sec

0 1 2

Load 5 sec

0 1 2

Load 1 sec

0 1 2

Load 0.5 sec

0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liersø

Error threshold

0.5 sec1 sec5 sec

10 sec

Periodbetweenloadtransitions ÿ�� 0.5s 37% 44%1.0s 80% 64%5s 48% 50%10s 30% 43%

The top graphshows the differentbackgroundload patternsbeingcompared;the bottomgraphandtableshow thelatency deviation for eachof theloadpatterns.

Figure 8.10: GLVU outlier distrib ution for different load transition fr equencies


0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15

0 5 10 15 20 25 30 35 40 45 50

Load

Time (sec)

1 process2 processes5 processes

10 processes

0%

20%

40%

60%

80%

100%

0% 50% 100% 150% 200%

Out

liersø

Error threshold

1 process2 processes5 processes

10 processes

Peakbackgroundload ÿ�� 1 process 30% 43%2 processes 41% 57%5 processes 57% 79%10processes 72% 89%

The top graphshows the differentbackgroundload patternsbeingcompared;the bottomgraphandtableshow thelatency deviation for eachof theloadpatterns.

Figure8.11: GLVU outlier distrib ution for different peak loads

8.3. ADAPTATION TO MEMORY LOAD 137

8.2.3 Summary

I measuredtheagility of CPUadaptationusingasyntheticworkloadandasimple“up-and-down” backgroundload;I foundthatthesystemrespondedto changesin loadwithin 2-3s.I alsoevaluatedtheaccuracy of CPUadaptationin GLVU, by evaluatingthedeviation fromthetargetlatency in variousscenarios.In thebaselinecase,I foundthatadaptationreducedlatency deviation considerably— thecommon-casedeviation wasreducedfrom 494%inthe non-adaptive caseto 43% in the adaptive case— and that both supply anddemandpredictionwerenecessaryto achievethis reduction.I repeatedthisevaluationwith varyingparametervalues,andfoundthat

þ latency deviation wasindependentof the3-D sceneused.þ deviation washighestwhenthe time scaleof loadvariationmatchesthatof adapta-tion: very rapidor veryslow loadvariationhadlittle negative impact.þ latency deviation increasedwith themagnitudeof loadvariation: it was43%whentheloadvariedbetween0–1,and89%whenit variedbetween0–10.

8.3 Adaptation to memory load

8.3.1 Agility

I measuredtheagility of memorysupplypredictionby subjectingmemtestto sharpupwardand downward transitionsin backgroundmemoryload. My measurementsfor memorywereon a time scaleof tensof secondsratherthanseconds.Every second,theLinux VMupdatestheagesof �!�"$# of thesystem’spages(Section5.5.2):thusit takestensof secondsto reflectchangesin load. Sincethe multi-fidelity runtimeusesstatisticsexportedby thekernelto guideadaptation,it cannotadapteffectively at timescalessmallerthanthis.

Eachoperationin memtestconsistsof writing to every pagein a block of size % —where % is a tunableparameterwhosevalueis computedby the multi-fidelity system—and then sleepingfor 5s. Thus the expectedlatency for an operationwith no memorycontentionis just over 5s. To prevent memorycontentionfrom delayingan operationindefinitely, I useda latency constraintviolation callback:any operationtakingmorethan50swasaborted.I subjectedmemtestto an“up-and-down” loadpattern:no loadfor 100s,a loadof 48MB (half thetotal systemmemory)for 300s, andno loadagainfor 100s. Inorderto startfrom a known VM stateeachtime,eachexperimentalrun wasprecededby averyheavy memoryloadto flushall unusedpagesfrom thesystem,followedby a10swaitfor thesystemto quiesce.

Figure8.12(a)shows thebehaviour over time of onerun of this experiment.We seeaTCP-like back-off behaviour: the applicationrampsup its memoryusageup to the opti-mal value ý÷úØûFü (84MB without memload); whenit exceedsý÷úØûFü , it experiencesmemorycontention,backsoff rapidly, and thenbegins to rampup again. When the background


0 10 20 30 40 50 60

0 100 200 300 400 500

Late

ncy

(sec

)

Time (sec)

0 10 20 30 40 50 60 70 80 90

Sw

appi

ng (

MB

s)

&

0 10 20 30 40 50 60 70 80 90

Bg

load

(M

B)

'

0 10 20 30 40 50 60 70 80 90

Mem

ory

dem

and

(MB

)

SuccessfulAbortedOptimal

(a) Time line

Typeof adaptation Time OperationsInitial convergence 6.2s (2.3s) 1.2 (0.4)Adapt-down 60.8s (28.5s) 1.8 (0.4)Overshootcorrectionafteradapt-down 61.7s (20.2s) 11.8 (3.7)Adapt-up 22.0s (8.4s) 2.4 (1.9)Overshootcorrectionafteradapt-up 0.0s (0.0s) 0.0 (0.0)

(b) Adaptationtimes

Thegraphshowsmemorydemand,backgroundload,systemswapactivity,andlatency for onerunof memtest. Thetableshows themeanlength(andstandarddeviation)of thedifferentadaptationphasesover5 runs.

Figure8.12: Agility of adaptation to memory supply


load increasessharplyto 48MB, theoperationin flight suffersheavy memorycontention,exceedsthe latency callbackthreshold,and is aborted. memtest thenadaptsdownwardand resumesthe TCP-like oscillationaroundthe new ý÷úØûFü of 36MB. When the load isremoved,it rampsbackup to theoriginalmemorydemand.Notethatoperationlatency in-creaseswith systemswapactivity, which is my measureof memorycontention.Swappingof memtest’sown pagesincreasesthenumberof timesit muststallonadiskaccess;swap-ping of otherprocesspagesincreasesthelengthof eachstall by introducingcontentionforthedisk.

Figure8.12(b)showsthelengthof eachadaptationperiod,averagedover5 runs.All ofthemareontheorderof tensof seconds,exceptfor theovershootcorrectionafteradapt-up:whenthesystemadaptsfidelity upwardsit backsoff beforeovershootingtheoptimalvalue.

þ whenadaptingupwards,thesystemrely onthefreeandinactivepagecountsexportedby the VM. Inactive pagecountsareupdatedover a 32s period. Freepagecountsareupdatedimmediately, but freepagesarecreatedonly on demand,i.e., whenthefree pagepool becomestoo small. Thusaseachsubsequentoperationusesmorememory, thesystemrespondsby creatingasmallnumberof freepages,andrequiresmany operationsto rampup to theoptimalvalue.þ adaptdownwardsin responseto a loadspike only takesa few operations.However,becauseof memorycontention,thesefew operationstakea long time. This timecanbeboundedusinglatency callbacksandaborts:in fact,theadapt-down time (60s) isdominatedby thelatency of thesingleabortedoperation(50s). For memtest, I couldhave useda moreaggressive (lower) callbackvalue;however, in a real applicationan excessively aggressive valuewould leadto falsecallbacks,if theoperationtooklongerthanexpectedfor reasonsotherthanmemorypressure.

Figure8.13(a)shows a histogramof latency distribution averagedover 5 runs: mostoperationstake 5s,with a tail thatrepresentstheoperationsaffectedby swappingactivity.Figure8.13(b)shows thecorrespondingoutlier distribution: thedeviation of memoryde-mandfrom the optimal value ý÷úØûFü (84MB whenunloaded,36MB whenloaded). ÿ�� is35%,i.e.,35%of operationsdeviatedby morethan20%.Fromthelatency distribution,wecaninfer thatthesystemusuallyerrson thesideof caution:mostoperationsdo not suffermemorycontention,andhavea latency of 5s. I.e.,mostoperationsoccurduringaramp-upphase,ratherthanaback-off.

8.3.2 Accuracy: Memory adaptation in Radiator

To measurethe accuracy of memoryadaptationin Radiator, I ran radiosityrepeatedlyona singlescenewith a latency constraintof 10s. With this constraint,on the testplatform,progressive radiosityis alwaysa betterchoicethanhierarchicalradiosity. ThelatterhasaveryhighCPUdemand,andis only usefulonaveryfastprocessor, or if theusercantoleratevery high latencies.In all of my experiments,themulti-fidelity runtimechoseprogressive


0%

20%

40%

60%

80%

100%

0 5 10 15 20 25 30

Fre

quen

cy�

Latency (sec)

(a)Latency distribution

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100% 120% 140%

Out

liersø

Error threshold

f20=35%

E90=44%

(b) Memoryoutlierdistribution

Thegraphsshowsthedistributionof operationlatency (left), andthedeviationfrom optimalmem-ory demand(right) overfiverunsof memtestsubjectedto asingleupwardandasingledownwardloadtransition.

Figure 8.13: Agility of memory adaptation: latency distrib ution and deviation fr omý÷úØûFü

radiosityfor every singleoperation;in the remainderof this sectionI use“Radiator” and“radiosity” to signify progressiveradiosityalone.

My methodfor measuringeachoperation’s memorydemand— its working set— isbasedon measuringits residentsetsize(Section5.6.1),andis not valid whentheremightbememorycontention.For this reason,thememorydemandmeasurementspresentedherearebasedon the valuespredictedby the demandpredictorfor eachoperation,given itsfidelity: we alreadyknow from controlledexperimentsthat thesevaluesare extremelyaccurate(Section7.4.3).

My benchmarkonly approximatesa realisticuseof radiosity: a real userwould onlyre-runradiosityif shemodifiedthescenein someway. Unfortunately, Radiatordoesnotincludesceneeditingfunctionality, whichpreventsusfrom modifying thesceneon thefly.Theeffect of modifying thescenewould beto introduceadditionalerror into theresourcedemandpredictions.However, thiseffect is certainto besmall:Radiator’smemorydemandis predictablewith low error (1.3%)even with a singlegenericpredictoracrossmultiplescenes,andany error inducedby modificationsto a singlescenewill besmallerthanthis.Sincememorydemandis highly predictable,I amreallyevaluatingtheaccuracy of memorysupplyprediction,andits effectonRadiator.

I useda square-wave backgroundmemoryloadswitchingbetween0 and48MB every200s; theentireexperimenttakes1200s, i.e. 3 cyclesof loadvariation. Theselarge timeperiodsaredeterminedby thelargetimeconstantsof theunderlyingoperatingsystem.TheLinux VM’ s pageagingalgorithmupdatestheageof eachmemorypageonceevery 32s,


andany interestingVM behaviour canonly observedafterseveraliterationsover theentirelist of pages.

As with memtest, I useda latency constraintviolationcallbackto abortoperationssuf-fering from excessive memorycontention,i.e. thrashing.A low callbackthresholdlimitsthe time andresourceconsumedby an abortedoperation. However, too low a thresholdwill causeunnecessaryaborts,dueto underestimatesof CPUdemandratherthanmemorycontention.TheCPUdemandpredictionerror for radiosityis well under50%in thecom-moncase(Section7.4.2): thus,any mispredictionof latency by a factorof 2 or greater, isalmostcertainlydueto memorycontention.HenceI setthe latency callbackthresholdto20s: twice thetargetlatency.

WhenRadiatorreceivesa callback,it mustcleanup the statecreatedby the abortedoperationbeforeit canbegin anew one.In suchacomplex application,thecleanuprequiresmany memoryaccesses,whichaggravatesthememorycontention.A cheapyetcleanabortmechanismwould requireconsiderablechangesto the sourcecode. Instead,I chosethesimplerapproachof killing andrestartingtheRadiatorprocesson abort.This is a feasibleapproachfor Radiatorbecauseit only hassoftstate, which canberegeneratedby readingtheinput datafile andre-executingtheradiositycomputation.

Figure8.14showstheresultsof oneexperimentalrunof Radiatorandmemload. Whenthereis no load, Radiator’s fidelity is closeto 0.08,correspondingto a memorydemandof 55MB. This is significantly lessthanthe memoryavailablein the system:Radiator’sfidelity is limited by the 10s latency constraint: it is CPU-boundrather than memory-bound.

Whenthebackgroundloadincreasesto 48MB, Radiatorbecomesmemory-bound,andmustadaptits fidelity downwards.Weseetheexpectedpattern:

þ The operationin flight when the load increasessuffers heavy memorycontention,andis abortedafter20s.þ Radiatorthenadaptsits memorydemanddown to 35MB.þ It correctstheovershoot,andconvergeson40MB � 5MB: notethattheoscillationislargerthanin theno-loadcase.þ Whenthe load drops,Radiatorrapidly adaptsupwardsbackto the 55MB level, atwhich point it is CPU-boundagain.

I ran the sameexperimentwith Radiatorin a non-adaptive configuration,i.e., with afixedfidelity andwith latency callbacksdisabled.In theadaptivecase,theoptimalfidelitywas0.09 in the absenceof memoryload, and0.02 when loaded: thus, I used0.05 asagoodcompromisevalue.Figure8.15showstheperformanceof non-adaptiveRadiator. Weseethat during the unloadedor CPU-boundphases,thereis no memorycontention,andlatency is around5s, well shortof the 9s optimum: thestaticfidelity is too conservativein this case.Whenthereis load, the applicationis memory-bound,andthe samefidelityis now tooaggressive: theapplicationsuffersheavy memorycontentionandlatencieswellabove 20s. If I hadenabledthe latency callbacksin the non-adaptive case,almostevery


0

5

10

15

20

25

0 200 400 600 800 1000

Late

ncy

(sec

)

Time (sec)

0 5

10 15 20 25 30 35 40 45 50

Sw

appi

ng (

MB

s)

(

0

10

20

30

40

50

60

Mem

ory

dem

and

(MB

)

CompletedAbortedBg load

0

0.1F

idel

ity)

Figure 8.14: Adaptation in Radiator under varying memory load


0 10 20 30 40 50 60 70

0 200 400 600 800 1000

Late

ncy

(sec

)

Time (sec)

0

10

20

30

40

50

60

Sw

appi

ng (

MB

s)

(

0 5

10 15 20 25 30 35 40 45 50

Mem

ory

dem

and

(MB

)

CompletedBg load

0

0.1F

idel

ity)

Figure8.15: Radiator with no adaptation under varying memory load


operationexecutedunder load would have beenaborted;in the adaptive case,only theoperationthathadalreadycommencedwhentheloadincreased,is aborted.Notealsothatin thenon-adaptivecase,thetotal latency of operationsexecutedunderloadis significantlylower thanthe lengthof the correspondingloadperiod: memorycontentionprolongstheinter-operationintervals (which are normally extremely short) as well as the operationsthemselves.

Radiatorthusalternatesbeweentwo phases:CPU-boundandmemory-bound.In orderto evaluatetheaccuracy of adaptation,I considerthetwo phasesseparately. WhenRadiatoris CPU-bound,weexpectlatency to becloseto ùeúØûFü (9s). Whenmemory-boundweexpectmemorydemandto be closeto ý÷úØûFü (39MB for this experiment),andlatency to be lessthan ùeúØûFü .

Figures8.16(a)and8.16(b)show thelatency distributionduringtheCPU-boundphases,for the adaptive andnon-adaptive configurationsrespectively. In the former caselatencyis usuallyaround9s; in the latter, almostalways5s. Figure8.16(c)shows the deviationof theselatenciesfrom the optimal. Whenadaptive, latency is closeto optimal (usuallywithin 10%); whennot adaptive, it is consistentlyat 40% underthe optimal 9s, indicat-ing an unnecessarysacrificeof fidelity (0.05ratherthan0.09). Figure8.16(d)shows thecorrespondingvaluesof ÿ�� and

�� ; it alsoshowstheaveragenumberof successfulopera-tionsperexperimentalrun,andthenumberof abortedoperationsin theadaptivecase.Thenon-adaptivecasecompletedalargernumberof operationsdueto its lowerfidelity andcor-respondinglylower latency. Therewereno abortsin theadaptive casewhenCPU-bound,astherewasnomemorycontention.

Figures8.17(a)and8.17(b)show the latency distributionsduring the memory-boundphases.In theadaptivecase,latency is usuallyunder9s. Sometimesweseehigherlatenciesdueto CPUmispredictions,or to theafter-effectsof a recentlyconcludedmemory-boundphases(thesystemcontinuesto swapfor a while evenafter thebackgroundmemoryloadis removed). The graphdoesnot show abortedoperations:thesehave a latency of 20seach,andthereis onefor eachupward transitionin load(i.e. 3 perexperimentalrun). Inthenon-adaptive case,thefixedfidelity leadsto a fixedmemorydemandof about48MB,whichexceedsthememorysupplyandcausesmemorycontentionandhigh latencies.

Figure8.17(c)showsthedeviationfrom ý÷úØûFü in thetwo configurations.Whenadaptive,memorydemandis usuallywithin 10% of optimal; whennot adaptive, it is consistently25% above optimal. Figure 8.17(d)shows the correspondingÿ�� and

�� , andalso thenumberof successfulandabortedoperations.By adaptingmemorydemand,theadaptivecasewasableto boundlatency, andcompletea largernumberof operations,at thecostofreducedfidelity, andoneabortedoperationperupwardtransitionin memoryload.

Costsand benefitsof aborting

In the precedingexperiments,we saw that a sharpincreasein memory load during anoperationcausesseverememorypressure:my solutionwasto abortthe operationafter a


0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20

Fre

quen

cy�

Latency (sec)

(a) Latency distribution: adaptive

0%

20%

40%

60%

80%

100%

0 5 10 15 20 25 30 35

Fre

quen

cy�

Latency (sec)

(b) Latency distribution: not adap-tive

0%

20%

40%

60%

80%

100%

0% 20% 40% 60% 80% 100%

Out

liersø

Error threshold

AdaptiveNot adaptive

(c) Latency outliers: adaptive andnon-adaptive

Configuration Latency deviation Successfulops Abortsÿ�� Adaptive 2.7% 10.3% 58.6 0Not adaptive 98.0% 38.1% 88.2 n.a.

(d) Baddeviation frequency andcommon-casedeviation

Figure8.16: Memory adaptation accuracyin Radiator: whenunloaded


0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20

Fre

quen

cy�

Latency (sec)

(a)Latency distribution: adaptive

0%

20%

40%

60%

80%

100%

0 10 20 30 40 50 60 70

Fre

quen

cy�

Latency (sec)

(b) Latency distribution: not adap-tive

0%

20%

40%

60%

80%

100%

0% 10% 20% 30% 40% 50% 60%

Out

liersø

Error threshold

AdaptiveNot adaptive

(c) Memoryoutliers: adaptive andnon-adaptive

Configuration Memorydeviation Successfulops Abortsÿ�� Adaptive 0.5% 7.7% 39.8 3Not adaptive 100.0% 23.5% 8.8 n.a.

(d) Baddeviation frequency andcommon-casedeviation

Figure8.17: Memory adaptation accuracyin Radiator: under load


Meantime to abort( * ,+ ú.-�ü ) 20.6s (0.8s)Meantime to recover ( */-��$ú�01�2- ) 49.6s (5.3s)

Thetableshows the frequency andthecostof latency-basedabortingfor Radiator. We show themeanvaluesof 5 experimentalruns,with standarddeviationsin parentheses.Notethatabortsonlyoccurwhenanoperationis hit by anupwardloadtransition.

Figure 8.18: Cost of aborting on latencycallbacksin Radiator

certaintimeoutperiod,andto restartit. However, abortingoperationscomesat thecostofadditionallatency: the time takenby theabortedoperationis wasted,andtheapplicationmustalsospendsometimeto recover from theabortandstartanew operation.Figure8.18shows the costof abortingfor the 5 experimentalrunsdescribedin the previous section.Themeantime to abort * ,+ ú.-�ü is the lengthof theabortedoperationandis almostexactlyequalto thelatency callbackthreshold;themeanrecoverytime *3-4�.��ú.0,��- is thetimeto restarttheRadiatorprocess,andre-readtheinput datafile.

Thetotal costof anabortis in thetensof seconds:however, this is usuallypreferableto continuingexecutionat high fidelity undermemorypressure,which cantake hundredsof secondsandseverelyimpactotherapplicationsin themeantime.

Abort-and-restartis ageneralandpowerful solutionto acommonlyobservedasymme-try in many adaptivesystems:if resourcedemandis too highcomparedto resourcesupply(e.g. if resourcesupplydropssharply),thenprogramexecutionslows down, andthewaittime until the next adaptive decisionpoint canbe arbitrarily long. On the otherhand,ifresourcedemandandfidelity aretoosmall,theapplicationwill reachthenext adaptivede-cision point quickly, andcanrapidly adjustfidelity upwards. E.g.,Nobleet al. [71] haveobserved that an adaptive video streamcanincreasefidelity rapidly whennetwork band-width improves;but if bandwidthdropssuddenly, adaptationis slow: it mustwait for thecurrentsegmentof video to be fetchedat high fidelity beforeit candegradesubsequentsegments.

Aborts canbe usedto recover not only from resourceshortages,but also from bugsandcrashes.The importanceof cleanandcheapabortmechanismshasbeenfrequentlyobserved in a wide rangeof contexts. E.g. RecoverableVirtual Memory [82] providesa transactionalview of memoryoperationsin Unix processes,with theability to abortorcommit a seriesof modificationsto programmemory. My observation is that for single-threadedapplicationswith soft state,anefficient abortmechanismcanbeusefulby itself,evenwithout supportfor persistenceor serializability.

Of course,adaptationthroughabortingcomesat a cost. Abort andrecovery consumetime andresources,which mustbebalancedagainstthe potentialsavings. Dependingonthe implementation,abortsmay alsohave a visual impacton the user. E.g., my simple“kill-and-restart”mechanismcausesRadiator’swindow to disappear, reappear, andremainblankwhile the input file is beingread. Any adaptive policy mustbalanceall thesecosts


againstthepotentialbenefitsof aborting.Thelatency callbackthresholdsusedin my pro-totypeprovide anextremelysimpleform of suchcost/benefitanalysis.More sophisticatedmethodswoulddynamicallypredicttherelativecostsof abortingvs. non-aborting,in termsof latency aswell asotherresourcesanduserdistraction.

Variation acrossscenes

Theresultsin thepreviousexperimentwerefor asinglescene:“dragon”. How wouldtheseresultsvary from onesceneto another?Weknow thatmemorydemandvarieswidely fromonesceneto another, but in all casesis apredictablefunctionof polygoncountandfidelity.Thuswe expect that while the valueof the optimal fidelity ÿ6úØûFü will vary from scenetoscene,theaccuracy of adaptationwill besimilar.

Figure8.19 shows the latency andmemorydeviation for the unloaded(CPU-bound)and loaded(memory-bound)phasesrespectively. We seethat, for all scenes,latency iswithin 10.3%of optimal90%of thetimewhenCPU-bound,andmemorydemandis within32%of optimal90%of thetimewhenmemory-bound.Figure8.20showsthebaddeviationfrequency andcommon-casedeviationcorrespondingto thegraphs.

Figure8.21shows the distribution of latency for the CPU-boundandmemory-boundphases.We seethat,whenCPU-bound,latency is closeto theoptimal9s. Whenmemory-bound,it is usuallybelow 9s: i.e.,whenthereis deviationfrom theoptimalmemoryusage,it is usually on the conservative side, and the systemavoids thrashing. We seea widespreadin latency dueto variationin CPUdemandacrossscenes:recall thatCPUdemandis data-specific(Section7.4.2).

Scalingthe experiment

The experimentdescribedat the beginning of this sectionshowed Radiatoradaptingitsfidelity to memoryload;however, thehighestfidelity everachievedwas0.1(Figure8.14).This is becausethelimited memorysupplyon themobilehardwareplatform(96MB) andthe large numberof polygonsin the scene(109K polygons)prevent operationat higherfidelity, evenwhenthereis no load.

Mobile devices,however, aregrowing morepowerful every day. Moreover, whenwehave connectivity to a fastcomputeserver, we might wish to executetheradiositycompu-tationremotely, thusachieving a higherfidelity. To verify thatmulti-fidelity adaptationisequallyeffective at high fidelitieson a fastserver machine,I repeatedthebaselineexperi-ment— Radiatorexecutingprogressiveradiosityrepeatedlyonthe“Dragon” scene— onafastserver(2.2GHzIntel Xeonprocessor, 512MB of memory).I usedasimilar, but higher,backgroundmemoryload: asquarewavepatternalternatingbetween0MB and256MB.

Figure 8.22 shows a timeline for one run of this “server” experiment. We seeverysimilar behaviour to the“mobile” case:


0%

20%

40%

60%

80%

100%

0% 10% 20%

Out

liers5

Error threshold

DragonWhaleBunny

CarPolar bear

Human bust

(a) Latency outliers:whenCPU-bound

0%

20%

40%

60%

80%

100%

0% 10% 20% 30% 40% 50% 60%

Out

liers5

Error threshold

DragonWhaleBunny

CarPolar bear

Human bust

(b) Memoryoutliers:whenmemory-bound

Figure 8.19: Memory adaptation accuracy for different scenesin Radiator: outliergraphs


Scene Latency deviation Memorydeviation(whenCPU-bound) (whenmemory-bound)ÿ�� ÿ��

Dragon 2.7% 10.3% 0.5% 7.7%Whale 2.7% 9.7% 1.6% 15.3%Bunny 2.5% 9.2% 14.2% 24.9%Car 2.4% 8.9% 18.4% 31.3%Polarbear 2.3% 8.6% 14.2% 27.2%Humanbust 1.9% 8.6% 20.5% 31.7%

Figure8.20: Memory adaptation accuracyfor different scenesin Radiator: bad devi-ation fr equencyand common-casedeviation

0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20

Fre

quen

cy�

Latency (sec)

(a)CPU-bound

0%

20%

40%

60%

80%

100%

0 5 10 15 20 25

Fre

quen

cy�

Latency (sec)

(b) Memory-bound

Figure 8.21: Memory adaptation accuracy for different scenesin Radiator: latencydistrib ution


0

5

10

15

20

25

0 200 400 600 800 1000

Late

ncy

(sec

)

Time (sec)

0

50

100

150

200

250

Sw

appi

ng (

MB

s)

(

0

50

100

150

200

250

300

350

Mem

ory

dem

and

(MB

)

CompletedAbortedBg load

00.10.20.30.40.50.60.70.80.9 1

Fid

elity)

Figure8.22: Adaptation in Radiator under varying memory load on a fast server


0%

20%

40%

60%

80%

100%

0 2 4 6 8 10 12 14 16 18 20

Fre

quen

cy�

Latency (sec)

(a) CPU-boundphases

0%

20%

40%

60%

80%

100%

0 5 10 15 20 25

Fre

quen

cy�

Latency (sec)

(b) Memory-boundphases

Figure 8.23: Latency distrib ution for Radiator with time-varying memory load on afast server

Configuration Memorydeviation Latency deviation Probabilityof abort(whenloaded) (whenunloaded) (on upwardloadtransition)ÿ�� ÿ��

Fastserver (Xeon) 1.8% 12.7% 20.3% 32.5% 0.8Baseline(ThinkPad) 0.5% 7.7% 2.7% 10.3% 1.0

Figure 8.24: Memory adaptation in Radiator on two different platforms

þ Whenthereis no memoryload,Radiatoris CPU-bound,andlatency is closeto op-timal (9s). On this platform, the optimal latency correspondsto a fidelity slightlyunderthemaximumvalue,which is 1.0.þ Whenmemoryload increases,the operationin flight is sometimesaborted;subse-quentoperationshavevery low memorydemand,andgraduallyrampup towardstheoptimalvalue.þ When the memoryload decreases,Radiatorrampsup its fidelity until it is CPU-boundagain.

Figure 8.23 shows the distribution of operationlatency during the CPU-boundandmemory-boundphases,aggregatedover 5 runsof the experiment. We seeagainthat thelatency whenCPU-boundis closeto the target (9s); whenmemory-bound,it is usuallyunderthe target. Figure8.24 shows the deviation of memorydemandand latency fromoptimal,andtheprobabilityof abortfor this configuration,aswell asthebaselineconfigu-ration.Weseethatthoughdeviation is slightly higher, latency is still usuallywithin 13%ofoptimalwhenCPU-bound.Whenmemory-bound,memoryusageis usuallywithin 33%ofoptimal; from thehistogramwe canseethaterrorsarealmostalwayson theconservativeside,i.e., thesystemavoidsthrashing.

8.4. CONCURRENTAPPLICATIONS — ARCHITECT SCENARIO 153

8.4 Concurrent applications— architect scenario

In theprecedingsections,wesaw how multi-fidelity adaptationbenefitsasingleapplicationcompetingwith a time-varyingbut non-adaptive load,by reducingvariationin interactivelatency andavoiding memorycontention.In this sectionI evaluatetheeffect of adaptationon two interactiveapplicationsrunningconcurrentlyandcompetingfor resources.My aimis to evaluatethebenefitof adaptationby comparingtheadaptive (fidelity is dynamicallychosenby theruntimesystem)andnon-adaptive(fidelity is staticallydeterminedandfixedat runtime)approacheson thesameworkload.

8.4.1 Experimental setup

Theexperimentalsetupemulatesthefollowing scenario:

An architect is showingthe plans for a building to her client, who is using GLVU to“walk through” a modelof the building. During the walkthrough, the client suggestschangesin the placementof windowsor indoor lighting; each time, the architect makesthesechangesandrunstheRadiatorprogramto seetheeffectof themodifications.WhileRadiator is running, the client continueswith the walkthrough; whenthe new scenehasbeenprocessed,it is substitutedfor theold in GLVU. ThusGLVU runscontinuouslyin theforeground;Radiatorrunssporadically in thebackground,whenever thearchitectwishesto recomputethelighting effects.

Thecorefunctionality— renderingandradiosity— requiredto implementthisscenariois presentin GLVU and Radiator. However, thereare aspectsof the scenariothat theseapplicationsdonot support:

þ GLVU cannotreadtheannotatedscenesproducedby Radiator, andsoI couldnot feedtheradiosity-annotatedsceneinto GLVU.þ Radiatordoesnot offer any tools for sceneediting: thus I wasconstrainedto runradiosityon the samescenerepeatedly, sincetherewasno way to modify it on thefly.

Theexperimentalsetupattemptsto mimic thearchitectscenariosubjectto theselimitations.I ran the “Notre Dame” benchmarkon GLVU: to mimic the continually-runningvirtualwalkthrough,I ranGLVU continuouslyon theuserpathtrace,with a 1s latency constraint.Therecomputationof lighting effectswasrepresentedby sporadicallyrunningRadiatorinthebackground,alsoonthe“NotreDame”scene.EachRadiatoroperationranaprogressiveradiositycomputationwith a 10s latency constraint.Betweenoperations,Radiatoridledfor a “think time” chosenrandomlyfrom therange0–10s.

I ranthis experiment5 timesoneachof 4 configurations:

1. Both adapt: both GLVU andRadiatorusemulti-fidelity adaptationto matchfidelity


to their latency constraint.2. GLVU adapts: GLVU aloneadapts,Radiatorusesastaticpolicy of settingthefidelity

to 0.05for all operations.3. Radiator adapts: Radiatoraloneadapts,GLVU usesa static policy of setting the

fidelity to 0.54. Neitheradapts: GLVU andRadiatorusedfixedfidelitiesof 0.5and0.05respectively

Thestaticfidelity valuesusedherewerechosento representreasonablehand-optimizedvalues.I.e.,GLVU’s latency for the“Notre Dame”sceneatafidelity of 0.5is about1swhenthereis noCPUload.Thelatency will, howevervaryovertimeevenwithoutCPUload:wehavealreadyseenthatGLVU’s CPUdemandatagivenfidelity varieswith cameraposition.A fidelity of 0.05for Radiatorgivesusa latency of around6s whenGLVU is not running,andavoidsmemorycontentionwhenGLVU is running.While thesevaluesmightnotbetheoptimalstaticvaluesfor any particularbenchmark,I believethey representthebestthatwecanreasonablyexpectfrom astaticapproach.In Section8.4.2I will show thatbenchmark-specificfidelity tuning canachieve the desiredmeanperformancefor any workload,butthatperformancevariability canonly bereducedby dynamicfidelity adaptation.

This experimentalsetupavoidsmemorycontentionbetweenGLVU andRadiator:thus,both applicationsare adaptingbasedon the CPU resourcealone. This also avoids theneedfor abortsin thecaseof Radiator:thoughI hadenabledlatency callbacksof 20s forRadiatorwhenadaptive,they werenever triggered.

8.4.2 Experimental results

Figure 8.25 shows one representative run of the architectscenario,in eachof the fourconfigurations.I show GLVU’s fidelity andlatency, andRadiator’sfidelity andlatency overtime. Note that when GLVU adapts,the latency is usually around1s; when it doesnotadapt,latency is highly variable.Multi-fidelity adaptationsubstitutesvariability in fidelityfor variability in latency, by adaptingto changesin CPUdemandasGLVU movesthroughtheusertrace,andto changesin CPUsupplycausedby thesporadicexecutionof Radiator.

Radiator’s latency doesnot varymuch,evenwhenit is notadaptive. This is because:

þ Radiator’s CPUdemanddoesnot changeover time,sinceit executesthesamecom-putationon thesameinput data.þ At thetime scaleof 10s, thebackgroundloadcausedby GLVU is constant,anddoesnotvary from oneradiositycomputationto thenext.

However, thestaticallychosenfidelity convergesto a latency of 14s,whereastheadaptiveconfigurationconvergesto theoptimal9s.

Figures8.26and8.27show thelatency distributionsfor GLVU andRadiatorfrom 5 runsof the experimentin eachconfiguration.As we expect,we seeGLVU’s latency clusteredaround1s whenadaptive, andhighly variablewhennot. Radiatoris consistentlyat 9s


0

10

20

30

0 50 100 150 200 250 300Rad

. lat

ency

(se

c)

Time (sec)

00.020.040.060.08 0.1

Rad

. fid

elity6

0 1 2 3 4 5

GLV

U la

tenc

y (s

ec)

7 0

0.2 0.4 0.6 0.8 1

GLV

U fi

delit

y

(a)Both adapt

0

10

20

30

0 50 100 150 200 250 300Rad

. lat

ency

(se

c)

Time (sec)

00.020.040.060.08 0.1

Rad

. fid

elity6

0 1 2 3 4 5

GLV

U la

tenc

y (s

ec)

7 0

0.2 0.4 0.6 0.8 1

GLV

U fi

delit

y

(b) GLVU adapts

0

10

20

30

0 100 200 300 400 500 600Rad

. lat

ency

(se

c)

Time (sec)

00.020.040.060.08 0.1

Rad

. fid

elity6

0 1 2 3 4 5

GLV

U la

tenc

y (s

ec)

7 0

0.2 0.4 0.6 0.8 1

GLV

U fi

delit

y

(c) Radiatoradapts

0

10

20

30

0 100 200 300 400 500 600 700Rad

. lat

ency

(se

c)

Time (sec)

00.020.040.060.08 0.1

Rad

. fid

elity6

0 1 2 3 4 5

GLV

U la

tenc

y (s

ec)

7 0

0.2 0.4 0.6 0.8 1

GLV

U fi

delit

y

(d) Neitheradapts

NotethatwhenGLVU is adaptive, therearefewer pointson theRadiatortimeline: this is becausethe entireexperimenttakesmuchlesstime whenGLVU is adaptive, i.e., it runsthroughthe usertracemuchfaster.

Figure 8.25: GLVU and Radiator in architect scenario


0%

10%

20%

30%

0 1 2 3 4 5

Fre

quen

cy�

Latency (sec)

(a) Both adapt

0%

10%

20%

30%

0 1 2 3 4 5

Fre

quen

cy�

Latency (sec)

(b) GLVU adapts

0%

10%

0 1 2 3 4 5

Fre

quen

cy�

Latency (sec)

(c) Radiatoradapts

0%

10%

0 1 2 3 4 5

Fre

quen

cy�

Latency (sec)

Outliers (> 5 sec): 0.06%

(d) Neitheradapts

Figure8.26: Latency distrib utions for GLVU: architect scenario


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

0 5 10 15 20 25 30

Fre

quen

cy�

Latency (sec)

(a)Both adapt

0%

10%

20%

30%

40%

50%

60%

70%

80%

0 5 10 15 20 25 30

Fre

quen

cy�

Latency (sec)

(b) GLVU adapts

0%10%20%30%40%50%60%70%80%90%

100%

0 5 10 15 20 25 30

Fre

quen

cy�

Latency (sec)

(c) Radiatoradapts

0%

10%

20%

30%

40%

50%

60%

70%

80%

0 5 10 15 20 25 30

Fre

quen

cy�

Latency (sec)

(d) Neitheradapts

Figure8.27: Latency distrib utions for Radiator: architect scenario


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 50% 100% 150% 200%

Fre

quen

cy8

Error threshold

Both adaptGLVU adapts

Radiator adaptsNeither adapts

(a)Latency outlierdistribution: GLVU

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0% 20% 40% 60% 80% 100%

Fre

quen

cy8

Error threshold

Both adaptGLVU adapts

Radiator adaptsNeither adapts

(b) Latency outlierdistribution: Radiator

Adaptation GLVU RadiatorGLVU Radiator ÿ�� ÿ�� Adaptive Adaptive 38% 46% 12% 22%Adaptive Not adaptive 36% 42% 100% 99%Not adaptive Adaptive 87% 296% 5% 9%Not adaptive Not adaptive 89% 311% 100% 46%

(c) Baddeviation frequency andcommon-casedeviation

Figure 8.28: Concurrent GLVU and Radiator: latencydeviation


whenadaptive, and14s whennot. The adaptive casehasa small numberof outliersonthe low side,correspondingto an initial ramp-upof fidelity to thesteady-statevalue. Thenon-adaptive casehasa few outlierson thehigh side,againcorrespondingto thefirst fewoperations,whichareaffectedby thepagingactivity causedby GLVU andRadiatorstartingup simultaneously.

Figure 8.28 shows the deviation of GLVU’s and Radiator’s latency from the optimalvalues:0.9sand9s respectively. Weseethatbothapplicationsseesignificantbenefitfromadaptation.We alsoseethat eachis eachis able to adapteffectively whetherthe otheradaptsor not. I.e., GLVU, whenadaptive,hasthesamebehaviour whetherRadiatoradaptsor not. Similarly anadaptive Radiatoris ableto masktheeffect of GLVU’s adaptation,orlackof it.

WhenGLVU is not adaptive, thereis somedependenceon Radiator’s behaviour: GLVU

performsslightly betterwhenRadiatoris adaptive. This is becauseadaptationshortensRadiator’s “on” periods,while the think timesor “off ” periodsareunaffected: the “on”periodsarethe onesthat causehigh latency in GLVU. Radiator, whennot adaptive, per-forms slightly betterwhen GLVU is not adaptive. This is simply becausewhen GLVU isnot adaptive, the walkthroughtakessignificantly longer: thus the outlierscausedby theinitial page-inactivity have a smallerweightcomparedto thelargernumberof operationsin steadystate.

Variance

So far, I have focussedon the ability of adaptive applicationsto keeplatency closeto apre-determinedoptimalvalue ùeúØûFü . However, for many interactive applications,variancein latency is alsoimportant[62]: i.e., deviation from theexpected, ratherthantheoptimalvalue. In otherwords,userexpectationsof latency might bebasednot only on thelatencyconstraintthey imposeonanapplication,but alsoon theruntimebehaviour of theapplica-tion itself. A usermight seta latency constraintof 1s,but realisethatthey would preferaconstantresponsetime of 2s to a responsetime of 1s 90%of thetime and3s 10%of thetime.

My currentprototypedoesnot supportsettingexplicit boundson latency variance.However, simply by attemptingto meeta pre-determinedlatency constraint,the systemreducesvarianceasa side-effect. Figure8.29shows thestandarddeviation of GLVU’s andRadiator’s latency for the datafrom the “architectscenario”experiments.In this case,Imeasurethe deviation of latency from the observed meanvalue, ratherthan from a pre-determinedoptimalvalue.

Weseethatadaptationreducesnotonly themeanlatency 9 , but alsothestandarddevia-tion : , i.e. thevariationfrom themean.In thecaseof GLVU, adaptationdecreasesvariationby adjustingfidelity to keepCPUdemandsteady;for Radiator, by avoidingtheinitial high-latency outliersduringstartup.In fact,adaptationreducesnot only theabsolutevariance,but alsothecoefficientof variation :;!<9 : thestandarddeviationscaledby themean.


GLVU’s strategy Radiator’s strategy GLVU’s latency Radiator’s latency9 : :;!<9 9 : :=!<9

Adaptive Adaptive 0.9s 0.3s 0.29 8.5s 0.6s 0.07Adaptive Static 0.9s 0.3s 0.28 14.2s 1.9s 0.13Static Adaptive 2.1s 0.9s 0.44 8.3s 0.4s 0.05Static Static 2.3s 1.0s 0.43 13.1s 1.4s 0.10Optimalstatic Optimalstatic 1.0s 0.44s 0.46 8.4s 0.3s 0.03

Thetableshows themean> , standarddeviation ? , andcoefficientof variation ?A@B> for thelatencyof GLVU andRadiator, over 5 runsof eachconfigurationof the architectscenario.The optimalstaticconfigurationusedafidelity of 0.17for GLVU and0.019for Radiator.

Figure8.29: Concurrent GLVU and Radiator: latencyvariability

Benchmark-specifictuning

Multi-fidelity adaptationreducesmeanlatency by degradingfidelity to theappropriatelevelfor the resourceconditions;it also reducesvariation in latency by compensatingfor dy-namicchangesin resourcesupplyanddemand.Thefirst of thesebenefitscanbeachievedthroughbenchmark-specifictuning: choosingastaticfidelity thatachievesthedesiredmeanlatency, givenaparticularworkloadandbackgroundloadpattern.Suchtuningis unwieldyand impractical: often, we do not know the runtime conditionsthat an applicationwillexperience. Moreover benchmark-specifictuning can bring meanlatency on target, butcannotregulatedynamicvariation.

To validatethis claim, I repeatedthe “architect scenario”experimentin the optimalstaticconfiguration: fidelity for bothapplicationsis fixed,but thefixedvalueis chosentokeepmeanlatency ontargetfor thisworkload.To computetheoptimalstaticfidelity, I usedthemeanfidelity chosenby thefully-adaptiveconfigurationover5 runsof theexperiment:0.17for GLVU and0.019for Radiator.

Thelastline of Figure8.29showsthemeanlatency andvariancefor thisoptimalstaticcase.Now meanlatency is closeto target,asin theadaptivecase.However, GLVU’s latencyis still asvariableasin thenon-adaptive case.For Radiator, theoptimal staticstrategy isin fact the bestone,sinceits resourcesupplyanddemandareconstantduring the exper-iment. The adaptive casedeviatesslightly from optimal during the first few operations,beforeconvergingto thecorrectvalue.Theoptimalstaticstrategy doesnotsuffer from thisproblem,andhasslightly lowervariability.

8.5. SUMMARY 161

8.5 Summary

In this chapterI evaluatedmulti-fidelity adaptationwith both syntheticandreal applica-tions. I showed that thesystemrespondedto sharpchangesin CPU loadwithin seconds,andto changesin memoryloadwithin tensof seconds.With adaptation,latency wasusu-ally within 43%of GLVU’s latency constraintwhencompetingwith asyntheticbackgroundCPUload,comparedto 494%at full fidelity. I comparedtheseresultsacrossmultiple in-put scenes,latency constraints,andbackgroundloadpatterns.I showedthat largerscenesshowedslightly larger deviation; that loadvariationon the sametime-scaleasadaptationhadthemostdestructiveeffectonadaptationaccuracy; andthatlargerpeak-to-troughtran-sitionsin loadcausedlessaccurateadaptation.

Thesystemcanmaintainlatency constraintsassmallas0.5s,but not lower:

þ to reduceCPU demandbelow 0.5s on my testplatform, fidelity would have to bedecreasedto anunusablelevel;þ evenso,thelargeschedulerquantumwould causelargedeviationsfrom thelatencyconstraint.

Moore’sLaw will soonsolvethefirst problem,while thesecondsimplyrequiresmorefine-grainedscheduling.Modernprocessorscaneasilysupportschedulingquantaof 10mswithlittle overhead;I hopethatfutureversionsof theLinux kernelwill supportsmallerquanta,andenableus to achieve the desired“seamless”interactive responsetime of 100ms [12]for virtual walkthrough.

In the caseof Radiator, constrainingmemoryusageis key to good performance. Ialsoshowedthat,with adaptation,Radiator’s memoryusage(with a syntheticbackgroundmemoryload)is within 10%of optimalin thecommoncase,comparedto 38%whenusinga static,conservatively chosenfidelity. I notedthateffectivememoryadaptationrequiresacleanandcheapabortmechanismfor in-flight operationsthatarethrashingthesystembyconsumingtoo muchmemory. My simpleabortmechanism— killing andrestartingtheapplicationprocess— costsusanadditional70s perabortedradiosityoperation.

I alsoevaluatedtheadaptationof GLVU andRadiatorwhenthey ranconcurrentlyandcompetedwith eachother. I showedthatadaptationaidedbothapplicationsin meetingtheirlatency constraint,independentlyof the otherapplication’s behaviour. Finally, I showedthatadaptationhelpsnotonly to maintainlatency constraints,but alsoto reducevariabilityin latency despitedynamicresourcevariation.Benchmark-specifictuningcanprovide theformerbenefitbut not thelatter.

Chapter 9

RelatedWork

Thevillany you teachmeI will execute,andit shallgo hardbut I will bettertheinstruction.

(TheMerchantof Venice,III.i)

Theideaof adaptingfidelity to gainperformanceis anold one:however, to thebestofmy knowledge,theresearchpresentedhereis thefirst to proposea generalcomputationalmodelfor multi-fidelity adaptation.I believe that theresourcemodel,theoperation-basedAPI, andthehistory-basedpredictionof resourcedemandasa functionof fidelity, arealsonovel. Of course,I derived inspirationandborrowedmany ideasfrom otherresearchers,andhopethat this work will influencefuture researchinto mobile andadaptive systems.This chapterpresentsselectedresearchrelevant to oneor moreof the issuesaddressedinthis dissertation.

Section9.1 describeswork that addressessimilar problems,or usessimilar solutions,to thosedescribedin this dissertation.In eachcase,I briefly summarizethe cited work,contrastit with mine,andlist thenovel contributionsof my work.

Section9.2presentsa broaderview of adaptivesoftwaresystems:its purposeis not toenumerateall suchsystems,but to mapout the designspacewith a few examples. Thedesignspaceis decomposedalongseveralaxes;for eachaxis,I indicatethechoicesI madein designingthe multi-fidelity runtimesystem,anddescribea few examplesof differingdesignchoices.

In aninteractivesystem,thetruemeasureof successis userhappiness.Agile andaccu-rateadaptationis of little useif thesystemoptimizesthewrongutility function: for exam-ple minimizing latency whentheuserwishedto maximizefidelity. Section9.3 discussessomeresearchthatcouldhelpusderivegoodutility functions:by quantifyingthenotionofapplication“quality”, andby inferring userpreferencesdynamicallyfrom context.

163

164 CHAPTER9. RELATED WORK

9.1 RelatedWork

9.1.1 Mobile application adaptation

The ideaof fidelity adaptationin resource-constrainedmobile environmentsis not new.Fox [37] hasshown thatdynamicdistillation of web contentcanimprove web downloadtimesby degradingthefidelity of downloadedcontentto matchthenetwork bandwidthandthe capabilitiesof the viewing device. Noble [71] showed that fidelity adaptationin theOdyssey systemenablesa variety of applicationsto maintainperformancedespitevaria-tionsin network bandwidth

My work differspreviousresearchin mobileapplicationadaptationby

þ including many differentkinds of adaptationundera broadunifying concept: themulti-fidelitycomputation.þ demonstratingtheimportanceof history-basedresourcedemandpredictionto effec-tivefidelity adaptationþ showing throughempiricalevaluationthatmulti-fidelity adaptationis well suitedtotheneedsof interactiveapplications.

More recently, de Lara [20] hasshown that component-basedadaptationcanenableadaptationin applicationsevenwithout sourcecodeaccess.This researchis complemen-tary to mine: while I have assumedsourcecodeaccessin my applicationcasestudies,thegeneralideasof multi-fidelity computationandhistory-baseddemandpredictionarealsoapplicableto binary-onlysituations.

9.1.2 Odyssey

The largestcontribution to the concept,designandimplementationof my systemcomesfrom Odyssey [71]. Odyssey demonstratedthe importanceof adaptingto network band-width in mobilescenarios,andproposedthefirst genericAPI for application-awareadap-tation. Themulti-fidelity runtimebuilds on Odyssey, andusesOdyssey’s monitoringandpredictioncodefor its network supplypredictor.

Concurrentlywith my research,JasonFlinn alsoworkedontheOdyssey infrastructure;heextendedit to supportenergy asaresourceandbatterylifetime asaperformancemetric.Thus,at theconceptuallevel, hiscontribution wasthedemonstrationthatapplication-levelfidelity adaptationandremoteexecutioncanhelp in improving batterylifetimes [34, 33].My contribution is the broadconceptof multi-fidelity computation,the API andruntimesystemarchitecturethatsupportsthesecomputations,andthedemonstrationthat togetherthey canimprove interactive responsetimes.

At the implementationlevel, Flinn built thesupplypredictoranddemandmonitor forenergy, the remoteexecutionengine,andthe file cachepredictor[32]; I built the solver,

9.1. RELATED WORK 165

andthesupplypredictorsanddemandmonitorsfor CPUandmemory.

9.1.3 QoSand real-timeoperating systems

QoS-awareoperatingsystemsandmiddleware[67] focuson dividing resourcesacrossap-plicationsto satisfysomenotion of fairness.Thus,eachapplicationreceivesa promisedrate of processingor network bandwidth,or sometimesa fixed share of availableband-width.

In thesesystems,applicationsareexpectedto specifytheir resourceneeds,andadaptcorrectlyto their runtimeresourceallocation.While this maybea straightforwardprocessfor multimediaapplicationssuchasvideo,we have seenin this dissertationthatcomplex,interactiveapplicationsrequireanadditionalstep:mappingapplication-specificparametersto resourcedemand.In someQoSarchitectures,theneedfor suchanexplicit mappingisrealized:e.g.,the “applicationresourceprofile” in Q-RAM [57]. However, to thebestofmy knowledge,this is thefirst work to developandvalidatea generalmethod— history-baseddemandprediction— to generatesuchmappingsor profiles

Thus,while QoSresearchfocusseson fair allocationacrossapplications,my researchenableseffective andautomaticadaptationwithin applications. The two techniquesarecomplementary.

A secondpoint of differenceis in the target applicationdomain: mostQoSsystemsfocusonstreamingmultimediaapplicationssuchasvideo.Theseapplicationscanbechar-acterizedasstreamsof periodic operations,eachof fixed or boundedresourcedemand:thesystem’sobjective is thento make surethatall activestreamscanbesustainedwithoutexcessive delayor jitter. Sucha modeldoesnot fit interactive applications,wherewe careabouttheresponsetime of individual operationsratherthantheaggregatethroughputandjitter of a stream.Further, interactive1applicationsarenot periodic— operationsaretrig-geredby userrequests— nor aretheoperationsalwaysof fixedsize– runtimeparameterscancausewidevariationsin per-operationresourcedemand.

Somereal-timeoperatingsystemsallow a broaderworkloaddefinition,althoughtheyarestill focussedon multimediaapplications.E.g., rate-basedexecution[52] allows taskstreamsthat arenot perfectlyperiodic,but that have someaveragetaskrateandperiod.However, tasksizeis still fixed: thesystemassumesa staticworst-caseresourcedemandestimate.Abeni [2, 3] describesa systemwhereboth tasksizeand interarrival time arerepresentedasprobability distributionsratherthanfixed values. However, the probabil-ity distributionsarestaticallyderived, andthusresourcedemandestimatesarenot madedependentonapplicationruntimeparameters.

In summary, my researchdiffers from that in QoSandreal-timeoperatingsystemsinthreeways:

þ thepresenceof explicit, history-baseddemandprediction


þ theuseof adaptationratherthanallocationto achieveperformancegoalsþ thefocuson interactiveratherthanstreamingapplications

9.1.4 Resourceprediction

Resourcesupplyprediction— i.e. loadprediction— is foundin someform or theotherinevery adaptive systems.Typically, predictionis implicit in thesesystems:offeredloadorothersystemindicatorsareusedascontrolsignalsthatdrive loadbalancing[58], schedul-ing [27], processmigration[28, 45], or resourceallocation[89].

A few systemsuseexplicit resourceprediction. TheRunningTime Advisor [25] pre-dictstherunningtimeof a job onany hostin adistributedsystem,givenits nominalexecu-tion time: i.e., it predictsresourcesupplyandperformance,but theapplicationmustspecifytheresourcedemand.TheRTA is basedonDindaetal.’swork onhostloadprediction[23]usinga varietyof time seriesmodels.For example,autoregressive linearmodels[10] takethevarianceaswell asthemeanof recentobservationsinto account.Suchtechniquescouldimprovetheaccuracy of my relatively simplesupplypredictors(Section5.5).

Resourcedemandpredictionis comparatively rare. Someload-balancingsystems[58,45] estimatetheCPUdemand(executiontime)of a job from ananalyticallyor empiricallyderivedprobabilitydistribution. However, they do not makeuseof any dynamic,instance-specificinformationsuchasthejob’sruntimeparameters:atmost,they usethejob’scurrentlifetime to predictits futureexecutiontime. For regular, loop-basedcodes,staticcompileranalysishasbeenusedto automaticallypartitionprogramsinto subtasksof known resourcedemand[87].

Demandpredictionasa function of runtimeparametersoccursin two systemsthat Iknow of. Abdelzaher’s “automatedprofiling systemfor QoS” [1] dynamicallyestimatestheCPUutilization of a multimediastreamasa linearfunctionof its task(frame)rateandaveragetasksizein bytes. The techniqueused— linear estimationwith forgetting— isidenticalto RecursiveLeastSquares(Section6.3.1).Thepredictionsof utilizationarethenusedfor QoSadmissionscontrolor allocation.

More closelyrelatedto my work is the PUNCH system[53]. PUNCH usesmachinelearningtechniquesto predicttheCPUdemandof aprogramrunasafunctionof application-specificruntimeparameters.This is very similar to history-baseddemandprediction,withmoresophisticated(andexpensive)learningalgorithmsthanlinearestimation.Thedemandpredictionsarenotusedfor applicationadaptation,however, but for loadbalancingin agridcomputingframework.

Thus,while thereis a large amountof work on load predictionandsomeon demandprediction,I believemy systemis thefirst thathasall of thefollowing properties:

þ Demandis predictedfrom empiricaldata,bothoffline andonline.þ Demandis predictedasa functionof theapplication’s runtimeparameters,bothtun-

9.2. DESIGNSPACE 167

ableandnontunable.þ Demandpredictionsareusedto driveadaptationof thetunableparameters.

9.2 DesignSpace

To moreeasilydescribethe large designspacefor adaptive softwaresystems,I have de-composedit into five axes,not necessarilyorthogonal:adaptationvs. allocation, central-izedvs. decentralized, predictionvs. feedback-control, application-awarevs. application-transparent, andgray-boxvs. in-kernel. Sections9.2.1–9.2.5dealwith eachof theseinturn.

9.2.1 Adaptation or Allocation?

Variationin applicationresourcesupplycanbeaddressedat two levels: theOScanchangethe allocation of sharedresourcesacrosscompetingapplications,and applicationscanadapt their behaviour to their currentallocation. The two arecomplementary, anda re-ally effectivesystemwill useacombinationof thetwo techniques.

My approachis an “adaptation-only”one: the systempredictseachapplication’s re-sourceshare,but doesnot attemptto modify it. This hasa numberof practicalbenefits(Section5.1): it facilitatesa gray-box,user-level implementation,it assuresnon-adaptivelegacy applicationsthat their resourcesharewill not be affected,and it doesnot requireguaranteesfrom theunderlyingresources(which aredifficult to obtain,for example,fromwirelessnetworks).This best-effort, adaptation-onlyphilosophyis sharedby othermobileadaptivesystems[37, 71].

A large body of work, which I refer to collectively as “Adaptive QoS”, focussesonallocation: sharingresourcesacrossapplicationsto maximizeoverall utility. Givenanal-location,eachapplicationis expectedto adaptits tunableparametersto maximizeits ownutility. I believethatadaptationandallocationarecomplementary. My resourcemodelandhistory-basedpredictionapproachcanmapresourceallocationsinto utility; theseapplica-tion utility functionscould be usedby a QoS-awareallocatorto maximizeglobal utility.David PetrouandI have proposedanarchitecturefor combinedadaptationandallocation,basedon goodnesshints [74], where

þ applicationsprovide demandpredictors,anduserstheir utility functions,to thesys-temashints.þ the systemperformsa two-level optimization: applicationtunableparametersareadjustedto maximizeapplicationutility given a resourceallocation,and resourceallocationsareadjustedto maximizeglobalutility.


9.2.2 Centralized or decentralized?

My approachto finding optimaladaptiveparametervaluesis to usea centralizedsolveroroptimizer(Section5.3). Lee[57] usesasimilarapproach,anddescribesoptimizationalgo-rithmsthatefficiently maximizeutility: i.e., find optimalor near-optimalparametervaluesundercertainconditions. Incorporatingthesealgorithmsinto my solver couldpotentiallyimprove its efficiency, generality, robustnessandtheoreticalsoundness.

Stratford and Mortier [90] advocatemarket modelsas an alternative to centralizedsolvers.Herethesystempriceseachresourceaccordingto theobserveddemand.Eachap-plicationis givenafixednumberof creditsto spend,andmustallocatethemacrossdifferentresources.NeugebauerandMcAuley [68] have developeda market-basedCPU allocatorbasedon congestionpricing. Thesemicroeconomicapproachesgeneralizeproportional-shareschemessuchaslotteryscheduling[93], whichallocatesresourcesstrictly in propor-tion to creditallotment,andusesadifferentcurrency for eachresource.

Market models,while decentralizedandpossiblymorerobust thansolvers,imposeanadditionalburdenon the applicationprogrammer. Insteadof utility functions,they mustnow derive biddingstrategiesthatallocateapplicationcreditsacrossresources,given theprice of eachresource.I believe goodbidding strategiesareharderto createthangoodutility functions,andknow of no automaticway to derive theformerfrom thelatter.

9.2.3 Prediction or feedback-control?

In Section5.3.2,I discussedtwo differentwaysof doingadaptation:

þ feedback-control: iteratively adjustingfidelity basedon recentperformance.þ predict-and-search: searchingfor theoptimalvalueacrosstheentirespaceof fideli-ties,basedonpredictorsof performanceasa functionof fidelity.

A goodexampleof the first approachis the feedback-driven CPU allocatorof Steereet al. [89]. Thesystemcontinuouslymonitorseachapplication’s throughputor “progress”,andadjustsits CPUshareto maintainasteadyrateof progress.For someclassesof applica-tions,thesystemcantransparentlymeasureprogressby monitoringtheapplication-kernelinterface:thefill levelof asocketbuffercantell uswhethertheproducerandconsumerratesaremismatched.Lutfiyya et al.[59] describea feedback-drivenapproachbasedon probes:embeddedcodemodulesthatmeasureapplication-specificmetricsandtriggeralarmswhenthesemetricsexceedspecifiedbounds.Thesystemreactsby consultingapre-specifiedrulebase:e.g.,“if theinput buffer is full, increasetheCPUpriority of theconsumerprocess”.

By contrast,the researchpresentedin this dissertationusesthepredict-and-searchap-proach,which is basedon two broadtypesof predictors:

þ temporal: usingpastvaluesof a time-varyingsignal C to predictfuturevalues.Theseanswer“what next” questions(“what will CPUloadbeoverthenext 1s?”),andI use

9.2. DESIGNSPACE 169

themfor resourcesupplyprediction.I.e., I assumethatanapplication’sresourcesup-ply in thenearfutureis independentof its adaptiveparameters,but canbepredictedfrom therecentpast.þ parametric: usingpastvaluesof C andassociatedparametervalues D�EGFIHJEK�LHNMNM�MPO , topredict C asa functionof D2EGFIH1EK�LHNMNMNMO . Parametricschemesanswer“what if ” ques-tions(“what wouldCPUdemandbeatfidelity 0.7?”) I usethesefor resourcedemandprediction.

9.2.4 Application-aware or application-transparent?

In multi-fidelity adaptation,as in Odyssey, applicationsare aware and in control of theadaptiveprocess:thesystemprovidessupportfor makingfidelity decisions,but leavesthedecisionsto theapplication.This awarenesscomesat a cost: theapplication’s behaviourmustbe modified,which usually involvessourcecodemodifications. The multi-fidelityAPI is designedto minimize this cost: however, sourcecodemodificationstill presentsabarrierto modifying legacy applications.

Application-transparentadaptationdoesnot requireusto modify theapplications,butis restrictedin its scopeandpower. Suchadaptationusuallyhappensat the systemlevel,i.e., to adjustsystem-wideparametersratherthanapplicationfidelity. Perhapsthe mostfamousexampleis TCP congestioncontrol [51], which is invisible to applicationsusingthestandardsocket interface.Similarly, Coda[55] adaptsto masktheeffect of low band-width or disconnectionfrom file servers,but presentsa standardfile systeminterfacetoapplications.

How canwe achieve applicationadaptationwithout sourcecodemodification?Trans-parentadaptationof datafidelitycanoftenbeachievedthroughproxytechniques. By plac-ing a proxy on the datapathbetweena server anda client, we candegradedatafidelitywithoutmodifyingeither. Fox etal. [37] usethis techniqueto degradewebcontent,asdoesthe multi-fidelity web browser (Section7.5). Component-basedadaptation[20] allowsevenfiner-grainedcontrol,by interceptingthedataflow betweendifferentcomponentsof asingleapplication.

Computationalfidelity adaptation,however, ofteninvolveschangingstatevariablesandexecutionpathswithin a program. Herewe needtechniquesto insertcodeinto applica-tions without modifying their sourcecode. Suchtechniquesare typically usedto insertprofiling, prefetching,or otherperformanceoptimizations;they couldpotentiallybeusedto insertfidelity adaptationcodeaswell. Binary rewriting tools[88, 77, 17] cantransformapplicationswithout sourcecodeaccess.Compiler-basedapproaches[65] requireread-only accessto sourcecode.Sometimes,programminglanguagefeaturescanhelpto reducethe costof sourcecodemodification,thoughnot eliminateit entirely: Changet al. [16]uselanguageextensionsto export parallelizability informationto the system,which thenchoosesadaptiveparametervaluesandthedegreeof parallelismat runtime.


9.2.5 Gray-box or in-kernel?

Themulti-fidelity runtimesystemextendsOSfunctionalitywith servicesnot providedbytheLinux kernel.I choseto build it at user-level to easedeploymentanddebugging.How-ever, to predict applicationresourcesupply, the systemneedknowledgeof the kernel’sresourceallocationdecisions.It infers this informationby translatingload statisticsintopredictionsof resourceallocation.

Arpaci-DusseauandArpaci-Dusseau[6] usetheterm“gray-box” for suchapproaches,which combineknowledgeof kernelinternalswith observationsmadeat userlevel. Theydescribegray-boxtechniquesfor inferringbuffer cachecontents,diskfile layout,andmem-ory availability by usingactiveprobes: inferring resourcestateby attemptingto usetheresource.My approachusespassiveobservations: while lesspowerful, they have loweroverheadsandavoid modificationof the statebeingread(the Heisenberg effect). Gray-box techniquescanpotentiallymodify or controlkernelbehaviour aswell aspredictingit:Changet. al [15] enforceresourceallocationsat userlevel by adjustingprocessprioritiesandpageprotectionsin WindowsNT.

Gray-boxtechniqueseasedevelopmentanddeployability of third-partyOSextensions.Whenthesearenot overridingconcerns,thereareadvantagesto carefullydesignedkernelmodifications.For example,oneproblemwith themulti-fidelity runtimesystemis that itdoesnot properlyaccountfor resourcesspentby thekernelor a sharedserver (e.g. theXserver)on anapplication’sbehalf.New kernelabstractionssuchasresourcecontainers [7]couldsolve this problem.

More generally, SeltzerandSmall [86] advocateself-monitoringandself-adaptingOSkernels.They recommendrecordingtracesandlogsof a largenumberof kernelvariablesandevents,andusingoff-line aswell ason-lineanalysisto correlatethesewith applicationperformance.Theseanalyseswould be usedfor adaptationssuchasmodifying prefetchstrategiesor dynamiccoderecompilation.This is very similar to my philosophyof “mea-sure,log,learn,adapt”(Chapter4), which I useatuserlevel for applicationadaptation.

9.3 Inferring userpreferences

The end-goalof applicationadaptationis to maximizeuserutility. This requiresus toquantify the effect of fidelity andperformanceon utility. In this dissertation,I assumedthatutility functionsareprovidedexplicitly by theapplicationor theuser, whichraisestwoquestions:

þ How canapplicationprogrammersderivegooddefault utility functions?þ How canutility functionsbespecializedto individualusers’needsat runtime?

To creategoodutility functions,wemustquantifytheeffectonusersof differentlevelsof fidelity. This hasbeenwidely studiedfor commonmediaapplicationssuchasvideo.

9.3. INFERRINGUSERPREFERENCES 171

Webster[95] describeshow objective measurementsof video quality can be correlatedwith subjectiveratingsof videoclipsby users.ChandraandEllis [14] recommendusingtheJPEGQuality Factor(JQF)to representtheuser-perceivedquality of a JPEG-compressedimage:for theadaptivewebbrowser(Section7.5),thiswouldmakeoutputqualityidenticalto fidelity. Wilson andSasse[97] measuredthecorrelationbetweenvideoframerateandbiometricindicatorsof stresssuchasheartrate.Detailedstudiessuchasthesearelackingfor adaptive, interactive applicationssuchasaugmentedreality. This is an importantareafor futureresearch:for effectiveadaptation,wemustknow theuser-perceivedeffectof anychangein polygoncount,textures,etc.

Similarly, the effect of latency on usersatisfactionneedsto be quantified. Thereis aconsensusamongHCI researchers[62, 12, 69, 13] that therearethreeimportantlevelsofinteractive response:“perceptualprocessing”(100ms), “immediateresponse”(1s), and“unit task” (10s). However, this doesnot tell ushow to classifyinteractiveoperationsintothesethreecategories,nor how much fidelity we shouldsacrificein aiming for a lowerlatency constraint. Often, an operationis simply too expensive to meeta lower latencytarget,forcingusto “demote”it to aslowercategory. E.g.,to achieve“immediateresponse”in Radiator, we would have to reducefidelity to an absurdlevel, which is why I usethe10s “unit task” constraintinstead.Similarly, my intuition is that the100ms“perceptual”responsetime is theappropriateonefor GLVU; however, theschedulerquantumof 200msmakesthisanimpossiblegoalunderload,andsoI usethe1s“immediateresponse”target.

A well-chosenutility functioncanonly provideagooddefault: userutility will still varyacrossusersandtaskcontexts. In the initial stagesof design,our architect(Section1.1)would wantquick responsesfor rapidprototyping.Later, shemight prefera high level ofdetailandaccuracy, evenat thecostof higherlatency. An appropriatelydesignedinterfacecanallow usersto directly adjustutility functions,but at the costof increasinguserdis-traction.Alternatively, thesystemcaninfer theuser’s preferencesfrom observationsabouttheir context; however, suchinferencesareoftenuncertainandunreliable,andcanleadtobadadaptivedecisions.

Clearly, weneedsomecombinationof thetwo approaches.Horvitz [47] describessucha combination,which hetermsa mixed-initiativeapproach.A mixed-initiativesystemal-lowsusersto directlymanipulatesystemsettings;it alsoadjuststhesesettingsautomaticallyby makinginferencesabouttheuser. Thecostsof explicit userinterventionarecontinuallybalancedagainsttherisksof automateddecision-making,andtheuseris alwaysallowedtooverrideor correctbaddecisionsby thesystem.

Chapter 10

Conclusion

To write historyis sodifficult thatmosthistoriansareforcedto make concessionstothetechniqueof legend.

(ErichAuerbach,Mimesis, ch. I)

Wearable,interactivecomputingrequiressmall lightweightdevices,pervasivewirelessconnectivity, andinteractive applicationswritten for mobile ratherthandesktopusers.Inthe lastdecadewe have seenrapidadvancesalongall threefronts. Soonwe will have themobilehardware,thewirelessnetworks,andtheapplicationsoftware: whatwe will needis operatingsystemsoftwareto mediateeffectively betweenthem.

This dissertationhasshown thatany suchsystemmustsupportapplicationadaptation,andhasproposeda novel framework for suchadaptation,basedon multi-fidelity compu-tation anda supply-demandresourcemodel. It setsforth a modularsystemdesignbasedon this framework, anddescribesits key componenttechnologies:history-basedpredic-tion, passive resourcemonitoring,and the gradient-descentsolver. Througha prototypeimplementationandevaluation,the dissertationshows the systemsignificantly improvesthelatency responseof interactiveapplications.

In this chapter, I summarizetheresearchcontributionsmadeby this dissertation(Sec-tion 10.1);describethecurrentstatusof thesystemandongoingresearch(Section10.2);andexploresomedirectionsfor futureresearch(Section10.3).

10.1 Contrib utions

10.1.1 ConceptualContrib utions

The primary conceptualcontribution of this dissertationis multi-fidelity computation. Itgeneralizesthe classicalnotion of algorithm,wherethe output specificationis fixed butresourceconsumptiondependson the input data. With multi-fidelity computations,both

173

174 CHAPTER10. CONCLUSION

output quality and resourceconsumptionare allowed to vary, and we can dynamicallyexplore the tradeoffs betweenthem. In developingthis conceptI alsoexpandedpreviousdefinitionsof fidelity. I introducedthenotionof computationalfidelity asa complementtodatafidelity; moregenerally, I usethetermfidelity to coverall runtimetunableparametersthataffect outputquality and/orresourcedemand.

Thesecondconceptualcontribution is theresourcemodel, which letsusabstractappli-cationbehaviour asresourcedemand,systemstateasresourcesupply, andperformanceasa costfunctionon supplyanddemand.For theperformancemetricof greatestinteresttous— latency — I showedthatsimplecostfunctionsaresufficient for many applications.

A third contribution is my history-basedmethodology for predictingan application’sresourcedemandasa function of its tunableparameters.This bridgesthe gapbetweenquantitiesrelevant to the application(i.e., fidelity metrics)and thoseunderstoodby thesystem: resourcesupply and demand. It allows us to make predictionsfrom empiricalobservationsinsteadof relyingexclusively onanalyticmodels,andto specializepredictivemodelsto time-or data-dependentvariationin applicationbehaviour.

Fourth, my API designshows how we can make significantchangesin applicationbehaviour with only smallchangesto their sourcecode,by isolatingmostof thenew func-tionality to configuration files andhint modules. Oncean applicationhasbeenmodifiedto usethe multi-fidelity API, third partieswithout accessto sourcecodecancontinuetoexperimentwith differentresourcedemandpredictorsandutility functions.

Fifth, I haveshown thatapredictive, gray-boxapproachto resourcemanagementis fea-sibleandeffective. In otherwords,we cansubstantiallymitigatetheill effectsof resourcevariability without changingtheOS’s resourceallocationpoliciesor modifying its kernelsourcecode.

Finally, my evaluationstrategy highlightsa usefulnew performancemeasure:thede-viation plot. With fidelity adaptation,thereis hardly ever a free lunch: latency can beimprovedonly by sacrificingfidelity, andvice versa.Thusthetruetestof anadaptivesys-tem is to measurehow closelyit tracksuserpolicy, i.e., maximizesutility underresourceconstraints.Whenuserpolicy is expressedasa latency constraint,the latency deviationplot providesuswith a simpleyet informative measureof thesystem’s successin stayingon target.

10.1.2 Artifacts

This dissertationhasproducedonesubstantialartifact: the multi-fidelity runtimesystem.The systemis modularandcontainsseveral components,eachwith valueindependentofthesystemasawhole:

Q The resourcesupply and latency predictorsestimatethe resourceavailability andperformanceof anapplicationin thenearfuture.

10.1. CONTRIBUTIONS 175

Q Theloggertransparentlycollectslogsof applicationresourcedemandto beusedforperformancedebuggingaswell asconstructingresourcedemandpredictors.Q Thesolvercanoptimizeutility acrossmultiple dimensionsof performance,resourcedemand,andoutputquality.Q Theapplications,whichcannow adaptto a varietyof resourcescenariosandperfor-mancerequirements.

10.1.3 Evaluation Results

The experimentalevaluationof this work hascontributedseveral importantfindings forapplicationadaptation.I showedthat,with acarefullydesignedAPI, thecostof applicationmodificationcanbemadesmallif notzero.I alsoshowedthathistory-basedtechniquescaneffectively predictapplicationresourcedemand,especiallywith onlineupdatesanddata-specificprediction.

My evaluationshowsthatthesystemoverheadperoperationrangesfrom 1msto 20ms.Thisoverheadis notnegligible,but acceptablefor operationsof 1sor longer. Theoverheadcanbe reducedby moreefficient ways of readingkernel load statistics,andby a betteroptimizedsolver.

I have shown that gray-box techniquescan accuratelypredict resourcesupply: i.e.,that kernel modification is not necessary. My approachis an intermediatepoint in thedesignspace:it avoidskernelmodificationbut notkerneldependence.My memorysupplypredictordependson theLinux VM andthestatisticsexportedby it; if thesechangein alaterkernel,thepredictormustchangealso.

I showed that the agility andaccuracy of resourcesupplypredictionarecloseto thelimits imposedby the underlyingOS. The CPU supply predictor is accurateover timeperiodsof 1s or greater, and its agility is on the sametime scale. The memorysupplypredictor’s time scaleis in the tensof seconds:it is limited by therateat which VM loadstatisticsareupdated.Adaptingdownwards to a spike in memoryload canbe prolongedfor anotherreason:applicationsthrashwhenthereis memorycontention,slowing themdown anddelayingtheirnext adaptationpoint. Thus,acheapabort-and-restartmechanismis essentialfor memory-intensiveapplications.

I showedthateffectiveadaptationdependsonbothsupplyanddemandprediction:bothcontribute to meetinglatency goals. I alsoshowed that,with adaptation,applicationsaremuch more likely to meetthesegoals. I also showed that the benefitof adaptationforeachapplicationis independentof the adaptive behaviour of other, concurrentlyrunningapplications.Finally, I showed that adaptationreducesvariancein latency in additiontobringingmeanlatency closerto thetargetvalue.


10.2 Curr ent statusand ongoingwork

Work on themultifidelity prototypeis ongoing,andotherresearchershaveaddedcapabili-tiesbeyondthosedescribedin thedissertationsofar.

JasonFlinn, RajeshBalan,SoYoungPark andTadashiOkoshihave portedthesystemto the StrongARM-basedItsy [8] and iPAQ hand-heldcomputers. The Spectraremoteexecutionengine[33] allows multi-fidelity computationsto executewholly or partly onremoteservers; the latestversionof the GLVU virtual walkthroughsystemusesSpectraservicesto do remoterendering. I.e.,whenappropriate,the3-D renderingis doneona fastserverandtheresultingimageshippedbackto theclient. Thiscapabilitywasimplementedby ShaheenGandhi,TadashiOkoshi,Tridib Chakravarty, andNingningHu.

Researchin multi-fidelity adaptationis ongoingaspartof theChromaproject,whosegoalis providing middlewarefor easyandautomaticadaptationandresourcemanagementin mobile applications. Currentwork in Chromais focussedon automaticstubgenera-tion: givenanApplicationConfigurationFile,Chromaautomaticallygenerates“adaptationstubs”to be linkedwith theapplication.Thesestubsreducetheburdenof modifying ap-plications,by providing ahigh-level interfacetailoredto theapplication’sneeds.They alsoinsulateapplicationprogrammersfrom changesin theunderlyingruntimesystemandAPI:thestubgenerator, andnot theapplication,will trackthesechanges.

Chromais partof the largerAura [80] project,which addressesthechallengesof per-vasive computingsimultaneouslyat several softwarelayers. In Aura, a tasklayer calledPrism managesusertasksand the groupsof applicationsthat participatein thesetasks.Theadaptationlayer (Chroma)translatesbetweenapplicationneedsandsystemresources.Lower layersdealwith physicalvariablessuchaslocationandresourceavailability.

10.3 Futur eWork

This work hasopenedup severaldirectionsfor futureresearch.Someof thesearemodest,incrementalimprovementsto thesystem;othersarebroaderandmoreambitiousin scope.HereI explore differentareasof improvement,in eachcaseprogressingfrom short-termgoalsto long-termambitions.

10.3.1 The Application Programming Interface

Themulti-fidelity API is bothsimpleandgeneral.My experiencewith this API hascon-vincedmethatwe shouldretainthesimplicity, but specializetheAPI to theneedsof eachapplication.E.g.,applicationsshouldbeableto refer to tunableandnon-tunableparame-tersby name,insteadof passingthemto thesystemasanonymousarraysof values.Theproblemof translatingbetweena high-level application-specificinterfaceandthegeneric

10.3. FUTUREWORK 177

multi-fidelity API is beingaddressedby theChromastubgeneratordescribedin theprevi-oussection.

Thehint mechanism,thoughsimpleandefficient, is not very secure.Binary modulesare loadedinto the runtimesystem’s addressspace,wherea maliciousor buggy modulecould wreakhavoc. Onepossiblesolutionto this problemis to usea restrictedlanguagethat is interpretedat runtime. A slightly lesssecurebut moreefficient approachwould befor a stubgeneratorto compiletherestrictedcodeinto trustedbinarymodules.

Thereareseveral limitationsto beaddressedin themulti-fidelity programmingmodel,especiallythecurrentsimplenotionof interactiveoperations. E.g., theAPI doesnot cur-rently supportmultiple concurrent,communicatingoperations,wherethefidelity andper-formanceof onemight dependon that of another, even in the absenceof resourcecon-tention. Anotherdesirableextensionis supportfor streamingmediaapplicationssuchasvideo. Here the low-level “operations”(e.g. rendera frame)arenot driven by userac-tions, but are performedperiodically. Further, the primary performancemetricsarenotper-operationmeasurementssuchaslatency, but aggregatestatisticssuchasframerateandjitter. We needto expandthenotionof operation,andcorrespondinglytheAPI, to supportadaptivestreamingmedia.

10.3.2 The Solver

Themulti-fidelity solver workswell whenthenumberof tunableparametersis small andthe utility functionsaresimpleandwell-behaved. We do not know how its performanceor accuracy scalewith thenumberof parameters,nor whatkind of utility functionsmightcausepathologicalbehaviour. To measurethesolver’s robustnessandscalability, we needa more comprehensive evaluation. To improve theseproperties,we probablyalso needa theoreticallysoundapproachto theoptimizationproblem.Lee’s solver [57] is provablyefficient,andtheconstraintsontheutility functionsarewell-defined.However, oneof theseconstraintsis very restrictive: all runtimeparametersmusthave an ordered setof values,andbothresourcedemandandutility mustincreasemonotonicallyaswe progressthroughthis set. Often, we cannotachieve suchan ordering. E.g., for Radiator, the progressiveradiosityalgorithmhasahigherCPUdemandbut lowermemorydemandthanhierarchicalradiosity, makingit impossibleto find a total orderon thesetwo choices.It would bebothinterestingandusefulto adaptLee’ssolver to themulti-fidelity runtimesystem,andto findwaysto relaxthemonotonicityrequirement.

10.3.3 Resourcemonitoring

My resourcemonitorsaresimpleanddemonstratethefeasibilityof agray-boxapproachtoresourcesupplyprediction.They areagileandaccuratewhenthenumberof applicationsissmall. In theshortto mediumterm,I would like a morecomprehensive evaluationwith a


wider rangeof applicationsandloadpatterns.HereI indicatea few obviousimprovementsto theresourcemonitors;adetailedevaluationwill probablysuggestseveralmore.

Currently, thesystemmonitorsmemorydemandby measuringVM statisticsat thebe-ginning and the endof eachoperation(Section5.5.2). This approachis limited by thetendency of VM stateto build upover time. In particular, it suffersfrom ahighwatermarkeffect: if thecurrentoperationuseslessmemorythana previousone,thenits working setwill besmallerthanits residentset,andwe will overestimatetheformer. Thiseffect couldbeavoidedby addingfine-grainedinformationabouttheapplication’s memoryallocation,e.g.,by instrumentingRTSAU�U�V�W and X/Y�ZLZ .

TheCPUsupplypredictorassumesthatschedulingis approximatelyround-robin.Thisis only true for processeswith equalpriority: we needto extendthe predictorto predictschedulerbehaviour for processeswith differentpriorities.

My currentprototypelacks supply and demandmonitoring for an importantsystemresource: the disk. While disk capacityis easily modelledas a space-sharedresource(Chapter2), disk bandwidthis a morecomplex matter: it is time-shared,but it doesnotfit the GPSmodel very well. Disk “supply” dependsnot only on bandwidthshare,butalsoon seekandrotationalpositions. Similarly, “demand” is characterizednot only byread/writebyte counts,but alsoby the positionof thosebyteson the disk. An accurateresourcemodelfor disk bandwidthmustmodeltheseadditionaldimensionsto supplyanddemand,asalsotheeffectof any memorycachesinsidethestoragedevice.

The ultimategoal of resourcemanagementis to combineadaptationwith allocation,suchthat both system-level andapplication-level decisionsaresimultaneouslyoptimizedto maximizeuserutility. I.e., a singledecision-makingprocessshoulddetermineboththesharingof resourcesacrossapplications,andthe tunableparametersof eachapplication.Section9.2.1discussedvariousallocation-basedstrategies,andhow they might be com-binedwith multi-fidelity adaptation.Thereareseveralchallengesthatany suchcombinedsolutionmustaddress:

Q Themobileclient’s OSdoesnot have controlover resourcessuchasnetwork band-width or remoteserver cycles;thus,we mustreconcileourselvesto a predictive ap-proachfor theseresources,or solve theproblemof distributedresourceallocation.Q It is not clearhow to combineindividualapplicationutilities into a single“userutil-ity”. Applicationsrun at differenttime scales:how arewe to comparetheutility of10 GLVU operationsagainstthatof 1 speechrecognitionof thesameduration.Fur-ther, applicationscompetenot only for systemresources,but alsofor userattention.This cancausestrangeinterferenceeffects: e.g.,the usermight desirehigh-qualityvideoif thatis theonly applicationrunning,but will notnoticeadegradationin videoquality if they areoccupiedwith anothertask.Q If allocationis tied to adaptation,how canwe prevent ill-behavedapplicationsfromhoggingresources,perhapsby generatingbogusutility functions? I.e., how do wedetectwhenapplicationslie abouttheir resourceneeds?

10.3. FUTUREWORK 179

10.3.4 Resourcedemandprediction

This work hasshown that history-basedtechniques— logging and learning— can ac-curatelypredictanapplication’s resourcedemandasa functionof its tunableparameters.Thereis enormousscopefor futureresearchin this area.I haveusedverysimplestatisticalmodelssuchas least-squaresregression;it would be worthwhile to studyothermachinelearning(ML) algorithmssuchasnearestneighbour[63, pp. 231–236],andto quantifytheir accuracy aswell astheirperformanceoverheads.

Convertingresourcelogsinto predictorsstill requiresmanualinterventionto chooseanML techniqueandaparametricmodel.To automatetheprocessfurther, I proposea toolkitapproach:a library of many parametricmodels,overwhich a “meta-learner”caniteratetofind theonemostappropriatefor thegivendata.

Building sucha toolkit would be an excellentopportunityfor collaborationbetweenML andsystemsresearchers.ML algorithmswill help systemsbuilderscharacterizeandpredictthebehaviour of largesystems;logsfrom suchsystemswill providereal-world testcasesfor thesealgorithms.

10.3.5 User Interaction

Section9.3discussedtheproblemof knowing userpreferences.Wewishusersto beabletosettheirpreferences(e.g.,desiredresponsetime)dynamically. To thisend,wehavestartedbuilding amulti-fidelitycontrol interfacethatallowsusersto adjustapplicationutility func-tions: to changelatency constraints,desiredbatterylifetimes, or the relative weightsofdifferentquality metrics.Building suchaninterfaceis a challenge:it mustberich enoughto gatherinformationaboutuserpreferencesalongmultiple dimensions,but simpleandunobtrusiveenoughto avoid excessiveuserdistraction.

Ideally, thesystemwould automaticallyinfer theuser’s preferencesfrom their context,avoiding userdistractionaltogether. E.g., the systemwould know that userswatchingasportsvideopreferhigh frameratesto high quality images;userson a videotour of a mu-seumwantpreciselytheopposite.Currentwork on the tasklayer of Aura [80] is focussedon this issue:identifying theuser’shigher-level tasks,inferring theappropriateapplicationsettingsandtradeoffs, andtranslatingtheminto utility functionsfor theruntimesystemtooptimize?

Whetherwe solicit userpreferencesor infer them,we must always be aware of theattendantcost: increaseduserdistraction. In theformercase,thereis thecognitiveburdenof manipulatingauserinterface.In thelattercase,thereis therisk of erroneousinferencescausingunpredictableandunstablebehaviour. User attentionis often the mostvaluableresourcein any computingsystem:thus,all systemdecisionsneedto bemeasuredagainstthisyardstickin additionto traditionalmetricsof performance.Quantifyinguserdistractionis thusa researchprojectof immenseimportance,presentingformidablechallengesand


promisingenormousbenefits.

10.3.6 AugmentedReality

GLVU andRadiatoronly partially fulfil the vision of augmentedreality containedin thearchitectscenario.My researchconcentrateson the resourcemanagementaspectof aug-mentedreality; to make the vision cometrue,otherhardproblemsmustbe solved. E.g.,in additionto 3-D rendering,augmentedrealityneedslocationsensingtechniquesto knowtheuser’s positionandorientationandmachinevision algorithmsto recognizereal-worldobjectsin theuser’sfield of view; all thesemustrunconcurrentlyandcooperateto providea gooduserexperiencein spiteof resourcevariability. Many of thesecomponenttech-nologiesare at the cutting edgeof researchin differentfields; this is a perfectmomentto begin cross-cuttingresearchcollaborationbetweenHCI, imageprocessing,andsystemsresearchersto build ausableaugmentedrealitysystem.

10.4 Summary

Theprimaryfocusof this dissertationwascrispinteractive responsein resource-intensive,mobileapplications.I showedthatmulti-fidelity computationis essentialto dynamicregu-lation of resourcedemandandthusof performance.I alsoshowedthatsystemsupportforsupplyanddemandpredictionis necessaryto supportmulti-fidelity computation,andthatit is sufficient — with somelimitations — to achieve boundedinteractive responsetimeswithoutunnecessarilysacrificingfidelity. Of themany directionsfor futureresearch,prob-ably themostimportantis thatof combiningprediction-basedadaptationwith QoS-basedresourceallocation.

Bibliography

[1] TarekF. Abdelzaher. An automatedprofiling subsystemfor QoS-awareservices.InProceedingsof theSixthIEEE Real-TimeTechnology andApplicationsSymposium(RTAS’00), pages208–217,Washington,DC, June2000.

[2] LucaAbeniandGiorgio Buttazzo.Integratingmultimediaapplicationsin hardreal-timesystems.In Proceedingsof the19thIEEEReal-TimeSystemsSymposium(RTSS’98), pages4–13,Madrid,Spain,December1998.

[3] LucaAbeni andGiorgio Buttazzo.QoSguaranteeusingprobabilisticdeadlines.InProceedingsof the11th IEEE Euromicro Conferenceon Real-TimeSystems, pages242–249,York, UK, June1999.

[4] SwarupAcharya,Phillip B. Gibbons,andViswanathPoosala.Congressionalsam-ples for approximateansweringof group-byqueries. In Proceedingsof the ACMSIGMODInternationalConferenceon Managementof Data (SIGMOD’00), pages487–498,Dallas,TX, May 2000.

[5] K. AppelandW. Haken. Everyplanarmapis four colorable.AmericanMathemati-cal SocietyBulletin, 82(5):711–712,1976.

[6] AndreaArpaci-Dusseauand Remzi Arpaci-Dusseau. Information and control ingray-boxsystems.In Proceedingsof the18thACM Symposiumon Operating Sys-temsPrinciples(SOSP2001), pages43–56,ChateauLake Louise,Banff, Canada,October2001.

[7] Gaurav Banga,PeterDruschel,andJeffrey C. Mogul. Resourcecontainers:A newfacility for resourcemanagementin server systems.In Proceedingsof the3rd Sym-posiumonOperatingSystemsDesignandImplementation(OSDI’99), pages45–58,New Orleans,LA, February1999.USENIX.

[8] JoelF. Bartlett,LawrenceS.Brakmo,KeithI. Farkas,William R.Hamburgen,Timo-thy Mann,MarcA. Viredaz,CarlA. Waldspurger, andDeborahA. Wallach.TheItsyPocket Computer. CompaqWesternResearchLaboratory, Palo Alto, CA, October2000.WRL TechnicalNote2000.6.

181

182 BIBLIOGRAPHY

[9] Ludwig van Beethoven. String quartetno. 16 in F, Op. 135, 1826. DedicatedtoJohannWolfmayer.

[10] GeorgeE.P. Box,Gwilym M. Jenkins,andGregoryC.Reinsel.Timeseriesanalysis:forecastingandcontrol. PrenticeHall, UpperSaddleRiver, NJ,3rd edition,1994.

[11] JohnBradley. xv: Interactiveimagedisplayfor thex window system.ftp://ftp.cis.upenn.edu/pub/xv/docs/xv docs .pdf , 1994. (Accessedon July 232002).

[12] StuartK. Card,ThomasP. Moran, andAllen Newell. ThePsychology of Human-ComputerInteraction. LawrenceErlbaumAssociates,Mahwah,NJ,1983.

[13] StuartK. Card,GeorgeG. Robertson,andJockD. Mackinlay. TheInformationVi-sualizer, aninformationworkspace.In Proceedingsof theACM SIGCHIConferenceon HumanFactors in ComputingSystems(CHI ’91), pages181–188,New Orleans,LA, May 1991.

[14] SurendarChandraandCarlaSchlatterEllis. JPEGcompressionmetricasa qualityawareimagetranscoding.In Proceedingsof the2ndUSENIXSymposiumonInternetTechnologiesandSystems(USITS’99), pages81–92,Boulder, CO,October1999.

[15] FangzheChang, Ayal Itzkovitz, and Vijay Karamcheti. User-level resource-constrainedsandboxing.In Proceedingsof the4th USENIXWindowsSystemsSym-posium(WSS2000), pages25–36,Berkeley, CA, August2000.

[16] FangzheChang,Vijay Karamcheti,andZvi Kedem.Exploiting applicationtunabil-ity for efficient,predictableresourcemanagementin parallelanddistributedsystems.Journalof Parallel andDistributedComputing, 60(11):1420–1445,November2000.

[17] Fay ChangandGarthA. Gibson. AutomaticI/O hint generationthroughspecula-tiveexecution. In Proceedingsof the3rd Symposiumon Operating SystemsDesignand Implementation(OSDI ’99), pages1–14, New Orleans,LA, February1999.USENIX.

[18] Michael F. CohenandJohnR. Wallace. Radiosityand RealisticImage Synthesis.AcademicPressProfessional,Boston,MA, 1993.

[19] ThomasH. Cormen,CharlesE. Leiserson,andRonaldL. Rivest. IntroductiontoAlgorithms. TheMIT Press,Boston,MA, 1990.

[20] EyaldeLara,DanS.Wallach,andWilly Zwaenepoel.Puppeteer:Component-basedadaptationfor mobile computing. In Proceedingsof the 3rd USENIXSymposiumon InternetTechnologiesand Systems(USITS-01), pages159–170,Berkeley, CA,March2001.

BIBLIOGRAPHY 183

[21] ThomasDeanandMark Boddy. An analysisof time-dependentplanning. In Pro-ceedingsof the7thNationalConferenceonArtificial Intelligence(AAAI ’88), pages49–54,SaintPaul,MN, August1988.AAAI Press/MITPress.

[22] P. J.Denning.Theworking setmodelof programbehavior. Communicationsof theACM, 11(5):323–333,May 1968.

[23] P. Dinda,B. Lowekamp,L. Kallivokas,andD. O’Hallaron. Thecasefor prediction-basedbest-effort real-timesystems.In Proceedingsof the7th InternationalWork-shoponParallel andDistributedReal-TimeSystems(WPDRTS’99), pages308–318,SanJuan,PR,April 1999.Springer-Verlag.

[24] PeterDindaandDavid O’Hallaron.RealisticCPUworkloadsthroughhostloadtraceplayback. In Proceedingsof the5th Workshopon LanguagesCompilers, andRun-timeSystemsfor ScalableComputers (LCR2000), pages265–280,Rochester, NY,May 2000.Springer-Verlag.

[25] PeterA. Dinda.Onlinepredictionof therunningtimeof tasks.In Proceedingsof the10th IEEE InternationalSymposiumon High PerformanceDistributedComputing(HPDC ’01), pages383–394,SanFrancisco,CA, August 2001. IEEE ComputerSociety.

[26] Fred Douglis, RamonCaceres,FransKaashoek,Kai Li, Brian Marsh, JoshuaA.Tauber, andD. E. Shaw. Storagealternativesfor mobilecomputers.In Proceedingsof the1stSymposiumonOperatingSystemsDesignandImplementation(OSDI’94),pages25–37,Monterey, CA, November1994.USENIX.

[27] AndreaC. Dusseau,RemziH. Arpaci, andDavid E. Culler. Effective distributedschedulingof parallelworkloads. In Proceedingsof the Joint InternationalCon-ferenceon Measurementand Modelingof ComputerSystems(ACM SIGMETRICS’96), pages25–36,Philadelphia,PA, May 1996.

[28] Derek L. Eager, Edward D. Lazowska, and JohnZahorjan. The limited perfor-mancebenefitsof migratingactive processesfor load sharing. In ProceedingsoftheJoint InternationalConferenceonMeasurementandModelingof ComputerSys-tems(ACM SIGMETRICS’88), pages63–72,SantaFe,NM, May 1988.

[29] W. FengandJaneW. S. Liu. An extendedimprecisecomputationmodelfor time-constrainedspeechprocessingandgeneration.In Proceedingsof theIEEEWorkshopon Real-TimeApplications, pages76–80,New York, NY, May 1993.

[30] W. FengandJaneW. S. Liu. Algorithms for schedulingtaskswith input errorandend-to-enddeadlines.TechnicalReportUIUCDCS-R-94-1888,Universityof Illinoisat Urbana-Champaign,September1994.

184 BIBLIOGRAPHY

[31] J. Flinn andM. Satyanarayanan.PowerScope:A tool for profiling the energy us-ageof mobile applications. In Proceedingsof the 2nd IEEE Workshopon MobileComputingSystemsandApplications(WMCSA’99), pages2–10,New Orleans,LA,February1999.

[32] JasonFlinn. ExtendingMobileComputerBatteryLife throughEnergy-AwareAdap-tation. PhDthesis,Carnegie Mellon University, 2001.

[33] JasonFlinn, DushyanthNarayanan,andM. Satyanarayanan.Self-tunedremoteex-ecutionfor pervasivecomputing.In Proceedingsof the8thWorkshoponHot Topicsin Operating Systems(HotOS-VIII), pages61–66,SchlossElmau,Germany, May2001.IEEEComputerSociety.

[34] JasonFlinn andM. Satyanarayanan.Energy-awareadaptationfor mobile applica-tion. In Proceedingsof the17thACM Symposiumon Operating SystemsPrinciples(SOSP’99), pages48–63,KiawahIsland,SC,December1999.

[35] G. H. FormanandJ. Zahorjan. The challengesof mobilecomputing. IEEE Com-puter, 27(4):38–47,April 1994.

[36] ArmandoFox andEric A. Brewer. Harvest,yield andscalabletolerantsystems.InProceedingsof the7th Workshopon Hot Topicsin Operating Systems(HotOS-VII),pages174–178,Rio Rico,AZ, March1999.IEEE ComputerSociety.

[37] ArmandoFox, StevenD. Gribble,Eric A. Brewer, andElanAmir. Adaptingto net-workandclientvariability viaon-demanddynamicdistillation. In Proceedingsof the7th InternationalConferenceonArchitectural Supportfor ProgrammingLanguagesand Operating Systems(ASPLOS’96), pages160–170,Cambridge,MA, October1996.ACM Press.

[38] Alan Frieze,Ravi Kannan,andSantoshVempala.FastMonte-Carloalgorithmsforfinding low-rankapproximations.In Proceedingsof theIEEE Symposiumon Foun-dationsof ComputerScience(FOCS’98), pages370–378,November1998.

[39] M.R. Garey andD.S.Johnson.ComputersandIntractability. FreemanandCo.,NewYork, NY, 1979.

[40] Michael Garland. QSlim 2.0 sourcecode and online documentation. http://graphics.cs.uiuc.edu/˜garl and/ soft ware/ qsli m.htm l ,March1999. (AccessedonJuly23 2002).

[41] Michael GarlandandPaul S. Heckbert. Surfacesimplificationusingquadricerrormetrics. In Proceedingsof SIGGRAPH’97, pages209–216,Los Angeles,CA, Au-gust1997.AddisonWesley.

BIBLIOGRAPHY 185

[42] Carl FriedrichGauss.TheoriaCombinationisObservationumErroribusMinimumObnoxiae. Royal Societyof Gottingen,1821. Reprintedby SIAM Classicsin Ap-pliedMathematics,1995,with Englishtranslationby G.W. Stewart.

[43] The IndependentJPEGgroup. http://www.ijg.org/ . (Accessedon July 232002).

[44] RichardHan,Pravin Bhagwat, RichardLaMaire,ToddMummert,VeroniquePeret,andJim Rubas.Dynamicadaptationin animagetranscodingproxy for mobilewebbrowsing. IEEE PersonalCommunications, 5(6):30–44,December1998.

[45] Mor Harchol-BalterandAllen B. Downey. Exploiting processlifetime distributionsfor dynamicloadbalancing.In Proceedingsof theJoint InternationalConferenceonMeasurementandModelingof ComputerSystems(ACM SIGMETRICS’94), pages13–24,Nashville,TN, May 1994.

[46] JosephM. Hellerstein,PeterJ.Haas,andHelenJ.Wang.Onlineaggregation.In Pro-ceedingsof the ACM SIGMODInternationalConferenceon Managementof Data(SIGMOD’97), pages171–182,Tucson,AZ, May 1997.

[47] E.Horvitz. Principlesof mixed-initiativeuserinterfaces.In Proceedingsof theACMSIGCHI Conferenceon HumanFactors in ComputingSystems(CHI ’99), pages159–166,Pittsburgh,PA, May 1999.

[48] David Hull, W. Feng,and JaneW. S. Liu. Operatingsystemsupportfor impre-cisecomputation. In Proceedingsof the 1996 AAAI Fall Symposiumon FlexibleComputationin IntelligentSystems:Results,Issues,andOpportunities, pages9–11,Cambridge,MA, November1996.AAAI Press/MITPress.

[49] Intel, Microsoft, andToshiba.Advancedconfigurationandpower interfacespecifi-cationv2.0. http://www.acpi.info/ , February1998. (Accessedon July 232002).

[50] YannisE. Ioannidis,RaymondT. Ng, KyuseokShim,andTimosK. Sellis.Paramet-ric queryoptimization.VLDB Journal: VeryLarge Data Bases, 6(2):132–151,May1997.

[51] VanJacobson.Congestionavoidanceandcontrol. In Proceedingsof theSymposiumon CommunicationsArchitecturesandProtocols(SIGCOMM’88), pages314–329.ACM Press,August1988.Stanford,CA.

[52] Kevin Jeffay andDavid Bennett. A rate-basedexecutionabstractionfor multime-dia computing. In Proceedingsof the5th InternationalWorkshopon NetworkandOperating SystemSupportfor Digital Audio andVideo, pages64–75,Durham,NH,April 1995.Springer-Verlag.

186 BIBLIOGRAPHY

[53] Nirav H. Kapadia,Jose A. B. Fortes,andCarlaE. Brodley. Predictive application-performancemodelingin a computationalgrid environment. In Proceedingsof the8th IEEE International Symposiumon High PerformanceDistributed Computing(HPDC’99), pages47–54,Los Angeles,CA, August1999.

[54] R.H. Katz.Adaptationandmobility in wirelessinformationsystems.IEEEPersonalCommunications, 1(1):6–17,1994.

[55] J.J.Kistler andM. Satyanarayanan.Disconnectedoperationin theCodafile system.ACM Transactionsof ComputerSystems, 10(1):3–25,February1992.

[56] Milan KunderaandMichaelHenryHeim(Translator).TheUnbearableLightnessofBeing. Harper& Row, 1984.

[57] ChenLee,JohnLehoczky, DanSiewiorek,RaghunathanRajkumar, andJeff Hansen.A scalablesolutionto themulti-resourceQoSproblem. In Proceedingsof the20thIEEEReal-TimeSystemsSymposium(RTSS’99), pages315–326,Phoenix,AZ, De-cember1999.

[58] Will E. LelandandTeunisJ.Ott. Load-balancingheuristicsandprocessbehavior. InProceedingsof theJoint InternationalConferenceonMeasurementandModelingofComputerSystems(ACM SIGMETRICS’86), pages54–69,Raleigh,NC,May 1986.

[59] HananLutfiyya, Gary Molenkamp,Michael Katchabaw, and Michael Bauer. Is-suesin managingsoftQoSrequirementsin distributedsystemsusingapolicy-basedframework. In Proceedingsof the Workshopon Policies for Distributed SystemsandNetworks(Policy 2001), pages185–201,Bristol, UK, January2001.Springer-Verlag.

[60] GurmeetSinghManku,SridharRajagopalan,andBruceG. Lindsay. Approximatemediansandotherquantilesin onepassandwith limited memory. In Proceedingsof theACM SIGMODInternationalConferenceon Managementof Data (SIGMOD’98), pages426–435,June1998.

[61] BobMarley. CouldYouBe Loved(12” Mix). IslandRecords,1980.5:26min.

[62] Robert B. Miller. Responsetime in man-computerconversationaltransactions.AFIPSFall Joint ComputerConferenceProceedings, 33:267–277,December1968.

[63] TomM. Mitchell. MachineLearning. McGraw-Hill, 1997.

[64] R.MotwaniandP. Raghavan.RandomizedAlgorithms. CambridgeUniversityPress,Cambridge,UK, 1995.

[65] ToddC.Mowry, AngelaK. Demke,andOrranKrieger. Automaticcompiler-insertedI/O prefetchingfor out-of-coreapplications.In Proceedingsof the2ndSymposium

BIBLIOGRAPHY 187

on Operating SystemsDesignandImplementation(OSDI ’96), pages3–17,Seattle,WA, October1996.USENIX.

[66] David J. Musliner, EdmundH. Durfee,andKang G. Shin. Any-dimensionalgo-rithms. In Proceedingsof the9th IEEE Workshopon Real-TimeOperating SystemsandSoftware (RTOSS’92), pages78–81,May 1992.

[67] KlaraNahrstedt,DongyanXu, DuangdaoWichadukul,andBaochunLi. QoS-awaremiddleware for ubiquitousand heterogeneousenvironments. IEEE Communica-tions, 39(11):140–148,November2001.

[68] Rolf NeugebauerandDerekMcAuley. Congestionpricesasfeedbacksignals:Anapproachto QoSmanagement.In Proceedingsof the9th ACM SIGOPSEuropeanWorkshop, pages91–96,Kolding,Denmark,September2000.

[69] Allen Newell. UnifiedTheoriesof Cognition. HarvardUniversityPress,Cambridge,MA, 1990.

[70] Sergei Nirenburg, editor. The PANGLOSS Mark III machinetranslationsystem.TechnicalReportCMU-CMT-95-145,Carnegie Mellon University Centerfor Ma-chineTranslation,April 1995.

[71] BrianD. Noble,M. Satyanarayanan,DushyanthNarayanan,JamesEric Tilton, JasonFlinn, andKevin R. Walker. Agile application-awareadaptationfor mobility. InProceedingsof the 16th ACM Symposiumon Operating SystemsPrinciples(SOSP’97), pages276–287,SaintMalo, France,October1997.

[72] Brian D. Noble,M. Satyanarayanan,Giao T. Nguyen,andRandyH. Katz. Trace-basedmobile network emulation. In Proceedingsof the ACM SIGCOMMConfer-ence:Applications,Technologies,Architectures,andProtocolsfor ComputerCom-munication(SIGCOMM’97), pages51–62,Cannes,France,September1997.

[73] AbhayK. ParekhandRobertG.Gallager. A generalizedprocessorsharingapproachto flow control in integratedservicesnetworks: the single-nodecase. IEEE/ACMTransactionson Networking, 1(3):344–357,June1993.

[74] David PetrouandDushyanthNarayanan.Positionsummary:Hinting for goodness’sake. In Proceedingsof the 8th Workshopon Hot Topics in Operating Systems(HotOS-VIII), pages177–177,SchlossElmau,Germany, May 2001.IEEEComputerSociety.

[75] The Walkthru Project. GLVU sourcecodeandonline documentation.http://www.cs.unc.edu/˜walk/software /glvu / , February2002. (AccessedonJuly23 2002).

[76] L. RabinerandB.-H. Juang. Fundamentalsof Speech Recognition. PrenticeHallSignalProcessingSeries.PrenticeHall, UpperSaddleRiver, NJ,1993.

188 BIBLIOGRAPHY

[77] Ted Romer, Geoff Voelker, Dennis Lee, Alec Wolman, Wayne Wong, HankLevy, Brian Bershad,and J. Bradley Chen. Instrumentationand optimizationofWin32/Intel executablesusingEtch. In Proceedingsof the USENIXWindowsNTWorkshop, pages1–7,Berkeley, CA, August1997.

[78] EdwardSaid.Representationsof theIntellectual. VintageBooks,1993.

[79] M. Satyanarayanan.Mobile informationaccess.IEEE PersonalCommunications,3(1):26–33,February1996.

[80] M. Satyanarayanan.Pervasive computing:Vision andchallenges.IEEE PersonalCommunications, 8(4):10–17,August2001.

[81] M. Satyanarayanan,JasonFlinn, andKevin R. Walker. VisualProxy:ExploitingOScustomizationswithout applicationsourcecode. ACM Operating SystemsReview,33(3):14–18,July1999.

[82] M. Satyanarayanan,Henry H. Mashburn, PuneetKumar, David C. Steere,andJamesJ.Kistler. Lightweightrecoverablevirtual memory. In Proceedingsof the14thSymposiumonOperatingSystemsPrinciples(SOSP’93), pages146–160,Asheville,NC, December1993.

[83] M. SatyanarayananandDushyanthNarayanan.Multi-fidelity algorithmsfor inter-activemobileapplications.WirelessNetworks, 7:601–607,2001.

[84] Steve Schlosser, JohnGriffin, Dave Nagle,andGreg Ganger. Designingcomputersystemswith mems-basedstorage.In Proceedingsof the9th InternationalConfer-enceon Architectural Supportfor ProgrammingLanguagesandOperating Systems(ASPLOS2000), pages1–12,Cambridge,MA, November2000.ACM Press.

[85] WalterCarruthersSellarandRobertJulianYeatman.1066andAll That: A Memo-rableHistory of England. Methuen& Co.Ltd., London,UK, 1930.

[86] MargoSeltzerandChristopherSmall.Self-monitoringandself-adaptingsystems.InProceedingsof the6th Workshopon Hot Topicsin Operating Systems(HotOS-VI),pages124–129,CapeCod,MA, May 1997.IEEE ComputerSociety.

[87] BruceS. Siegell andPeterSteenkiste.Automaticgenerationof parallelprogramswith dynamicload balancing. In Proceedingsof the 3rd IEEE InternationalSym-posiumon High PerformanceDistributedComputing(HPDC ’94), pages166–175,SanFrancisco,CA, August1994.

[88] Amitabh Srivastava andAlan Eustace.ATOM: A systemfor building customizedprogramanalysistools. In Proceedingsof theACM SIGPLANConferenceon Pro-grammingLanguage Designand Implementation(PLDI ’94), pages196–205,Or-lando,FL, June1994.

BIBLIOGRAPHY 189

[89] David C. Steere,Ashvin Goel,JoshuaGruenberg, Dylan McNamee,CaltonPu,andJonathanWalpole. A feedback-drivenproportionallocatorfor real-ratescheduling.In Proceedingsof the3rd Symposiumon Operating SystemsDesignandImplemen-tation (OSDI’99), pages145–158,New Orleans,LA, February1999.USENIX.

[90] Neil StratfordandRichardMortier. An economicapproachto adaptive resourcemanagement.In Proceedingsof the7th Workshopon Hot Topicsin Operating Sys-tems(HotOS-VII), pages142–147,Rio Rico,AZ, March1999.IEEE ComputerSo-ciety.

[91] GauriViswanathan.Masksof Conquest:Literary StudyandBritish Rulein India.ColumbiaUniversityPress,New York, NY, 1989.

[92] A. Waibel. Interactive translationof conversationalspeech. IEEE Computer,29(7):41–48,July1996.

[93] Carl A. Waldspurger and William E. Weihl. Lottery scheduling: Flexibleproportional-shareresourcemanagement.In Proceedingsof the1stSymposiumonOperating SystemsDesignandImplementation(OSDI’94), pages1–11,Monterey,CA, November1994.USENIX.

[94] GregoryK. Wallace.TheJPEGstill picturecompressionstandard.Communicationsof theACM, 34(4):30–44,April 1991.

[95] A. A. Webster, C. T. Jones,M. H. Pinson,S. D. Voran, andS. Wolf. An objec-tive video quality assessmentsystembasedon humanperception. In Proceedingsof SPIE: HumanVision, Visual Processing, and Digital Display IV, pages15–26,February1993.

[96] Andrew J. Willmott. Radiatorsourcecodeandonline documentation.http://www.cs.cmu.edu/˜ajw/software/ , October1999. (Accessedon July 232002).

[97] G. Wilson andM.A. Sasse.Do usersalwaysknow what’s goodfor them?Utilisingphysiologicalresponsesto assessmediaquality. In Proceedingsof the14thAnnualConferenceof theBritish HCI Group(HCI 2000), pages327–339,Sunderland,UK,September2000.Springer-Verlag.

[98] PeterYoung.RecursiveEstimationandTime-SeriesAnalysis. Springer-Verlag,Hei-delberg, Germany, 1984.

[99] JamieZawinski. Remotecontrolof Unix Netscape.http://home.netscape.com/newsref/std/x- remote.html , December1994. (Accessedon July 232002).

[100] T.C.ZhaoandMark Overmars.Xformshomepage.http://world.std.com/˜xforms/ , 1995.(Accessedon July232002).

190 BIBLIOGRAPHY

Appendix A

UnrelatedWork

Muchexcellentresearchis moreor lessirrelevantto thefocusof thisdissertation.Herewebriefly survey someunrelatedwork.

The Four-Colour Theorem[5] is a pioneeringexampleof computer-aidedtheorem-proving. It hardlyinfluencedour research.Said[78] articulatesa vision of theintellectual“as exile andmarginal,asamateur, andastheauthorof a languagethattriesto speaktruthto power”, while Viswanathan[91] arguesconvincingly that the developmentof EnglishStudiesasanacademicdisciplineis closelyrelatedto thepolitical imperativesof theBritishEmpirein India. Thisdissertationdoesnot build on theseanalyses.

Somequestionsthathave rarelyplaguedthesystemscommunityhave beenanswered,e.g. “Muss essein?” [9, 56]. Otherproblems,of equallynegligible import for the futureof our field, remainunsolved. We notethat, despitethe efforts of many researchers,theproblemof “CouldYou BeLoved?” [61] remainsopen.Lastandprobablyleast,SellarandYeatman[85] have shown conclusively that history is not what you think it is, it is whatyoucanremember. Wehavenot improvedon this resultin any way.

191