Privacy-Preserving Transparency-Enhancing Tools

37
Privacy-Preserving Transparency-Enhancing Tools Tobias Pulls LICENTIATE THESIS | Karlstad University Studies | 2012:57 Computer Science Faculty of Economic Sciences, Communication and IT

Transcript of Privacy-Preserving Transparency-Enhancing Tools

Page 1: Privacy-Preserving Transparency-Enhancing Tools

Privacy-Preserving Transparency-Enhancing Tools

Tobias Pulls

licentiate thesis | Karlstad University studies | 2012:57

computer science

Faculty of economic sciences, communication and it

Page 2: Privacy-Preserving Transparency-Enhancing Tools

licentiate thesis | Karlstad University studies | 2012:57

Privacy-Preserving Transparency-Enhancing Tools

Tobias Pulls

Page 3: Privacy-Preserving Transparency-Enhancing Tools

Distribution:Karlstad University Faculty of economic sciences, communication and itcomputer sciencese-651 88 Karlstad, sweden+46 54 700 10 00

© the author

isBn 978-91-7063-469-7

Print: Universitetstryckeriet, Karlstad 2012

issn 1403-8099

Karlstad University studies | 2012:57

licentiate thesis

tobias Pulls

Privacy-Preserving transparency-enhancing tools

www.kau.se

Page 4: Privacy-Preserving Transparency-Enhancing Tools

iii

Privacy-Preserving Transparency-Enhancing ToolsTOBIAS PULLSDepartment of Computer ScienceKarlstad UniversitySweden

AbstractTransparency is a key principle in democratic societies. For example, the pub-lic sector is in part kept honest and fair with the help of transparency throughdifferent freedom of information (FOI) legislations. In the last decades, whileFOI legislations have been adopted by more and more countries worldwide,we have entered the information age enabled by the rapid development of in-formation technology. This has led to the need for technological solutionsthat enhance transparency, for example to ensure that FOI legislation canbe adhered to in the digital world. These solutions are called transparency-enhancing tools (TETs), and consist of both technological and legal tools.TETs, and transparency in general, can be in conflict with the privacy prin-ciple of data minimisation. The goal of transparency is to make informationavailable, while the goal of data minimisation is to minimise the amount ofavailable information.

This thesis presents two privacy-preserving TETs: one cryptographic sys-tem for enabling transparency logging, and one cryptographic scheme for stor-ing the data for the so called Data Track tool at a cloud provider. The goal ofthe transparency logging TET is to make data processing by data controllerstransparent to the user whose data is being processed. Our work ensures thatthe process in which the data processing is logged does not leak sensitive in-formation about the user, and that the user can anonymously read the in-formation logged on their behalf. The goal of the Data Track is to make ittransparent to users which data controllers they have disclosed data to underwhich conditions. Furthermore, the Data Track intends to empower usersto exercise their rights, online and potentially anonymously, with regard totheir disclosed data at the recipient data controllers. Our work ensures thatthe data kept by the Data Track can be stored at a cloud storage provider,enabling easy synchronisation across multiple devices, while preserving theprivacy of users by making their storage anonymous toward the provider andby enabling users to hold the provider accountable for the data it stores.

Keywords: Transparency-Enhancing Tools, Privacy by Design, applied cryp-tography, anonymity, unlinkability.

Page 5: Privacy-Preserving Transparency-Enhancing Tools
Page 6: Privacy-Preserving Transparency-Enhancing Tools

v

AcknowledgementsIt is commonly said that you learn the most when you surround yourself withbetter people than yourself. My time at Karlstad University in the PriSec re-search group, working in the PrimeLife project and within the realm of aGoogle research award, has convinced me of the truth of this saying. With-out the help and influence of several people the work presented in this thesiswould never have happened.

First and foremost, I am grateful to my supervisor Simone Fischer-Hübnerand my co-supervisor Stefan Lindskog. Their support and constructive advicehave kept me on the right track and focused on the task at hand. Thank youHans Hedbom for being my, from my point of view, informal supervisorwhen I first got hired at the department. Without your guidance I would nothave gotten into the PhD program, or hired in the first place.

Thank you to my colleagues at the Department of Computer Science thathave provided me with a wonderful working environment; be it in form ofrewarding discussions on obscure topics, or the regular consumption of sub-par food on Fridays during lunch followed by delicious cake. In particular, Iwould like to thank Stefan Berthold, Philipp Winter, and Julio Angulo for thefruitful, and often adhoc1, discussions and collaborations.

I would also like to thank all the inspirational researchers I have had theopportunity to collaborate with as part of the different projects the PriSecgroup have participated in. My experiences in PrimeLife, HEICA, U-PrIM,and with Google have helped me grow as a research student. In particular, Iam grateful for the collaboration with Karel Wouters. I hope our work willcontinue, just as it has so far, even though PrimeLife ended over a year ago.

Last, but not least; to my family and friends, outside of work, thank youfor all of your support over the years. I am in your debt.

The work in this thesis was a result of research funded by the EuropeanCommunity’s Seventh Framework Programme (FP7/2007-2013) under grantagreement number 216483, and a Google research award on “Usable Privacyand Transparency Tools”.

Karlstad, December 2012 Tobias Pulls

1Initiated by stuffed animals or balls being thrown in different directions.

Page 7: Privacy-Preserving Transparency-Enhancing Tools
Page 8: Privacy-Preserving Transparency-Enhancing Tools

vii

List of Appended PapersA. Tobias Pulls, Karel Wouters, Jo Vliegen, and Christian Grahn. Dis-

tributed Privacy-Preserving Log Trails. In Karlstad University Studies,Technical Report 2012:24, Department of Computer Science, Karlstad Uni-versity, Sweden, 2012.

B. Hans Hedbom and Tobias Pulls. Unlinking Database Entries—Imple-mentation Issues in Privacy Preserving Secure Logging. In Proceedings ofthe 2nd International Workshop on Security and Communication Networks(IWSCN 2010), pp. 1–7, Karlstad, Sweden, May 26–28, IEEE, 2010.

C. Tobias Pulls. (More) Side Channels in Cloud Storage—Linking Data toUsers. In Privacy and Identity Management for Life – Proceedings of the 7thIFIP WG 9.2, 9.6/11.7, 11.4, 11.6/PrimeLife, International Summer SchoolTrento, Italy, September 2011 Revised Selected Papers, pp. 102–115, IFIPAICT 375, Springer, 2012.

D. Tobias Pulls. Privacy-Friendly Cloud Storage for the Data Track—AnEducational Transparency Tool. In Secure IT Systems – Proceedings of the17th Nordic Conference (NordSec 2012), Karlskrona, Sweden, October 31–November 2, Springer LNCS, 2012.

Comments on my ParticipationPaper A This technical report was joint work by four authors. I and KarelWouters collaborated on the bulk of the work. I came up with the idea ofcascading and wrote all the algorithms defining the (non-auditable) system, in-cluding the specification for a trusted state. Karel made the system auditable,performed a thorough investigation of related work, and wrote the proof forcascading. Jo Vliegen and Christian Grahn contributed with a description oftheir respective proof of concept hardware and software implementations.

Paper B This paper was a collaboration with Hans Hedbom. We identifiedthe problem area as part of my Master’s thesis, and jointly came up with thedifferent versions of the shuffler algorithm. I performed the experiments,while Hans was the driving force behind writing the paper.

Paper C I was the sole author of this paper. As acknowledged in the pa-per, I received a number of useful comments from Simone Fischer-Hübner,Stefan Lindskog, Stefan Berthold, and Philipp Winter.

Paper D I was the sole author of this paper. I received a number of usefulcomments from Stefan Berthold, Simone Fischer-Hübner, Stefan Lindskog,and Philipp Winter.

Some of the appended papers have been subject to minor editorial changes.

Page 9: Privacy-Preserving Transparency-Enhancing Tools

viii

Selection of Other Peer-Reviewed Publications• Jo Vliegen, Karel Wouters, Christian Grahn and Tobias Pulls. Hard-

ware Strengthening a Distributed Logging Scheme. In Proceedings ofthe 15th Euromicro Conference on Digital System Design, Cesme, Izmir,Turkey, September 5–8, IEEE, 2012. To appear.

• Julio Angulo, Simone Fischer-Hübner, Erik Wästlund, and Tobias Pulls.Towards Usable Privacy Policy Display & Management for PrimeLife.Information Management & Computer Security, Volume 20, Issue 1, pp.4–17, Emerald, 2012.

• Hans Hedbom, Tobias Pulls, and Marit Hansen. Transparency Tools.In Jan Camenisch, Simone Fischer-Hübner, and Kai Rannenberg (eds.),Privacy and Identity Management for Life, 1st Edition, pp. 135–143,Springer, 2011.

• Julio Angulo, Simone Fischer-Hübner, Tobias Pulls, and Ulrich König.HCI for Policy Display and Administration. In Jan Camenisch, SimoneFischer-Hübner, and Kai Rannenberg (eds.), Privacy and Identity Man-agement for Life, 1st Edition, pp. 261-277, Springer, 2011.

• Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén.Adding Secure Transparency Logging to the PRIME Core. Privacy andIdentity Management for Life, 5th IFIP WG 9.2,9.6/11.7,11.4,11.6 /PrimeLife International Summer School, Nice, France, Revised Selected Pa-pers, pp. 299–314, Springer, 2010.

Selected Contributions to Project Deliverables• Tobias Pulls, Hans Hedbom, and Simone Fischer-Hübner. Data Track

for Social Communities: the Tagging Management System. In ErikWästlund and Simone Fischer-Hübner (eds.), End User TransparencyTools: UI Prototypes, PrimeLife Deliverable 4.2.2, 2010.

• Tobias Pulls and Simone Fischer-Hübner. Policy Management & Dis-play Mockups – 4th Iteration cycle. In Simone Fischer-Hübner and Har-ald Zwingelberg (eds.), UI Prototypes: Policy Administration and Presen-tation –Version 2, PrimeLife Deliverable 4.3.2, 2010.

• Tobias Pulls and Hans Hedbom. Privacy Preferences Editor. In Si-mone Fischer-Hübner and Harald Zwingelberg (eds.), UI Prototypes:Policy Administration and Presentation –Version 2, PrimeLife Deliver-able 4.3.2, 2010.

• Tobias Pulls. A Cloud Storage Architecture for the Data Track. UsablePrivacy and Transparency Tools, Google Research Award Project Deliv-erable, 2011.

Page 10: Privacy-Preserving Transparency-Enhancing Tools

ix

ContentsList of Appended Papers vii

INTRODUCTORY SUMMARY 1

1 Introduction 3

2 Background 42.1 Research Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 A Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 The Role of TETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 The Need for Preserving Privacy in TETs . . . . . . . . . . . . . . 9

3 Related Work 10

4 Research Questions 12

5 Research Methods 135.1 Theoretical Cryptography . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Cryptography in this Thesis . . . . . . . . . . . . . . . . . . . . . . . 145.3 Research Method for Each Paper . . . . . . . . . . . . . . . . . . . . 14

6 Main Contributions 15

7 Summary of Appended Papers 16

8 Conclusions and Future Work 17

PAPER ADistributed Privacy-Preserving Log Trails 23

I Introduction 27

1 Setting and Motivation 27

2 Terminology 29

3 Structure of the Report 30

II Related Work 32

1 Notation 32

Page 11: Privacy-Preserving Transparency-Enhancing Tools

x

2 Related Work 322.1 Early Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2 Searchability and Privacy . . . . . . . . . . . . . . . . . . . . . . . . . 352.3 Maturing Secure Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 Logging of eGovernement Processes 393.1 Building the Trail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.2 Reconstructing the Trail . . . . . . . . . . . . . . . . . . . . . . . . . 433.3 Auditable Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Privacy-Preserving Secure Logging 464.1 Attacker Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.2 Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Summary 52

III Threat Model and Requirements 54

1 Threat Model 541.1 Outside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551.2 Inside Attackers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551.3 Distribution and Collusion . . . . . . . . . . . . . . . . . . . . . . . 56

2 Requirements 562.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 562.2 Verifiable Authenticity and Integrity . . . . . . . . . . . . . . . . . 572.3 Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4 Auditability and Accountability . . . . . . . . . . . . . . . . . . . . 592.5 Out of Scope Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3 Main Components 603.1 Data Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2 Data Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.3 Log Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.4 Time-Stamping Authorities . . . . . . . . . . . . . . . . . . . . . . . 62

4 Summary 62

IV Components 63

1 Overview 63

Page 12: Privacy-Preserving Transparency-Enhancing Tools

xi

2 The Data Subject’s Perspective 662.1 Data Vault . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.2 Mandate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662.3 Log Consultation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3 Integrity and Unlinkability 673.1 The Data Processor’s Data Vault . . . . . . . . . . . . . . . . . . . . 673.2 Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.3 Log Server Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703.4 Log Server State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.5 Data Processor State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4 Auditing and Dependability 754.1 Log Server Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Data Processor Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Logging APIs 785.1 Data processor API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.2 Log server API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Summary 82

V Log Usage 88

1 Generating a Log Trail 88

2 Log Trail Reconstruction 89

3 Audit 903.1 Accountability of Log Servers . . . . . . . . . . . . . . . . . . . . . 903.2 Enabling Log Servers to Show Their Trustworthiness . . . . . . 913.3 Auditability Toward Third Parties . . . . . . . . . . . . . . . . . . . 92

4 Summary 92

VI Hardware-Based Improvements 97

1 Additional Requirements 97

2 Component Specification 98

3 Necessary Changes due to Hardware 983.1 Providing the Authenticated API . . . . . . . . . . . . . . . . . . . 993.2 Providing the Open API . . . . . . . . . . . . . . . . . . . . . . . . . 1003.3 Data Processor Interactions . . . . . . . . . . . . . . . . . . . . . . . 101

Page 13: Privacy-Preserving Transparency-Enhancing Tools

xii

4 Implementation 1064.1 Physical Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . 1064.2 Communication Interconnect . . . . . . . . . . . . . . . . . . . . . 1064.3 Cryptographic Primitives . . . . . . . . . . . . . . . . . . . . . . . . 1064.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.5 Power Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1134.6 Additional Threats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

5 Summary 115

VII Software Proof of Concept 116

1 Overall Structure 1161.1 Common Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1161.2 Log Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1171.3 Data Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181.4 Data Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

2 Implementation 1182.1 Common Backbone . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1192.2 Log Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1202.3 Data Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1212.4 Data Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

3 Summary and Future Work 124

VIII Evaluation 126

1 Evaluation Against Requirements 1261.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 1261.2 Verifiable Authenticity and Integrity Requirements . . . . . . . 1271.3 Privacy Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281.4 Auditability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

2 Compromised Entities 1312.1 Compromised Data Subjects . . . . . . . . . . . . . . . . . . . . . . 1322.2 Compromised Data Processors . . . . . . . . . . . . . . . . . . . . . 1322.3 Compromised Log Servers . . . . . . . . . . . . . . . . . . . . . . . . 1352.4 Compromised Data Processor Audit Component . . . . . . . . . 1382.5 Colluding Log Servers and Data Processors . . . . . . . . . . . . . 1382.6 Evaluating the Impact of Hardware . . . . . . . . . . . . . . . . . . 140

3 Summary 143

Page 14: Privacy-Preserving Transparency-Enhancing Tools

xiii

IX Concluding Remarks 145

PAPER BUnlinking Database Entries—Implementation Issues in Pri-vacy Preserving Secure Logging 151

1 Introduction 153

2 A Privacy Preserving Secure Logging Module 1542.1 A Secure Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1552.2 A Privacy Preserving Secure Log . . . . . . . . . . . . . . . . . . . . 1552.3 The Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

3 Problem Description 157

4 Possible Solutions 1584.1 Version 1: In-Line Shuffle . . . . . . . . . . . . . . . . . . . . . . . . 1584.2 Version 2: Threaded Shuffle . . . . . . . . . . . . . . . . . . . . . . . 1584.3 Version 3: Threaded Table-Swap Shuffle . . . . . . . . . . . . . . . 160

5 Evaluation 1615.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1615.2 Initial Performance Comparison . . . . . . . . . . . . . . . . . . . . 1615.3 Performance Impact of the Shuffler on Insertion . . . . . . . . . 1625.4 Performance of Larger Sizes of the Database . . . . . . . . . . . . 163

6 Conclusion and Future Work 165

PAPER C(More) Side Channels in Cloud Storage—Linking Data toUsers 169

1 Introduction 171

2 Deduplication 172

3 Related Work 174

4 Adversary Model 175

5 Linking Files and Users 1765.1 A Formalised Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . 1765.2 Wuala - Distributed Storage Among Users . . . . . . . . . . . . . 1775.3 BitTorrent - Efficient File Sharing and Linkability . . . . . . . . 1785.4 Freenet - Anonymous Distributed and Decentralised Storage . 1795.5 Tahoe-LAFS - Multiple Storage Providers . . . . . . . . . . . . . . 179

Page 15: Privacy-Preserving Transparency-Enhancing Tools

xiv

5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6 Profiling Users’ Usage 1806.1 Observing Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1806.2 Mitigating Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

7 Conclusion 182

PAPER DPrivacy-Friendly Cloud Storage for the Data Track—AnEducational Transparency Tool 186

1 Introduction 1891.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1901.2 Overview of the Setting . . . . . . . . . . . . . . . . . . . . . . . . . 191

2 Adversary Model and Requirements 1922.1 Adversary Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1922.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

3 Cryptographic Primitives 1943.1 Encryption and Signatures . . . . . . . . . . . . . . . . . . . . . . . . 1943.2 History Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1953.3 Anonymous Credentials . . . . . . . . . . . . . . . . . . . . . . . . . 195

4 The Data Track Scheme 196

5 Informal Evaluation 2005.1 Confidentiality of Disclosed Data . . . . . . . . . . . . . . . . . . . 2005.2 An Accountable Cloud Provider . . . . . . . . . . . . . . . . . . . . 2005.3 Minimally Trusted Agents . . . . . . . . . . . . . . . . . . . . . . . . 2015.4 Anonymous Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2015.5 Proof of Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

6 Related Work 202

7 Concluding Remarks 203

Page 16: Privacy-Preserving Transparency-Enhancing Tools

Introductory Summary

“The goal is justice, the method is transparency”

Julian Assange, founder of WikiLeaksInterviewed by John Pilger (2010)

1

Page 17: Privacy-Preserving Transparency-Enhancing Tools
Page 18: Privacy-Preserving Transparency-Enhancing Tools

3

1 IntroductionSunlight is said to be the best of disinfectants2. This saying embodies theconcept of transparency. By making a party transparent, for instance by themandatory release of documents or by opening up governing processes, it isimplied that undesirable behaviour by the party is discouraged or prevented.In other words, access to information about a party enables others to exer-cise control over the transparent party. This control enabled through trans-parency is also what makes transparency a key privacy principle. It enablesindividuals to exercise their right to informational self determination, i.e.,control over their personal spheres. Information is power, as the saying goes.

When the transparent party is the government and the recipient of infor-mation is the general public, this public control of the government may beviewed as the essence of democracy [44]. The importance of transparency ina democratic society is recognised by the freedom of information legislation(FOI) found in democratic countries around the world3 [31]. Transparencyalso plays a key role in the private sector. For example, the Sarbanes-OxleyAct require that corporations disclose accurate and reliable data concerningtheir finances for accounting purposes [5]. Transparency is a social trust fac-tor, i.e., openness fosters trust, both in the public and private sectors [4].

This thesis describes the design of technological tools that enhance trans-parency. These tools are often referred to as TETs, an acronym for eitherTransparency-Enhancing Tools or Transparency-Enhancing Technologies. Ingeneral, the difference between the two acronyms lies in that the term ‘tool’includes legal tools (such as those provided by the EU Data Protection Direc-tive 95/46/EC), in addition to technological tools [18]. While the work inthis thesis is focused on technologies, several aspects of our TETs rely on thepresence of legal transparency-enhancing tools.

The main goal of this thesis is to design TETs that preserve privacy. Ingeneral, transparency can be in conflict with the privacy principle of dataminimisation. For example, ensuring the privacy (in particular, the confiden-tiality) of a private conversation is natural, while making the conversationtransparent to a third party is a violation of the expectation of privacy of theconversing parties. This trade-off between transparency and privacy can befound in virtually all FOI legislation [31], where exemptions are made, forexample, in the case of national security interests. For FOI requests that forsome reason have to be redacted in parts, the redacting party is still obliged tomaximise the amount of disclosed information [31]. This balance is analogousto our work on designing privacy-preserving TETs. While making specific in-formation transparent, we ensure that no other information is released due tohow the TETs function. Furthermore, we take particular care in protectingthe privacy of the recipient of information provided by the TETs.

2The quote originates from the collection of essays Other People’s Money And How the BankersUse It (1914) by U.S. supreme court justice Louis Brandeis.

3Sweden, with what is today referred to as offentlighetsprincipen (the principle of public access),was the first country to introduce such legislation [34].

Page 19: Privacy-Preserving Transparency-Enhancing Tools

4

The remainder of the introductory summary is structured as follows. Sec-tion 2 provides the background of my work, both in terms of the setting inwhich it was done and the underlying motivations. Section 3 discusses re-lated work. My research questions and the research methods that I appliedare described in Section 4 and 5, respectively. Section 6 presents the maincontributions, and Section 7 provides a summary of appended papers. Con-cluding remarks and a brief discussion on future work in Section 8 ends theintroductory summary.

2 BackgroundThis section explains the background of the thesis. First, two research projectsand the work I did within them are presented. A short scenario follows thatshows an example use-case of how a user may use the tools constructed in thetwo research projects. Next, this section explores the role of TETs in generaland the motivation as to why they need to preserve privacy.

2.1 Research ProjectsThe work done as part of this thesis has been conducted within the scope ofthe research project PrimeLife and a Google research award project. For eachproject my focus was on the privacy-preserving design of a particular TET.

2.1.1 PrimeLife and Transparency Logging

PrimeLife, Privacy and Identity Management for Life, was a European re-search project funded by the European Community’s Seventh FrameworkProgramme (FP7/2007-2013) under grant agreement number 216483. As partof the project, I worked on a TET that performed logging for the sake oftransparency of data processing. The idea of transparency logging is that dataprocessors4, that are processing personal data about users, should make theprocessing of personal data transparent to users by continuously logging allactions on data on behalf of the users. My focus was on ensuring that thisTET preserved privacy in the sense that an adversary (defined in Paper A)should not be able to deduce any personal information from the log entries,or from the process in which the log entries were generated. The result of thework is presented in Papers A and B.

2.1.2 Google and the Data Track

As part of a Google research award project on “Usable Privacy and Trans-parency Tools”, I worked on the storage for the Data Track; a TET that in-

4In this thesis, we define a data processor as any entity that performs data processing of per-sonally identifiable information. A data controller is the entity that is legally responsible for thedata processing performed by a data processor. A data controller may also be a data processor.These definitions may not be entirely in line with the corresponding definitions in the EU DataProtection Directive 95/46/EC.

Page 20: Privacy-Preserving Transparency-Enhancing Tools

5

tends to educate and empower users. The Data Track provides users with anoverview of what data they have disclosed to whom under which privacy pol-icy. The idea is that users, from this overview, can exercise their rights of ac-cessing, potentially correcting, and even deleting the data disclosed to serviceproviders that are now stored at the providers’ sides. My work focused onensuring that the data disclosures tracked by the Data Track could be storedat a cloud storage provider in a privacy-friendly way. This allows Data Trackusers to view and track data disclosures from multiple devices, since all thedata kept by the Data Track is stored centrally in the cloud. The result of thework is presented in Papers C and D.

2.2 A ScenarioIn the following scenario, illustrated in Figure 1, Alice discloses some personaldata to the website Example.com. In this particular case, Example.com is boththe data controller and data processor of Alice’s personal data. The privacypolicy that Alice agreed to, prior to disclosing data, specifies for what purposesthe data was requested and will be processed, whether the data will be for-warded to third-parties, how long the data will be retained, etc. At the time ofdata disclosure, Alice’s Data Track (DT) client stores a copy of the data disclo-sure together with Example.com’s privacy policy at her cloud storage provider.Furthermore, at the time of data disclosure, there is a small exchange of mes-sages between Example.com and Alice’s Transparency Logging (TL) client toenable transparency logging. As Example.com is processing Alice’s personaldata, a log of all processing is created and stored at Example.com’s log server.

Alicediscloses data

Privacy Policy

Example.com

Logs

Processing

match?

performs

generates

copyDT client

TL client

Cloud Provider

Figure 1: Alice discloses personal data to Example.com, which specifies itsdata handling practices in a privacy policy to which Alice has agreed. Withthe help of her Data Track (DT) and Transparency Logging (TL) clients shecan still exercise control over the data that she has disclosed to Example.com.

Page 21: Privacy-Preserving Transparency-Enhancing Tools

6

With the help of her DT client, Alice can later view the data disclosureshe made to Example.com together with the privacy policy that she agreed to.Now she wonders: Did Example.com really live up to what they promised?That is, did Example.com really follow the privacy policy? Using her TLclient she downloads from the log server the log of all data processing per-formed by Example.com on her data. She can then compare, i.e., match, if theprocessing is in accordance with the prior agreed to privacy policy. Due tothe fact that all data kept by the DT client are stored at a cloud provider, andthat all logged data are stored at a log server, Alice can use both tools frommultiple devices.

2.3 The Role of TETsTransparency can be used to empower one party to the detriment of another,through the flow of information that facilitates a form of control, as discussedearlier. In essence, transparency ensures that information is available to fuelthe public discussion facilitated by freedom of speech [44], another corner-stone of democratic societies. Next, we elaborate on the purpose of TETs, fol-lowed by a discussion on technological TETs supported by legal frameworks.

2.3.1 The Purpose of TETs

As the world moves further into the information age, information technology(IT) plays a larger role in society. In the public sector, the eGovernment initia-tives intend to bringing government services online, facilitating both greaterservices to citizens and enhanced transparency [35]. In the private sector,large global IT organisations have emerged, such as Google and Facebook.They possess vast amounts of personal information of, and thus wield powerover, a significant portion of all users of IT worldwide. Just as we identifiedthe need to keep powerful institutions transparent in meatspace5 as societymatured, the same need appears in the rapidly growing cyberspace as it ma-tures. The role of TETs is thus, in general, to facilitate the transparency wehave grown to expect from meatspace in cyberspace.

The proliferation of IT threatens to erode the privacy of users of IT [30].Privacy-Enhancing Technologies (PETs) intend to mitigate the threats to pri-vacy primarily by adhering to the principle of data minimisation and by puttingusers of IT in control of their personal information [42]. Prime examples ofPETs are anonymous credentials, such as idemix [10], and the low-latencyanonymity network Tor [16]. Another broader reaction to the threats to pri-vacy posed by IT is the concept of Privacy by Design (PbD). PbD promotesthe principle of designing IT with privacy as the default, throughout the lifecy-cle of the technology, and striving for a ‘positive sum’ by taking all legitimateinterests and objectives into account [12].

Privacy, and in particular the principle of data minimisation, is not alwaysdesired by users of technology. For example, on the social networking sites

5In real life, the opposite of cyberspace. The term meatspace can be derived from the cyber-punk novel Neuromancer (1984), by William Gibson, that popularised the term ‘cyberspace’.

Page 22: Privacy-Preserving Transparency-Enhancing Tools

7

such as Facebook the primary purpose for users is to disclose personal infor-mation to a group of acquaintances. Here one mental tool used by users tomanage their privacy is their perceived control over the recipients of sharedinformation [3]. PETs, and in the broader sense PbD, can potentially aidusers by ensuring that information is shared only with the intended recipi-ents. Diaspora [8], PeerSoN [9], and Safebook [15] are P2P social networksthat intend to accomplish just that, and in the process eliminate the need fora social network provider. After all, why does there have to be a provider inthe middle intercepting all communication between users? However, as longas there is a need for a provider, TETs could be used to facilitate control overthe provider by the users.

The above example highlights an important role of TETs in relation toPETs. First, there is an overlap between the definition of PETs and TETs, inthat they both may facilitate control, as shown in Figure 2. In this thesis, weconsider the distinguishing characteristic of a TET to be that it enables controlthrough an information flow from one party to another. Furthermore, in theabsence of mature and usable PETs, TETs can be deployed to facilitate controlover the powerful entity that PETs would significantly weaken or remove theneed for altogether. This can be said to be the primary purpose of TETs,i.e., to reduce asymmetries between a strong and weak party, be it in termsof information, knowledge, or power6, by increasing information availableto the weak party. The relationship between TETs and PETs, in terms ofinformation asymmetries, are illustrated in Figure 3. The goal of TETs is toincrease the information available to a weak party, while the primary goal ofPETs is to reduce the information available to the stronger party.

ControlTET PET

Figure 2: Both TETs and PETs may act as facilitators of control.

2.3.2 Legal Frameworks for Supporting Technological TETs

Technological TETs that are supported by legal (privacy) frameworks havethe potential to be exceedingly efficient in empowering users. Recent propos-als around the so called ‘Do Not Track’ (DNT) header [48], while arguablymore of a PET than a TET, highlights this potential. The DNT header is abrowser header set by users’ user agent (browser) as part of HTTP requests.If the header is set to the value 1, it represents that the user wishes to opt outof being tracked for advertisement purposes. While technically trivial, theDNT header captures the users intent of not consenting to be tracked which

6Technically, TETs enable a flow of information from one party to another. The informationis a necessary but not sufficient criteria for one party to gain knowledge about the other. Thisknowledge empowers one party to the detriment of the other.

Page 23: Privacy-Preserving Transparency-Enhancing Tools

8

TETs

weak

PETs

strongin

form

atio

nparty

Figure 3: How TETs and PETs are related in terms of addressing informationasymmetry.

is a (not necessarily valid) request in the legal realm. Given an adequate legalframework, or industry self regulation as is largely the case for DNT [48],such a simple technical solution greatly empowers users.

This thesis presents work on two TETs; transparency logging presented inPapers A and B, and the Data Track presented in Papers C and D.

• As part of performing transparency logging, a data processor agrees tolog all actions it performs on users’ data. In the case of an accusation bya user of misuse of data, the transparency log either provides the userwith direct proof in the form of log entries, or enables the user to high-light the fact that a malicious action was not logged, further increasingthe liability of the data processor. Performing transparency logging isthus a strong legal commitment by a data processor. For example, trans-parency logging can be used to check compliance with regulations, suchas the Sarbanes-Oxley Act, ultimately leading to accountability.

• The Data Track provides a user with an overview of all past data disclo-sures performed by the user to data controllers. From this overview, theuser can send different requests to the recipient data controllers. Theserequests can be to access, rectify, or delete the data stored at a data con-troller. Ensuring that the requests are honoured is not based upon anytechnology, but upon the assumption of the presence of laws or self reg-ulation the data controllers are required to comply with.

In Europe, the EU Data Protection Directive 95/46/EC provides several legalprovisions (in a sense, legal TETs) that pushes data controllers towards pro-viding both transparency logging and the functionality needed by the DataTrack. Sections IV–V of the directive outlines requirements on informationto be given to the data subject, and the right of the data subject to access andrectify data at a data controller. In general, today these obligations are metby data controllers by providing a static privacy policy and giving out datamanually offline (however, if providers comply is questionable).

At the time of writing, the European Commission (EC) is proposing a re-form of the data protection rules in Europe, published in January 2012 [49].

Page 24: Privacy-Preserving Transparency-Enhancing Tools

9

The proposal includes ‘the right to be forgotten’, empowering data subjectsto demand that their data be deleted at a data controller and any third-partyrecipients of the data. Furthermore, Article 12 of the proposal “...obliges thecontroller to provide procedures and mechanism for exercising the data subject’srights, including means for electronic requests, ...”, and in particular states that“where the data subject makes the request in electronic form, the informationshall be provided in electronic form, unless otherwise requested by the data sub-ject”. Presumably, this will push towards allowing data subjects to exercisetheir rights online with technological tools, in favour of the current primarilyanalog model of static privacy policies and manual processing of data accessrequests. TETs in general, and those described in this thesis in particular, canbe used by people to exercise their rights online.

2.4 The Need for Preserving Privacy in TETsPrivacy, in the context of TETs, can be approached in different ways. Oneapproach is to consider that ensuring that TETs preserve privacy is a formof optimisation. As was discussed in Section 2.3 and illustrated in Figure 3,the primary purpose of TETs is to reduce asymmetries between a weak anda strong party. If a TET, due to how it functions, leaks information aboutthe weak party to the strong party, this reduces the efficiency of the TET. Ifthe leaked information, according to some metric, is more valuable (or greaterthan the received information) for the strong party than what the informationthe weak party is getting in return through the TET, then the TET actually in-creases the information asymmetry between the two parties. Since it is hard todetermine how the stronger party values different kinds of information aboutthe weak party, the conservative approach is to ensure that TETs leak little tonone information about the weak party in the first place. This can be viewedas ensuring the accuracy of TETs, similar to the balance needed when partiallyredacted FOI requests are still required to disclose the maximum amount ofinformation possible. If TETs are inaccurate, the risk of disclosing unintendedinformation may discourage parties from adopting TETs.

In general, one can argue that TETs and PETs are often deployed to ad-dress, from a privacy perspective, some problem caused by (or side-effect of)using technology. It is therefore natural to ensure that we do not introduce fur-ther problems when we are using more technology to solve problems causedby technology in the first place7. In that sense TETs are like any other pieceof software or hardware, in that it needs to be designed with privacy in mind.

In this thesis, due to how the TETs function, the focus have been on pro-tecting the privacy of the recipient of information. The scenario in Section 2.2

7Joseph Weizenbaum, in the book Computer Power and Human Reason: From Judgment ToCalculation (1976), distinguishes between deciding and choosing. He argues that computers,while capable of deciding, are not capable of making choices because choice is a matter of judge-ment, not computation. One way to interpret this crucial distinction is that we need to exercisegreat care when constructing technologies, because technology itself will not guide us in the rightdirection. In other words, just because it is possible to do something does not mean one shoulddo it.

Page 25: Privacy-Preserving Transparency-Enhancing Tools

10

described how the Data Track and Transparency Logging TETs could be usedby Alice. For the Data Track, one of the main privacy issues for the recipient(Alice in the scenario) is the storage of the data disclosures at a cloud provider.Our work therefore focused on identifying and addressing privacy issues re-lated to this outsourcing of the storage of the data. For the TransparencyLogging, we ensured that the process in which log entries are generated, howthe log entries are stored, and finally how the log entries are retrieved by therecipient user leak as little information as possible about the user.

3 Related WorkThe earliest relevant work on using logs to provide transparency of data pro-cessing is that of Sackmann et al. [43]. They identify the interplay betweenprivacy policies, logging for the sake of transparency of data processing, andlog entries constituting so called ‘privacy evidence’. Here, the logged data isused to verify that the actual data processing is consistent with the processingstated in the privacy policy. Figure 4 illustrates this relationship. In such a set-ting, the primary focus in terms of security and privacy have been on ensuringthe confidentiality, integrity, and authenticity of logged data. These loggingschemes are often based on schemes from the secure logging area, buildingupon the seminal work by Schneier and Kelsey [45].

Users

disclose data

Privacy Policy

Data Controller

Auditors Logs

Processing

match?

performs

generates

monitored by

Figure 4: The interplay between privacy policies and logging for achievingtransparency of data processing. A similar picture can be found in [1].

A prime example of the state of the art in the secure logging area is BBox[2], that is somewhat distributed (several devices that write, and one collectorthat stores), similar to the system described in Paper A. A comprehensivedescription of related work in the secure logging area can be found in Paper A.Ignoring the contents of logged data, privacy primarily becomes an issue whenthere are multiple recipients of the logged data. This is the case when userstake on the role of auditor of their own logged processing records, arguably

Page 26: Privacy-Preserving Transparency-Enhancing Tools

11

enhancing privacy by removing the need for trusted auditors. This is one ofthe key observations in the prior works of Wouters et al. [51] and Hedbom etal. [24], and the setting of Paper A. In Paper A, we advance state of the art bybuilding on the Schneier and Kelsey [45] scheme in a fully distributed setting.Our system has multiple writers (data processors), multiple collectors (logservers), and multiple recipients (users or data subjects) of logged data. In thissetting, we address the privacy issues that emerge by making the constructionof the logged data unlinkable (both in terms of users and log entries), and byallowing users to retrieve their log entries anonymously.

The Data Track was originally developed within the EU research projectsPRIME [11] and PrimeLife [41]. A related TET is the Google Dashboard8

that provides a summary to Google users of all their data stored at Google fora particular account. From the dashboard, users can also delete and managetheir data for several of Google’s services. While the Google Dashboard istied to authenticated Google users and their Google services, the Data Trackis a generic tool that allows anonymous access to stored data. The Data Trackfrom PRIME and PrimeLife use local storage to store all the data tracked bythe Data Track. In Paper D, we describe a scheme for using cloud storagefor the data needed by the Data Track in a privacy-preserving way. The mainadvantage of using cloud storage, instead of local storage, is that the centralstorage in the cloud enables easy synchronisation across multiple devices thata user might use to disclose data and view data disclosures from. One keyproperty of the scheme is the fact that users are anonymous towards the cloudprovider. The most closely related work in our cloud storage setting9 is that ofSlamanig [46] and Pirker et al. [38], where they use and interactively updateanonymous credentials to provide fine-grained resource usage. While theirwork is more elaborate than ours, their scheme is unusable for our purposedue to our additional security and privacy requirements for writing to ourcloud storage. We advance state of the art by (i) providing a simple constructthat ensures the size of the anonymity set, and (ii) by applying the historytree scheme by Crosby and Wallach [13, 14] in the cloud storage setting. Thehistory tree scheme provides a more efficient construct when compared tohash chains, used by for example CloudProof [40], where frequent commit-ments (and verification of those commitments by users) on all data stored atthe cloud provider are paramount.

Implementation details that negatively impacts the properties of crypto-graphic schemes are abound, especially in the case of anonymity10. For exam-ple, the low-latency anonymity network Tor has been widely deployed for asignificant amount of time and thus been the focus of several papers that iden-tify implementation details that negatively effect the anonymity provided by

8https://www.google.com/dashboard/, accessed 2012-07-24.9Using only one cloud provider. In the distributed setting there are more related work, see

[47] for an overview.10The fact that anonymity in particular is negatively affected is no surprise, since anonymity

can be seen as the absence of information to uniquely identify users. When cryptographic schemesare deployed as systems they are surrounded by a plethora of other systems which may leakidentifying information.

Page 27: Privacy-Preserving Transparency-Enhancing Tools

12

the network [6, 20, 25, 26, 33, 37]. Similarly, in Paper B, we identify and sug-gest mitigation for a particular implementation detail that may be a threat tothe unlinkability property of privacy-preserving secure logging schemes, suchas [24] or the system presented in Paper A. When a flaw is a consequence ofthe (physical) implementation of a particular system, it is often called a sidechannel. In Paper C, we explore side channels in cloud storage and advance thestate of the art by identifying and formalising a new side channel. The workbuilds upon work by Harnik et al. [23], who presents other closely relatedside channels in cloud storage services.

4 Research QuestionsThe overall objective of the thesis is the construction of TETs that preserveprivacy. The following two research questions are addressed in this thesis:

RQ1. What are potential threats to the privacy of users of TETs?

This question is directly addressed in Papers B and C. Paper B identifiesan implementation issue in transparency logging that poses a risk of logentries becoming linkable to other log entries and users. Paper C iden-tifies the risk posed by deduplication in cloud storage services, whichmay be used by TETs, such as the Data Track described in Paper D. Inaddition, the paper highlights the risk of profiling of users if a storageservice is not designed to provide unlinkability of storage and users.

Papers A and D indirectly addresses this research question with regardto their requirements related to security and privacy. For example, thelack of confidentiality of data disclosures (Requirement 1, Paper D),or the lack of unlinkability of log entries and users (Requirement 9,Paper A), are both examples of threats of the respective TETs to theprivacy of their users.

RQ2. How can TETs be designed to preserve privacy?

Each paper in this thesis presents possible solutions to this question.Paper A presents a TET for transparency logging that preserves pri-vacy in the sense of providing anonymous reconstruction of a log trailwhile the process that generated the log trail has both unlinkable iden-tifiers and log entries. In Paper B, a problem when implementing trans-parency logging is identified and possible solutions explored. Paper Cinvestigates side-channels in cloud storage and in the process identifiesseveral requirements that are relevant in the construction of privacy-preserving TETs that rely on cloud storage. Finally, Paper D presentsa cryptographic scheme for a TET, in the form of the Data Track, thatenables cloud storage to be used while preserving privacy.

Page 28: Privacy-Preserving Transparency-Enhancing Tools

13

5 Research MethodsThe research methods used in this thesis are the scientific and mathematicalmethods [21, 39]. Basically, both methods (iteratively) deal with (i) identify-ing and characterising a question, (ii) analysing the question and proposing ananswer, (iii) gathering evidence with the goal of determining the validity ofthe proposed answer, and (iv) reviewing the outcome of the previous steps.One essential, for the work in this thesis, difference between the two methodsis their respective setting. The mathematical method is set in formal mathe-matical models, which are abstractions of the real world. On the other hand,the scientific method is set exclusively in the real natural world. It focuses onstudying the natural world, commonly but not necessarily with the help ofmathematical tools [21].

This thesis is within the field of computer science. Broadly speaking, com-puter science is inherently mathematical in its nature [50], for example withregard to the formal theory of computation, but deals also with the applica-tion of this theory in the real world, i.e., it is a science [17]. All papers inthis thesis (more or less) ends up in both of these domains: they deal withmathematical models that later are applied in some sense, for example by im-plementation. This duality can also be found within the field of cryptography,which most of the work in this thesis deals with. Basically, the field of cryp-tography can be split into two sub-fields: applied and theoretical cryptogra-phy. Theoretical cryptography deals with the mathematical method to studythe creation11 of cryptographic primitives, while applied cryptography dealswith the scientific method to apply the results from theoretical cryptographyin the real world.

5.1 Theoretical CryptographyDirectly or indirectly, works in theoretical cryptography formally specify (i)a scheme, (ii) an adversary model, (iii) an attack, (iv) a hard problem, and (v)a proof [7, 22]. The scheme consists of protocols and algorithms that accom-plish something, such as encrypting a message using a secret key. The adver-sary model describes what an attacker has access to and can do, for examplequery a decryption oracle. The attack describes the goal of the adversary, suchas recovering the plaintext from a ciphertext. The hard problem is a mathe-matical problem that is believed, after a significant amount of research, to bea hard problem to solve. Commonly used hard problems are for example thediscrete logarithm problem or the integer factorisation problem [36]. Last,but not least, the proof is a formal mathematical proof that proves that for anadversary to accomplish the specific attack on the scheme with non-negligibleprobability, within the assumed adversary model, the adversary must solvethe hard problem. This is often referred to as a reduction, i.e., attacking thescheme is reduced to attacking the hard problem.

11Correspondingly, cryptanalysis is the study of how to break cryptographic systems, schemesor primitives. The umbrella term for cryptography and cryptanalysis is cryptology.

Page 29: Privacy-Preserving Transparency-Enhancing Tools

14

5.2 Cryptography in this ThesisIn this thesis, the TETs found in Paper A and D have not been formallyproven to be secure. We have only provided informal sketches of proofs orargued why our TETs provide different properties. Primarily, this is due tothe lack of widely accepted definitions of adversary models and goals withinthe respective settings. With this in mind, the added value of formally prov-ing some property of any of our TETs is questionable at this early stage ofour work [27, 28, 29, 32]. Secondary, faced with the task of constructingprivacy-preserving TETs in such settings, it is also a question of the scope ofthe work. Within the scope of the respective projects that lead to the twoprivacy-preserving TETs, the work was focused on building upon prior workand identifying key properties of each TET primarily with regard to privacy.These identified properties can be seen as a step towards sufficient adversarygoals in the respective settings. In Paper A, the proposed privacy-preservingTET constitutes a cryptographic system, i.e., we investigate the requirementsfor deploying the TET in the real world with real world adversaries. In Pa-per D, the proposed privacy-preserving TET is a cryptographic scheme, i.e.,we only discuss the requirements for the TET in a formal model with a spe-cific adversary model.

5.3 Research Method for Each PaperPapers A, C, and D use the mathematical method to varying degrees of com-pleteness. In Paper A, the system is formally defined and a quasi-formal ad-versary model is in place. In Paper C, a side channel is formally modelledtogether with the adversary goal. In Paper D, a scheme is formally defined,requirements are specified and formal properties of the cryptographic build-ing blocks are identified. However, the scheme is only informally evaluated.Creating the mathematical models for Papers A, C, and D have mainly beendone through literature review in the area of theoretical cryptography. Fromthe point of view of the mathematical method, the work done in Papers A, C,and D are incomplete. Paper D comes the closest to being complete, mainlylacking formal proofs instead of sketches. Section 5.2 discussed the motivationfor this approach. Future work intends to address these shortcomings.

Papers A, B, and C use the scientific method to varying degrees. Paper Adescribes a system where requirements are identified for a system that alsoconsiders real world adversaries. The evaluation of the system is done byproof of concept implementation and thorough but informal evaluation foreach identified requirement. In Paper B, an implementation issue is identifiedand different solutions are suggested. Each suggested solution is experimen-tally evaluated in terms of its overhead cost on the average insert time of newlog entries. We chose to perform experiments, for example over an analyticalapproach, due to the fact that the problem was caused by an implementationissue. In Paper C, the mathematical model of the side channel is applied toseveral different system and schemes, and the impact of the identified sidechannel is informally evaluated for each application.

Page 30: Privacy-Preserving Transparency-Enhancing Tools

15

6 Main ContributionsThis section presents the main novel contributions of this thesis.

C1. A proposal for a cryptographic system for distributed privacy-preserving logtrails. Paper A presents a novel cryptographic system for fully distributedtransparency logging of data processing where the privacy of users is pre-served. The system uses standard formally verified cryptographic prim-itives with the exception of the concept of cascading, described in C2.The system is informally but thoroughly evaluated. In addition, the pa-per also presents work on proof of concept implementations of both thesystem and enhancements by introducing a trusted state provided by cus-tom hardware. The work directly contributes to RQ2, and indirectly toRQ1 by identifying several potential threats to the privacy of the usersof the system.

C2. A method for transforming public keys in discrete logarithm asymmetricencryption schemes that is useful for enabling unlinkability between publickeys. Paper A presents the concept of cascading public keys. Given apublic key, a method is presented that transforms (i.e., cascades) the pub-lic key into another public key in such a way that decrypting encryptedcontent that was encrypted under the transformed public key requiresknowledge of the original private key and the cascade value c , used duringthe transformation. The original and transformed public key are unlink-able without knowledge of c , while the security of the transformed keyis the same as any other key in the particular scheme, which we formallyprove. This method is a key part in ensuring that the system described inPaper A preserves privacy, and therefore contributes to RQ2.

C3. A proposal for a cryptographic scheme for privacy-preserving cloud storage,where writers are minimally trusted. Paper D presents the cryptographicscheme that is built specifically for the Data Track, which entails the sep-aration of concerns between writing to and reading from a custom cloudstorage provider. The (potentially multiple) agents responsible for writ-ing to the storage are minimally trusted, while the reader has the capa-bility to both read and write. The storage provider is considered an ad-versary, and assumed to be passive (honest but curious). The scheme usesseveral known and formally verified cryptographic primitives to accom-plish anonymous storage and an accountable cloud provider with regardto data integrity. The scheme itself is informally evaluated in the paper.This work directly contributes to RQ2, since the Data Track is a TET,and indirectly to RQ1 by identifying several potential threats to the pri-vacy of the users of the Data Track.

C4. A general solution for removing the chronological order in which entriesin a relational database are stored. Paper B investigates, and presents asolution for, the issue with relational databases that the chronological or-der in which entries are inserted are preserved due to how the database

Page 31: Privacy-Preserving Transparency-Enhancing Tools

16

functions internally. This recording of the chronological order of entriesposes a threat to the unlinkability of entries, by opening up for correla-tion attacks with other sources of information. We generalise the prob-lem and present a general algorithm that destroys the chronological orderby shuffling the entries, with minimal impact on the performance of in-serting new entries into the database. We perform evaluations by experi-ment of several versions of our shuffler algorithm. This work contributesto RQ1, by identifying a particular threat, and to RQ2 by offering a so-lution to the problem.

C5. Identification and formalisation of a side channel in cloud storage services.Paper C, in the setting of public cloud storage services, identifies and for-malises a side channel due to the use of a technique called deduplication.We investigate the impact of the side channel on several related systemsand schemes. This work indirectly contributes to RQ1, since TETs (likethe Data Track described in Paper D) may use cloud storage services.

7 Summary of Appended PapersThis section summarises the four appended papers.

Paper A – Distributed Privacy-Preserving Log TrailsThis technical report describes a cryptographic system for distributed privacy-preserving log trails. The system is ideally suited for enabling transparencylogging of data processing in distributed settings, such as in the case of cloudservices. The report contains a thorough related work section with a focus onsecure logs. We further describe a software proof-of-concept implementation,enhancements possible by using custom hardware, and a proof-of-concept im-plementation of a hardware component.

Paper B – Unlinking Database EntriesThis paper investigates an implementation issue for a privacy-preserving log-ging scheme with using relational databases for storing log entries. If thechronological order of log entries can be deduced from how they are stored,then an attacker may use this information and correlate it with other sources,ultimately breaking the unlinkability property of the logging scheme. Thepaper investigates three different solutions for destroying the chronologicalorder of log entries when they are stored in a relational database. Our resultsshow that at least one of our solutions are practical, with little to no noticeableoverhead on average insert times.

Paper C – (More) Side Channels in Cloud StorageThis paper explores side channels in public cloud storage services, in particularin terms of linkability of files and users when the deduplication technique is

Page 32: Privacy-Preserving Transparency-Enhancing Tools

17

used by the service provider by default across users. The paper concludes thatdeduplication should be disabled by default and that storage services shouldbe designed to provide unlinkability of users and data, regardless of if the datais encrypted or not.

Paper D – Privacy-Friendly Cloud Storage for the Data TrackThis paper describes a cryptographic scheme for privacy-friendly cloud stor-age for the Data Track. The Data Track is a TET built around the concept ofproviding users with an overview of their data disclosures from where theycan exercise their rights to access, rectify, and delete data stored at remote re-cipients of data disclosures. The scheme allows users to store their data disclo-sures anonymously, while the cloud provider is kept accountable with regardto the integrity of the stored data. Furthermore, the Data Track Agents thatare responsible for storing data disclosures at the cloud provider are minimallytrusted.

8 Conclusions and Future WorkEnsuring that TETs preserve privacy is of key importance with regard to howefficient the tools are at their primary purpose: addressing information asym-metries. TETs are becoming more and more important due to the prolifer-ation of IT that often leads to further information asymmetries. After all,transparency is just as important in cyberspace as in meatspace, where it hasand continues to play a key role in keeping entities honest in both the pub-lic and private sectors. This thesis contains four papers with the overarchinggoal of constructing TETs that preserve privacy. Ultimately, we hope that ourwork contributes to making cyberspace more just.

Future work for both the transparency logging TET and the Data Trackis planned within the scope of another Google research award project and theEuropean FP7 research project A4Cloud. The Data Track scheme for anony-mous cloud storage will be generalised to the regular cloud storage setting (likeDropbox12) and used to store personas13. The Data Track itself will be furtherenhanced with regard to exploring how to realise ‘the right to be forgotten’and how it can be integrated with the transparency logging. Transparencylogging will be used as a part to make cloud services accountable with regardto their data processing within the A4Cloud project. We plan to ultimatelyformally model and prove several key properties of the transparency loggingscheme, within an adequate adversary model with proper adversary goals.

12https://www.dropbox.com/, last accessed 2012-08-03.13Personas can be seen as profiles for users depending on what role they play within a context.

Page 33: Privacy-Preserving Transparency-Enhancing Tools

18

References[1] Rafael Accorsi. Automated privacy audits to complement the notion

of control for identity management. In Elisabeth de Leeuw, SimoneFischer-Hübner, Jimmy Tseng, and John Borking, editors, Policies andResearch in Identity Management, volume 261 of IFIP International Fed-eration for Information Processing. Springer-Verlag, 2008.

[2] Rafael Accorsi. Bbox: A distributed secure log architecture. In JanCamenisch and Costas Lambrinoudakis, editors, EuroPKI, volume 6711of Lecture Notes in Computer Science, pages 109–124. Springer, 2010.

[3] Alessandro Acquisti and Ralph Gross. Imagined communities: Aware-ness, information sharing, and privacy on the Facebook. In GeorgeDanezis and Philippe Golle, editors, Privacy Enhancing Technologies,volume 4258 of Lecture Notes in Computer Science, pages 36–58. Springer,2006.

[4] Christer Andersson, Jan Camenisch, Stephen Crane, Simone Fischer-Hübner, Ronald Leenes, Siani Pearsorr, John Sören Pettersson, and Di-eter Sommer. Trust in PRIME. In Signal Processing and InformationTechnology, 2005. Proceedings of the Fifth IEEE International Symposiumon, pages 552 –559, December 2005.

[5] Stefan Arping and Zacharia Sautner. Did SOX section 404 make firmsless opaque? evidence from cross-listed firms. Contemporary Account-ing Research, Forthcoming, 2012.

[6] Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, andDouglas Sicker. Low-resource routing attacks against Tor. In Proceedingsof the 2007 ACM workshop on Privacy in electronic society, WPES ’07,pages 11–20, New York, NY, USA, 2007. ACM.

[7] Mihir Bellare. Practice-oriented provable-security. In Eiji Okamoto,George I. Davida, and Masahiro Mambo, editors, ISW, volume 1396 ofLecture Notes in Computer Science, pages 221–231. Springer, 1997.

[8] Ames Bielenberg, Lara Helm, Anthony Gentilucci, Dan Stefanescu, andHonggang Zhang. The growth of diaspora - a decentralized online socialnetwork in the wild. In INFOCOM Workshops, pages 13–18. IEEE, 2012.

[9] Sonja Buchegger, Doris Schiöberg, Le Hung Vu, and Anwitaman Datta.PeerSoN: P2P social networking - early experiences and insights. InProceedings of the Second ACM Workshop on Social Network Systems So-cial Network Systems 2009, pages 46–52, Nürnberg, Germany, March 31,2009.

[10] Jan Camenisch and Els Van Herreweghen. Design and implementationof the idemix anonymous credential system. In Vijayalakshmi Atluri, ed-itor, ACM Conference on Computer and Communications Security, pages21–30. ACM, 2002.

Page 34: Privacy-Preserving Transparency-Enhancing Tools

19

[11] Jan Camenisch, Ronald Leenes, and Dieter Sommer, editors. PRIME– Privacy and Identity Management for Europe, volume 6545 of LectureNotes in Computer Science. Springer Berlin, 2011.

[12] Ann Cavoukian. Privacy by design. Information & PrivacyCommissioner, Ontario, Canada, http://www.ipc.on.ca/images/Resources/privacybydesign.pdf, accessed 2012-07-07.

[13] Scott A. Crosby and Dan S. Wallach. Efficient data structures fortamper-evident logging. In USENIX Security Symposium, pages 317–334.USENIX Association, 2009.

[14] Scott Alexander Crosby. Efficient tamper-evident data structures for un-trusted servers. PhD thesis, Rice University, Houston, TX, USA, 2010.

[15] Leucio Antonio Cutillo, Refik Molva, and Melek Önen. Safebook: Adistributed privacy preserving online social network. In 12th IEEE In-ternational Symposium on a World of Wireless, Mobile and MultimediaNetworks (WOWMOM), pages 1–3. IEEE, 2011.

[16] Roger Dingledine, Nick Mathewson, and Paul F. Syverson. Tor: Thesecond-generation onion router. In USENIX Security Symposium, pages303–320. USENIX, 2004.

[17] Gordana Dodig-Crnkovic. Scientific methods in computer science. InConference for the Promotion of Research in IT at New Universities and atUniversity Colleges in Sweden, April 2002.

[18] FIDIS WP7. D 7.12: Behavioural Biometric Profiling and Trans-parency Enhancing Tools. Future of Identity in the Information Society,http://www.fidis.net/resources/deliverables/profiling/, March 2009.

[19] Simone Fischer-Hübner and Matthew Wright, editors. Privacy Enhanc-ing Technologies - 12th International Symposium, PETS 2012, Vigo, Spain,July 11-13, 2012. Proceedings, volume 7384 of Lecture Notes in ComputerScience. Springer, 2012.

[20] Yossi Gilad and Amir Herzberg. Spying in the Dark: TCP and TorTraffic Analysis. In Fischer-Hübner and Wright [19], pages 100–119.

[21] Peter Godfrey-Smith. Theory and Reality: An Introduction to the Philos-ophy of Science. Science and Its Conceptual Foundations. University ofChicago Press, 2003.

[22] Shafi Goldwasser and Silvio Micali. Probabilistic encryption. Journal ofComputer and System Sciences, 28(2):270–299, 1984.

[23] Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Side chan-nels in cloud services: Deduplication in cloud storage. IEEE Security &Privacy, 8(6):40–47, November-December 2010.

Page 35: Privacy-Preserving Transparency-Enhancing Tools

20

[24] Hans Hedbom, Tobias Pulls, Peter Hjärtquist, and Andreas Lavén.Adding secure transparency logging to the PRIME Core. In MicheleBezzi, Penny Duquenoy, Simone Fischer-Hübner, Marit Hansen, andGe Zhang, editors, Privacy and Identity Management for Life, volume 320of IFIP Advances in Information and Communication Technology, pages299–314. Springer Boston, 2010. 10.1007/978-3-642-14282-6_25.

[25] Nicholas Hopper, Eugene Y. Vasserman, and Eric Chan-Tin. How muchanonymity does network latency leak? ACM Transactions on Informa-tion and System Security (TISSEC), 13(2):13:1–13:28, March 2010.

[26] Rob Jansen, Paul Syverson, and Nicholas Hopper. Throttling Tor Band-width Parasites. In Proceedings of the 21st USENIX Security Symposium,August 2012.

[27] Neal Koblitz. The Uneasy Relationship Between Mathematics andCryptography. Notices of the AMS, 54(8):973–979, September 2007.

[28] Neal Koblitz and Alfred Menezes. Another look at “provable security”.Cryptology ePrint Archive, Report 2004/152, 2004. http://eprint.iacr.org/.

[29] Neal Koblitz and Alfred Menezes. Another look at "provable secu-rity" II. Cryptology ePrint Archive, Report 2006/229, 2006. http://eprint.iacr.org/.

[30] Marc Langheinrich. Privacy by design - principles of privacy-aware ubiq-uitous systems. In Gregory D. Abowd, Barry Brumitt, and Steven A.Shafer, editors, Ubicomp, volume 2201 of Lecture Notes in Computer Sci-ence, pages 273–291. Springer, 2001.

[31] Toby Mendel and UNESCO. Freedom of Information: A ComparativeLegal Survey. United Nations Educational and Scientific Cultural Orga-nization, Regional Bureau for Communication and Information, 2008.

[32] Alfred Menezes. Another look at provable security. In DavidPointcheval and Thomas Johansson, editors, EUROCRYPT, volume7237 of Lecture Notes in Computer Science, page 8. Springer, 2012.

[33] Steven J. Murdoch and George Danezis. Low-cost traffic analysis ofTor. In IEEE Symposium on Security and Privacy, pages 183–195. IEEEComputer Society, 2005.

[34] Juha Mustonen and Anders Chydenius. The World’s First Freedom ofInformation Act: Anders Chydenius’ Legacy Today. Anders ChydeniusFoundation publications. Anders Chydenius Foundation, 2006.

[35] United Nations Department of Economic and Social Affairs. UN e-Government Survey 2012. E-Government for the People. 2012.

Page 36: Privacy-Preserving Transparency-Enhancing Tools

21

[36] European Network of Excellence in Cryptology II. D.MAYA.3 – MainComputational Assumptions in Cryptography. April 2010.

[37] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas En-gel. Website fingerprinting in onion routing based anonymization net-works. In Yan Chen and Jaideep Vaidya, editors, WPES, pages 103–114.ACM, 2011.

[38] Martin Pirker, Daniel Slamanig, and Johannes Winter. Practical privacypreserving cloud resource-payment for constrained clients. In Fischer-Hübner and Wright [19], pages 201–220.

[39] George Pólya. How to solve it: a new aspect of mathematical method.Science study series. Doubleday & Company, Inc, 1957.

[40] Raluca Ada Popa, Jacob R. Lorch, David Molnar, Helen J. Wang, andLi Zhuang. Enabling security in cloud storage SLAs with Cloud-Proof. In Proceedings of the 2011 USENIX Annual Technical Conference,USENIXATC’11, pages 355–368, Berkeley, CA, USA, 2011. USENIXAssociation.

[41] PrimeLife WP4.2. End User Transparency Tools: UI Prototypes. InErik Wästlund and Simone Fischer-Hübner, editors, PrimeLife Deliv-erable D4.2.2. PrimeLife, http://www.PrimeLife.eu/results/documents,June 2010.

[42] Registratiekamer, Rijswijk, The Netherlands and Information and Pri-vacy Commissioner, Ontario, Canada. Privacy-enhancing Technologies:The Path to Anonymity (Volume I). Office of the Information & PrivacyCommissioner of Ontario, 1995.

[43] Stefan Sackmann, Jens Strüker, and Rafael Accorsi. Personalization inprivacy-aware highly dynamic systems. Communications of the ACM(CACM), 49(9):32–38, September 2006.

[44] Frederick Schauer. Transparency in three dimen-sions. University of Illinois Law Review, volume 2011,number 4, http://illinoislawreview.org/article/transparency-in-three-dimensions/, accessed 2012-06-27.

[45] Bruce Schneier and John Kelsey. Cryptographic support for secure logson untrusted machines. In Proceedings of the 7th conference on USENIXSecurity Symposium - Volume 7, SSYM’98, pages 53–62, Berkeley, CA,USA, 1998. USENIX Association.

[46] Daniel Slamanig. Efficient schemes for anonymous yet authorized andbounded use of cloud resources. In Ali Miri and Serge Vaudenay, ed-itors, Selected Areas in Cryptography, volume 7118 of Lecture Notes inComputer Science, pages 73–91. Springer, 2011.

Page 37: Privacy-Preserving Transparency-Enhancing Tools

22

[47] Daniel Slamanig and Christian Hanser. A closer look at distributedcloud storage: And what about access privacy? To appear.

[48] Christopher Soghoian. The history of the do nottrack header. http://paranoia.dubfire.net/2011/01/history-of-do-not-track-header.html, accessed 2012-07-10.

[49] The European Commission. Commission proposes a comprehen-sive reform of the data protection rules. http://ec.europa.eu/justice/newsroom/data-protection/news/120125_en.htm, ac-cessed 2012-07-11.

[50] Alan M. Turing. On computable numbers, with an application to theentscheidungsproblem. Proceedings of the London Mathematical Society,42:230–265, July 1936.

[51] Karel Wouters, Koen Simoens, Danny Lathouwers, and Bart Preneel.Secure and privacy-friendly logging for egovernment services. Availabil-ity, Reliability and Security, International Conference on, 0:1091–1096,2008.