Engaging data

6

Click here to load reader

Transcript of Engaging data

Page 1: Engaging data

An Internet Data Sharing Framework For Balancing Privacyand Utility

Erin E. KenneallyCAIDA/UCSD ∗

[email protected]

Kimberly ClaffyCAIDA/[email protected]

1. INTRODUCTIONWe re-visit the common assumption that privacy risks

of sharing1 Internet infrastructure data outweigh thebenefits, and suggest that we have a window of opportu-nity in which to apply methods for undertaking empir-ical Internet research that can lower privacy risks whileachieving research utility. The current default, defen-sive posture to not share network data derives from thepurgatory formed by the gaps in regulation and law,commercial pressures, and evolving considerations ofboth threat models and ethical behavior. We proposesteps for moving the Internet research stakeholder com-munity beyond the relatively siloed, below-the-radardata sharing practices and into a more reputable andpervasive scientific discipline, by self-regulating througha transparent and repeatable sharing framework.

The threat model from not data sharing is necessar-ily vague, as damages resulting from knowledge man-agement deficiencies are beset with causation and cor-relation challenges. And at a more basic level, we lacka risk profile for our communications fabric, partly as aresult of the data dearth. Notably, society has not feltthe pain points that normally motivate legislative, ju-dicial or policy change – explicit and immediate “bodycounts” or billion dollar losses. But we must admit, thepolicies that have given rise to the Internet’s tremen-dous growth and support for network innovations havealso rendered the entire sector opaque, unamenable toobjective empirical macroscopic analysis, in ways andfor reasons disconcertingly resonant with the U.S. finan-cial sector before its 2008 meltdown. The opaqueness,juxtaposed with this decade’s proliferation of Internetsecurity, scalability, sustainability, and stewardship is-sues, is a cause for concern for the integrity of the in-frastructure, as well as for the security and stability ofthe information economy it supports [9].

Strategies to incentivize sharing by amending or en-acting legislation merit consideration, and if the past

∗This project is sponsored by the U.S. Department of Home-land Security (DHS) Science and Technology (S&T) Direc-torate.1By ”sharing” we mean any deliberate exchange, disclosure,or release of lawfully possessed data by aData Provider (DP) to one or more Data Seekers (DS).

is any indication, communications legislation will even-tually be updated to reflect data sharing challenges [5].However, regulation, especially in the technology arena,is largely reactive to unanticipated side-effects and dam-ages, rather than proactive, fundamental adjustmentsto predictable difficulties. The behavioral advertisingindustry is an instructive example; the FTC deferredto industry self-regulation [11] until repeated failuresat market-based management of privacy risk inducedgovernment action [10]. Furthermore, the length of thelegislative policy cycle, confluence of variables involvedin changing law, and unclear influence dynamics are outof sync with the need for immediate solutions that in-terested stakeholder DS and DPs can execute.

Internet research stakeholders have an opportunityto tip the risk scales in favor of more protected datasharing by proactively implementing appropriate man-agement of privacy risks. We seek to advance this objec-tive by outlining a model – the Privacy-Sensitive Shar-ing (PS2) framework – that effectively manages privacyrisks that have heretofore impeded more than ad hocor nod-&-a-wink data exchanges. Our model integratesprivacy-enhancing technology with a policy frameworkthat applies proven and standard privacy principles andobligations of data seekers and data providers. We eval-uate our policies and techniques along two primary cri-teria: (1) how they address privacy risks; and, (2) howthey achieve utility objectives.

2. CHALLENGES AND MOTIVATIONSResearchers have argued that greater access to real

network traffic datasets would “cause a paradigmaticshift in computer security research.”[1] While data providers(DP) acknowledge the potential benefits of sharing, theyare sufficiently uncertain about the privacy-utility riskthat they yield to a normative presumption that therisks outweigh potential rewards.

At present, there is no legal framework that pre-scribes, explicitly incentivizes, or forbids the sharing ofnetwork measurement data. Implicit incentives to sharemeasurement data exist, but implementations have mostlyfloundered. Data sharing relationships that occur aremarket-driven or organically developed. Unsurprisinglythen, there are no widespread and standard proceduresfor network measurement data exchange. Inconsistent,

1

Page 2: Engaging data

ad hoc and/or opaque exchange protocols exist, butmeasuring their effectiveness and benefit is challenging.A formidable consequence is the difficulty of justifyingresources funding for research and other collaborationcosts that incentivize a sharing regime.

Privacy is hard to quantify, as is the utility of em-pirical research. Both variables are dynamic and lacknormative understanding among domain professionalsand the general citizenry. In addition to lacking a com-mon vocabulary, the fields of information privacy andnetwork science both lack structured, uniform meansof analysis to parse issues and understand and com-municate requirements, and common specific use cases.There is no cost accounting formula for privacy liabili-ties, nor ROI formula for investment in empirical net-work traffic research. The conundrum, admittedly notunique to Internet research, is that the risk-averse dataprovider needs utility demonstrated before data is re-leased, and the researcher needs data to prove utility.

The rational predilection against sharing is strength-ened by an uncertain legal regime and the social costsof sensationalism-over-accuracy-driven media accountsin cases of anonymized data being reverse engineered.Although there is interest in efficient sharing of mea-surement data, it hangs against a backdrop of legal am-biguity and flawed solution models. This backdrop andour experiences with data-sharing inform the Privacy-Sensitive Sharing framework (PS2) we propose.

2.1 An Uncertain Legal RegimeThe concept of personally identifiable information (PII)

is central to privacy law and data stewardship. Am-biguity over this fundamental concept drives privacyrisk assessment. Privacy presumes identity so unlessidentity is defined in relation to network data artifacts,the notions of privacy and PII are already disjointedin the Internet data realm. Both the legal and Inter-net research communities acknowledge this unclarity re-garding PII in Internet data – its definition is context-dependent, both in terms of technology and topology.Definitional inconsistencies are exacerbated as evolu-tion of technologies and protocols increases capabilitiesand lowers the cost of linking network data to individ-uals. Against this dynamic interpretation of PII, manyprivacy-related laws must find practical application.

Much of the risk management challenge lies in thislinguistic incongruity between the legal and technicaldiscourse about traffic data – its definitions, classifica-tions and distinctions. The legal perspective apportionsdifferent risk to content versus addressing based on anecessarily limiting analogy to human communicationsversus machine instructions, respectively. Similarly, of-ficers of the court associate IPAs with a greater privacyrisk than URLs, based on our past and still partial abil-ity to link those to an individual. This distinction was

always artificial (albeit not unfounded) since both typesof data reference a device or virtual location rather thanan individual, and many URLs directly reveal muchmore user information than an IP address.

Our jurisprudence contributes to (or reflects) our cog-nitive dissonance about privacy. The U.S. legal regimehas not consistently or comprehensively protected IPAsor URLs as PII, and caselaw largely fails to recognizea reasonable expectation of privacy in IPAs and URLs(i.e., no Fourth Amendment constitutional protection).Privacy statutes like HIPAA explicitly include IPA asprotected personal data, while the majority of statedata protection laws do not. Juxtaposed with this dis-parity is a reasonable if not normative societal expec-tation of privacy in those traffic components (i.e, IPAor URLs with search terms) as the digital fingerprintsbetween anonymous bits and their carbon-based source.Technologies developed and deployed to strengthen theprivacy of these components, e.g., IPA masking servicessuch as Tor, in-browser automated URL deletion fea-tures, suggest their privacy is considered a social good.

And yet, IPA data is the principal evidentiary un-derpinning in affidavits for search warrants or subpoe-nas relied upon by private intellectual property owners(e.g. RIAA) and government investigators to identifyand take legal action against the person associated withthe IPA. In practice there is little functional differentia-tion between IPAs/URLs and other, privacy-protectedPII, yet the related legal treatment is far less consistent.

2.2 Flawed Technology ModelsIn addition to muddy legal waters, technology mod-

els for data-sharing proposed thus far have failed to ac-complish their own objectives. Efforts by the technicalcommunity focus on improving computing technologiesto solve the privacy problem, in particular anonymizingdata components considered privacy-sensitive. Sinceprivacy risk is influenced by shifting contexts associ-ated with relationships between people, data, technol-ogy and institutions, these ”point” solutions inherentlyfail to address the range of risks associated with thelifecycle of shared data. Like networked systems them-selves, the privacy threat is evolving, unpredictable, andin constant tension with user needs. It must thereforebe managed accordingly, including creating and rein-forcing conditions that allow privacy technology to beeffective.

Researchers have advocated and supported sharingdata for years [2, 16, 7]. Recognizing that these purelytechnical efforts have had limited success in support-ing needed cybersecurity research, DHS developed thePREDICT project to formalize a policy framework forbalancing risk and benefit between data providers andresearchers. A policy control framework enables thetechnical dials to allow more privacy risk if a specific

2

Page 3: Engaging data

use justifies it. A classic example is how to anonymizeIPAs critical to answering research questions – tracesprotected by prefix-preserving anonymization may besubject to re-identification risk or content observationrisk, but policy controls can help data providers mini-mize the chances that sensitive information is misusedor wrongfully disclosed.

2.3 Privacy Risks of Internet Research – theCourt of Public Opinion

The privacy risks of data sharing fall into two cat-egories: disclosure and misuse. Although we focus onthe risks of data sharing, our model can also be appliedto initial data collection, which also poses significantprivacy risks that should not be ignored.

Public disclosure is the act of making information ordata readily accessible and available to the general pub-lic via publication or posting on the web, including logfiles with IP addresses of likely infected hosts.

Accidental or malicious disclosure is the act of mak-ing information or data available to a third party(s) asa result of inadequate data protection [4].

Compelled disclosure to third parties risk arises withthe obligations attendant to possessing data, such ashaving to respond to RIAA subpoenas requesting datadisclosure in lawsuits.

Government disclosure involves the release of data togovernment entities, illustrated in the NSA wiretappingrevelations and subsequent EFF lawsuits [8].

Misuse of user or network profiles such as with net-work traffic that contains proprietary or security-sensitiveinformation, or reveals user behaviors, associations, pref-erences or interests, knowledge that attackers – or ad-vertisers – can then exploit.

Inference misuse risk involves synthesizing first- or second-order PII to draw (possibly false and damaging) impli-cations about a persons behavior or identity.

Re-identification or de-anonymizing misuse risk.Anonymization involves obfuscating sensitive PII by re-placing it completely or partially with synthetic identi-fiers, or using aggregation or statistical techniques in-tended to break the connection between the persons andreference identifiers. Re-identification or de-anonymization,conversely, involves reversing data masks to link an ob-fuscated identifier with its associated person. Sharedanonymized data poses a misuse risk because it is vul-nerable to reidentification attacks that make use of in-creasingly available public or private information be-yond the knowledge or control of the original or inter-mediate data provider [15, 13].

De-anonymization risk bears special consideration inthe growing incongruity around PII. DPs face increas-ing legal and societal pressures to protect the expandingamounts of PII they amass for legitimate business pur-poses. Yet, DPs are under equal pressure from the mar-

ketplace to uncover and exploit PII in order to betterconnect supply and demand and increase profit marginson their goods and services. DPs will have a growinginterest in anonymization to avoid triggering laws thathinge on definitions of PII which exempt aggregate oranonymized data. In the meantime, both legitimateand criminal consumers of PII are motivated to ad-vance de-anonymization techniques to uncover sensitiveidentity data, as well as strategies to extract informa-tiotn without direct access to data by ‘sending code tothe data’ [12]. Similar to the arms race between ex-ploits and defenses in the systems security arena, de-anonymization techniques will become commoditized,lowering the barrier to extracting PII for investigativereporting, law enforcement, business intelligence, re-search, legal dispute resolution, and the presumed crim-inal threatscape.

2.4 Communicating the Benefits of ResearchInternet measurement supports empirical network sci-

ence [6], which pursues increased understanding of thestructure and function of critical Internet infrastruc-ture, including topology, traffic, routing, workload, per-formance, and threats and vulnerabilities. But networkresearchers and funding agencies have yet to outline anetwork science agenda [3], partly due to their lack ofvisibility into the nation’s most critical network infras-tructure, but also because the field is quite young rela-tive to traditional scientific disciplines. In light of thischallenge, we offer the following criteria to help measureand communicate empirical network research utility:

• Is the objective for data use positively related tosocial welfare?

• Is there a need for such empirical research?• Is such network research already conducted?• Could the research be conducted without the data?• Is sufficiently similar data already being collected

elsewhere that could be shared?• Are research and peer review methods as trans-

parent, objective, and scientific as possible whilebeing responsible to privacy concerns?

• Can the results be acted upon meaningfully?• Are the results capable of being integrated into

operational or business processes? Or security im-provements, e.g., situational awareness of criticalinfrastructure?

3. PS2 FRAMEWORK: DESCRIPTION ANDEVALUATION

We propose a repeatable process for data sharingwhich employs and enforces these techniques – a pro-cess that supports sharing of not only the data, but alsothe obligations and responsibilities for privacy risk. Wedescribe our Privacy-Sensitive Sharing Framework andthen evaluate its ability to address the privacy risks

3

Page 4: Engaging data

outlined in 2.3 and the utility criteria in 2.4. A coreprinciple of the framework is that privacy risks associ-ated with shared Internet data are contagious – if thedata is transferred, responsibility for containing the risklies with both provider and seeker of data. Recogniz-ing that privacy risk management is a collective actionproblem, the PS2 hybrid framework contains this riskby replicating the collection, use, disclosure and dispo-sition controls over to the data seeker.

Our strategy presumes that the DP has a lawful own-ership or stewardship right to share the network data,and that the associated data privacy obligations travelcommensurate with those rights. Sharing data does notdischarge the DP from control or ownership obligations.This inextricable connection is essential to engenderingthe trust that facilitates data exchanges.

3.1 Components of the PS2 FrameworkThe components of our framework are rooted in the

principles and practices that underlie privacy laws andpolicies on both the national and global levels.2

• Authorization – Internal authorization to sharewith exchange partner(s) involves explicit consentof the DP and DS, and may require the consentof individuals identified or identifiable in networktraffic, which can often be implicit by way of proxyconsent with the DP. Requirements for consent toInternet traffic monitoring are unresolved, but willno doubt be a part of forthcoming legal, policy andcommunity decisions. In addition, the DP and DSshould obtain some external explicit authorizationto share data, such as through an external advisoryor publication review committee, or an organiza-tion’s Institutional Review Board (IRB).

• Transparency – The DP and DS should agree onthe objectives and obligations associated with shareddata. Terms might require that algorithms bemade public but data or conclusions protected, orvice-versa. Transparency may require validatinganalytic models or algorithms. It should be possi-ble to apply scientific principles of testability andfalsifiability to analytical conclusions.

• Compliance with applicable law(s) – Collection anduse of data should comport to laws that have anauthoritative interpretation about proscribed be-haviors or mandated obligations.

• Purpose adherence – The data should be used toachieve its documented purpose for being shared.

• Access limitations – The shared data should be re-stricted from non-DS partners (government, thirdparties) and those within the DS organization who

2In particular, the Fair Information Practices (FIPS) areconsidered de facto international standards for informationprivacy and address collection, maintenance, use, disclosure,and processing of personal information.

do not have a need and right to access it.• Use specification and limitation – Unless otherwise

agreed, the DP should disallow merging or linkingidentifiable data contained in the traffic data.

• Collection and Disclosure Minimization – The DSshould apply privacy-sensitive techniques to stew-ardship of the network traffic such as:

A. Deleting all sensitive data.B. Deleting part(s) of the sensitive data.C. Anonymizing, or otherwise de-identifying all

or parts of sensitive data.D. Aggregation or sampling techniques, such as

scanning portions of networks and generaliz-ing results.

E. Mediation analysis, e.g., ’sending code to thedata’ (or a person) in lieu of sharing data.

F. Aging the data such that it is no longer cur-rent or linkable to a person.

G. Size limitations, e.g., minimizing the quantityof traces that are shared.

H. Layered anonymization, e.g., applying mul-tiple levels of anonymization are applied tolower the de-identification countermeasure ef-fectiveness.

• Audit tools – Techniques for provable compliancewith policies for data use and disclosure, e.g., se-cure audit logging via a tamper-resistant, crypto-graphically protected device connected to but sep-arate from the protected data, accounting policiesto enforce access rules on protected data.

• Redress mechanisms – Technology and proceduresto address harms from erroneous use or disclosureof data, including feedback mechanism to supportcorrections of data or erroneous conclusions.

• Oversight – Following authorization, any systemicsharing or program needs subsequent third-partychecks and balances such as Institutional ReviewBoards (IRB), external advisory committees, or asponsoring organization.

• Quality data and analyses assurances – Processesshould reflect awareness by DS and DP of falsepositives and inference confidence levels associatedwith the analyses of shared, privacy-sensitive data.

• Security – Controls should reasonably ensure thatsensitive PII is protected from unauthorized col-lection, use, disclosure, and destruction.

• Training – Education and awareness of the privacycontrols and principles by those who are autho-rized to engage the data.

• Impact assessment – Experiment design should con-sider affected parties and potential collateral ef-fects with the minimal standard of doing no fur-ther harm [14]. Considerations include possiblepsychological harm, physical harm, legal harm, so-cial harm and economic harm.

4

Page 5: Engaging data

• Transfer to third parties – Further disclosure shouldbe prohibited unless the same data control obli-gations are transferred to anyone with whom thedata is shared, proportionate to the disclosure riskof that data.

• Privacy Laws – Existing laws can provide enforce-able controls and protections that can help shapesharing dynamics. For example, the ElectronicCommunications and Privacy Act has well-definedparameters for traffic content disclosure based onwhether the DS is a government entity.

3.2 Vehicles for Implementing PS2sGiven the legal grey areas and ethical ambiguity around

disclosure and use of network measurement data dis-cussed in Section 2.1, we recommend MOUs, MOAs,model contracts, and binding organizational policy asenforceable vehicles for addressing privacy risk bothproactively and reactively. For intended wider disclo-sure of data it may be cost-preferential and risk pro-portional to release data under a unidirectional (blan-ket) AUP, which trades off lower per-sharing negotia-tion costs associated with bilateral agreements for po-tentially lower DS privacy protections and greater DPenforcement and compliance costs.

The research community would benefit from devel-oping and publishing a reference sets of MOUs whichembed this or a similar framework, in support of a com-mon interpretation of acceptable sharing and reasonablepractices. Recognizing that the law is not a one-wayproscription, but ultimately reflects and institutional-izes community norms, network researchers have a win-dow of opportunity to pre-empt privacy risks in legallygrey areas, while informing policymakers of possibilitiesthey are not well-positioned to discern themselves.

3.3 How well the PS2 addresses privacy risksWe evaluate the effectiveness of the Privacy Sensitive

Sharing Framework in balancing the risks and benefitsby: (1) assessing how it addresses the privacy risks out-lined in 2.3; and (2) assessing whether it impedes theutility goals described in 2.4. Tables 1 and 2 illustratethe value of the hybrid policy-technology model.

Although the risks and benefits we enumerate arenot all-inclusive, they are a strong foundation for en-gaging in privacy-utility risk management. For DS, itprovides a tool to examine whether the proposed re-search balances the privacy risks and utility rewards.For an oversight committee, it helps determine whetherpossible risks are justified, by explicitly asking the userto assess sharing risks against technical and policy con-trols, as well as to assess the achievement of utility goalsagainst those controls. For prospective DP, the assess-ment will assist the determination of whether or not toparticipate. This framework can also be applied as a

self-assessment tool for data sharing engagements.

3.4 How well the PS2 promotes utility goalsTable 1 suggests that the Minimization techniques

(the technical controls discussed in 3.1) partly or com-pletely address privacy risks, implying little need fora policy control backdrop. However, when evaluatinghow the Minimization techniques fare against the Util-ity Goals in Table 2, they mostly fail, revealing the lim-its of a one-dimensional technical approach. This failureis unsurprising, since data minimization techniques in-tentionally obfuscate information often essential to net-work management and troubleshooting, countering se-curity threats, and evaluating algorithms, applications,and architectures. A purely technical approach breaksdown along the utility dimension, justifying a hybridstrategy that addresses both privacy risk and utilitygoals.

An evaluation of the PS2 framework should also con-sider practical issues such as education costs, whethernew privacy risk(s) are introduced, whether control(s)are forward-looking only or address legacy privacy risks,and free rider problems created by NPs who choose notto share.

4. CONCLUSIONSWe proposed a Privacy Sensitive Sharing framework

that offers a consistent, transparent and replicable eval-uation methodology for risk-benefit evaluation ratherthan relying on subjective, opaque and inconsistent eval-uations that turn on “trust me” decision metrics. Indesigning the framework we have considered practicalchallenges confronting security professionals, networkanalysts, systems administrators, researchers, and re-lated legal advisors. We have also emphasized the propo-sition that privacy problems are exacerbated by a short-age of transparency surrounding the who, what, when,where, how and why of information sharing that car-ries privacy risks. Transparency is defeated if one ofthe sharing parties is dishonest about its objectives, orif the DP or DS’s organizational motivations supersedethe desire to engender trust. We developed PS2 to en-able transparency as a touchstone of data-sharing.

Documented experimentation with protected data-sharing models like PS2 will allow those professionalswith access to ground truth about privacy risks andmeasurement rewards to practically influence policy andlaw at these crossroads. Indeed, we believe a window ofopportunity exists in this time and space of uncertaintyin interpreting and applying privacy-related laws to thesharing of network traffic and other data for research.Rather than wait for law and policymakers to magi-cally “get it right”, it behooves infrastructure operatorsand researchers to self-regulate, until and unless and inthe interest of informing clarity from top down regimes.Information security controls were initially considered a

5

Page 6: Engaging data

financial liability until regulations rendered lack of secu-rity a compliance liability. We optimistically anticipatecircumstances that reveal that rather than data-sharingbeing a risk, not sharing data is a liability. We offer thePS2 as a tool to help community mindset(s) move (orin some cases, already moving) in that direction.

Privacy Risk/PS2 Public

Dis

clos

ure

Com

pel

led

Dis

clos

ure

Mal

icio

us

Dis

clos

ure

Gov

ernm

ent

Dis

clos

ure

Mis

use

Infe

rence

Ris

k

Re-

IDR

isk

Authorization X X X X XTransparency X X X X XLaw Compliance X X XAccess Limitation X X X XUse Specification X X X XMinimization XAudit Tools X X X X X X XRedress X X X X X X XOversight X X X XData Quality X X X X XSecurity X X XTraining/Education X X X XImpact Assessment X X X X X

Table 1: Privacy risks evaluated against the PS2privacy protection components. (Minimization

refers to the techniques evaluated in Table 2.)

5. REFERENCES[1] Allman, M., and Paxson, V. Issues and

etiquette concerning use of shared measurementdata. In IMC (2007).

[2] Allman, M., Paxson, V., and Henderson,

T. Sharing is caring: so where are your data?ACM SIGCOMM Computer Communication

Review 38, 1 (Jan 2008).[3] Assocation, C. R. Network Science and

Engineering (NetSE) Council, 2008.http://www.cra.org/ccc/netse.php.

[4] Barbaro, M., and T. Zeller, J. A Face isExposed for AOL Searcher No. 4417749. New

York Times (Aug 2006).[5] Burstein, A. Amending the ecpa to enable a

culture of cybersecurity research. Harvard Journal

of Law & Technology 22, 1 (December 2008),167–222.

[6] C. B. Duke, et al., Ed. Network Science. TheNational Academies Press, Washington, 2006.

[7] CAIDA. DatCat. http://www.datcat.org.

Min. Tech. / Utility IsP

urp

ose

Wor

thw

hile?

Isth

ere

anee

d?

Isit

alre

ady

bei

ng

don

e?

Are

ther

eal

tern

ativ

es?

Isth

ere

asc

ientific

bas

is?

Can

resu

lts

be

acte

dupon

?

Can

DS

&D

Pim

ple

men

t?

Rea

sonab

leed

uca

tion

cost

s?

For

war

d&

bac

kw

ard

contr

ols?

No

new

pri

vacy

risk

scr

eate

d?

No

free

rider

pro

ble

mcr

eate

d?

Not Sharing X X X X X X XDelete All X X X X X X X XDelete Part X X X X X X XAnonymize X X X X X X X X XAggregate X X X X X X XMediate (SC2D) X X X XAge Data X X X X X X XLimit Quantity X X X X X X X X XLayer Anonymization X X X X X X X X

Table 2: PS2 minimization (of collection and dis-closure) techniques evaluated against utility.

[8] Cauley, L. NSA has massive database ofAmericans’ phone calls. USA Today (May 2006).

[9] Claffy, K. Ten Things Lawyers should knowabout Internet research, August 2008.http://www.caida.org/publications/papers/2008/.

[10] Commission, F. T. FTC Staff Report:Self-Regulatory Principles For Online BehavioralAdvertising, February 2009.http://ftc.gov/os/2009/02/P085400behavadreport.pdf.

[11] Federal Trade Commission. Onlinebehavioral advertising: Moving the discussionforward to possible self-regulatory principles,2007. http://www.ftc.gov/os/2007/12/P859900stmt.pdf.

[12] Mogul, J. C., and Arlitt, M. Sc2d: analternative to trace anonymization. In SIGCOMM

MineNet workshop (2006).[13] Narayanan, A., and Shmatikov, V. Robust

De-anonymization of Large Sparse Datasets.IEEE Symposium on Security and Privacy (2008).http://www.cs.utexas.edu/ shmat/shmat oak08netflix.pdf.

[14] ”NIH”. The Belmont Report - Ethical Principlesand Guidelines for the protection of humansubjects of research.

[15] Porter, C. De-Identified Data and Third PartyData Mining: The Risk of Re-Identification ofPersonal Information. Shilder Journal of Law,

Communication, and Technology, 3 (Sept. 2008).http://www.lctjournal.washington.edu/Vol5/a03Porter.html.

[16] Vern Paxson. The Internet Traffic Archive.http://ita.ee.lbl.gov/.

6