Directions in Big Data Anonymisation · 2016-12-13 · Directions in Big Data Anonymisation 1...
Transcript of Directions in Big Data Anonymisation · 2016-12-13 · Directions in Big Data Anonymisation 1...
Directions in Big Data Anonymisation
Directions in Big Data Anonymisation
Josep Domingo-Ferrer
Universitat Rovira i Virgili, Tarragona, Catalonia
Cambridge, the 5th of December, 2016
1 / 59
Directions in Big Data Anonymisation
1 Introduction
2 Big data, law and ethics
3 Nihilists: no privacy possible with big data
4 Fundamentalists: privacy even if data become useless
5 Desiderata in big data anonymization
6 Big data protection under k-anonymity
7 Big data protection under differential privacy
8 Transparent, local and collaborative anonymization
9 Conclusions and further research
2 / 59
Directions in Big Data Anonymisation
Introduction
Introduction
Big data have come true with the new millennium.
Any human activity leaves a digital track that someonecollects and stores:
Sensors of the Internet of ThingsSocial mediaMachine-to-machine communicationMobile video, etc.
3 / 59
Directions in Big Data Anonymisation
Introduction
Example: Big data from Internet
4 / 59
Directions in Big Data Anonymisation
Introduction
Example: Big data from smart cities and IoT
5 / 59
Directions in Big Data Anonymisation
Introduction
Distinguishing features of big data
Volume The volume of data in the digital universe reached9.5 billion petabytes in 2015 (9.5× 1024 bytes) withan increase of 3 billion petabytes over 2014 (Meeker2016).
Velocity Most data are no longer static data sets, butdynamic data. On-line data can be harvested atmillions of events per second (e.g. by sensors).
Variety Data come from several sources and in differentformats (numerical, categorical, unstructured text,audio, video, etc.).
6 / 59
Directions in Big Data Anonymisation
Introduction
New big data technologies
Storage. New technologies have arisen to replace thetraditional structured storage, like Hadoop, NoSQL,MapReduce, etc.
Data science. Conventional statistics tries to infer populationproperties from (small) samples. Data science leveragespractically all the data of the population of interest. Datavariety and volume allow new and very sophisticated analyses.
7 / 59
Directions in Big Data Anonymisation
Introduction
Big data threat on privacy
While big data are very valuable in many fields, theyincreasingly threaten the privacy of individuals on whom theyare collected (often unawares of these).
E.g. a retail chain’s prediction model guessed the pregnancyof a teenager before her parents did (Duhigg 2012).
8 / 59
Directions in Big Data Anonymisation
Introduction
Statistical disclosure control
Statisticians and computer scientists have worried aboutdisclosure risk since the time of small data.
Statistical disclosure control (SDC, Hundepool et al. (2012))seeks to allow useful inferences on data while preserving theprivacy of the subjects to whom the records correspond.
SDC techniques are available for microdata (data sets withrecords corresponding to individuals), tabular data and on-linequeryable databases.
SDC-protected data are also called anonymized data.
9 / 59
Directions in Big Data Anonymisation
Introduction
Utility-first and privacy-first SDC
Utility-first anonymization (iteratively changing parametersuntil empirical disclosure risk is low enough, as usual in officialstatistics) is slow and lacks formal privacy guarantees.
Privacy-first anonymization (based on enforcing a privacymodel, like k-anonymity, t-closeness or ε-differential privacy)may lead to poor data utility/linkability depending on theparameter choice.
10 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Big data, law and ethics
Big data involve collecting all possible data and extractingknowledge from them, possibly using innovative methods.
This conflicts with the privacy of individuals, especially thedata subject (consumer, citizen) is often unaware of providingher data.
The service provider obtains the data as a result of atransaction (e.g. on-line purchase), in return for a free service(e.g. social media or e-mail) or as a natural requirement for aservice (e.g. location when using GPS).
11 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Personal data protection principles in the EU law
Personal data or, more precisely personally identifiable information(PII), mean any information related to an identified or identifiablenatural person.Principles applicable to PII before big data (Art. 29 DataProtection Working party, new General Data ProtectionRegulation, see D’Acquisto et al. (2015)):
Lawfulness (consent obtained or processing needed for: acontract or legal obligation or the subject’s vital interests or apublic interest or legitimate processor’s interests compatiblewith the subject’s rights)
Consent (simple, specific, informed and explicit)
Purpose limitation (legitimate and specified beforecollection)
12 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Personal data protection principles in the EU law (II)
Necessity and data minimization (collect only what isneeded and keep only as long as needed)
Transparency and openness (subjects need to get infoabout collection and processing in a way they understand)
Individual rights (to access, rectify, erase/be forgotten)
Information security (collected data protected againstunauthorized access and processing, manipulation, loss,destruction, etc.)
Accountability (ability to demonstrate compliance withprinciples)
Data protection by design and by default (privacy built-infrom the start rather than added later)
13 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Personal big data conflict with principles
Big data result from collecting and linking data from severalsources, often in a continuous way
Unless personal data are anonymized, potential conflicts withthe above principles:
Purpose limitation. Big data often used secondarily forpurposes not even known at collection time.Consent. If purpose is not clear, consent cannot be obtained.Lawfulness. Without purpose limitation and consent,lawfulness is dubious.Necessity and data minimization. Big data result preciselyfrom accumulating data for potential use.Individual rights. Individuals do not even know which dataare stored on them.Accountability. Compliance does not hold and hence cannotbe demonstrated.
14 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Reactions to the big data vs privacy conflict
Minimize privacy. To avoid hindering technology development,privacy protection should be limited to preventingprivacy-damaging data uses. Data collection should be free orself-regulated.
Minimize collection. They regard data collection as theprimary privacy problem, and they advocate minimizing it.Indeed, there are threats tied to data collection itself.
15 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Threats tied to data collection
Data violation. The more data are collected, the moreattractive they are to potential attackers.
Misuse by employees (Chen 2010). The data controller’semployees may misuse the data.
Undesired secondary use. Health data of an opposer tocontraceptives may be used to develop new contraceptives.
Changes in corporate practice. The privacy pledge by a datacollector may change (e.g. Whatsapp recently decidedone-sidedly to share the phone numbers of their users withFacebook).
Government access without due legal guarantees (Solove2011). For example, NSA access to data of users of the bigInternet companies.
16 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
The anonymization solution
Anonymization is a possible way to overcome the conflictbetween big data and privacy.
The conflict relates to PII, but one may think there are nolonger PII after anonymization, so that no protection isneeded.
Yet, anonymization of big data faces several challenges...
17 / 59
Directions in Big Data Anonymisation
Big data, law and ethics
Challenges in big data anonymization
Too little anonymization, for example mere de-identification(suppression of identifiers), may be insufficient to preventre-identification (Barbaro and Zeller 2006);
This is especially problematic with big data, whose volumeand variety facilitate re-identifying subjects.
Too much anonymization may prevent data corresponding tothe same or similar subjects from being linked, which hindersbig data construction.
18 / 59
Directions in Big Data Anonymisation
Nihilists: no privacy possible with big data
Nihilists: privacy must be sacrificed
Privacy to be sacrificed to security. Governments(anti-terrorist fight). Companies (biometric identification ofemployees or customers, which breaks privacy without alwaysguaranteeing more security).
Privacy to be sacrificed to functionality. Free web applicationsand mobile apps (search engines; Google Calendar, Streetview,Latitude, etc.).
Privacy to be sacrificed to functionality and security. Datacollected by Internet companies by means of free applicationsmay be leaked to governments (Snowden on NSA).
19 / 59
Directions in Big Data Anonymisation
Nihilists: no privacy possible with big data
The pragmatic nihilists: data brokers
They give no arguments, but they collect all personal datathey possibly can find (web, social media, etc.) or buy (loyaltyprograms, on-line commerce, etc.).
They cluster all information corresponding to the sameperson, to get personal profiles.
They sell those profiles to whoever buys them, typicallypersonalized marketing companies.
Several data brokers operate in the U.S., among whichAcxiom accumulates data on over 700 million peopleworldwide (FTC 2014).
Data brokers threaten privacy even more than Internetcompanies, because the former are unknown to the public.
20 / 59
Directions in Big Data Anonymisation
Nihilists: no privacy possible with big data
Data broker activity
21 / 59
Directions in Big Data Anonymisation
Nihilists: no privacy possible with big data
The extreme nihilists
E.g. Teradata’s CTO:
They claim that aspiring to any privacy in the big data societyis delirious.
The best people can expect is for data collectors not tomisuse their data (which they cannot verify).
22 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Fundamentalists: privacy even if data become useless
Statistical disclosure control was inaugurated by Dalenius(1977). See Hundepool et al. (2012) for a state of the art.
Later, privacy-preserving data mining (PPDM) arose incomputer science as a parallel to SDC (Agrawal and Srikant2000).
Computer scientists contributed the notion of privacy models,which specify ex ante parameterizable privacy guarantees.
They are enforced by using one (or several) anonymizationmethods.
Privacy models with very stringent privacy parameters mayrender data useless for exploratory analysis.
23 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Privacy models: k-anonymity
k-Anonymity (Samarati & Sweeney 1998)
A data set is said to satisfy k-anonymity if each combination ofvalues of the quasi-identifier attributes in it is shared by at least krecords (k-anonymous class).
=⇒ Usually enforced via generalization and suppression inquasi-identifiers, but also reachable via microaggregation(Domingo-Ferrer and Torra 2005)
24 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Example: a 2-anonymous data set
25 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Privacy models that extend k-anonymity
l-Diversity (Machanavajjhala et al. 2006)
A data set is said to satisfy l-diversity if, for each group of recordssharing a combination of quasi-identifier attributes, there are atleast l “well-represented” values for each confidential attribute.
t-Closeness (Li et al. 2007)
A data set is said to satisfy t-closeness if, for each group of recordssharing a combination of quasi-identifier attributes, the distancebetween the distribution of the confidential attribute in the groupand the distribution of the attribute in the whole data set is nomore than a threshold t.
26 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Privacy models: ε-differential privacy
ε-Differential privacy (Dwork 2006)
A randomized query function F gives ε-differential privacy if, for alldata sets D1, D2 such that one can be obtained from the other bymodifying a single record (neighbor data sets), and allS ⊂ Range(F )
Pr(F (D1) ∈ S) ≤ exp(ε)× Pr(F (D2) ∈ S).
Usually enforced via Laplacian noise addition.
Later extended for data set publishing (Soria-Comas et al.2014; Xiao et al. 2014; Xu et al. 2012; Zhang et al. 2014).
27 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Privacy models: ε-differential privacy (II)
28 / 59
Directions in Big Data Anonymisation
Fundamentalists: privacy even if data become useless
Relaxations of differential privacy
Strict differential privacy is problematic for the exploratoryuses typical in big data.
Relaxations of it are being proposed:
Mohan et al. (2012) claim that less protection is needed forolder data.Machanavajjhala and Kiefer (2015) restrict the definition ofneighbor data sets: the differences between the differingrecords are bounded.Dwork and Rothblum (2016) propose concentrated differentialprivacy, whereby the DP guarantee may be violated with asmall probability.
29 / 59
Directions in Big Data Anonymisation
Desiderata in big data anonymization
Desiderata in big data anonymization
Anonymized big data that are published should yield resultssimilar to those obtained on the original big data for a broadrange of exploratory analyses.
They should not allow unequivocal reconstruction of anysubject’s profile.
A privacy model for big data should satisfy at least(Soria-Comas and Domingo-Ferrer 2015):
Composability(Quasi-)linear computational costLinkability
30 / 59
Directions in Big Data Anonymisation
Desiderata in big data anonymization
Composability
A privacy model is composable if its privacy guarantee holds(perhaps in a limited way) after repeated application.
In other words, a privacy model is not composable if poolingindependently released data sets, each of which satisfies themodel separately, can lead to a violation of the model.
Composability can be evaluated between data sets satisfyingthe same privacy model, different privacy models, or betweenan anonymized data set and a non-anonymized data set (thelatter is the most demanding case).
Composability is needed to cope with the velocity and varietyfeatures of big data.
31 / 59
Directions in Big Data Anonymisation
Desiderata in big data anonymization
(Quasi-)linear computational cost
Low cost is needed to cope with the volume feature of bigdata.
Normally, there are several SDC methods that can be used tosatisfy a privacy model.
The computational cost depends on the selected method.
The desirable costs would be O(n) or at most O(n log n), fora data set of n records.
For methods with higher cost, blocking can be used, but itcan damage the utility and/or privacy of the resulting data.
32 / 59
Directions in Big Data Anonymisation
Desiderata in big data anonymization
Linkability
In big data, the information on a particular subject is collectedfrom several sources (variety feature of big data).
Hence, the ability to link records corresponding to the sameindividual or to similar individuals is critical.
Thus, anonymizing data at the source should preservelinkability to some extent.
But... linking records corresponding to the same subjectdecreases the subject’s privacy=⇒ the accuracy of linkage should be lower with anonymizeddata sets than with original data sets.
33 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Big data protection under k-anonymity
In a context of big data, it is hard to determine the subset ofQI attributes (attributes that can be used by an attacker tolink with external identified databases).
The safest option is to consider that all attributes are QIattributes.
34 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Composability of k-anonymity
k-Anonymity was designed to protect a single data set and isnot composable in principle.
If several k-anonymous data sets have been published thatshare some subjects, the attacker can mount an intersectionattack to discard some records in the k-anonymous classes asnot corresponding to the target subject (based on the latter’sconfidential attributes).
To reach composability, the controllers ought to coordinate sothat, for the subjects shared by two data sets, theirk-anonymous classes contain the same k subjects.
If such coordination is infeasible, see Domingo-Ferrer andSoria-Comas (2016) for alternative strategies.
35 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Intersection attack against k-anonymity
R1, . . . ,Rn ← n independent data releasesP ← population consisting of subjects present in all R1, . . . ,Rn
for each individual i in P dofor j = 1 to n do
eij ←equivalence class of Rj associated to isij ←set of confidential values of eij
end forSi ← si1 ∩ si2 ∩ . . . ∩ sin
end forreturn S1, . . . ,S|P|
36 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Computational cost of k-anonymity
k-Anonymity is attained by modifying the values of QIattributes either by combining generalization and suppression(Samarati and Sweeney 1998) or via microaggregation(Domingo-Ferrer and Torra 2005).
Optimal generalization/suppression and optimalmicroaggregation are NP problems.
Using heuristics and blocking one can reach O(n log n)complexities, where n is the number of records.
37 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Linkability of k-anonymity
For a subject known to be in two k-anonymous data sets, wecan determine and link the corresponding k-anonymousclasses containing her.
If some of the confidential attributes are shared between thedata sets, the linkage accuracy improves (one can link withink-anonymous classes).
38 / 59
Directions in Big Data Anonymisation
Big data protection under k-anonymity
Summary on k-anonymity for big data
For k-anonymity to be composable, the controllers sharingsubjects must coordinate or follow suitable strategies.
There are quasi-linear heuristics for k-anonymity.
Linkability is possible at least at the k-anonymous class level.
With some coordination effort, k-anonymity is a reasonableoption to anonymize big data.
39 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Big data protection under differential privacy
ε-Differential privacy (DP) offers strong privacy guarantees.
The smaller ε, the more privacy.
DP can be reached via noise addition or by generatingsynthetic data from a differentially privacy model (e.g. ahistogram).
A synthetic data set can be either partially or fully synthetic.
In partial synthesis, only values deemed too sensitive arereplaced by synthetic data.
40 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Composability of DP: sequential composition
Sequential composition refers to a sequence of computations, eachof them providing differential privacy in isolation, providing alsodifferential privacy in sequence.
Theorem
Let κi (D), for some i ∈ I , be computations over D providingεi -differential privacy. The sequence of computations (κi (D))i∈Iprovides (
∑i∈I εi )-differential privacy.
41 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Composability of DP: parallel composition
Parallel composition refers to several ε-differentially privatecomputations each on data from a disjoint set of subjects yieldingε-differentially private output on the data from the pooled set ofsubjects.
Theorem
Let κi (Di ), for some i ∈ I , be computations over Di providingε-differential privacy. If each Di contains data on a set of subjectsdisjoint from the sets of subjects of Dj for all j 6= i , then(κi (Di ))i∈I provides ε-differential privacy.
42 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Composability of DP for data sets
Sequential composition. The release of εi -differentially privatedata sets Di , for some i ∈ I , is (
∑i∈I εi )-differentially private.
That is, by accumulating differentially private data about aset of individuals, differential privacy is not broken but thelevel of privacy decreases.
Parallel composition. The release of ε-differentially privatedata sets Di refering to disjoint sets of individuals, for somei ∈ I , is ε-differentially private.
43 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Computational cost of DP
DP by noise addition has linear cost O(n).
It has been suggested to use other methods to attain DP withimproved utility:
Data synthesis (Cormode et al. 2012; Zhang et al. 2014) has ahigher computational complexity.Microaggregation step prior to noise addition (Sanchez et al.2014; Soria-Comas et al. 2014) has complexity O(n2) orO(n log n) depending on whether blocking is used.
44 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Linkability of DP
In general, there is no linkability between two DP data setsgenerated via noise addition or as fully synthetic data.
Partially synthetic data sets, although they do not satisfystrict DP, allow accurate linkage.
45 / 59
Directions in Big Data Anonymisation
Big data protection under differential privacy
Summary on DP for big data
DP has good composability properties, which may be suitableto anonymize dynamic data.
DP has also a low computational cost, which may be suitablefor very large data sets.
Linkability across differentially private data sets is only feasibleif the data sets share unaltered attributes.
The main problem with DP is that it does not providesignificant utility for exploratory analyses unless the εparameter is quite large.
46 / 59
Directions in Big Data Anonymisation
Transparent, local and collaborative anonymization
Transparency to subjects and users
In a big data context, potentially many controllers dealingwith a subject’s data, and it cannot be assumed the datasubjects or users trust all involved controllers.
There is a need for anonymization to be transparent to bothusers and subjects.
The subject must be able to assess how much her data havebeen anonymized.
The user must be told the SDC methods and parametersused, except any random seeds, in order to maximize the datautility.
See transparency proposals in Domingo-Ferrer and Muralidhar(2016).
47 / 59
Directions in Big Data Anonymisation
Transparent, local and collaborative anonymization
Local anonymization
Despite or because transparency, subjects may prefer to takecare of anonymizing their own data.
In a big data context, this is also good to relieve the datacontroller from the computational burden.
Local anonymization is an alternative SDC paradigm in whichthe subject anonymizes her own data record before handing itto the controller (Warner 1965; Agrawal and Haritsa 2005;Song and Ge 2014).
However, in local anonymization the subjects lacks a globalview of the data set, which may lead to overdoinganonymization and wasting data utility.
48 / 59
Directions in Big Data Anonymisation
Transparent, local and collaborative anonymization
Collaborative anonymization
Proposed by Soria-Comas and Domingo-Ferrer (2015b), itcombines the low utility loss of centralized anonymization andthe high subject privacy of local anonymization.
It is based on the notion of co-utility (Domingo-Ferrer et al.2016).
Subjects collaborate to determine the disclosure riskassociated to their data and then locally apply the right levelof protection.
49 / 59
Directions in Big Data Anonymisation
Conclusions and further research
Conclusions
There is a debate on whether big data are compatible withthe privacy of citizens.
There are two extreme positions: nihilism andfundamentalism.
We have tried to break new ground by opening a midway path.
We have stated the desirable properties of privacy models forbig data (composability, low computation, linkability).
We have examined how well the two main privacy models(k-anonymity and ε-differential privacy) satisfy thoseproperties.
We have also highlighted the need for transparency andperhaps for local and collaborative anonymization.
50 / 59
Directions in Big Data Anonymisation
Conclusions and further research
Future research
This midway path is by no means ready.
Privacy models are needed that satisfy composability, lowcomputation, linkability and utility preservation for exploratoryanalyses.
The variety of big data goes beyond data sets formed byrecords: it includes video, audio, unstructured text, etc.,whose anonymization is quite challenging.
Privacy models and SDC methods must be able to cope withvelocity and volume: anonymizing dynamic big data is alargely unexplored territory.
Finally, collaborative anonymization is also attractive topreserve the self-determination of subjects without losingutility.
51 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References I
S. Agrawal and J.R. Haritsa (2005) A framework forhigh-accuracy privacy-preserving data mining, in ICDE’05,IEEE, pp. 193-204.
R. Agrawal and R. Srikant (2000) Privacy-preserving datamining, in ACM SIGMOD’00, pp. 439-450.
M. Barbaro and T. Zeller (2006) A face is exposed for AOLsearcher no. 4417749, New York Times.
A. Chen (2010) Gcreep: Google engineer stalked teens, spiedon chats, Gawker.
G. Cormode, C. Procopiuc, D. Srivastava, E. Shen and T. Yu(2012) Differentially private spatial decompositions, inProceedings of the 2012 IEEE 28th International Conferenceon Data Engineering-ICDE12, Washington, DC, EUA, pp.20-31. IEEE Computer Society.
52 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References II
G. DAcquisto, J. Domingo-Ferrer, P. Kikiras, V. Torra, Y.-A.de Montjoye and A. Bourka (2015) Privacy by Design in BigData — An overview of privacy enhancing technologies in theera of big data analytics, European Union Agency for Networkand Information Security (ENISA).
T. Dalenius (1977) Towards a methodology for statisticaldisclosure control. Statistik Tidskrift 15:429-444.
J. Domingo-Ferrer and K. Muralidhar (2016) New directions inanonymization: permutation paradigm, verifiability by subjectsand intruders, transparency to users, Information Sciences337-338:11-24.
J. Domingo-Ferrer, D. Sanchez and J. Soria-Comas (2016)Co-utility: self-enforcing collaborative protocols with mutualhelp, Progress in Artificial Intelligence 5(2):105-110.
53 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References III
J. Domingo-Ferrer and J. Soria-Comas (2016) Anonymizationin the time of big data, in Privacy in Statistical Databases-PSD2016, Springer, pp. 225-236.
J. Domingo-Ferrer and V. Torra (2005) Ordinal, continuousand heterogeneous k-anonymity through microaggregation,Data Mining and Knowledge Discovery 11(2):195-212.
C. Duhigg (2012) How companies learn your secrets, New YorkTimes Magazine, Feb. 16.
C. Dwork (2006) Differential privacy, in ICALP06, LNCS 4052,Springer, pp. 1-12.
C. Dwork and G. N. Rothblum (2016) Concentrated differentialprivacy (v2), March 16, arXiv:1603.01887v2.
M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page and T.Ristenpart (2014) Privacy in pharmacogenetics: an end-to-endcase study of personalized warfarin dosing, in Proc. of the 23rdUSENIX Security Symposium, San Diego CA, EUA, pp. 17-32.
54 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References IV
FTC (2014) Data Brokers: A Call for Transparency andAccountability, US Federal Trade Commission.
A. Hundepool, J. Domingo-Ferrer, L. Franconi, S. Giessing, E.Schulte-Nordholt, K. Spicer and P.-P. de Wolf (2012)Statistical Disclosure Control, Wiley.
N. Li, T. Li and S. Venkatasubramanian (2007) t-Closeness:privacy beyond k-anonymity and l-diversity, in ICDE07, pp.106-115.
A. Machanavajjhala and D. Kiefer (2015) Designing statisticalprivacy for your data, Communications of the ACM58(3):58-67.
A. Machanavajjhala, D. Kifer, J. Gehrke and M.Venkitasubramaniam (2007) l-Diversity: privacy beyondk-anonymity, ACM Trans. Knowl. Discov. Data 1(1):3.
55 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References V
A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke and L.Vilhuber (2008) Privacy: theory meets practice on the map, inProceedings of the 2008 IEEE 24th Intl. Conf. on DataEngineering-ICDE’08, Washington, DC, USA. IEEE ComputerSociety, pp. 277286.
M. Meeker (2016) 2016 Internet Trends http:
//www.kpcb.com/blog/2016-internet-trends-report
P. Mohan, A. Thakurta, E. Shi, D. Song and D. E. Culler(2012) GUPT: privacy preserving data analysis made easy, inProc. of ACM SIGMOD’12, Scottsdale AZ.
P. Samarati and L. Sweeney (1998) Protecting Privacy whenDisclosing Information: k-Anonymity and its Enforcementthrough Generalization and Suppression, Technical Report, SRIInternational.
56 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References VI
D. Sanchez, J. Domingo-Ferrer and S. Martnez (2014)Improving the utility of differential privacy via univariatemicroaggregation, in Privacy in Statistical Databases-PSD2014, pp. 130-142. Springer.
D. J. Solove (2011) Nothing to Hide: the False TradeoffBetween Privacy and Security, New York: Yale UniversityPress.
C. Song and T. Ge (2014) Aroma: a new data protectionmethod with differential privacy and accurate query answering,in CIKM’14, ACM, pp. 1569-1578.
J. Soria-Comas, J. Domingo-Ferrer, D. Snchez and S. Martnez(2014) Enhancing data utility in differential privacy viamicroaggregation-based k-anonymity, VLDB Journal23(5):771-794.
57 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References VII
J. Soria-Comas and J. Domingo-Ferrer (2015) Big dataprivacy: challenges to privacy principles and models, DataScience and Engineering 1(1):21-28.
J. Soria-Comas and J. Domingo-Ferrer (2015b) Co-utilecollaborative anonymization of microdata, in MDAI 2015,LNCS 9321, Springer, pp. 192-2016.
S. L. Warner (1965) Randomized response: a survey techniquefor eliminating evasive answer bias, J. Am. Stat. Assoc.60:63-69.
X. Xiao and Y. Tao (2007) M-Invariance: towardsprivacy-preserving re-publication of dynamic datasets, inSIGMOD’07, ACM, pp. 689-700.
J. Xu, Z. Zhang, X. Xiao, Y. Yang and G. Yu (2012)Differentially private histogram publication, in Proceedings ofthe 2012 IEEE 28th Intl. Conf. on Data Engineering-ICDE’12,Washington, DC, USA. IEEE Computer Society, pp. 3243.
58 / 59
Directions in Big Data Anonymisation
Conclusions and further research
References VIII
J. Zhang, G. Cormode, C.M. Procopiuc, D. Srivastava and X.Xiao (2014) Privbayes: private data release via Bayesiannetworks, in Proceedings of the 2014 ACM SIGMOD Intl.Conf. on Management of Data, SIGMOD’14, New York, NY,USA. ACM, pp. 14231434.
59 / 59