DISTRIBUTING DATA FOR SECURE DATA SERVICES
description
Transcript of DISTRIBUTING DATA FOR SECURE DATA SERVICES
1
DISTRIBUTING DATA FOR SECURE DATA
SERVICES
Vignesh Ganapathy, Dilys Thomas, Tomas Feder, Hector Garcia Molina, Rajeev Motwani
March 25, 2011Stanford, TRDDC, TRUST
Road MapMotivation for Secure Databases Distributing Data
Encryption, Distribution Privacy Constraints Schema Decomposition
Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition
Experiments
Related Work
2
Motivation 1: Data Privacy in Enterprises
3
HealthPersonal medical details
Disease history
Clinical research dataBanking
Bank statement
Loan Details
Transaction history
FinancePortfolio information
Credit history
Transaction records
Investment details
InsuranceClaims records
Accident history
Policy details
OutsourcingCustomer data for testing
Remote DB Administration
BPO & KPORetail Business
Inventory records
Individual credit card details
Audits
ManufacturingProcess details
Blueprints
Production data
Govt. AgenciesCensus records
Economic surveys
Hospital Records
Motivation 2: Government Regulations
4
Country Privacy LegislationAustralia Privacy Amendment Act of 2000
European Union Personal Data Protection Directive 1998
Hong Kong Personal Data (Privacy) Ordinance of 1995
United Kingdom Data Protection Act of 1998
United States Security Breach Information Act (S.B. 1386) of 2002Gramm-Leach-Bliley Act of 1999Health Insurance Portability and Accountability Act of 1996
Motivation 3: Personal Information
Emails
Searches on Google/Yahoo
Profiles on Social Networking sites
Passwords / Credit Card / Personal information at multiple E-commerce sites / Organizations
Documents on the Computer / Network
5
Data Privacy Value disclosure: What is the value of attribute
salary of person XPerturbation
- Privacy Preserving OLAP
Identity disclosure: Whether an individual is present in the database tableRandomization, K-Anonymity etc.
- Data for Outsourcing / Research
Linkage disclosure: Linking columns from multiple sites
6
Losses due to Lack of Privacy: ID-Theft
7
3% of households in the US affected by ID-Theft
US $5-50B losses/year
UK £1.7B losses/year
AUD $1-4B losses/year
Road Map Motivation for Secure Databases
Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition
Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition
Experiments
Related Work
8
Two Can Keep a Secret: A Distributed Architecture for Secure Database
Services
Aggarwal, Bawa, Ganesan, Garcia-Molina, Kenthapadi, Motwani, Srivastava, Thomas, Xu
CIDR 2005
9
How to distribute data across multiple sites for :1. Redundancy and 2. Privacy so that a single site being compromised
does not lead to data loss
Cloud Data Services Data outsourcing growing in popularity
Cheap, reliable data storage and management 1TB $399 < $0.5 per GB$5000 – Oracle 10g / SQL Server $68k/year DBAdmin
Privacy concerns looming ever largerHigh-profile thefts (often insiders)
UCLA lost 900k recordsBerkeley lost laptop with sensitive informationAcxiom, JP Morgan, Choicepointwww.privacyrights.org
10
Present solutions Application level: Salesforce.com
On-Demand Customer Relationship Management $65/User/Month ---- $995 / 5 Users / 1 Year
Amazon Elastic Compute Cloud 1 instance = 1.7Ghz x86 processor, 1.75GB RAM, 160GB local disk, 250 Mb/s network bandwidth Elastic, Completely controlled, Reliable, Secure$0.10 per instance hour$0.20 per GB of data in/out of Amazon$0.15 per GB-Month of Amazon S3 storage used
Google Apps for your domain Small businesses, Enterprise, School, Family or Group
11
Encryption Based Solution
12
EncryptClient DSP
Client-side
Processor
Query Q Q’
“Relevant Data”
Answer
Problem: Q’ “SELECT *”
The Power of Two
13
Client DSP1
DSP2
The Power of Two
14
DSP1
DSP2
Client-side
Processor
Query QQ1
Q2
Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)
Privacy ConstraintsSB1386 Privacy
{ Name, SSN}
{ Name, LicenceNo}
{ Name, CaliforniaID}
{ Name, AccountNumber}
{ Name, CreditCardNo, SecurityCode}
are all to be kept private.
A set is private if at least one of its elements is “hidden”. Element in encrypted form ok
15
Techniques for Satisfying Privacy Constraints
Vertical Fragmentation Partition attributes across R1 and R2 E.g., to obey constraint {Name, SSN}, R1 Name, R2 SSN Use tuple IDs for reassembly. R = R1 JOIN R2
Encoding
One-time Pad For each value v, construct random bit seq. r R1 v XOR r, R2 r
Deterministic Encryption R1 EK (v) R2 K Can detect equality and push selections with equality predicate
Random addition R1 v+r , R2 r Can push aggregate SUM
16
Example Schema & Privacy Constraints
An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}
Privacy Constraints {Telephone}, {Email} {Name, Salary}, {Name, Position}, {Name, DoB} {DoB, Gender, ZipCode} {Position, Salary}, {Salary, DoB}
Will use just Vertical Fragmentation and Encoding.
17
An Employee relation: {Name, DoB, Position, Salary, Gender, Email, Telephone, ZipCode}
Privacy Constraints {Telephone}, {Email} {Name, Salary}, {Name, Position}, {Name, DoB} {DoB, Gender, ZipCode} {Position, Salary}, {Salary, DoB}
Decomposed schema R1: {TID, Name, Email, Telephone, Gender, Salary } R2: {TID, Name, Email, Telephone, DoB, Position, ZipCode } Encrypted Attributes E: {Telephone, Email, Name}
18
Partitioning, Execution
Partitioning Problem Partition to minimize communication cost for given workload Even simplified version hard to approximate Hill Climbing algorithm after starting with weighted set cover
Query Reformulation and Execution Consider only centralized plans Algorithm to partition select and where clause predicates
between the two partitions
19
Hill Climbing Approach for Partitioning
20
Road Map Motivation for Secure Databases
Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition
Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition
Experiments
Related Work
21
Predicates for cost computation
22
State Definitions for Bottom Up Evaluation
0: condition clause cannot be pushed to either servers
1: condition clause can be pushed to Server 1
2: condition clause can be pushed to Server 2
3: condition clause can be pushed to both servers
4: condition clause can be pushed to either servers
23
OR State Evaluation
24
AND State Evaluation
25
Query Partitioning
Query 1:
SELECT TID, name, salary
FROM R1
WHERE Name=’Tom’
Query 2:
SELECT TID, dob, zipcode
FROM R2
WHERE Position=’Staff’
26
Original QuerySELECT Name, DoB, Salary
FROM R WHERE (Name =’Tom’ AND Position=’Staff’) AND (Zipcode =’94305’ OR Salary > 60000)
R1: {TID, Name, Email, Telephone, Gender, Salary R2: {TID, Email, Telephone, DoB, Position, ZipCode }
Distributed Query Plan
27
Road Map
Motivation for Secure Databases
Distributing Data Encryption, Distribution Privacy Constraints Schema Decomposition
Query Partitioning Cost Estimation Where and Select clause processing Query Decomposition
Experiments Related Work
28
Number of Iterations
29
Perfomance Gain Experiment
30
Iterations Vs Privacy Constraints
31
Papers[CIDR05]Two Can Keep A Secret.
[SIGMOD05] Privacy Preserving OLAP.
[ICDT05]Anonymizing Tables.
[PODS06]Clustering For Anonymity.
[KDD07] Probabilistic Anonymity.
32
Thank You!
33
Acknowledgements: Collaborators
Stanford Privacy Group
TRDDC Privacy Group
PORTIA, TRUST, Google
34