Towards an end-to-end architecture for handling sensitive data

28
1 Towards an end-to- Towards an end-to- end architecture for end architecture for handling sensitive handling sensitive data data Hector Garcia-Molina Rajeev Motwani and students

description

Towards an end-to-end architecture for handling sensitive data. Hector Garcia-Molina Rajeev Motwani and students. 1. DB Perspective. Performance Preservation Distribution (P2P) Bad Guys: eavesdrop corrupt Trust. DB Perspective. Preservation. goal. +. easy. preservation. easy. -. - PowerPoint PPT Presentation

Transcript of Towards an end-to-end architecture for handling sensitive data

1

Towards an end-to-end Towards an end-to-end architecture forarchitecture for

handling sensitive datahandling sensitive data

Hector Garcia-Molina

Rajeev Motwani

and students

2

DB Perspective

• Performance

• Preservation

• Distribution (P2P)

• Bad Guys:

• eavesdrop

• corrupt

• Trust

3

DB Perspective

• Preservation

privacy +-

preservation

+

- easy

easy

goal

4

Privacy Spectrum

• Prevention

• Detection

• Containment

5

Prevention: Our Work

• Privacy-Preserving OLAP

• Distributed Architecture for Secure DBMS (P)

• Data Preservation in P2P Systems

• P2P Trust and Reputation Management (P)

• P2P Privacy Preserving Indexing (P)

6

Distributed Architecturefor Secure DBMS

• Motivation: Outsourcing– Secure Database Provider (SDP)

EncryptClient Service

Provider

7

Performance Problem

EncryptClient

Client-side

Processor

Query Q Q’

“Relevant Data”

Answer

Problem: Q’ “SELECT *”

ServiceProvider

8

The Power of Two

Client DSP1

DSP2

9

Basic Idea

{ CC#, expDate, name }

{ expDate, name }

{ CC# }

10

Another Example

{ salary }

{ rand }

{ salary + rand }

11

The Power of Two

DSP1

DSP2

Client-side

Processor

Query QQ1

Q2

Key: Ensure Cost (Q1)+Cost (Q2) Cost (Q)

12

Challenges

• Find a decomposition that– Obeys all privacy constraints– Minimizes execution cost for given workload

• For given query, find good plan

13

Example

R(id, a, b, c), privacy constraint: { a, b, c }

R1(id, a)R2(id, b, c)

R1(id, a, b)R2(id, c)

R1(id, a, b)R2(id, b, c)

R1(id, a, c)R2(id, b, c)

Most popular queries:• Select on a, b• Select on b, c

R1(id, a, b)R2(id, b, c)

14

Detection: Our Work

• Simulatable Auditing (P)

• k-Anonymity– algorithms and hardness

15

Containment: Our Work

• Paranoid Platform for Privacy Preferences (P)

• Entity Resolution

16

Containment

• Trusting– privacy policies

• Paranoid

17

Example: Trusting

alicedealsRus

(1) browse policy

(2) give info

(3) cross fingers

• Example P3P Policies:– Current purpose: completion and support of the recurring

subscription activity

– Recipients: DealsRUs and/or entities acting as their agents or entities for whom DealsRUs are acting as an agent...

18

Example: Email

alicea@z dealsRus

(1) temp a12@w

alice’sagent

(2) a12@w

(3) To:a12@w(4) To: a@z

P4P: Paranoid Platformfor Privacy Preferences

Framework

Data/Control Types: t1 ... tn

API API

Strategy/Reference

Implementation

20

Private Information

ownership

function

cont

rol

individual

organization

complete privacy

limited time use

no predicate input

no integration

accountable

sharable

identifier

service handle

input to predicate

copy

21

Entity Resolution

N: a A: b CC#: c Ph: e

e1

N: a Exp: d Ph: e

e2

• Applications:– mailing lists, customer files, counter-terrorism, ...

22

Privacy

Nm: AliceAd: 32 FoxPh: 5551212

1.0

Nm: AliceAd: 32 FoxPh: 5551212

1.0Nm: AliceAd: 32 Fox

1.0Nm: AliceAd: 32 FoxPh: 5551212

0.7Nm: AliceAd: 32 FoxPh: 5551212Ad: 14 Cat

1.0

Bob

Alice

23

Leakage

Nm: AliceAd: 32 FoxPh: 5551212

1.0

Nm: AliceAd: 32 FoxPh: 5550000

0.7

Bob

Alice

L = 0.6 (between 0 and 1)

24

Multi-Record Leakage

Nm: AliceAd: 32 FoxPh: 5551212

1.0

Bob

Alice

LL = 0.9 (between 0 and 1, e.g., max L)

r1, L = 0.9r2, L = 0.8r3, L = 0.7

25

Q1: Added Vulnerability?

Bob

Alice

ΔLL = ??

r1 r2 r3 r4

p

r4 may cause Bob’s records tosnap together!

26

Q2: Disinformation?

Bob

Alice

ΔLL = ??

r1 r2 r3 r4 (lies)

p

What is mostcost effectivedisinformation?

27

Q3: Verification?

Bob

Alicep

What is best factto verify to increaseconfidence in hypothesis?

r1, 0.9r2, 0.8r3, 0.7...

hypothesis h (0.6)

28

Privacy Spectrum

• Prevention

• Detection

• Containment