Download - Vanish : Increasing Data Privacy with Self-Destructing Data

Vanish: Increasing Data Privacy with Self-Destructing Data

Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al.

University of Washington

USENIX Security Symposium, 2009

---

Presented by Joseph Del Rocco

University of Central Florida

2

Outline

• Distributed Hash Tables (DHT)

• Data Destruction w/ Vanish– Motivations– Architecture– Implementations– Results– Contributions / Weakness / Improvement

• References

3

Hash Tables (review) [5]

• Tag or data hashed into table index

• Hashing functions:MD5, SHA-1, CRC32, FSB, HAS, etc.++++

• Hash collisions unavoidable, so linked lists used (see birthday paradox)

4

Distributed Hash Tables

• Hash Tables… split-up across machines• Key-space astronomically large (2128, 2160)

2128 = 340282367000000000000000000000000000000

• In 2001, four main DHTs ignited research: - Chord (MIT)

- CAN (Berkeley, AT&T)

- Pastry (Rice, Microsoft)

- Tapastry (Berkeley)

• Availability, Scale, Decentralized, Churn!

55

Viewed as a Distributed Hash Table [3]

Hash table0 2128-1

Peer nodes

• Each peer node is responsible for a range of the hash table, according to the peer hash key

• Location information about Objects are placed in the peer with the closest key (information redundancy)

66

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put() [3]

insert(K1,V

1)

Operation: Route message, “I have the file,” to node holding key K1

Want to share a

file

77

retrieve (K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get() [3]

Operation: Retrieve message V1 at node holding key K1

8

Chord [4]

• Keys, nodes have 2m hash (filename, IP)

• Each node stores ~(K / N) keys

• “finger table” = next((n + 2i − 1) % 2m)

99

CAN – Content Addressable Network [3] • Each peer is

responsible for one zone, i.e., stores all (key, value) pairs of the zone

• Each peer knows the neighbors of its zone

• Random assignment of peers to zones at startup – split zone if not empty

• Dimensional-ordered multi-hop routing

1010

(1) a = hx(K)

b = hy(K)

CAN: Object Publishing [3]

x = a

y = b

node I::publish(K,V) I

1111

(1) a = hx(K)

b = hy(K)

CAN: Object Publishing [3]

(2) route (K,V) -> J

node I::publish(K,V) I

J

12

Modern, Popular P2P w/ DHTs

• Vuze / Azureus (Java BitTorrent client)

• BitTorrent DHT (Based on KAD)

• IBM Websphere

• Apache Cassandra

• OpenDHT

• Mainline

• Kazaa, eDonkey, etc. (KAD)

Dendrobates Azureus(Blue Poison Dart Frog)

13

Vuze (Azureus) Specifics

• Nodes in network assigned “random” 160-bit ID hashed on IP & port (DHT idx range)

• Client sends “put” messages to 20 closest nodes to hashed key index in DHT

• Nodes re-put() entries from local hash tables every 30 minutes to combat churn

• Nodes supposedly remove key/value pairs > 8 hours, if not re-put() by originator

• Originator node must re-put() to persist?

14

Vanish Motivations

• Data frequently cached/archived by email providers, ISPs, network backup systems

• Often available after account termination

• Forensic examination of hard drives (raid)

• Laptops stolen, taken-in for repair

• High-profile political scandals

• Some argue the right and ability to destroy data is as fundamental as privacy & liberty

15

Vanish Motivations

• Hushmail email encryption service offered cleartext contents of encrypted messages to the federal government

• Trusted 3rd party (Ephemerizer) supposedly destroy data after timeout, but this never caught on… trust issue?

• Subpoenas…

16

Vanish Goals

• Create a Vanishing Data Object (VDO)

• Becomes unreadable after a timeout, regardless if one retroactively obtains a pristine copy of VDO before expiration

• Accessible until timeout

• Leverage existing infrastructure

• NO required passwords, keys, special security hardware…

17

Vanish Architecture

• Encrypt data D w/ random key K into C• Use T.S.S.[6] to split C into N shares • Pick random access key L, use cryptographically

secure PRNG (keyed by L) to derive N indices

18

Vanish Architecture

• Threshold of T.S.S. (threshold ratio), determines how many of N shares are needed to reconstruct K

• EX: N = 20, threshold = 10So any 10 of the 20 shares can be used

• EX: N = 50, threshold = 50Better have all shares…

• VDO = (L, C, N, threshold), sent / stored

19

Vanish Decapsulation

• Given VDO: - extract access key L - derive locations of shares of K - get() # of shares required by threshold - reconstruct K - decrypt C to obtain D

• # of shares must be > threshold ratio

20

Benefits of Churn!

• Nodes continue to leave/re-enter network

• Supposedly 80% of IPs change in 7 days

• Nodes change IDs (locations in network) as IP changes

• Also, hash tables per node purge themselves after some time period

• So data is guaranteed to NOT last long at its original node…

21

The Big Question… [1][7]

• How long are we talking w/ churn? - Vuze = unclear… (7h, 3h, 2h …) - OpenDHT = (1 hour – 1 week) - Kazaa = ~“several minutes” (2.5)

• Refresh uses K, re-splits into new shares, uses L to derive new indices & re-puts()

• “Naturally, refreshes require periodic Internet connectivity.” [1]

22

Implementations

23

Implementations

24

Results

“two minor changes (<50 lines of code)”

25

Results

“[With single share VDO model] … ~5% of VDOs continue to live long after 8 hours.” [1]

“…the cause for the latter effect demands more investigation, we suspect that some of the single VDO keys are stored by DHT peers running non-default configurations.” [1]

“These observations suggest that the naive (one share) approach [does not meet our goals] … thereby motivating our need for redundancy.” [1]

26

Contributions

• Solution utilizes existing, popular, researched technology - used since 2001

• Interesting idea utilizing DHT as general temporary storage

• No required special security hardware, or special operations on the part of the user

• Utilizes inherent half-life (churn) of nodes in DHT – data definitely destroyed

27

Weaknesses

• Requirements: - network connection (put, get, !destroy) - complex analysis of DHT networks - “refresher” hardware for reasonable life

• Clearly not for all data! (network flooding)• Life of data is not mathematically

determinate or even guaranteed (depends completely on churn)

• Assumes no hardcopies of data…

28

Improvement / Future Work

• Instead of refresher hardware, DHT maintenance could refresh automatically

• Utilization of many P2Ps in parallel, choose appropriate one based upon churn

• Analyze network w/ many data objects over very long timeout periods

• Make sure VDO is well encrypted or someone could easily hack the threshold

29

References1 Geambasu, Roxana, et al. Vanish: Increasing Data Privacy with Self-

Destructing Data, USENIX Security Symposium, 2009

2 Wiley, Brandon. Distributed Hash Tables, Part I, http://www.linuxjournal.com/article/6797, Linux Journal, 2003

3 Hua, Kien. P2P Search, http://www.cs.ucf.edu/~kienhua/classes/, COP5711 Parallel & Distributed Databases, University of Central Florida, 2008

4 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29

5 http://en.wikipedia.org/wiki/Hash_table

6 Shamir, A. How to share a secret, Commun. ACM, 1979

7 Stutzbach, D., Rejaie R. Characterizing Churn in P2P Networks, Technical Report, University of Oregon, 2005

http://www.linuxjournal.com/article/6797

http://www.cs.ucf.edu/~kienhua/classes/

http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29

http://en.wikipedia.org/wiki/Hash_table