Vanish: Increasing Data Privacy with Self-Destructing Data
Roxana Geambasu, Tadayoshi Kohno, Amit Levy, et al.
University of Washington
USENIX Security Symposium, 2009
---
Presented by Joseph Del Rocco
University of Central Florida
2
Outline
• Distributed Hash Tables (DHT)
• Data Destruction w/ Vanish– Motivations– Architecture– Implementations– Results– Contributions / Weakness / Improvement
• References
3
Hash Tables (review) [5]
• Tag or data hashed into table index
• Hashing functions:MD5, SHA-1, CRC32, FSB, HAS, etc.++++
• Hash collisions unavoidable, so linked lists used (see birthday paradox)
4
Distributed Hash Tables
• Hash Tables… split-up across machines• Key-space astronomically large (2128, 2160)
2128 = 340282367000000000000000000000000000000
• In 2001, four main DHTs ignited research: - Chord (MIT)
- CAN (Berkeley, AT&T)
- Pastry (Rice, Microsoft)
- Tapastry (Berkeley)
• Availability, Scale, Decentralized, Churn!
55
Viewed as a Distributed Hash Table [3]
Hash table0 2128-1
Peer nodes
• Each peer node is responsible for a range of the hash table, according to the peer hash key
• Location information about Objects are placed in the peer with the closest key (information redundancy)
66
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: put() [3]
insert(K1,V
1)
Operation: Route message, “I have the file,” to node holding key K1
Want to share a
file
77
retrieve (K1)
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
DHT in action: get() [3]
Operation: Retrieve message V1 at node holding key K1
8
Chord [4]
• Keys, nodes have 2m hash (filename, IP)
• Each node stores ~(K / N) keys
• “finger table” = next((n + 2i − 1) % 2m)
99
CAN – Content Addressable Network [3] • Each peer is
responsible for one zone, i.e., stores all (key, value) pairs of the zone
• Each peer knows the neighbors of its zone
• Random assignment of peers to zones at startup – split zone if not empty
• Dimensional-ordered multi-hop routing
1010
(1) a = hx(K)
b = hy(K)
CAN: Object Publishing [3]
x = a
y = b
node I::publish(K,V) I
1111
(1) a = hx(K)
b = hy(K)
CAN: Object Publishing [3]
(2) route (K,V) -> J
node I::publish(K,V) I
J
12
Modern, Popular P2P w/ DHTs
• Vuze / Azureus (Java BitTorrent client)
• BitTorrent DHT (Based on KAD)
• IBM Websphere
• Apache Cassandra
• OpenDHT
• Mainline
• Kazaa, eDonkey, etc. (KAD)
Dendrobates Azureus(Blue Poison Dart Frog)
13
Vuze (Azureus) Specifics
• Nodes in network assigned “random” 160-bit ID hashed on IP & port (DHT idx range)
• Client sends “put” messages to 20 closest nodes to hashed key index in DHT
• Nodes re-put() entries from local hash tables every 30 minutes to combat churn
• Nodes supposedly remove key/value pairs > 8 hours, if not re-put() by originator
• Originator node must re-put() to persist?
14
Vanish Motivations
• Data frequently cached/archived by email providers, ISPs, network backup systems
• Often available after account termination
• Forensic examination of hard drives (raid)
• Laptops stolen, taken-in for repair
• High-profile political scandals
• Some argue the right and ability to destroy data is as fundamental as privacy & liberty
15
Vanish Motivations
• Hushmail email encryption service offered cleartext contents of encrypted messages to the federal government
• Trusted 3rd party (Ephemerizer) supposedly destroy data after timeout, but this never caught on… trust issue?
• Subpoenas…
16
Vanish Goals
• Create a Vanishing Data Object (VDO)
• Becomes unreadable after a timeout, regardless if one retroactively obtains a pristine copy of VDO before expiration
• Accessible until timeout
• Leverage existing infrastructure
• NO required passwords, keys, special security hardware…
17
Vanish Architecture
• Encrypt data D w/ random key K into C• Use T.S.S.[6] to split C into N shares • Pick random access key L, use cryptographically
secure PRNG (keyed by L) to derive N indices
18
Vanish Architecture
• Threshold of T.S.S. (threshold ratio), determines how many of N shares are needed to reconstruct K
• EX: N = 20, threshold = 10So any 10 of the 20 shares can be used
• EX: N = 50, threshold = 50Better have all shares…
• VDO = (L, C, N, threshold), sent / stored
19
Vanish Decapsulation
• Given VDO: - extract access key L - derive locations of shares of K - get() # of shares required by threshold - reconstruct K - decrypt C to obtain D
• # of shares must be > threshold ratio
20
Benefits of Churn!
• Nodes continue to leave/re-enter network
• Supposedly 80% of IPs change in 7 days
• Nodes change IDs (locations in network) as IP changes
• Also, hash tables per node purge themselves after some time period
• So data is guaranteed to NOT last long at its original node…
21
The Big Question… [1][7]
• How long are we talking w/ churn? - Vuze = unclear… (7h, 3h, 2h …) - OpenDHT = (1 hour – 1 week) - Kazaa = ~“several minutes” (2.5)
• Refresh uses K, re-splits into new shares, uses L to derive new indices & re-puts()
• “Naturally, refreshes require periodic Internet connectivity.” [1]
22
Implementations
23
Implementations
24
Results
“two minor changes (<50 lines of code)”
25
Results
“[With single share VDO model] … ~5% of VDOs continue to live long after 8 hours.” [1]
“…the cause for the latter effect demands more investigation, we suspect that some of the single VDO keys are stored by DHT peers running non-default configurations.” [1]
“These observations suggest that the naive (one share) approach [does not meet our goals] … thereby motivating our need for redundancy.” [1]
26
Contributions
• Solution utilizes existing, popular, researched technology - used since 2001
• Interesting idea utilizing DHT as general temporary storage
• No required special security hardware, or special operations on the part of the user
• Utilizes inherent half-life (churn) of nodes in DHT – data definitely destroyed
27
Weaknesses
• Requirements: - network connection (put, get, !destroy) - complex analysis of DHT networks - “refresher” hardware for reasonable life
• Clearly not for all data! (network flooding)• Life of data is not mathematically
determinate or even guaranteed (depends completely on churn)
• Assumes no hardcopies of data…
28
Improvement / Future Work
• Instead of refresher hardware, DHT maintenance could refresh automatically
• Utilization of many P2Ps in parallel, choose appropriate one based upon churn
• Analyze network w/ many data objects over very long timeout periods
• Make sure VDO is well encrypted or someone could easily hack the threshold
29
References1 Geambasu, Roxana, et al. Vanish: Increasing Data Privacy with Self-
Destructing Data, USENIX Security Symposium, 2009
2 Wiley, Brandon. Distributed Hash Tables, Part I, http://www.linuxjournal.com/article/6797, Linux Journal, 2003
3 Hua, Kien. P2P Search, http://www.cs.ucf.edu/~kienhua/classes/, COP5711 Parallel & Distributed Databases, University of Central Florida, 2008
4 http://en.wikipedia.org/wiki/Chord_%28peer-to-peer%29
5 http://en.wikipedia.org/wiki/Hash_table
6 Shamir, A. How to share a secret, Commun. ACM, 1979
7 Stutzbach, D., Rejaie R. Characterizing Churn in P2P Networks, Technical Report, University of Oregon, 2005
Top Related