Data Privacy and Security: EDBT 2018 Challenges and ...sisinflab.poliba.it/sebd/2018/invited/Amr El...
Transcript of Data Privacy and Security: EDBT 2018 Challenges and ...sisinflab.poliba.it/sebd/2018/invited/Amr El...
EDBT 2018
1/48
Data Privacy and Security:
Challenges and Opportunities
Amr El Abbadi
University of California, Santa Barbara
Rachel Lin, Stefano Tessaro, Miriam Metzger, Scott Reid, Tristan Allard, Esther Pacitti, Reza Akbarinia, Cetin Sahin, Victor Zakhary.
NEW CHALLENGES
EDBT 2018
3/48
Cloud
EDBT 2018
4/48
Social Media
EDBT 2018
5/48
CryptoCurrency
Overall Message
• Great Theoretical Cryptography Results
• Need Practical Systems Design
EDBT 2018 6/48
EDBT 2018 7
Cloud Computing
Google App Engine
: Target of Attacks
Rennes July 2017 8
Is it a Real Concern?What are the top three challenges or barriers to implementing a cloud computing
strategy for your IT organization? [IDG’16]
Data and Security concerns are still main
inhibitor!
Rennes July 2017 9
December 2016:
1 billion accounts are affected
Yahoo! Data Breach (Occurred in 2013)
Rennes July 2017 10
We need mechanisms to ensure
data security and privacy!
11
Challenge: Conflicting Goals
Fun
ctio
nality
Perfo
rmance
Confidentiality/Privacy
High
Low High
IDEAL
Many
Crypto
Protocols
Many
Cloud
Services
What is the
correct
balance?
EDBT 2018
Outsourced Private Data
Alice
read (a)
ACKwrite (c, data)
read (b)
Security Concerns?
Confidentiality of Data
Solution:
Encryption
12
EDBT 2018
13
Building Tools – (1) Encryption
Deterministic
AES + EBC
Electronic Codebook Mode
𝑬𝑵𝑪𝜿(𝑿) = 𝑬𝑵𝑪𝜿(𝑿)
Non-deterministic
AES + CBC
Cipher Block Chaining Mode
𝑬𝑵𝑪𝜿(𝑿) ≠ 𝑬𝑵𝑪𝜿(𝑿)
EncryptionHello World!
Plaintextf559c6da5e9efb90c34cf27170
1fad34ba5952f9
Ciphertext
Decryption
CiphertextPlaintext
f559c6da5e9efb90c34cf27170
1fad34ba5952f9 Hello World!
𝜅
EDBT 2018
14
Database Community: Secure SQL?
SELECT SUM(price) AS total
FROM orders
WHERE 10 <= price AND city = ‘Vienna’
GROUP BY order_id
HAVING total > 20
GOAL: Developing algorithms that can answer queries over
securely outsourced data without fetching all data
comparison of entries for equality
keyword search: search for pattern
range query: comparison of numerical value
aggregation
EDBT 2018
15
Building Tools –
(2) Homomorphic Encryption
A form of encryption which allows some computations to be carried out on ciphertexts without decrypting it.
EDBT 2018
𝑬𝑵𝑪(𝒙) 𝑬𝑵𝑪(𝒚)Θ𝑫𝑬𝑪 ( )𝒙 + 𝒚 =
Example: Multiplicative Homomorphic
Encryption
EDBT 2018 16/48
𝑇𝑜 𝑒𝑛𝑐𝑟𝑦𝑝𝑡: E m = me (modn)
To decrypt: D 𝑐 = 𝑐𝑑 (mod𝑛)
So: 𝐸 𝑚1 = 𝑚1𝑒 𝐸 𝑚2 = 𝑚2
𝑒
𝐸 𝑚1 × 𝐸 𝑚2
= 𝑚1𝑒 ×𝑚2
𝑒
= (𝑚1 ×𝑚2)𝑒
= 𝐸(𝑚1 ×𝑚2)
𝐸 𝑚1 × 𝐸 𝑚2 = 𝐸(𝑚1 ×𝑚2)
17
Building Tools – Homomorphic Encryption
Partially Homomorphic
• Either additive or multiplicative
• Paillier, El Gamal, …etc.
Fully Homomorphic [Gentry’09]
• Supports computations for any arbitrary function
• Quite inefficient
EDBT 2018
18
Order Preserving Encryption (OPE) [Agrawal et al. SIGMOD’04]
𝑿𝒊 > 𝑿𝒋 ⇔ 𝑬𝑵𝑪𝜿 𝑿𝒊 > 𝑬𝑵𝑪𝜿(𝑿𝒋)
• Comparison can be done on the server without decrypting
• Standard database indexes can be used
• Vulnerable to statistical attacks!
EDBT 2018
Building Tools: Differential Privacy [Dwork 2006]
xn
xn-1
x3
x2
x1
Server
query 1
answer 1
query T
answer T
DB=
random coins
¢ ¢ ¢
slide 19
UserS
xn
xn-1
y3
x2
x1
Server
query 1
answer 1
query T
answer T
DB’=
random coins
¢ ¢ ¢
UserS’
Differ in 1 row
Distance
between
distributions
is at most
EDBT 2018 20
Range Query over Encrypted DataData
Provider(trusted)
ENC(DB)
publish
query
PINED-RQ [ICDE 2018](with Tristan Allard, Esther Pacitti, Reza Akbarinia)
Builds a differentially private index over encrypted data to execute range queries
Support Updates (modify, insert, delete)
Probabilistic Query Execution guarantee
Strong privacy: Joint differential privacyand semantically-secure encryption
Very Efficient.
DP-Indexuntrusted
Rennes July 2017 21
PINED-RQ: Challenges?
How to create an index?
How to process the query over the index?
How to support updates?
Rennes July 2017 22
Creating the index
Inspired from B-Trees:
• Well-known efficient data access structures;
• A hierarchy of ranges
Design choices :
• Replace ranges by differentially private histograms
0
5
10
15
20
25
30
[0, 10) [10, 20) [20, 30) … …
Rennes July 2017 23
Creating the index
Inspired from B-Trees:
• Well-known efficient data access structures;
• A hierarchy of ranges
Design choices :
• Replace ranges by differentially private histograms
0
10
20
0
5
10
0
5
10
0
2
4
0
10
20
0
10
20
Pointers to encrypted records
24
Full Fledged Secure Systems
CryptDB, SOSP’11
MONOMI, VLDB’13
TrustedDB, SIGMOD’11
Cipherbase, CIDR’13hardware assisted
EDBT 2018
25
CryptDB [Popa et al. SOSP’11]
• Supporting various SQL Queries over encrypted data without decrypting on the untrusted DBMS server
Trusted Proxy
Alice
Bob
Carolyn
Stores schema, master key
No data storage
No query execution
Process queries completely at the DBMS,
on encrypted database
SQL Aware Encryption
EDBT 2018
26
CryptDB – Onions of Encryption[Popa et al. SOSP’11]
SELECT SUM(price) AS total
FROM orders
WHERE city = ‘Venice’
Required: Deterministic
• Strip off RND layer from Onion Eq
Required: Homomoprhic
• Use Onion Add
EDBT 2018
27
CryptDB [Popa et al. SOSP’11]
Designed for OLTP workloads
Supports a wide range of SQL queries
In the long term, confidentiality level degrades
to the weakest encryption, e.g. OPE
Supports only 4/22 TPC-H queries (analytical)
EDBT 2018
Outsourced Private Data
Alice
read (a)
AC
Kwrite (c, data)
read (b)
Security Concerns?
Confidentiality of Data
Encryption
Encryption alone is not enough!!!
Access patterns can leak sensitive information
[Islam et al. NDSS’12]
Is encryption
enough?
28
EDBT 2018
Access PatternsThe 2 requests read the same
block
If query is a search
for a certain drug
Alice and Bob have
the same disease:
reveals medical
condition
AliceBob
29
EDBT 2018
Outsourced Private Data
Alice
Initially proposed by [Goldreich and Ostrovsky, JACM’96]
Goal: Oblivious Access
OBLIVIOUS RAM (ORAM)
Translate each logical access
to a sequence of random-looking accesses
ORAM
30
EDBT 2018
31
Multi-Client Scenario
Alice
Bob
Carolyn
read (a)
read (a)
write (c, data)
Concurrency
ACK
Single ORAM Client
NotEfficient!
EDBT 2018
32
Asynchrony to the rescue
Tree-Based Asynchronous Oblivious Store
• TaoStore [S&P 2016]
Fully concurrent and asynchronous
oblivious access
Concurrent and non-blocking
processing of requests
Makes tree-based ORAM concurrent
First formal study of asynchronicity
Trusted Proxy
Alice
Bob
Carolyn
EDBT 2018
Rennes July 2017
33
Starting point – Path ORAM [Stefanov et al CCS’13]
ServerStorage is organized as a binary tree
Every access to a random pathItems randomly re-assigned after every access
Stores the assignment
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Stash
Pos Map
Proxy
a→3
a
O(1) blocks
Rennes July 2017
34
Path ORAM - AccessServer
1) Read path• Fetch associated path• Read/Modify block• Assign block to a new random path in
position map
2) Flush• Push every block to the lowest non-
full node that intersects with its assigned path (otherwisestash)
3) Write-back• Re-encrypt w/ fresh
randomness
Read/Write block a
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Stash
Proxy
a→3
aa
a→1
aa
Pos Map
Rennes July 2017
35
Path ORAM - AccessServer
1) Read path• Fetch associated path• Read/Modify block• Assign block to a new random path in
position map
2) Flush• Push every block to the lowest non-
full node that intersects with its assigned path (otherwisestash)
3) Write-back• Re-encrypt w/ fresh
randomness
Read/Write block a
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Stash
Proxy
a→3a→1
a
If root is full move to stash
Pos Map
a
How to handle≤ k concurrent requests?
Server
Leaf 1 Leaf 2 Leaf 3 Leaf 4
Proxy
Stash
Process k operations• Fetch corresponding k
paths• Form a subtree in proxy
STAGE 1
• Re-assign k items to new random paths
• Flush along the entire subtree and write-back
STAGE 2
Blockingflush and write-back
Concurrent accesseson same block
Two problems
Pos Map
TaORAM – Basic ApproachRennes July 2017
36
Social Media
EDBT 2018 37/48
EDBT 2018
Content-based
Inference
Attribute Inference Example [DBSec 2017]
38/48
EDBT 2018
T1T2T3
T1: (L1:60%, L2:30%, L3:5%, others: 5%)
T2: (L1:80%, L4:10%, L5:4%, others: 6%)
T3: (L1:75%, L7:15%, L8:9%, others: 1%)
This user is most probably in location L1
Goal: hide user’s location while
maintaining their persona
39/48
3- Write posts
2- Suggest topics
1- Write a post
Tu11 Tu12 Tu13 Tu14 Tu15 ……Tu1k
Tu21 Tu22 Tu23 Tu24 Tu25 ……Tu2l
Tu31 Tu32 Tu33 Tu34 Tu35 ……Tu3m
………………………………………..
………………………………………..
Tun1 Tun2 Tun3 Tun4 Tun5 ……Tunh
0- Continuous topic
analysis
The Client5- Publish posts
4- Queue
posts
LocBorg
EDBT 2018
LocBorg: Obfuscation [Sigspatial 2018]
40/48
Exploring Group Privacy (with Miriam Metzger, Scott Reid)
• The dangers of algorithmic profiling on group identities.
• Existing aggregation and anonymization privacy
methods focus on individual identity, rather than the
identify or profile of a group.• Community Aware Trending Topics
• Strava Heat maps
• Privacy research has yet to
examine whether people are
aware of privacy risks from
group inference technologies.
User experiments to better understand group (versus
individual) privacy and its effects on users/data subjects
BITCOIN & BLOCKCHAIN
Borrowing heavily from Bitcoin and cryptocurrency technologies online course
Hash Pointers
• Hash Pointer: Pointer to data + H(data).
• With hash pointers, we can:
• Retrieve data
• Verify data has not been tampered.
EDBT 2018 43
DATA
H( )
TransID: H()
A Block Chain
• Detecting tampering
EDBT 2018 44/48
DATA DATA DATA
H( ) H( ) H( )
TransID: 5 TransID: 6 TransID: H()
Blockchain: How to avoid a
centralized authority?
EDBT 2018 45/48
DATA DATA DATA
H( ) H( ) H( )
TransID: 5 TransID: 6 TransID: H()
Publish to ALL nodes a history of all transactions
Optimization: multiple transactions per block
Everybody has a copy of the blockchain
EDBT 2018 46/48
Main Challenge
EDBT 2018 47
How to decentralize the system
and operate without any
trusted central authority?
Bitcoin Consensus
EDBT 2018 48
• Miners compete to decide who proposes
new block
• Solve computationally expensive cryptographic
puzzle.
• Highly unlikely two miners will solve puzzle at
same time
• Proposes new block in chain
• Other nodes accept by extending chain with new
block after validation
Mining is hard work!
EDBT 2018 49
Byzantine Agreement to the rescue ?
EDBT 2018 50
Must be prepared for Malicious Behavior
Byzantine Agreement
EDBT 2018 51/48
Challenges and much recent work:
• Requires multiple rounds of communication
• Requires a priori knowledge of ALL participants
Conclusion
• Need collaborations between theory and systems experts:
• Cryptography
• Database Systems
• Distributed Systems
• Applications experts
• To ensure
• Data Privacy
• Data Security
• Data Scalability
• Data Access Efficiency
EDBT 2018 52/48