Post on 02-Jan-2016
Executing SQL over Encrypted Data in Database-Service-Provider
Model
Hakan HacigumusUniversity of California, Irvine
Bala IyerIBM Silicon Valley Lab.
Chen LiUniversity of California, Irvine
Sharad MehrotraUniversity of California, Irvine
SIGMOD 2002, Madison, Wisconsin, USA
2
What do we want to do?
We want to store the data on “a server”
User Encrypted User DatabaseServer
User Data
But the problem is we do not trust “the server” for sensitive information!
encrypt the data and store it but still be able to run queries over the encrypted data do most of the work at the server
If the server is trusted, ICDE 2002
Distrusted
3
Why is it important anyway?
Application Service Provider (ASP) Model for Database
DB management transferred to service provider for backup, administration, restoration, space management,
upgrades etc.
use the database “as a service” provided by an ASP use SW, HW, human resources of ASP, instead of your own
User Encrypted User Database
(Distrusted) Application Service Provider
User Data
Distrusted Server
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
5
Service Provider Architecture
Encrypted User
Database
Query Translator
Server Site
Temporary Results
Query Executer
MetadataOriginal Query
Server Side Query
Encrypted Results
Actual Results
Service Provider
User
Client Site
Client Side Query ?
? ?
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
8
Relational Encryption
NAME SALARY
PID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
etuple N_ID S_ID P_ID
fErf!$Q!!vddf>></|
50 1 10
F%%3w&%gfErf!$ 65 2 10
&%gfsdf$%343v<l
50 2 20
%%33w&%gfs##! 65 2 20Server Site
Store an encrypted string – etuple – for each tuple in the original table
This is called “row level encryption”
Any kind of encryption technique can be used
Blowfish encryption algorithm is used for this work
Create an index for each (or selected) attribute(s) in the original table
9
Building the Index:Partition and Identification Functions
Partition function divides domain values into partitions (buckets)
Partition (R.A) = { [0,200], (200,400], (400,600], (600,800], (800,1000] }
partitioning function has an impact on performance as well as privacy
2000 400 600 800 1000
2 7 5 1 4
Domain Values
Partition (Bucket) ids
Identification function assigns a partition id to each partition of attribute A
e.g. identR.A( (200,400] ) = 7 Any function can be use as identification function, e.g., hash functions
10
Mapping Functions
Mapping function maps a value v in the domain of attribute A to the id of the partition which value v belongs to
e.g. MapR.A( 250 ) = 7, MapR.A( 620 ) = 1
2000 400 600 800 1000
2 7 5 1 4
Domain Values
Partition (Bucket) ids
11
Storing Encrypted Data
R = < A, B, C > RS = < etuple, A_id, B_id, C_id >
etuple = encrypt ( A | B | C ) A_id = MapR.A( A ), B_id = MapR.B( B ), C_id = MapR.C( C )
NAME SALAR
YPID
John 50000 2
Marry 110000 2
James 95000 3
Lisa 105000 4
Etuple N_ID S_ID P_ID
fErf!$Q!!vddf>></|
50 1 10
F%%3w&%gfErf!$ 65 2 10
&%gfsdf$%343v<l
50 2 20
%%33w&%gfs##! 65 2 20
Table: EMPLOYEE
Table: EMPLOYEES
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
14
Mapping Conditions
Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k
Server stores attribute indices determined by mapping functions Client stores metadata and utilizes that to translate the query
Conditions: Condition Attribute op Value Condition Attribute op Attribute Condition (Condition Condition) | (Condition Condition)
| (not Condition)
15
Mapping Conditions (2)
Example:
Attribute = Value Mapcond( A = v ) AS = MapA( v ) Mapcond( A = 250 ) AS = 7
2000 400 600 800 1000
2 7 5 1 4
Domain Values
Partition Ids
16
Mapping Conditions (3)
Attribute1 = Attribute2 Mapcond( A = B ) N (AS = identA( pk ) BS = identB( pl ))
where N is pk partition (A), pl partition (B), pk pl
Partitions
A_id
[0,100] 2
(100,200] 4
(200,300] 3
Partitions
B_id
[0,200] 9
(200,400] 8
C : A = B C’ : (AS = 2 BS = 9) (AS = 4 BS = 9) (AS = 3 BS = 8)
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
18
Relational Operators over Encrypted Relations
Partition the computation of the operators across client and server
Compute (possibly) superset of answers at the server Filter the answers at the client Objective : minimize the work at the client and process the
answers as soon as they arrive without requiring storage at the client
Operators studied: Selection Join Grouping and Aggregation Sorting Duplicate Elimination Set Difference Union Projection
19
Selection Operator
A=250
TABLE
2000 400 600 800 1000
2 7 5 1 4
Example:A=250
D
E_TABLE
A_id = 7
Client Query
Server Query
c( R ) = c( D (S
Mapcond(c)( RS
) )
20
Join Operator
C
EMP PROJ
C : A = B C’ :(A_id = 2 B_id = 9)
(A_id = 4 B_id = 9)(A_id = 3 B_id = 8)
Partitions A_id
[0,100] 2
(100,200] 4
(200,300] 3
Partitions B_id
[0,200] 9
(200,400] 8
R c T = c( D ( RS S
Mapcond(c) TS
)
Example:
C’
E_EMP E_PROJ
A=B
D
Client Query
Server Query
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
23
Query Decomposition
Client Query
Q: SELECT name, pname FROM emp, proj WHERE emp.pid=proj.pid AND salary > 100k
Server Query
Encrypted(EMP)
Encrypted(PROJ)
salary >100k
name,pname
D
D
e.pid = p.pid
EMP
PROJsalary >100k
name,pname
e.pid = p.pid
24
Query Decomposition (2)
E_EMP
E_PROJ
salary >100k
D
D
E_EMP
E_PROJ
salary >100k
D
D
s_id = 1 v s_id = 2
e.pid = p.pid
e.pid = p.pid
name,pname
name,pname
Client Query
Server Query
Client Query
Server Query
25
Query Decomposition (3)
e.p_id = p.p_id
E_EMP
E_PROJ
salary >100k e.pid = p.pid
D
s_id = 1 v s_id = 2
e.pid = p.pid
E_EMP
E_PROJ
salary >100k
D
D
s_id = 1 v s_id = 2
name,pname name,pname
Client QueryClient Query
Server Query Server Query
26
Query Decomposition (4)Q: SELECT name, pname
FROM emp, proj WHERE
emp.pid=proj.pid AND salary > 100k
QS: SELECT e_emp.etuple, e_proj.etuple FROM e_emp, e_proj
WHERE e.p_id=p.p_id
AND s_id = 1 OR s_id = 2
QC: SELECT name, pname FROM temp
WHERE
emp.pid=proj.pid AND salary > 100k
e.p_id = p.p_id
E_EMP
E_PROJ
salary >100k e.pid = p.pid
D
s_id = 1 v s_id = 2
name,pname
Client Query
Server Query
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
Talk Outline
Service Provider Architecture
How to create Metadata: Relational Encryption and Storage Model
Query Decomposition and Relational Operators
Query Decomposition – Examples
Experimental Results
Conclusion
29
Experimental Evaluation
Data TPC-H database, scale factor 0.1
Queries TPC-H Queries, versions of Q#6 and Q#3
Partitioning Strategy Equi-depth histograms for the first set of experiments Equi-width histograms for the second set of experiments
30
Effect of Number of Buckets in Non-Join Query
Client and communications costs decreases with increasing number of buckets due to better filtering at the server
Server cost doesn’t decrease as much, table scan remains best choice in the optimizer
0
10
20
30
40
Que
ry R
espo
nse
Tim
e
2 8Number of Buckets
Cost Factors for Query Response Time
Client SideNetworkServer Side
31
Effect of Number of Buckets in Non-Join Query
Single Server: Server is trusted and performs all operations including decryption on site
Shows that proposed query execution protocol doesn’t introduce significant overhead
05
101520253035
Que
ry R
espo
nse
Tim
e
2 8Number of Buckets
Client/Server v.s. Single Server
Single ServerServer SideClient Side
32
Effect of Number of Buckets in Join Query
Sharp decrease in query response time with increase in the number of buckets due to better filtering at the server
Client side query response time is greater than server side query response time due to dominant decryption cost on the query (second graph)
Client, Server, and Total Response Times
1 75 100 150 250 300 500 750 1500
Number of Buckets
Que
ry R
espo
nse
Tim
e
ClientServerTotal
Effect of Decryption Time
1 75 100 150 250 300 500 750 1500
Number of Buckets
Que
ry R
espo
nse
Tim
e
Client /wdecryptionClient w/odecryption
33
Effect of Number of Buckets in Join Query
Single Server: Server is trusted and performs all operations including decryption on site
Consistent with the previous results showing proposed communication protocol doesn’t introduce significant overhead
Client/Server v.s. Single Server
1 75 100 150 250 300 500 750 1500
Number of Buckets
Que
ry R
espo
nse
Tim
eC/SSingle Server
34
Conclusion
ASP model is a promising solution for enterprise computing in Internet era
We studied data privacy problem in the context of ASP model when the ASP is not trusted
Proposed solution encrypts data, creates “coarse indexes” and stores the data at ASP allows only data owner to decrypt the data
With query decomposition most of query execution performed at ASP client only performs filtering and continues to benefit from ASP
model