Post on 02-Jan-2016
Secure and efficient data sharing on encrypted cloud relational databases
Introduction
• (Relational)-cloud databases are welcomed
Service provider (SP)User
Item_ID Cost Wholesale_price
1076 10 20
3308 15 50
Store data on cloud
Get back a data item
Item_ID Cost Wholesale_price
1076 10 20
Encryption for security
• Due to security concern, data is encrypted before storing on cloud
Service provider (SP)User
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Store data on cloud
Get back a data item
Item_ID Cost Wholesale_price
1076 10 20
Key is kept by user, but not SP!
The problem of data sharing
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Alice SP Bob
Bob is my business partner, I want to let him know the wholesale price
of some of my selected products.
Requirements:1. Shared data should be revealed to only
Bob (but not SP). 2. Other unshared data should remain
unknown to both Bob and SP3. Cost to Alice should be low (while cost to
Bob and SP should be affordable)
Alice’s data
Application of data sharing
1. Alice is a company user of SP. Now, Alice hires Bob, who is a data analytics expert to perform analysis. Alice has to share some of her data with Bob
2. Alice and Bob are two business partners. They share some data for gaining advantages, e.g., more market information.
Naïve solution of data sharing(E.g., CryptDB, TrustedDB)
• Encryption: Use an existing general encryption function, e.g., RSA with padding, to encrypt all datac = E(p, k)– Ciphertext: c– Plaintext: p– (Public) Key: k– Encryption function: E
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Naïve solution of data sharing - cont
Wholesale_price
2S46Dg
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Alice SP Bob
Share Wholesale_price of Item “Egask5”
Alice sends Bob a copy of the key
On request, SP sends Bob the shared itemAccess control is enforced to prevent Bob from seeing unauthorized items
Item_ID Cost Wholesale_price
1076 10 20
3308 15 50
This solution is not secure!
Another naïve solution
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Alice SP Bob
Wholesale_price
2S46Dg
Wholesale_price
20
Wholesale_price
20
Alice downloads the items to be shared and decrypts them
Send Bob the plain data
Bob either stores the data on his own or inserts them to cloud like new tuples
High processing cost to Alice
Problem definition
• Data: relational data– A table R contains
• T: a set of tuples• C is a set of columns (attributes)
– Each tuple t has exactly m values • (NULL is also a value)
• Format of data for sharing:– CS: a subset of C
– TS: a subset of T– Just like the result of a query
A B C
a1 b1 c1
a2 b2 c2
B C
b2 c2
T = {t1, t2}C = {“A”, “B”, “C”}t1 = {a1, b1, c1}t2 = {a2, b2, c2}
TS = {t2}CS = { “B”, “C”}t2 = {b2}
Models
• 3 parties: Alice, Bob, SP– Relationship: refer to introduction
• Attack model– Bob and SP are semi-honest and colluding• Bob and SP are functioning as normal• An attacker observes everything seen by Bob and SP• Requirement:
– The attacker cannot any plain data of Alice except for those are shared with Bob
Solution framework
• The solution includes:– An encryption method (KeyGen, Enc, Dec)– Sharing method (Share, SDec)
Alice Bob SP
1. k = KeyGen()
2. c = Enc(p, k) A B C
ca1 cb1 cc1
ca2 cb2 cc2
2. p = Dec(c, k)
3. H = Share(CS, TS, k)
4. p = SDec(c, H)
Our solution: Relational-based encryption (RBE)
• Problem of using general encryption, e.g., RSA– The same key is required to decrypt all encrypted
values– In order to let Bob decrypt one particular data
item, the decryption key must be sent to Bob– Overpowered Bob can now decrypt any data
encrypted by Alice
Relational-based encryption (RBE)
• Idea: How about having each individual data item encrypted by a unique value key?
A B C
a1 b1 c1
a2 b2 c2
A B C
ka1 kb1 kc1
ka2 kb2 kc2
A B C
ca1 cb1 cc1
ca2 cb2 cc2
+
Plain values Value key table Encrypted values
To share b1
Give kb1 to BobBob can only decrypt cb1, other values are safe since Bob does not have other value keys
However, Alice has to remember all value keys, it will be a high storage cost
Key abstraction
• Each cell can be located by column identifier and row identifier
• Each tuple has a tuple secret rid; each column has a column secret cid
• Use one-way hash function– k = h(rid, cid)
• Storage cost at Alice: O(mn) => O(m+n)
A B C
t1 ka1 kb1 kc1
t2 ka2 kb2 kc2
t1, A ka1
t2, C kc2
m: number of columnsn: number of tuples
Towards O(1) storage cost at Alice
• Use an existing encryption function– E: encryption function– D: Decryption function
• Tuple secrets and column secrets are encrypted and are stored at SP
A B C
E(cidA) E(cidB) E(cidC)
E(rid1) ca1 cb1 cc1
E(rid2) ca2 cb2 cc2
Encryption/decryption process
• Alice first gets back E(cid) and E(rid) of the value to be encrypted/decrypted– Decrypt and get cid and rid– Get the value key of the cell and encrypt/decrypt
the cell• Although it may seem to have a higher
encryption/decryption cost now, RBE is more efficient for relational data – more details after the math details
Details in math
• KeyGen– Just the same key generation as the underlying
encryption scheme• Enc– Tuple ti = <p1, p2, …, pm>– Obtain cid and rid– ci = pi XOR h(rid XOR cid)• h: one-way hash
Details in math
• Dec– Encrypted tuple t’i = <c1, c2, …, cm>– Obtain cid and rid– pi = ci XOR h(rid XOR cid)
Correctness of encryption
• pi = ci XOR h(rid XOR cid) --- (1)
• ci = pi XOR h(rid XOR cid) --- (2)
• Sub. (2) into RHS of (1)• ci XOR h(rid XOR cid)
= pi XOR h(rid XOR cid) XOR h(rid XOR cid)= pi
Security
• Encrypted data is stored at cloud, is it safe?
• ci = pi XOR h(rid XOR cid)
One time pad: p XOR kNote: the same key cannot be used to encrypt two or more data items!
One time pad is perfectly secureNot breakable unless the key is leaked
One-way hash function: not reversibleKnowing the hash value cannot derive the input to hash (rid XOR cid) – an important feature to guard against CPA-style attackOverall: As secure as
the hash function
There are tons of highly secure one-way hash function, including those encryption functions of different encryption schemes
Security - cont
• On the other hand, cid and rid can be derived from CN (column name) and E(rid, k)
• Imagine they are encrypted values of the underlying encryption function (E, D), the security is the same as underlying scheme
Efficiency
• Decrypting a query result with n tuples and m columns– Traditional method, e.g., RSA,• mn decryptions
• In our scheme– m+n decryptions, mn hashes, 2mn XOR operations
• Cost of decryption >> hash >> XOR
Data sharing
• Input: TS, CS
– Alice sends the rid of each tuple in TS to Bob
– Alice sends the cid of each column in CS to Bob
• H = <HT, HC> = Share(TS, CS, k)– HT = {rid | rid of t and t in TS}
– HC = {cid | cid of c and c in CS}
• Decryption: SDec(c, H)– Find corresponding rid and cid of c• pi = ci XOR h(rid XOR cid)
Security
• Revealing some values of cid and rid
Cells that are not related – of course secureCells knowing its cid but
not rid, secure?
Secure
• ci = pi XOR h(rid XOR cid)
• Note: the above already assumed Bob and SP are colluding– Otherwise, Bob has no access to encrypted values
of other data
Unknown hash input due to unknown rid or cid
The hash value is unknown then
Problem of multiple sharing
• Users collusion• User retrieves different shared versions at
different time 1st sharing
2nd sharing
Additional information that can be observed combining both sharing instances
• Introduction– ECC: Operations are defined on 2D but finite
points
Advanced solutionEcliptic curve cryptography (ECC)
y2 mod p = x3 – x mod p y2 mod p = x3 – x + 1 mod p
p: system parameter
Operations on ECC
• “Addition”
• Scalar multiplication– kP = P + P + … + P
P-2P
2P
Operations on ECC
• Order of curve– Number of points on the curve– Let n be the order of curve
• (n+1)P = P for all P
• Curve with prime order, i.e., n is prime– There is integer k s.t., kP = Q for any point P, Q (P != 0)
• Elliptic curve discrete logarithm problem (ECDLP)– Given P, Q, it is hard to find k s.t. kP = Q
• Pairing function e: – e(aP, bQ) = e(P, Q)ab
– Security: Bilinear Diffie-Hellman (BDH) assumption• Given P, aP, bP, cP, it is hard to find e(P, P)abc
Improvement over our sharing scheme
• Recall:• Encryption: ci = pi XOR h(rid, cid)
• Decryption: pi = ci XOR h(rid, cid)• Share: Return all concerned rid and cid
– Define h(rid, cid) = e(rid P, cid Q)• P, Q are private (even if they are public, it is fine.)
Sharing
• Protocol Share– Alice generates a random r– Return• {(r-1*rid)*P}• {(r*cid) Q}
Bob’s decryption
• Protocol SDec– Bob has • X =(r-1*rid)* P• Y = (r*cid) Q
– Computing g(X, Y) = h2(e(X, Y))• = h2(e((r-1*rid) P), (r*cid) Q))
• = h2(e(rid P, cid Q))
Recall:h(rid, cid) = h2(e(rid P, cid Q))
Security in multiple sharing
• Focus on columns, the case for rows is similar
1st sharing
2nd sharing
r1 cidA Q r1 cidB Q r1 cidC Q
r2 cidB Q r2 cidC Q
The values of rid and cid are contained in different sharing instances, is it a concern?
Question: is it secure?
• If we can find e(rid2 P, cidA P)…, we can solve BDH problem (let Q = P for now)– Given P, aP, bP, cP, find e(P, P)abc
• In our case– a = cidA
– b = r2-1 * rid2
– c = r2
– Generate random unrelated parameters rid1, cidB, r1
r1 cidA P r1 cidB P
r2 cidB P
r1-1*rid1 P
r2-1*rid2 P
A B C
Any values combination of a, b, c can be expressed in this way
r1 A r1 cidB P
r1-1 * rid1 P
B
cidB C
Security in multiple rows, columns?
r1 cidA P r1 cidB P
r2 cidB P
r1-1*rid1 P
r2-1*rid2 P
r1 A r1 cidB P
r1-1 * rid1 P
B
cidB C
r1-1*ridi Pr1
-1 * ridi P
r2 cidC P
cidC C
Our security proof is for general case
Selecting tuples for sharing
• It is a fundamental problem that how the user defines what data to share with a particular party
• Select tuple with user’s free choice– Requires at least linear cost (to number of tuples)
• Another option– Define by query
Pre-computation for sharing by query
Alice SP Bob
Q Q
RR
Alice issues a query to define the data to be shared with Bob
Alice prepares an index-like pre-computed information and gives it to SP
H H
Shared
DB
R is related to the query answer and index
A hint H is generated based on R
Bob can observe the shared data with the hint and the index at SP
Solution framework
• The solution includes:– An encryption method (KeyGen, Enc, Dec,
BuildTree)– Sharing method (SQuery, Share, SDec)
Alice Bob SP
1. k = KeyGen()2. c = Enc(p, k)
A B C
ca1 cb1 cc1
ca2 cb2 cc2
2. p = Dec(c, k)
5. H = Share(CS,Φ, k)
6. p = SDec(c, Δ, H)3. Δ = BuildTree()
4. Φ = SQuery(q)
Extending basic scheme
• Encrypted tuple secrets in a tree
Ei(rid1, k12) Ei(rid2, k12) Es(k12)
t3 t4 t5 t6 t7 t8
Leaf level
Ei(k12, k14) Ei(k34, k14) Es(k14)
Ei(k14, k18) Ei(k58, k18) Es(k18)
t1 t2
Keys for Es are kept at Alice only
Computing the answer of a query
• SQuery(q)
Ei(rid1, k12) Ei(rid2, k12) Es(k12)
t3 t4 t5 t6 t7 t8
Leaf level
Ei(k12, k14) Ei(k34, k14) Es(k14)
Ei(k14, k18) Ei(k58, k18) Es(k18)
t1 t2
Answers
Returned to Alice
Share (CS, Φ, k)
• Φ = {Es(k14)}
• H = <HT, HC> = Share(CS, Φ, k)– HT = {k14}
– HC = {cid | cid of c and c in CS}
Computing the answer of a query
• Bob’s knowledge: k14
Ei(rid1, k12) Ei(rid2, k12) Es(k12)
t3 t4 t5 t6 t7 t8
Leaf level
Ei(k12, k14) Ei(k34, k14) Es(k14)
Ei(k14, k18) Ei(k58, k18) Es(k18)
t1 t2
k12 k34 Es(k14)
rid1 rid2 Es(k12)
Tuple secrets of t1 to t4
Remain unknown
Advantage of using index
• Without index, cost to Alice must be at least linear to number of tuples in the sharing domain– Now, it is linear to number of nodes returned in
the tree, which is usually much smaller
Indexing scheme for multi-sharing scenario
• Use a different function to generate the value key
t1
h1
t2
h2
t3
h3
t4
h4
t5
h5
t6
h6
t7
h7
t8
h8
h12 h34 h56 h78
h14 h58
h18
^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^
^^
^
Leaf level
For t1:ci = pi XOR h1(h12(h14(h18( cid ))))
Computing the answer of a query
• SQuery(q)• Φ = {h14 ο h18}
t1
h1
t2
h2
t3
h3
t4
h4
t5
h5
t6
h6
t7
h7
t8
h8
h12 h34 h56 h78
h14 h58
h18
^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^
^^
^
Leaf level
Answers
Share (CS, Φ, k)
• Φ = {h14 ο h18}
• H = Share(CS, Φ, k)– H = {h14(h18(cid)) | cid of c and c in CS}
Computing the answer of a query
• Bob’s knowledge– x = h14 (h18 (cid ))
t1
h1
t2
h2
t3
h3
t4
h4
t5
h5
t6
h6
t7
h7
t8
h8
h12 h34 h56 h78
h14 h58
h18
^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^
^^
^
Leaf level
value key of t1 = h1( h12(x))
One-way hash, can’t go upCan’t see other tuples
Specific to this column, not another column
Developed schemesScheme Secure against
user-SP Collusion?
Secure in multiple sharing?
Cost
Basic Yes Partial O(m+n) - Very lowMulti Yes Yes O(m+n) – Low
Scheme Alice’s cost Secure in multiple sharing?Basic O(m + u) PartialMulti O(mu) Yes
u: number of nodesm: number of columnsn: number of tuples
With Pre-computation
Related work
• Privacy preserving data integration, e.g., DMKD 04– User issues query that is to be answered by an
untrusted platform across multiple data sources– Different model
• Access control by ABE (attribute –based encryption), e.g., ASIACCS 10– Each data is associated with an access structure. Each
user is associated with certain access attributes. Only the user with the access attributes satisfying the access structure of the data can decrypt the data.
Access control• Example:
• A file requires “IT staff” OR (“Marketing” AND “Manager”)• Alan is <“IT Staff”, “Junior”, “Full Time”> - OK• Betty is <“Part time”, “Marketing”, “Manager”> - OK• Cathy is <“Full time”, “Sales”, “Manager”> - No
• Features• Attribute revocation and ciphertext revocation: SP takes almost all workload
– Attribute revocation: User permission changes, e.g., Betty becomes <“Full time”,…>– Ciphertext revocation: file permission changes
• Drawback in our case: require a pre-defined set of access attributes• Ad hoc sharing instances?
– Need to add a new attribute, say “ABC company”, which requires re-encryption of the entire database, by the data owner
– Side note: this method is attracting a good amount of attention in crypto area.
Backup
Ei(ϒ1, k12) Ei(ϒ2, k12) Es(k12)
t3 t4 t5 t6 t7 t8
Leaf level
Ei(k12, k14) Ei(k34, k14) Es(k14)
Ei(k14, k18) Ei(k58, k18) Es(k18)
t1 t2
t1
h1
t2
h2
t3
h3
t4
h4
t5
h5
t6
h6
t7
h7
t8
h8
h12 h34 h56 h78
h14 h58
h18
^ ^ ^ ^ ^ ^ ^ ^
^ ^ ^ ^
^^
^
Leaf level