Secure and efficient data sharing on encrypted cloud relational databases.

Post on 02-Jan-2016

220 views 0 download

Tags:

Transcript of Secure and efficient data sharing on encrypted cloud relational databases.

Secure and efficient data sharing on encrypted cloud relational databases

Introduction

• (Relational)-cloud databases are welcomed

Service provider (SP)User

Item_ID Cost Wholesale_price

1076 10 20

3308 15 50

Store data on cloud

Get back a data item

Item_ID Cost Wholesale_price

1076 10 20

Encryption for security

• Due to security concern, data is encrypted before storing on cloud

Service provider (SP)User

Item_ID Cost Wholesale_price

Egask5 A42fgs 2S46Dg

asD3j64 139ASs Dd3fj2

Store data on cloud

Get back a data item

Item_ID Cost Wholesale_price

1076 10 20

Key is kept by user, but not SP!

The problem of data sharing

Item_ID Cost Wholesale_price

Egask5 A42fgs 2S46Dg

asD3j64 139ASs Dd3fj2

Alice SP Bob

Bob is my business partner, I want to let him know the wholesale price

of some of my selected products.

Requirements:1. Shared data should be revealed to only

Bob (but not SP). 2. Other unshared data should remain

unknown to both Bob and SP3. Cost to Alice should be low (while cost to

Bob and SP should be affordable)

Alice’s data

Application of data sharing

1. Alice is a company user of SP. Now, Alice hires Bob, who is a data analytics expert to perform analysis. Alice has to share some of her data with Bob

2. Alice and Bob are two business partners. They share some data for gaining advantages, e.g., more market information.

Naïve solution of data sharing(E.g., CryptDB, TrustedDB)

• Encryption: Use an existing general encryption function, e.g., RSA with padding, to encrypt all datac = E(p, k)– Ciphertext: c– Plaintext: p– (Public) Key: k– Encryption function: E

Item_ID Cost Wholesale_price

Egask5 A42fgs 2S46Dg

asD3j64 139ASs Dd3fj2

Naïve solution of data sharing - cont

Wholesale_price

2S46Dg

Item_ID Cost Wholesale_price

Egask5 A42fgs 2S46Dg

asD3j64 139ASs Dd3fj2

Alice SP Bob

Share Wholesale_price of Item “Egask5”

Alice sends Bob a copy of the key

On request, SP sends Bob the shared itemAccess control is enforced to prevent Bob from seeing unauthorized items

Item_ID Cost Wholesale_price

1076 10 20

3308 15 50

This solution is not secure!

Another naïve solution

Item_ID Cost Wholesale_price

Egask5 A42fgs 2S46Dg

asD3j64 139ASs Dd3fj2

Alice SP Bob

Wholesale_price

2S46Dg

Wholesale_price

20

Wholesale_price

20

Alice downloads the items to be shared and decrypts them

Send Bob the plain data

Bob either stores the data on his own or inserts them to cloud like new tuples

High processing cost to Alice

Problem definition

• Data: relational data– A table R contains

• T: a set of tuples• C is a set of columns (attributes)

– Each tuple t has exactly m values • (NULL is also a value)

• Format of data for sharing:– CS: a subset of C

– TS: a subset of T– Just like the result of a query

A B C

a1 b1 c1

a2 b2 c2

B C

b2 c2

T = {t1, t2}C = {“A”, “B”, “C”}t1 = {a1, b1, c1}t2 = {a2, b2, c2}

TS = {t2}CS = { “B”, “C”}t2 = {b2}

Models

• 3 parties: Alice, Bob, SP– Relationship: refer to introduction

• Attack model– Bob and SP are semi-honest and colluding• Bob and SP are functioning as normal• An attacker observes everything seen by Bob and SP• Requirement:

– The attacker cannot any plain data of Alice except for those are shared with Bob

Solution framework

• The solution includes:– An encryption method (KeyGen, Enc, Dec)– Sharing method (Share, SDec)

Alice Bob SP

1. k = KeyGen()

2. c = Enc(p, k) A B C

ca1 cb1 cc1

ca2 cb2 cc2

2. p = Dec(c, k)

3. H = Share(CS, TS, k)

4. p = SDec(c, H)

Our solution: Relational-based encryption (RBE)

• Problem of using general encryption, e.g., RSA– The same key is required to decrypt all encrypted

values– In order to let Bob decrypt one particular data

item, the decryption key must be sent to Bob– Overpowered Bob can now decrypt any data

encrypted by Alice

Relational-based encryption (RBE)

• Idea: How about having each individual data item encrypted by a unique value key?

A B C

a1 b1 c1

a2 b2 c2

A B C

ka1 kb1 kc1

ka2 kb2 kc2

A B C

ca1 cb1 cc1

ca2 cb2 cc2

+

Plain values Value key table Encrypted values

To share b1

Give kb1 to BobBob can only decrypt cb1, other values are safe since Bob does not have other value keys

However, Alice has to remember all value keys, it will be a high storage cost

Key abstraction

• Each cell can be located by column identifier and row identifier

• Each tuple has a tuple secret rid; each column has a column secret cid

• Use one-way hash function– k = h(rid, cid)

• Storage cost at Alice: O(mn) => O(m+n)

A B C

t1 ka1 kb1 kc1

t2 ka2 kb2 kc2

t1, A ka1

t2, C kc2

m: number of columnsn: number of tuples

Towards O(1) storage cost at Alice

• Use an existing encryption function– E: encryption function– D: Decryption function

• Tuple secrets and column secrets are encrypted and are stored at SP

A B C

E(cidA) E(cidB) E(cidC)

E(rid1) ca1 cb1 cc1

E(rid2) ca2 cb2 cc2

Encryption/decryption process

• Alice first gets back E(cid) and E(rid) of the value to be encrypted/decrypted– Decrypt and get cid and rid– Get the value key of the cell and encrypt/decrypt

the cell• Although it may seem to have a higher

encryption/decryption cost now, RBE is more efficient for relational data – more details after the math details

Details in math

• KeyGen– Just the same key generation as the underlying

encryption scheme• Enc– Tuple ti = <p1, p2, …, pm>– Obtain cid and rid– ci = pi XOR h(rid XOR cid)• h: one-way hash

Details in math

• Dec– Encrypted tuple t’i = <c1, c2, …, cm>– Obtain cid and rid– pi = ci XOR h(rid XOR cid)

Correctness of encryption

• pi = ci XOR h(rid XOR cid) --- (1)

• ci = pi XOR h(rid XOR cid) --- (2)

• Sub. (2) into RHS of (1)• ci XOR h(rid XOR cid)

= pi XOR h(rid XOR cid) XOR h(rid XOR cid)= pi

Security

• Encrypted data is stored at cloud, is it safe?

• ci = pi XOR h(rid XOR cid)

One time pad: p XOR kNote: the same key cannot be used to encrypt two or more data items!

One time pad is perfectly secureNot breakable unless the key is leaked

One-way hash function: not reversibleKnowing the hash value cannot derive the input to hash (rid XOR cid) – an important feature to guard against CPA-style attackOverall: As secure as

the hash function

There are tons of highly secure one-way hash function, including those encryption functions of different encryption schemes

Security - cont

• On the other hand, cid and rid can be derived from CN (column name) and E(rid, k)

• Imagine they are encrypted values of the underlying encryption function (E, D), the security is the same as underlying scheme

Efficiency

• Decrypting a query result with n tuples and m columns– Traditional method, e.g., RSA,• mn decryptions

• In our scheme– m+n decryptions, mn hashes, 2mn XOR operations

• Cost of decryption >> hash >> XOR

Data sharing

• Input: TS, CS

– Alice sends the rid of each tuple in TS to Bob

– Alice sends the cid of each column in CS to Bob

• H = <HT, HC> = Share(TS, CS, k)– HT = {rid | rid of t and t in TS}

– HC = {cid | cid of c and c in CS}

• Decryption: SDec(c, H)– Find corresponding rid and cid of c• pi = ci XOR h(rid XOR cid)

Security

• Revealing some values of cid and rid

Cells that are not related – of course secureCells knowing its cid but

not rid, secure?

Secure

• ci = pi XOR h(rid XOR cid)

• Note: the above already assumed Bob and SP are colluding– Otherwise, Bob has no access to encrypted values

of other data

Unknown hash input due to unknown rid or cid

The hash value is unknown then

Problem of multiple sharing

• Users collusion• User retrieves different shared versions at

different time 1st sharing

2nd sharing

Additional information that can be observed combining both sharing instances

• Introduction– ECC: Operations are defined on 2D but finite

points

Advanced solutionEcliptic curve cryptography (ECC)

y2 mod p = x3 – x mod p y2 mod p = x3 – x + 1 mod p

p: system parameter

Operations on ECC

• “Addition”

• Scalar multiplication– kP = P + P + … + P

P-2P

2P

Operations on ECC

• Order of curve– Number of points on the curve– Let n be the order of curve

• (n+1)P = P for all P

• Curve with prime order, i.e., n is prime– There is integer k s.t., kP = Q for any point P, Q (P != 0)

• Elliptic curve discrete logarithm problem (ECDLP)– Given P, Q, it is hard to find k s.t. kP = Q

• Pairing function e: – e(aP, bQ) = e(P, Q)ab

– Security: Bilinear Diffie-Hellman (BDH) assumption• Given P, aP, bP, cP, it is hard to find e(P, P)abc

Improvement over our sharing scheme

• Recall:• Encryption: ci = pi XOR h(rid, cid)

• Decryption: pi = ci XOR h(rid, cid)• Share: Return all concerned rid and cid

– Define h(rid, cid) = e(rid P, cid Q)• P, Q are private (even if they are public, it is fine.)

Sharing

• Protocol Share– Alice generates a random r– Return• {(r-1*rid)*P}• {(r*cid) Q}

Bob’s decryption

• Protocol SDec– Bob has • X =(r-1*rid)* P• Y = (r*cid) Q

– Computing g(X, Y) = h2(e(X, Y))• = h2(e((r-1*rid) P), (r*cid) Q))

• = h2(e(rid P, cid Q))

Recall:h(rid, cid) = h2(e(rid P, cid Q))

Security in multiple sharing

• Focus on columns, the case for rows is similar

1st sharing

2nd sharing

r1 cidA Q r1 cidB Q r1 cidC Q

r2 cidB Q r2 cidC Q

The values of rid and cid are contained in different sharing instances, is it a concern?

Question: is it secure?

• If we can find e(rid2 P, cidA P)…, we can solve BDH problem (let Q = P for now)– Given P, aP, bP, cP, find e(P, P)abc

• In our case– a = cidA

– b = r2-1 * rid2

– c = r2

– Generate random unrelated parameters rid1, cidB, r1

r1 cidA P r1 cidB P

r2 cidB P

r1-1*rid1 P

r2-1*rid2 P

A B C

Any values combination of a, b, c can be expressed in this way

r1 A r1 cidB P

r1-1 * rid1 P

B

cidB C

Security in multiple rows, columns?

r1 cidA P r1 cidB P

r2 cidB P

r1-1*rid1 P

r2-1*rid2 P

r1 A r1 cidB P

r1-1 * rid1 P

B

cidB C

r1-1*ridi Pr1

-1 * ridi P

r2 cidC P

cidC C

Our security proof is for general case

Selecting tuples for sharing

• It is a fundamental problem that how the user defines what data to share with a particular party

• Select tuple with user’s free choice– Requires at least linear cost (to number of tuples)

• Another option– Define by query

Pre-computation for sharing by query

Alice SP Bob

Q Q

RR

Alice issues a query to define the data to be shared with Bob

Alice prepares an index-like pre-computed information and gives it to SP

H H

Shared

DB

R is related to the query answer and index

A hint H is generated based on R

Bob can observe the shared data with the hint and the index at SP

Solution framework

• The solution includes:– An encryption method (KeyGen, Enc, Dec,

BuildTree)– Sharing method (SQuery, Share, SDec)

Alice Bob SP

1. k = KeyGen()2. c = Enc(p, k)

A B C

ca1 cb1 cc1

ca2 cb2 cc2

2. p = Dec(c, k)

5. H = Share(CS,Φ, k)

6. p = SDec(c, Δ, H)3. Δ = BuildTree()

4. Φ = SQuery(q)

Extending basic scheme

• Encrypted tuple secrets in a tree

Ei(rid1, k12) Ei(rid2, k12) Es(k12)

t3 t4 t5 t6 t7 t8

Leaf level

Ei(k12, k14) Ei(k34, k14) Es(k14)

Ei(k14, k18) Ei(k58, k18) Es(k18)

t1 t2

Keys for Es are kept at Alice only

Computing the answer of a query

• SQuery(q)

Ei(rid1, k12) Ei(rid2, k12) Es(k12)

t3 t4 t5 t6 t7 t8

Leaf level

Ei(k12, k14) Ei(k34, k14) Es(k14)

Ei(k14, k18) Ei(k58, k18) Es(k18)

t1 t2

Answers

Returned to Alice

Share (CS, Φ, k)

• Φ = {Es(k14)}

• H = <HT, HC> = Share(CS, Φ, k)– HT = {k14}

– HC = {cid | cid of c and c in CS}

Computing the answer of a query

• Bob’s knowledge: k14

Ei(rid1, k12) Ei(rid2, k12) Es(k12)

t3 t4 t5 t6 t7 t8

Leaf level

Ei(k12, k14) Ei(k34, k14) Es(k14)

Ei(k14, k18) Ei(k58, k18) Es(k18)

t1 t2

k12 k34 Es(k14)

rid1 rid2 Es(k12)

Tuple secrets of t1 to t4

Remain unknown

Advantage of using index

• Without index, cost to Alice must be at least linear to number of tuples in the sharing domain– Now, it is linear to number of nodes returned in

the tree, which is usually much smaller

Indexing scheme for multi-sharing scenario

• Use a different function to generate the value key

t1

h1

t2

h2

t3

h3

t4

h4

t5

h5

t6

h6

t7

h7

t8

h8

h12 h34 h56 h78

h14 h58

h18

^ ^ ^ ^ ^ ^ ^ ^

^ ^ ^ ^

^^

^

Leaf level

For t1:ci = pi XOR h1(h12(h14(h18( cid ))))

Computing the answer of a query

• SQuery(q)• Φ = {h14 ο h18}

t1

h1

t2

h2

t3

h3

t4

h4

t5

h5

t6

h6

t7

h7

t8

h8

h12 h34 h56 h78

h14 h58

h18

^ ^ ^ ^ ^ ^ ^ ^

^ ^ ^ ^

^^

^

Leaf level

Answers

Share (CS, Φ, k)

• Φ = {h14 ο h18}

• H = Share(CS, Φ, k)– H = {h14(h18(cid)) | cid of c and c in CS}

Computing the answer of a query

• Bob’s knowledge– x = h14 (h18 (cid ))

t1

h1

t2

h2

t3

h3

t4

h4

t5

h5

t6

h6

t7

h7

t8

h8

h12 h34 h56 h78

h14 h58

h18

^ ^ ^ ^ ^ ^ ^ ^

^ ^ ^ ^

^^

^

Leaf level

value key of t1 = h1( h12(x))

One-way hash, can’t go upCan’t see other tuples

Specific to this column, not another column

Developed schemesScheme Secure against

user-SP Collusion?

Secure in multiple sharing?

Cost

Basic Yes Partial O(m+n) - Very lowMulti Yes Yes O(m+n) – Low

Scheme Alice’s cost Secure in multiple sharing?Basic O(m + u) PartialMulti O(mu) Yes

u: number of nodesm: number of columnsn: number of tuples

With Pre-computation

Related work

• Privacy preserving data integration, e.g., DMKD 04– User issues query that is to be answered by an

untrusted platform across multiple data sources– Different model

• Access control by ABE (attribute –based encryption), e.g., ASIACCS 10– Each data is associated with an access structure. Each

user is associated with certain access attributes. Only the user with the access attributes satisfying the access structure of the data can decrypt the data.

Access control• Example:

• A file requires “IT staff” OR (“Marketing” AND “Manager”)• Alan is <“IT Staff”, “Junior”, “Full Time”> - OK• Betty is <“Part time”, “Marketing”, “Manager”> - OK• Cathy is <“Full time”, “Sales”, “Manager”> - No

• Features• Attribute revocation and ciphertext revocation: SP takes almost all workload

– Attribute revocation: User permission changes, e.g., Betty becomes <“Full time”,…>– Ciphertext revocation: file permission changes

• Drawback in our case: require a pre-defined set of access attributes• Ad hoc sharing instances?

– Need to add a new attribute, say “ABC company”, which requires re-encryption of the entire database, by the data owner

– Side note: this method is attracting a good amount of attention in crypto area.

Backup

Ei(ϒ1, k12) Ei(ϒ2, k12) Es(k12)

t3 t4 t5 t6 t7 t8

Leaf level

Ei(k12, k14) Ei(k34, k14) Es(k14)

Ei(k14, k18) Ei(k58, k18) Es(k18)

t1 t2

t1

h1

t2

h2

t3

h3

t4

h4

t5

h5

t6

h6

t7

h7

t8

h8

h12 h34 h56 h78

h14 h58

h18

^ ^ ^ ^ ^ ^ ^ ^

^ ^ ^ ^

^^

^

Leaf level