Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia...

35
Authenticated Join Processing in Outsourced Databases Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia Providence, USA, 2009

Transcript of Yin Yang, Dimitris Papadias, Stavros Papadopoulos HKUST, Hong Kong Panos Kalnis KAUST, Saudi Arabia...

Authenticated Join Processing in Outsourced Databases

Yin Yang, Dimitris Papadias, Stavros PapadopoulosHKUST, Hong KongPanos KalnisKAUST, Saudi Arabia

Providence, USA, 2009

Database Outsourcing

Advantages The data owner does not need the hardware /

software / personnel to run a DBMS The service provider achieves economy of scale The client enjoys better quality of service

A main challenge The service provider is not trusted, and may

return incorrect query results2

initial data

data updates

query

query resultsService Provider ClientData Owner

Query Authentication

initial data & signatures

data updates& signature updates

query

query results & VO

Service Provider ClientData Owner

The owner signs its data with a digital signature scheme

Given a query, the service provider attaches a VO (Verification Object) to the results

The client verifies query results with the VO and the owner’s signature soundness completeness

3

Example Queries

Purchase Customer pid cid quantity cid name cityp1 c1 20 c1 Tom New Yorkp2 c3 50 c2 Brian Londonp3 c2 80 c3 Susan Tokyop4 c1 200 c4 Jane New Yorkp5 c2 500 c5 Carl London

Range: σquantity>100Purchase

Join: Purchase cidCustomer

Range & Join :(σquantity>100Purchase) cid(σcity=“New York”Customer)

4

State of the Art

Range authentication: many solutions

Join authentication: few proposals Materializing join results into views AINL (presented in detail later)

Joins are inherently more complex than ranges A join combines information from

multiple tables Only individual tables are signed 5

Previous Work

Multi-dimensional range authentication Y. Yang, S. Papadopoulos, D. Papadias, G.

Kollios (BU) ICDE’08, VLDB J.

Continuous range authentication S. Papadopoulos, Y. Yang, D. Papadias VLDB’07, VLDB J.

Novel authentication framework S. Papadopoulos, D. Saccharidis, D. Papadias ICDE’09

6

Background

Concepts in CryptographyAuthenticated Data Structure (ADS)

Merkle Hash Tree MB-Tree

AINL

7

Concepts in Cryptography

One-way, collision-resistant hash functions h = H(m) Computationally infeasible to infer m from h, or to find two

m1, m2 with the same hash value h Example: SHA1, SHA2, …

Public-key encryption Two keys: private key sk, public key pk Public key to encrypt, private key to decrypt Example: RSA

Digital Signature Hard to forge without the secret key Signing: s = encrypt(H(m), sk) Verifying: check if H(m) = decrypt(s, pk)

8

Merkle Hash Tree (Merkle, Crypto’89)

A binary tree with hash values satisfying hn = H(hn.lc | hn.rc) Authenticates 1D range queries

Example: a query Q retrieves d4, d5

VO(Q) = {sroot, h1-2, d3, d4, d5, d6, h7-8} The client re-constructs hRoot bottom-up, and verifies the

signature

h1 h2 h3 h4 h5 h6 h7 h8

h1-2 h3-4 h5-6 h7-8

h1-4 h5-8

hRoot signed by the owner

d1 d2 d3 d4 d5 d6 d7 d8

N1-4

N3-4

N4

N1-2

N3

sent to the client

Q

9

Merkle B-Tree (Li et al. SIGMOD’06)

Merkle Hash Tree + B-TreeConceptually, a Merkle Hash Tree

with a large fanout (>2)

10

s i s jsj-1s i+1 ...... ....

N3

s i-1 ... j+1s

......N4

......

N1 N2

...... ...

boundary records

root

hash values by left traversal

hash values by right traversal

N3pointer toright sibling

N4

N1 N2

AINL

For binary joinsRequires ADS on the join attribute of

the inner relationReduces a join query into multiple

rangesAlgorithm

For every tuple in the outer relationPerform an authenticated range on the inner relation

11

Example of AINL

12

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootSS

R.a (S.a )s11

Rr1

1. r1, hF, h10, s11, s12, hE

2. r2, h1, s2, s3, s4, h5, h6, hC, hG

3. …

r2

Drawbacks of AINL

Large VO size |R| records from R (outer relation) 2|R|+|RS| records from |S| (inner relation) Numerous hash values Often larger than the combined size of R

and SHigh computation overhead at the

server and the client

13

NAI: A Naïve solution

The server transmits all the data to the client

The client performs the join locallyNAI often outperforms AINL

14

Proposed Methods

Binary join authentication AISM: requires ADS on one relation AIM: requires ADSs on both relations ASM: requires no ADS

Complex join query authentication Multi-way join Select-project-join

15

AISM: Query Processing

Sort the outer relation R on the join attribute

Transmit all tuples in R to the client in their verifiable order

Transmit the sort order R of R tuples on the join attribute

Incrementally traverse the ADS on S once with the R records

16

Example of AISM

17

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootSS

R.a (S.a )s11

R[2]=4

VO: signature of R, root signature of TS, r1-r6 in their verifiable order1. R[1], h1, s2, s3, s4;2. R[2], h5, h6, hC, s10, s11, s12;3. R[3];4. R[4];5. R[5], h13, h14, s15; 6. R[6];

R[1]=2R[3]=6 R[4]=1 R[5]=3 R[6]=5

r2r1 r3

r4

r6

r5

AISM: Verification

The client checks R records correctness of the sort order R of R boundary records whether the re-constructed root hash of

TS matches its signature

18

AIM

Query processing Require ADSs on both relations Start with one relation R, traverse its ADS TR

down to the first tuple r1

Traverse TS until reaching the right boundary record s of r1

Traverse TR until reaching the right boundary record r of s

Alternatively traverse TS and TR similarly to the above

Verification: similar to AISM19

Example of AIM

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 s12 s13 s14 s15

A B C D E

F G

RootS

R

S

R.a(S.a)

H I

r1 r2 r5 r6

RootR

r4

s11

r3

20

VO: root signature of TS, root signature of TR, r1

1. hs1, s2, s3, s4;

2. r2;3. hs5

, hs6, hC, s10, s11, s12;

4. r3, r4;5. r5;6. hs13

, hs14, s15;

7. hr6;

ASM

Idea Sort-Merge-Join, sort at the server, merge at the client

Query processing Require no ADS Transmit both R and S in their verifiable order Sort R and S respectively on the join attribute Transmit the sort orders of R and S to the client Transmit bitmaps BR and BS to the client, indicating the

tuples with join partners Verification

correctness of the base relations / sort-orders / the bitmaps

21

Complex Query Authentication

Multi-way joinsSelection-Projection-Join queries

22

Build a tree of binary join operators m-ASM / m-AISM / m-AIM optimized for multi-way joins Example:

A specialized algorithm AST applies when all relations are joined on the same attribute One single VO

Multi-Way Join

R S T

R S

R S T

AIM

AISM

VO(RS) VO(RST)

m-

m-

Op 1

Op 2

23

Example of m-AIM and m-AISM

A B C

RootR

T

R

R.a/S.a

t1 t2 t3

RootT

RS/S

r3

r2 r4 r5 r6 r7 r8 r9

s1 s2 s3 s5

S

S.b/T.b[1] [2] [3]

RootSD E

r1

s4

[4]

Op

Op

1

2

VO(RS):{root signature of TR and TS, s1, s2; hA, r4, r5, r6; s3; s4; s5; hC}

VO(RST):{root signature of TT, [1], t1, t2; [2]; [3]; [4]; ht3}

24

Example of AST

S[3]

r1 r2 r3 r4 r5 r6 r7 r8 r9

A B C

RootR

T

R

R.a/S.a/T.a

t 1 t 2 t 3

RootT

S S[1] S[2] S[4]

,VO: {root signature of TR and TT, signature of relation S, bitmap BS = “1000”, s1-s4 in a verifiable order, S[1], hr1, r2, r3, r4; t1, t2; S[2]; S[3]; S[4]; hr5, hr6, hC; ht3}

25

Selection-Projection-Join Query

cid

2: city=“New York ”

1: quantity >100

CustomerPurchase

cid

Purchase Customer

1: quantity>100

2: city=“New York”

cid

Purchase Customer

1: quantity>100

2: city=“New York ”

26

Selection Use the m- algorithms for joins

Projection Build a Merkle Hash Tree for each record

Query optimization

Experiments

27

Three synthetic relations R(a1, a2)

S(a1, a2, b1, b2)

T(b1, b2) Queries

R a1 S R a2 S (R a1 S) b1 T (R a2 S) b2 T

Foreign keys S.a1 references

R.a1

S.b1 references T.b1

Parameters Tuple size Cardinality of |

S|

Repeatability and Workability We participated in the ACM SIGMOD 2009

Repeatability & Workability Evaluation (cf., http://homepages.cwi.nl/~manegold/SIGMOD-2009-RWE/).

The reviewers were able to repeat all the experiments presented in our paper, yielding results that match the ones published in

our paper, except from insignificant and to be expected

variation due to randomness and/or hardware/software differences.

The detailed reports will shortly be made publicly available by ACM SIGMOD.

28

Evaluations of AINL

29

Tuple size (bytes) 32 64 128 256 512CVO (Gbytes) 8.9 9.0 9.2 9.6 10.3

CClient (seconds) 205 207 210 214 219CDSP (seconds) 262 271 429 1728 4603

|R| / |S| 0.1 0.5 1 2 5CVO (Gbytes) 7.8 8.9 9.2 9.5 9.7

CClient (seconds) 196 205 210 218 223CDSP (seconds) 296 311 429 540 647

Binary Join: Effect of Tuple Size

0

200

400

600

800

1000

1200

32 64 128 256 512

VO size (Mbytes)

32 64 128 256 5120

200

400

600

800

1000

1200 VO size (Mbytes)

NAIAISMASM AIM optimal

0

20

40

60

80

100

120

140

32 64 128 256 51232 64 128 256 512

Total running time for the client (seconds)

0.1

1

10

100

1000

32 64 128 256 512

Total running time for the client (seconds)

0

20

40

60

80

100

120

32 64 128 256 512

Total running time for the DSP (seconds)

0

20

40

60

80

32 64 128 256 512

Total running time for the DSP (seconds)

30

Binary Join: Effect of |R| / |S|

NAIAISMASM AIM optimal

0.1 0.5 1 2 50

200

400

600

800VO size (Mbytes)

0. 1 0. 5 1 2 50

200

400

600

800 VO size (Mbytes)

0

20

40

60

80

100

0.1 0.5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

31

Multi-way Join: Effect of Tuple Size

NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal

32 64 128 256 5120

200

400

600

800 VO size (Mbytes)

32 64 128 256 5120

200

400

600

800 VO size (Mbytes)

0

20

40

60

80

100

120

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

32 64 128 256 512

Total running time for the client (seconds)

0

20

40

60

80

100

32 64 128 256 512

Total running time for the DSP (seconds)

0

20

40

60

32 64 128 256 512

Total running time for the DSP (seconds)

32

Multi-way Join: Effect of |S| / |R|

NAI-AISM+m -AISMm-ASM+m -ASMm -AIM+m -AISMm optimal

0. 1 0. 5 1 2 50

200

400

600

800 VO size (Mbytes)

0. 1 0. 5 1 2 5

VO size (Mbytes)

0

200

400

600

800

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the client (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

0

20

40

60

80

100

0. 1 0. 5 1 2 5

Total running time for the DSP (seconds)

33

Conclusion

Binary join authentication AISM: authenticated structure on one relation AIM: authenticated structures on both relations ASM: no authenticated structure

Complex query authentication Multi-way join: eliminate unnecessary

intermediate VO elements Selection-projection-join query

Future Work Authenticated Structures specialized to joins Hash join instead of SMJ

34

Thank you!

Questions?

35