Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

50
Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology Marios Hadjieleftheriou, AT&T Labs Research George Kollios, Boston University

description

Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams. Feifei Li, Florida State University Ke Yi, Hong Kong University of Science & Technology Marios Hadjieleftheriou , AT&T Labs Research George Kollios , Boston University. - PowerPoint PPT Presentation

Transcript of Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Page 1: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Feifei Li, Florida State University Ke Yi, Hong Kong University of Science &

Technology Marios Hadjieleftheriou, AT&T Labs Research George Kollios, Boston University

Page 2: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Outsourced stream model: stock trading monitoring

2

Provider: A stock broker

Servers(bloomberg)

Q

Register Queries: Sliding window query and/or

One shot query

Clients

Page 3: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Data Publishing Model [HIM02]

SD

3

Owner: publish dataServers: host (or monitor) the data and provide query servicesClients: query the owner’s data through servers

ownerserversclients

H. Hacigumus, B. R. Iyer, and S. Mehrotra, ICDE02

Page 4: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Information Security Issues

4

The third-party (server) cannot be trusted

Lazy server

Malicious intent

Compromised equipment

Unintentional errors (e.g. bugs)

Page 5: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Problem 1: Injection

5

SD

Select * from T where 5<A<11

A B

r1 …

… …

ri-1 4

ri 7

ri+1 9

ri+2 11

A B

r1 …

… …

ri-1 4

ri 7

ri+1 9

ri+2 11

Returns 7, 8, 9

owner

server

client

Page 6: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Problem 2: Drop

6

SD

Select * from T where 5<A<11

A B

r1 …

… …

ri-1 4

ri 7

ri+1 9

ri+2 11

A B

r1 …

… …

ri-1 4

ri 7

ri+1 9

ri+2 11

Returns 7

owner

server

client

9ri+1

Page 7: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Query Authentication: Goals

7

Query Correctnessresults do exist in the owner's database

Query Completenessno records have been omitted from the result

Page 8: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

General Approach

SD

8

ownerserversclients

A B

r1 …

… …

ri-1 4

ri 7

Authenticated Structures

Query results

Verification Object (VO)

Page 9: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Sliding Window Query

9

SELECT SUM(stock_price) FROM Stock_traceWHERE stock_name = A in last 5 MinutesSLIDES every 1 minute

Time-based Window

SELECT SUM(stock_price) FROM Stock_traceWHERE stock_name = A in last 100 TradesSLIDES every 1 trade

Tuple-based WindowThis talk concentrates on tuple-baesd window, generalizing to time-based window is in the paper.For tuple-based window, the timestamp is simply the arrivalid of the tuple.

2, A2, B 5, A9, C4, A 8, A7, C7, B… 2, D

xt+

1

Recent n tuples

xtxt-n xt-n+1

Page 10: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

One Shot Query

10

2, A2, B 5, A9, C4, A 8, A7, C7, B…

Recent n tuples

xtxt-n

SELECT SUM(stock_price) FROM Stock_traceWHERE stock_name = A in last 100 Trades

Tuple-based Window

Page 11: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Merkle Hash Tree[M89]-Amortizing Signature Cost

11

m1 m2 m3 m4 m5 m6 m7 m8

h1 h2 h3 h4 h5 h6 h7 h8

h12 h34 h56 h78

h1..4 h5..8

h1..8

Sign(h1..8,SK)

h12=H(h1|h2)

R. C. Merkle. CRYPTO, 1989

m6

h78

h5 h6

m5

h56

h5..8h1..4

h1..8

Ver(h1..8, ,pK)=valid?

Collision resistant hash function any change in the tree will lead to a different hash value for the rootDigital signature of the root no one except the ownercould produce the signatureHash function is publicly knownSingle signature to sign many messages

Page 12: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Extends to Range Query: f=2 (f is the fanout)

12

1 2 3 4 5 6 9 12

h1 h2 h3 h4 h5 h6 h7 h8

h12 h34 h56 h78

h1..4 h5..8

h1..8

Sign(h1..8,SK)

qLB(q) RB(q)

Select * from T where 5<A<11

h1..4

VO: 5, 12, h1..4,

5 12

h5..8

Page 13: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Client Side Verification

13

5 6 9 12

h5 h6 h7 h8

h56 h78

h1..4 h5..8

h1..8

Valid?Ver(h1..8,PK, )

q

Select * from T where 5<A<11

VO: 5, 12, h1..4, Query results: 6, 9

Unknown to the client

Reconstruct query subtree

Page 14: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Solution Overview

14

Sign Every Tuple (with query attribute(s) and timestamp) Expensive update cost for the data provider Expensive communication cost between server

and clients as VO size is large But it provides timely answer on a per-tuple basis

Amortize the signing cost by “proof-infusing” on a group of tuples: A delayed response, can often be tolerated.

Query with d query attributes is a query in d+1 dimension.

N: maximum window size; n: window size for a particular query; b: the delay

Page 15: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Tumbling Merkle Tree (TM-tree)

15

… …

Merkle binary search tree for every b tuples

… …

Merkle binary search tree for every b tuples

Time

Sign(hroot|t1|tb)

ti: timestamp of the ith tuple

Page 16: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

TM-tree Continues

16

Time

… …

Query A

ttribute A

Sort by A

Build Merkle tree

Page 17: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Sliding window query on the TM-tree

17

• • •

1. Initialization: Query n/b trees2. Window slides3. Incremental update: query four boundary trees

Tuples to be removed from results Tuples to be added to results

Page 18: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Query the TM-tree

18

Valu

e

Time

Q

QQuery shifts by b

False positives

Sent to clientsRemove from resultsAdded to resultsFalse positives

Page 19: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Correctness and Completeness

19

Correctness: Guaranteed by each individual Merkle tree

Completeness: Completeness in each small Merkle tree is

guaranteed by what we have studied in the first part of this talk

Overall completeness: Check that the results returned are obtained by

querying consecutive trees that fall within the query range on time dimension and they completely cover the query range on time dimension.

This is possible as two boundary tuples’ timestamps have been signed in each tree (hence these timestamps have to be included in the VO by the server).

Page 20: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Limitation of TM-tree

20

Only supports one dimensional query False positives lead to large VO size,

especially when each tuple has non-trivial size.

Page 21: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Merkle kd tree (Mkd-tree)

21

To get rid of false positives: Obviously we need a multi-dimensional indexing structure KD-tree: an excellent candidate with bounded query performance of

and to bulk-load.

A space-partition structure: partition along each dimension in turn.

)(O b )logO( bb

Page 22: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Mkd-tree and TMkd-tree

22

Incorporating Merkle tree into KD-tree: Leaf node: H(p), p is the point contained in this

node Index node u with children v, w and dividing line

lu: H(hv|hw|lu) Tumbling Merkle kd-tree (TMkd-tree)

Similar idea as it is in TM-tree, but we are using Mkd-tree as each small tree.

Boundary trees no longer introduce false positives!

Page 23: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Is this good enough?

23

Tumbling trees are good for maintaining the update to sliding window queries

They both have linear space to N and log b update cost, and

But they are expensive for answering one-shot queries (or the initialization of sliding window queries) query with window size n: have to query

n/b trees: linear in n and could be expensive for large values of n.

costquery or log kbbkbbb

Page 24: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Dyadic Merkle kd-tree (DMkd-tree): 1D queries

24

b b b b b b

2b 2b 2b

b b b

2b

b

2b

4b

N+b

4b• • •

• • •

N+b

Merkle tree

Mkd-tree

Q

2b

b

4b

b

2b

4b

N+b

b

2bDiscarded

b

Page 25: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Exponential Merkle kd-tree (EMkd-tree):Multi-dimensional queries

25

bb

4b

T0 T’0

T1

Tl

2b2bT’1

4bT’l

Materialized Mkd-tree

Non-materialized Mkd-tree

b b bnew T0

2bT’1

bT’0

bb b

2bT1

T0

2b 2b

4bT’l

Q

Page 26: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Some Experiments

26

We use real streams: World Cup Data (WC) IP traces from the AT&T network (IP)

We perform the following query: WC: Query attribute is the response size IP: Query attribute is the packet size

Hardware: 2.8GHz Intel Pentium 4 CPU Linux Machine

Page 27: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Tumbling trees: update cost

27

1. b=1000 is a sweet point2. This delay is small: in real streams it spans less than one or two seconds

Page 28: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Tumbling trees: size

28

They both have linear size (to number of tuples covered in maximal window size of N)

Page 29: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Query cost per sliding period, b=1,000: fixed sliding period as b

29

Linear scan of TM-tree at leaf level results in localitywhich greatly improves its performance

Page 30: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

VO size per sliding period, b=1,000: fixed sliding period as b

30

TM-tree incurs roughly 4γb false positives

Page 31: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

DM-kd Tree, EM-kd Tree Update Cost

31

Page 32: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

DMkd, EMkd trees: size

32

Page 33: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

One Shot Query Cost

33

Page 34: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

One Shot Query: VO size

34

Page 35: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Summary

35

All trees support aagregations TM-tree and DMkd-tree support only one-

dimensional queries TMkd-tree and EMkd-tree support multi-

dimensional queries Tumbling trees are good for maintaining

updates to sliding window queries, while DMkd-tree and Emkd-tree are good for one shot queries.

Page 36: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Thanks!

36

Questions

Page 37: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Intuition on Authenticating Aggregation Query

37

Query q

Naïve solution: answer it as a range selection query linear authentication cost k (k tuples in the range)!Find the canonical cover: authentication cost log k !

Page 38: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Public key digital signature schemes

38

Sender

RecipientKeyGen (SK, PK)

m

Ver(m, PK, ) valid?m SK

Sign(m, SK)

Insecure Channel

Page 39: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Merkle Tree: Verifying A Single Value

39

SELECT Airline FROM Flights WHERE price = $600

t1 t2 t3 t4

h3 h4

h12 h34

hroot

$250 $320 $410 $600

320 600

410

Query result:

t4

Verification Object:h3

h12

Ver(hroot , , PK)=valid?

Sibling hash values along the query path

apply merkle tree to database authentication [DGMS03] P. Devanbu, M. Gertz, C. Martel, and S. G. Stubblebine Journal of Computer Security 2003

Page 40: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Reduce S/C communication Cost [MNT04]

40

Aggregation Signature: Condensed RSA

m1

1

mk

k

m1

mk

=combine(1,…, k)

Overhead: computation cost of modular multiplication with big modular base number, close to 100 s

E. Mykletun, M. Narasimha, and G. Tsudik. NDSS'04

Page 41: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Condensed RSA [MNT04]

41

KeyGen:

• Choose two large primes, p and q, pq• Set n=pq• Compute (n)=(p-1)(q-1)• Choose e s.t. 1<e<(n) and e is coprime to (n)• Compute d s.t. de1 (mod (n))

(d, n) is the secret key and (e, n) is the public key

Page 42: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Condensed RSA [MNT04]

42

Sign:• Given mi, compute hi=H(mi)• Compute

• Compute

nhdii mod

nk

ii mod

1

Verify:• Given mi, compute hi=H(mi)• Check that:

nhk

ii

e mod1

Page 43: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

DMkd-tree vs. EMkd-tree

43

cost metrics

DMkd-tree EMkd-tree

Space

Update cost

Singing operations

One-shot query cost

One-shot VO size

b

NN log

Nlog

b

1

kbb

nn loglog

bb

nn loglog

N

Nb

Nloglog

b

1

kbn

bn

Page 44: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Tool 1: Collision-Resistant Hash Functions

44

Example SHA1: variable input size 20 bytes

(can also plug in any newer replacement) Observations:

Computation cost: 2-3 s (for up to 500 bytes input) Storage cost: 20 bytes

x1 x2

H H

hard to find collision

Page 45: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Tool 2: Public Key Digital Signature Schemes

45

Formally defined by [GMR88]

The message has not been changed in any way The message is indeed from the sender (corresponding to the public

key) No one except the secret key owner could produce a signature

One such scheme: RSA [RSA78]

Observations Computation cost: about 3-4 ms for signing and more

than 100 s for verifying Storage cost: 128 bytes

S. Goldwasser S. Micali R. Rivest SIAM Journal on Computing 1988. R. Rivest A. Shamir L. Adleman, Commun. ACM 1978

Page 46: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Problem 3: Omission

46

SD

Select * from T where 5<A<11

A B

r1 …

… …

ri-1 4

ri 7

ri+1 9

ri+2 11

A B

r1 …

… …

ri-1 4

ri 6

Ri 7

ri+1 9

ri+2 11

Returns 7,9

owner

server

client

Update

Page 47: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Roadmap

47

Solution overview Efficient authentication of sliding window

queries when window slides Efficient authentication of one shot queries

(also the sliding window query initialization): Experiment Conclusion

Page 48: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Is this good enough?

48

Tumbling trees are good for maintaining the update to sliding window queries

They both have linear space to N and log b update cost, and

But they are expensive for answering one-shot queries (or the initialization of sliding window queries) query with window size n: have to query

n/b trees: linear in n and could be expensive for large values of n.

costquery or log kbbkbbb

Page 49: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

Query cost per sliding period, b=1,000: fixed query selectivity as 0.1

49

Query upto 2/b+2 boundary trees

Page 50: Proof-Infused Streams: Authenticating Sliding Window Queries on Data Streams

VO size per sliding period, b=1,000: fixed query selectivity as 0.1

50

TM-tree incurs roughly (2/b+2)b false positives