University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012....

16
Application Scenarios Intuitions Evaluation Conclusions Location-based Matching in Publish/Subscribe Revisited Mohammad Sadoghi and Hans-Arno Jacobsen University of Toronto December 2012 Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 1 / 16

Transcript of University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012....

Page 1: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Location-based Matching in Publish/Subscribe

Revisited

Mohammad Sadoghi and Hans-Arno Jacobsen

University of Toronto

December 2012

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 1 / 16

Page 2: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 2 / 16

Page 3: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Computational Advertising (A Billion-dollar Industry)

Broker

Advertiser

Online User

Advertising Campaign

car=BMW

(latitude=43.6481)wt=0.5

year=2008

model=X3

(longitude=-79.4042)wt=0.5

(age=25)wt=0.1

SonySears

AmazonAdvertisement (BE):(latitude > 42)

wt=0.6

(longitude > -80)wt=0.6

(age < 32)wt=0.2

(price = 150)wt=0.1

User Profiles

Clickstream

Advertiser

Subscriptions

Events Events

“BMW X3 2008”(price<235)wt=0.2

Ads

(Most Relevant) Ads

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 3 / 16

Page 4: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Application Scenarios

1 Computational advertising (targeted advertising)

2 Computational finance (algorithmic trading)

3 Intrusion detection (deep packet inspection)

4 Real-time data analysis (data analytics)

5 Emerging mobile applications in co-spaces (location-based services)

Problem Statement

To continuously evaluate a set of predefined patterns/specifications(subscriptions) over incoming event stream.

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 4 / 16

Page 5: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Challenges Derived from Application Scenarios

Key matching problem challenges addressed in this work

1 Retrieve only the most relevant subscriptions for given a event.

2 Handle subscriptions with expressive operators (overdiscrete/continuous domains) that impose conditions only on a fewdimensions, resulting in a high degree of overlap among subscriptions.

3 Scale to large collections of subscriptions with thousands ofdimensions.

4 Sustain high matching rates of events in presence of frequentinsertions and deletions of subscriptions.

5 Adapt to skewed workload distributions (self-adjusting mechanism),i.e., avoid structure deterioration.

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 5 / 16

Page 6: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

BE-Tree Family Core Design (Two-phase Space-cutting)

c

p

l

partition-node

cluster-node

leaf-node

c

c

l

l

p p

p-directory

p-directory

c-directory

c

c-directory

l

c

p p

l

c

Partitioning

Clustering

The two-phase space-cutting technique consists of

1 space partitioning: a global structuring to determine the best splitting dimension2 space clustering: a local structuring for each partition to determine the best grouping of

expressions with respect to the expressions’ range of values of the chosen partition

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 6 / 16

Page 7: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Intuition Behind the Two-phase Space-cutting Technique

SUBSCRIPTION SPACE

Y-AXIS

SPACE PARTITIONING

SPACE CLUSTERING

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 7 / 16

Page 8: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

BE*-Tree Novel Features (Hierarchical Top-k Matching)

score

K-index [VLDB'09]

1st-index

2nd-index

kth-index

BE*-Tree

BE*-Tree continuously refining upper bound score during the matching process.

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 8 / 16

Page 9: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Experimental Evaluation

PC-based Algorithms

1 A-PCM: Parallel BE-Tree (Sadoghi, Jacobsen)2 BE*: BE*-Tree (Sadoghi, Jacobsen. ICDE’12)3 BE: BE-Tree (Sadoghi, Jacobsen. SIGMOD’11)

4 GR: IBM Gryphon (Aguilera et al., PODC’99)5 P: Propagation Algorithm (Fabret et al. SIGMOD’01)6 k-ind: k-index (Whang et al. VLDB’09)7 SIFT: Counting Algorithm (Yan et al. TODS’94)8 SCAN: Sequential Scan

GPU-based Algorithm

1 CLCB: Cuda Location-aware Content-Based Matcher (Cugola, Margara)

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 9 / 16

Page 10: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Workload Configurations

Table: Synthetic and Real Workload Properties

Wor

kloa

dS

ize

Nu

mb

erof

Dim

ensi

ons

Dim

ensi

onC

ard

inal

ity

Pre

dic

ate

Sel

ecti

vity

Dim

ensi

onS

elec

tivi

ty

Su

b/E

ven

t

Siz

e

%E

qu

alit

yP

red

Mat

chP

rob

DB

LP

(Au

thor

)

DB

LP

(Tit

le)

Mat

chP

rob

(Au

thor

)

Mat

chP

rob

(Tit

le)

Lo

cati

onW

orkl

oad

Par

alle

lM

atch

ing

Size 100K-1M 1M 100K 100K 100K 100K 1M 1M 100-760K 50-250K 400K 150 2.5M 5M

Number of Dim 400 50-1400 400 400 400 400 400 400 677 677 677 677 100 128

Cardinality 48 48 48-150K 48 2-10 48 48 48 26 26 26 26 65K 48

Avg. Sub Size 7 7 7 7 7 5-66 7 7 8 35 8 30 4 7

Avg. Event Size 15 15 15 15 15 13-81 15 15 8 35 16 43 4 15

Pred Avg. Range Size % 12 12 12 6-50 — 12 12 12 — — 12 12 — 12

% Equality Pred 0.3 0.3 0.3 0.3 1.0 0.3 0.2-1.0 0.3 1.0 1.0 0.3 0.3 0.25 0.4

Op Class Med Med Med Med Min Med Med Lo-Hi Min Min Lo-Hi Lo-Hi Hi Hi

Match Prob % 1 1 1 1 — 1 1 0.01-9 — — 0.01-9 0.01-9 ≈ 0 ≈ 0-1

The experimental results were verified by the SIGMOD’11 repeatability committee.

BEGenOur comprehensive Boolean expression workload generator: http://msrg.org/datasets/BEGen.

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 10 / 16

Page 11: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Effect of Workload Size on Matching (Log Scale)

Table: Comparing BE-Tree (PC) and CLCB (GPU)

Workload Type BE-Tree 1.1 BE-Tree 1.3 CLCB

without location 0.081 ms 0.045 ms N/A

with location 0.144 ms 0.067 ms 0.306 ms

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 11 / 16

Page 12: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Effect of Workload Size on Matching (Log Scale)

0.5

1

2

4

8

16

32

64

128

256

100K300K

500K700K

900K1MM

atc

hin

g T

ime/E

vent (m

s)

Varying Number of Subscriptions

BE-BBEGR

Pk-IndSIFT

SCAN

(c) Uniform: Workload Size

1

2

4

8

16

32

64

128

256

512

1024

100K300K

500K700K

900K1MM

atc

hin

g T

ime/E

vent (m

s)

Varying Number of Subscriptions

BE-BBEGR

Pk-IndSIFT

SCAN

(d) Zipf: Workload Size

Figure: Varying Workload Size

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 12 / 16

Page 13: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Effect of Matching Prob. on Top-k Matching (Log Scale)

1

2

4

8

16

32

64

128

256

512

0.001

0.010.1

1 5 9

Matc

hin

g T

ime/E

vent (m

s)

Varying Match (%); Sub=1M; Top-k Alg

BE*BE*(5)BE*(1)

k-Indk-Ind(1)

(a) Zipf Workload

0.5

1

2

4

8

16

32

64

128

0.001

0.010.1

1 5 9

Matc

hin

g T

ime/E

vent (m

s)

Varying Match (%); Sub=400K; Top-k Alg

BE*BE*(5)BE*(1)

k-Indk-Ind(1)

(b) DBLP Author Workload

Figure: Varying % of Matching Probability Predicates

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 13 / 16

Page 14: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Effect of Parallel Matching (Log Scale)

8

16

32

64

128

256

512

1024

2048

4096

8192

0.10.3

0.50.7

0.91.0

Avg

. T

hro

ug

hp

ut/

Se

co

nd

Varying Overlap Probablity; Sub=5M

BE-TreeBitmapParallelA-PCM

(a) Matching Probability m = 1%

16

64

256

1024

4096

16384

65536

262144

0.10.3

0.50.7

0.91.0

Avg

. T

hro

ug

hp

ut/

Se

co

nd

Varying Overlap Probablity; Sub=5M

BE-TreeBitmapParallelA-PCM

(b) Matching Probability ≈ 0%

Figure: Varying % of Stream Similarity

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 14 / 16

Page 15: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

Conclusions

1 BE-Tree is a major step forward in addressing notable challenges such as scalability,expressiveness, dynamic construction and adaptation, by proposing a novel, self-adjustingindex structure [SIGMOD’11].

2 BE-Tree also solves the problem of location-based matching (contrary to the claim thatspecialized algorithm is a must for location-based matching) [SIGMOD’11].

3 BE-Tree provably outperforms existing prominent approaches presented in the scientificliterature. Our results were verified by the SIGMOD’11 repeatability committee.

4 BE*-Tree has potential to impact the design of computational advertising engines, inwhich click streams and user profile information is matched against advertisementinventory to serve the most advertisements [ICDE’12].

5 Our hardware acceleration can play an essential role in the design of high-throughput andlow-matching latency requiring event processing engines for real-time data analysis, e.g.,algorithmic trading [VLDB’10, DEBS’11, DaMoN’11, ICDE’12].

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 15 / 16

Page 16: University of Toronto - MSRGmsrg.org/publications/presentations/2012/moMW12-Location... · 2012. 12. 3. · 1 Computational advertising (targeted advertising) 2 Computational nance

Application Scenarios Intuitions Evaluation Conclusions

ReferencesAmer Farroukh, Mohammad Sadoghi, and Hans-Arno Jacobsen.Towards vulnerability-based intrusion detection with event processing.In Proceedings of the 5th ACM international conference on Distributed event-based system, DEBS’11, pages 171–182, New York, New York, USA, 2011. ACM.

Gianpaolo Cugola and Alessandro Margara.High-performance location-aware publish-subscribe on GPUs.In Proceedings of the ACM/IFIP/USENIX 13th International Middleware Conference, volume 7662 of Lecture Notes in Computer Science, pages 312–331, Montreal, QC, Canada, 2012. Springer.

Mohammad Sadoghi.Towards an extensible efficient event processing kernel.In Proceedings of the on SIGMOD/PODS 2012 PhD Symposium, PhD ’12, pages 3–8, Scottsdale, Arizona, USA, 2012. ACM.

Mohammad Sadoghi, Ioana Burcea, and Hans-Arno Jacobsen.GPX-Matcher: a generic Boolean predicate-based XPath expression matcher.In Proceedings of the 14th International Conference on Extending Database Technology, EDBT/ICDT ’11, pages 45–56, Uppsala, Sweden, 2011. ACM.

Mohammad Sadoghi and Hans-Arno Jacobsen.Indexing Boolean expression over high-dimensional space.Technical Report CSRG-608, University of Toronto’10, 2010.

Mohammad Sadoghi and Hans-Arno Jacobsen.BE-Tree: an index structure to efficiently match Boolean expressions over high-dimensional discrete space.In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, SIGMOD ’11, pages 637–648, Athens, Greece, 2011. ACM.

Mohammad Sadoghi and Hans-Arno Jacobsen.Relevance matters: Capitalizing on less (top-k matching in publish/subscribe).In IEEE 28th International Conference on Data Engineering, ICDE’12, pages 786–797, Arlington, Virginia, USA, 2012. IEEE Computer Society.

Mohammad Sadoghi, Hans-Arno Jacobsen, Martin Labrecque, Warren Shum, and Harsh Singh.Efficient event processing through reconfigurable hardware for algorithmic trading.Proceedings of the VLDB Endowment, 3(2):1525–1528, 2010.

Mohammad Sadoghi, Rija Javed, Naif Tarafdar, Harsh Singh, Rohan Palaniappan, and Hans-Arno Jacobsen.Multi-query stream processing on fpgas.In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, ICDE ’12, pages 1229–1232, Washington, DC, USA, 2012. IEEE Computer Society.

Mohammad Sadoghi, Harsh Singh, and Hans-Arno Jacobsen.fpga-ToPSS: line-speed event processing on FPGAs.In Proceedings of the 5th ACM international conference on Distributed event-based system, DEBS ’11, pages 373–374, New York, New York, USA, 2011. ACM.

Mohammad Sadoghi, Harsh Singh, and Hans-Arno Jacobsen.Towards highly parallel event processing through reconfigurable hardware.In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN ’11, pages 27–32, Athens, Greece, 2011. ACM.

Mohammad Sadoghi (University of Toronto) Location-based Matching Middleware 2012 16 / 16