DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding...

DMSN 2011Cagri Balkesen & Nesime Tatbul

Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams

Talk Outline

• Intro & Motivation• Stream Partitioning Techniques

– Basic window partitioning– Batch partitioning– Pane-based partitioning

• Ring-based Query Evaluation• Experimental Evaluation• Conclusions & Future Work

cagri.balkesen@inf.ethz.ch 2

Intro & Motivation

Architectural Overview

• Classical Split-Merge pattern from Parallel DBs• Adjustable parallelism level, d• QoS on max latency & order

Query nodes

Splitstage

Split node

Query Mergestage

Merge node

inputstream

outputstream

QoS: latency < 5 seconds disorder < 3 tuples

Related Work: How to Partition?

• Content-sensitive– FluX: Fault-tolerant, load balancing Exchange [1,2]– Use group-by values from the query to partition– Need explicit load-balancing due to skewed data

• Content-insensitive– GDSM: Window-based parallelization (fixed-size tumbling wins) [3]– Win-Distribute: Partition at window boundaries– Win-Split: Partition each win into equi-length subwins

• The Problem:– How to handle sliding windows?– How to handle queries without group-by or a few groups?

[1] Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE‘03[2] Highly-Available, Fault-Tolerant, Parallel Dataflows, SIGMOD ‘04[3] Customizable Parallel Execution of Scientific Stream Queries, VLDB ‘05

Stream Partitioning Techniques

• Independently processable chunking– Window aware splitting of the stream

• Each window has an id & tuples are marked– (first-winid, last-winid, is-win-closer)

• Tuples are replicated for each of their windows

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

w = 6 units, s = 2 units, Replication = 6/2 = 3

SplitW2

Approach 1: Basic Sliding Window Partitioning

The Problem with Basic sliding window partitioning:• Tuples belong to many windows depending on slide• Excessive replication of tuples for each window• Increase in output data volume of split

Approach 1: Basic Sliding Window Partitioning

t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

SplitW2

w = 6 units, s = 2 units, Replication = 6/2 = 3

Approach 2: Batch-based Partitioning

• Batch several windows together to reduce replication• “Batch-window”: wb

= w+(B-1)*s ; sb = B*s– All the tuples in a batch go to the same partition– Only tuples overlapping btw. batches are replicated

• Replication reduced to wb/sb partitions instead of w/st1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .

B2w = 3, s = 1B = 3 wb = 5, sb = 3Replication : 3 5/3

Definitions:w : window-sizes : slide-sizeB : batch-size

The Panes Technique

• Divide overlapping windows into disjoint panes• Reduce cost by sub-aggregation and sharing• Each window has w/gcd(w,s) panes of size gcd(w,s)• Query is decomposed: pane-level (PLQ) & window-level (WLQ)

queries

p1 p2 p3 p4 p5 p6 p7 p8 . . .

[1] No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams, SIGMOD Record ‘05cagri.balkesen@inf.ethz.ch 10

Approach 3: Pane-based Partitioning• Mark each tuple with pane-id + win-id

– Treat panes as tumbling window with wp = sp = gcd(w,s)

• Route tuples to a node based on pane-id• Nodes compute PLQ with pane tuples• Combine all PLQ results of a window to form WLQ

– Need for an organized topology of nodes– We propose organization of nodes in a ring

w = 6 units, s = 2 units

Window1

Pane1 Pane3

Window2Pane5

Window3Pane6 Pane7

14131211

…P9P8

P3P2P1

…P11P10

…P13

Input Source

Ring-based Query Evaluation

• High amount of pipelined result sharing among nodes

• Organized communication topology

W = 6, S = 4 tuplesP = GCD(6,4) = 2 tuples

Assignment of Windows and Panes to Nodes

• All pane results only arrive from predecessors• Pane results sent to successor is only local panes

– Each node is assigned n consecutive windows– Min n st.

Definitions:ww : win-size in # of panessw : slide-size in # of panes

Flexible Result Merging

[1] Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS ‘04

Fully-ordered

k-ordered: k-ordering constraint [1], certain disorder allowed

Defn: For any tuple s, s’ arrives at least k+1 tuples after s st. s’.A ≥ s.A

* k = 0

Experimental Evaluation

• Implementation of techniques in Borealis• Workload adapted from Linear Road Benchmark

– Slightly modified segment statistics queries– Basic aggregation functions with different window/slide

ratios

Scalability of Split Operator

• Pane-partitioning: cost & tput constant regardless of overlap ratio• Window & batch –partitioning: cost ↑ and tput↓ as overlap ↑• Excessive replication in window-partitioning is reduced by batching

window-size/slide ratio (window overlap)

Scalability of Partitioning Techniques

• Pane-based scales close to linear until split is saturated– per tuple cost is constant

• Window & batch based: exteremely high replication– Split is not saturated, but scales very slowly

* w/s = overlap ratio = 100

Summary & Conclusions

• Pane-partitioning is the choice of partitioning– Avoids tuple replication– Incurs less overhead in split and aggregate– Scales close to linear

1) Window-based 2) Batch-based 3) Pane-based

Ongoing & Future Work

• Generalization of the framework• Support for adaptivity during runtime• Extending complexity of query plans• Extending performance analysis & experiments

Thank You!

DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding...

Documents

Transcript of DMSN 2011 Cagri Balkesen & Nesime Tatbul Scalable Data Partitioning Techniques for Parallel Sliding...

Myths and Realities of Sensor Network Data Management€¦ · DMSN’07 - Myths and Realities of Sensor Networks – Vienna 24.9.07 The Role of sensor networks today The early vision

Transactional Stream Processing - MIT · Transactional Stream Processing Irina Botan1, Peter M. Fischer2, Donald Kossmann1, Nesime Tatbul1 1ETH Zurich, Switzerland 2Universität Freiburg,

National Report on Meteorological Observing Network ...Some of main activities of DMSN : - Quality & quantity control & monitoring of observed meteorological data - Monitoring the

Package ‘sn’ - R · 2020-05-26 · dsn, dst, dmsn and others alike; these functions existed also in ‘version 0’ and their working is still very much the same (not necessarily

A Geometrical-based 3D Model for Fixed MIMO BS-RS Channels › papers › 2015 › CP1544wc.pdf7r (dpsn -dqsn + dlsn -dmsn) -j27rfnt} (5) where dqsn and dmsn denote the distance between

DOMINIO. - Canalplast · dmsn dm060 dmsn dm060 dmsn dm060 dmsn dm060 dmsn dm060 dmsn dmd dmd dm060 dmsn dmae dmc dm060 dmsn dmso dmso dmc dm180 dm281 dm280 dm281 dmcp dm055 dm050

COORDINATION CHEMISTR OF Y TRANSITION METALS WITH ... · measurements i DMSn shoO thw non-ionie natur oc thfe e . complexes. O thn basie os thesf studiee a octahedrans l geometry

Ariadne: Managing Fine-Grained Provenance on Data Streamspeople.csail.mit.edu/tatbul/publications/debs13_ariadne.pdf · an overview of our approach for adding provenance generation

Towards Dynamic Data Placement for Polystore Ingestionpeople.csail.mit.edu/tatbul/publications/sstore_birte17.pdfmust be collected, stored, and analyzed in a reliable and scalable

Justin Stan Gottschlich Zdonik - NeurIPS04-15-30...32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada Nesime Tatbul Mejbah Alam Justin Gottschlich

Supplemental Hospital Plans - DMSn $100 daily n $200 daily IN HOSPITAL STAY: 365 days per year coverage of the daily benefit (when charged room and board). OTHER BENEFITS: n 31 days

Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier Turek, … · 2019. 12. 6. · Mejbah Alam, Justin Gottschlich, Nesime Tatbul, Javier Turek, Timothy Mattson, Abdullah Muzahid

S-Store: Streaming Meets Transaction Processing · 2015-03-11 · S-Store: Streaming Meets Transaction Processing John Meehan1, Nesime Tatbul2;3, Stan Zdonik1, Cansu Aslantas1, Ugur

Commun Nonlinear Sci Numer - Homepage - CMU€¦ · This co-activation of dMSN and iMSN populations has challenged the tra- ditional model of a strict isomorphism between dMSN activity

Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.

SPACE SUIT EXTRAVEHICULAR Joseph J. Kosmo B. 14, 1987 … · steel fabric was used as an outer fabric coverlayer along with dmsn layers each of aluminized H-film (Kapton) and fiberglass

SECRET: A Model for Analyzing the Execution Semantics of ... · SECRET: A Model for Analyzing the Execution Semantics of Stream Processing Engines Nesime Tatbul ETH Zurich

Distinct value encoding in striatal direct and indirect ...while the dMSN population showed no significant overlap (χ. 2 =0.499, p=0.480). While recent work has emphasized similar

The Three Pillars of Machine Programmingpeople.csail.mit.edu/tatbul/publications/mapl18.pdf · produce as good programs as human coders (but without the errors)” [8]. The broader

SECRET: A Model for Analysis of the Execution …people.csail.mit.edu/tatbul/publications/maxstream_vldb...SECRET: A Model for Analysis of the Execution Semantics of Stream Processing