1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.
-
Upload
shauna-richard -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.
![Page 1: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/1.jpg)
1
Shetal Shah, IITB
Dissemination of Dynamic Data:Semantics, Algorithms, and Performance
![Page 2: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/2.jpg)
2
More and more of the informationwe consumeis dynamically constructed…
Ad Component
Headline Component
Headline Component
Headline Component
Headline Component
Personalized Component
Navig
ati
on C
om
ponent
![Page 3: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/3.jpg)
3
Buying a camera? Track auctions…
![Page 4: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/4.jpg)
4
Dynamic Data
Data gathered by (wireless sensor) networks Sensors that monitor light, humidity, pressure,
and heat Network traffic passing via switches
Sports Scores Score changes by 5 points
Financials Rice price changes by Rs. 10 compared to
previous day Total value of stock portfolio exceeds $ 10,000
rapid and unpredictable changestime critical, value critical
used in on-line decision making
![Page 5: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/5.jpg)
5
Continual Queries
A CQ is a standing query coupled with a trigger/select condition
CQ stock_monitor SELECT stock_price FROM quotes ; WHEN stock_price – prev_stock_price > $0.5
CQ RFP_tracker: SELECT project_name, contact_info FROM RFP_DB; WHERE skill_set_required ⋐ available_skills
Not every change at a source leads to a change in the result of the query
![Page 6: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/6.jpg)
6
Generic Architecture
Data sources
Proxies/caches/Data aggreators
End-hosts
servers
sensors
wired host
mobile host
Netw
ork
Netw
ork
![Page 7: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/7.jpg)
7
Where should the queries execute ?
At clients can’t optimize across clients, links
At source (where changes take place) Advantages
Minimum number of refresh messages, high fidelity Main challenge
Scalability Multiple sources hard to handle
At Data Aggregators -- DAs/proxies -- placed at edge of network Advantages
Allows scalability through consolidation, Multiple data sources Main challenge
Need mechanisms for maintaining data consistency at DAs
![Page 8: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/8.jpg)
8
Coherency of Dynamic Data Strong coherency
The client and source always in sync with each other
Strong coherency is expensive! Relax strong coherency: - coherency
Time domain: t - coherency The client is never out of sync with the source
by more than t time units eg: Traffic data not stale by more than a minute
Value domain: v - coherency The difference in the data values at the client
and the source bounded by v at all times eg: Only interested in temperature changes
larger than 1 degree
![Page 9: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/9.jpg)
9
Coherency Requirement (c )
temperature, max incoherency = 1 degree
ctStU )()(
SourceS(t)
ClientU(t)
![Page 10: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/10.jpg)
10
T
at ServerData/Query Value at client
Violation
Bounds
![Page 11: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/11.jpg)
11
Source pushes interesting changes
+ Achieves v - coherency
+ Keeps network overhead minimum-- poor scalability (has to maintain state and keep connections
open)
Source DA Userpush push
![Page 12: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/12.jpg)
12
Pull – interesting changes
Pull after Time to Live (TTL) Time To Next Refresh (TTR / TNR)
+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with
respect to state space and computation
Need to estimate when a change of interest will happen
Heavy polling for stringent coherence requirement or highly dynamic data
Network overheads higher than for Push
Server Repository UserPull
![Page 13: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/13.jpg)
13
Complementary Properties
Overheads (Scalability) Algorithm Resiliency Temporal Coherency (fidelity)
Comm..
Comp. State Space
Push Low High Low High High Pull
High Low (for tight coherency)
High (for loose coherency)
High Low Low
![Page 14: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/14.jpg)
14
Dynamic Content Distribution Networks
To create a scalable
content dissemination network (CDN)
for streaming/dynamic data.
Metric:
Fidelity: % of time coherency requirement is met
![Page 15: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/15.jpg)
15
Dissemination Network: Example
Data Set: p, q, r Max Clients : 2
Source
p: 0.2, q : 0.2 r: 0.2
p: 0.4, r: 0.3 q: 0.3
AB
D C
![Page 16: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/16.jpg)
16
Challenges – I
Given the data and coherency needs of repositories, how should repositories cooperate to satisfy these
needs?
How should repositories refresh the data such that coherency requirements of dependents are satisfied?
How to make repository network resilient to failures? [VLDB02, VLDB03, IEEE TKDE]
![Page 17: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/17.jpg)
17
Challenges - II
Given the data and the coherency available at repositories in the network,
how to assign clients to repositories? Given the data and coherency needs of clients in the
network, what data should reside in each repository and at what coherency? If the client requirements keep changing, how and when should the repositories be
reorganized ? [RTSS 2004, VLDB 2005]
![Page 18: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/18.jpg)
18
Dynamics along three axes
Data is dynamic, i.e., data changes rapidly and unpredictably
Data items that a client is interested in also change dynamically Network is dynamic, nodes come and go
![Page 19: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/19.jpg)
19
Data Dissemination
![Page 20: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/20.jpg)
20
Data Dissemination
Different users have different coherency req for the same data item.
Coherency requirement at a repository should be at least as stringent as that of the dependents.
Repositories disseminate only changes of interest.
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3
q: 0.4
q: 0.3
A B
D C
Client
![Page 21: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/21.jpg)
21
Data dissemination -- must be done with care
Source Repository P Repository Q
3.0Pc 5.0Qc
should prevent missed updates!
1.2
1.5
1
1.41.4
1.7
1.4
1 1 1
1
1.7 1.7
11
![Page 22: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/22.jpg)
22
Source Based Dissemination Algorithm
For each data item, source maintains unique coherency requirements of
repositories the last update sent for that coherency
For every change, source finds the maximum coherency for which it must be disseminated tags the change with that coherency disseminates (changed data, tag)
![Page 23: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/23.jpg)
23
Source Based Dissemination Algorithm
Source Repository P Repository Q
3.0Pc 5.0Qc
1.2
1.5
1
1.41.4
1.7
1 1 1
1
1.5
1
1.51.5 1.5
![Page 24: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/24.jpg)
24
QQP cxx
Repository Based Dissemination Algorithm
A repository P sends changes of interest to the dependent Q if
Pc
![Page 25: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/25.jpg)
25
Repository BasedDissemination Algorithm
Source Repository P Repository Q
3.0Pc 5.0Qc
1.2
1.5
1
1.41.4
1.7
1.4
1 1 1
1
1.7 1.7
1.41.4
![Page 26: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/26.jpg)
26
Building the content distribution network
Choose parents for repositories
such that overall fidelity observed by the repositories is high
---reduce communication and computational delays..
![Page 27: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/27.jpg)
27
If parents are not chosen judiciously
It may result in Uneven distribution of
load on repositories. Increase in the
number of messages in the system.
Increase in loss in fidelity!
Source
p:0.2, q:0.2 r:0.2
p:0.4, r: 0.3q: 0.3
AB
D C
![Page 28: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/28.jpg)
28
DiTA
Repository N needs data item x If the source has available push
connections, or the source is the only node in the dissemination tree for x
N is made the child of the source Else
repository is inserted in most suitable subtree where
N’’s ancestors have more stringent coherency requirements
N is closest to the root
![Page 29: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/29.jpg)
29
Most Suitable Subtree?
l: smallest level in the subtree with coherency requirement less stringent than N’’s.
d: communication delay from the root of the subtree to N.
smallest (l x d ): most suitable subtree.
Essentially, minimize communication andcomputational delays!
![Page 30: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/30.jpg)
30
Example
Source
Initially the network consists of the source.
q: 0.2 A
A and B request service of q with coherency requirement 0.2
q: 0.2B
C requests service of q with coherency requirement 0.1
q: 0.1 C
q: 0.2 A
![Page 31: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/31.jpg)
31
Example
D requests service of q with coherency requirement 0.2
Source
q: 0.1q: 0.2
q: 0.2 q: 0.2
q: 0.3 q: 0.5 q: 0.4
q: 0.3 q: 0.2 q: 0.3
q: 0.3 q: 0.5
q: 0.4
![Page 32: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/32.jpg)
32
Example
D requests service of q with coherency requirement 0.2
Source
q: 0.1q: 0.2
q: 0.2 q: 0.2
q: 0.3 q: 0.5 q: 0.4
q: 0.3 q: 0.2 q: 0.2
q: 0.3
q: 0.4
q: 0.5
q: 0.3
D
![Page 33: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/33.jpg)
33
Resiliency
![Page 34: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/34.jpg)
34
Handling Failures in the Network
Need to detect permanent/transient failures in the network and to recover from them
Resiliency is obtained by adding redundancy Without redundancy,
failures loss in fidelity Adding redundancy can increase cost possible loss of fidelity!
Handle failures such that cost of adding resiliency is low!
![Page 35: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/35.jpg)
35
Passive/Active Failure Handling
Passive failure detection: Parent sends I’m alive messages at the end
of every time interval. what should the time interval be?
Active failure handling: Always be prepared for failures. For example: 2 repositories can serve the
same data item at the same coherency to a child.
This means lots of work greater loss in fidelity.
![Page 36: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/36.jpg)
36
Middle Path
A backup parent B is found for each data item that the repository needs
Let repository R want data item x with coherency c.
P
R
B serves R with coherency k × c
k × cc
B
At what coherency should B serve R ?
![Page 37: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/37.jpg)
37
If a parent fails
Detection: Child gets two consecutive updates from the backup parent with no updates from the parent
B
R
k x cc
P
c Recovery: Backup
parent is asked to serve at coherency c till we get an update from the parent
![Page 38: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/38.jpg)
38
Adding Resiliency to DiTA
A sibling of P is chosen as the backup parent of R.
If P fails, A serves B with coherency c change is local. If P has no siblings, a sibling
of nearest ancestor is chosen.
Else the source is made the backup parent.
B
R
k x cc
P
A
![Page 39: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/39.jpg)
39
Markov Analysis for k
Assumptions Data changes as a random walk along
the line The probability of an increase is the
same as that of a decreaseNo assumptions made about the unit of
change or time taken for a changeExpected # misses for any k <= 2 k2 – 2
for k = 2, expected # misses <= 6
![Page 40: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/40.jpg)
40
Experimental Methodology
Physical network: 4 servers, 600 routers, 100 repositories Communication delay: 20-30 ms Computation delay: 3-5 ms Real stock traces: 100-1000 Time duration of observations: 10,000 s Tight coherency range: 0.01 to 0.05 loose coherency range: 0.5 to 0.99
![Page 41: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/41.jpg)
41
Failure and Recovery Modelling
Failures and recovery modeled based on trends observed in practice
Analysis of link failures in an IP backbone by G. Iannaccone et al
Internet Measurement Workshop 2002
Recovery:10% > 20 min 40% > 1 min & < 20 min 50% < 1 min
Trend for time between failure:
![Page 42: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/42.jpg)
42
In the Presence of Failures, Varying Recovery Times
Addition of resiliency does improve fidelity.
![Page 43: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/43.jpg)
43
In the Presence of Failures, Varying Data Items
Increasing Load
Fidelity improves with addition of resiliency even for large number of data items.
![Page 44: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/44.jpg)
44
In the Absence of Failures
Increasing Load
Often, fidelity improves with addition of resiliency, even in the absence of failures!
![Page 45: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/45.jpg)
45
Beyond Resiliency
Scheduling Assigning clients to repositories Balancing load in the network Handling queries
![Page 46: 1 Shetal Shah, IITB Dissemination of Dynamic Data: Semantics, Algorithms, and Performance.](https://reader035.fdocuments.net/reader035/viewer/2022062807/5697c00b1a28abf838cc8705/html5/thumbnails/46.jpg)
46
Acknowledgements
Allister Bernard & Vivek Sharma S. Dharmarajan Shweta Agarwal T. Siva Prof. C. Ravishankar Prof. Sohoni and Prof. Rangaraj Prof. S. Sudarshan Prof. Krithi Ramamritham