Future and Emerging Technologies (FET)
description
Transcript of Future and Emerging Technologies (FET)
Future and Emerging Technologies (FET)
Future and Emerging Technologies (FET)
The roots of innovationThe roots of innovationThe roots of innovationThe roots of innovation
Proactive initiative on:
Global Computing (GC)
Proactive initiative on:
Global Computing (GC)
DBGlobe IST-2001-32645
3rd Meeting
Athens, November 29, 2002
UoI Presentation
Directories :: Resource Location
Data Delivery
Outline
Summaries for Resource Discovery
Maintain summaries (e.g., Bloom filters) to assist the search for a service (resource)
Directories for XML metadata and appropriate summaries
Resource Discovery
Resource Discovery
Motivation: (DBGlobe) Large Scale and Dynamic Environment
How to locate a resource
System Model:
Sites that store hierarchical descriptions of services (in XML) or XML documents
Path queries
Limitations (so far):
We consider only XML-Trees (no cycles)
No value queries Joint work with Georgia Koloniari
Resource Discovery
<xml><device> <printer> <color></color> <postscript></postscript> </printer> <camera> <digital></digital> </camera></device>
device
printer
color postscript digital
camera
An example XML-description and the corresponding XML-tree
Path queries
From the root: //device/printer
Partial: camera/digital
*
Overall Approach: maintain Bloom-based indexes to check whether a document (item) exists at a site (peer)
Resource Discovery
Bloom-Filters
Allocate a vector v of m bits, initially all set to 0
Choose k independent hash functions, h1, h2, … , hk, each with range {1,…, m}.
For each element a A, set the bits at positions h1(a), h2(a), . . . , hk(a) to 1.
(A particular bit might be set to 1 multiple times)
Given a query for b, check the bits at positions h1(b), h2(b), . . . , hk(b).
If any is 0, then certainly b is not in the set A.
Otherwise we assume that b is in the set (“false positive”).
test if an element b exists in a set A = {a1, a2,…, an} of n elements (keys)
1
1
1
1
Element a
h1(a) = P1h2(a) = P2h3(a) = P3h4(a) = P4 m bits
Bit Vector v
Breadth (or level) BloomsResource Discovery
The Breadth Bloom Filter (BBF) for an XML tree T with j levels:set of Bloom filters {BBF0, BBF1, BBF2, … BBFi}, i ≤ j
One Bloom filter, denoted BBFi, for each level i of the tree. BBFi: the labels (attributes) of all nodes at level i. BBF0: all attributes that appear in any node of the XML tree T.
device
printer
color postscript digital
camera
{device, printer, camera, color, postscript, digital}
{device}
{printer, camera}
{color, postscript, digital}
BBF0
BBF1
BBF2
BBF3
The BBFi s are not of the same size We may skip levels
Depth (or Path) BloomsResource Discovery
The Depth Bloom Filter (DBF) for an XML tree T with j levels:set of Bloom filters {DBF0, DBF1, DBF2, … DBFi-1}, i ≤ j
One Bloom filter, denoted DBFi, for each path of length i (with i+1 nodes) of the tree. DBFi: the labels (attributes) of all paths of length i. DBF0: all attributes that appear in any node of the XML tree T.
device
printer
color postscript digital
camera
{device, printer, camera, color, postscript, digital}
{device/printer, device/camera, printer/color, printer/postscript, camera/digital}
{device/printer/color, device/printer/postscript, device/camera/digital
DBF0
DBF1
DBF2
Special symbol for “root” paths
Resource Discovery
Preliminary performance results
• Both outperform (in terms of false positives) a same size simple bloom
• Depth (path) very sensitive on the number of levels
• Depth (path) need more space
• Updates are handled efficiently (just the corresponding vectors)
Distribution
Each site:
local-filter: a bloom filter for local resources
one or more summary -filter
summary-filter: merge of the bloom filters of a set X of other sites
Resource Discovery
Horizons
(keep information for up to horizon = d neighbors (as in routing indexes)
A merged-filter for each path: merge of blooms for all sites on the path up to length equal to the horizon
Resource Discovery
Merged of nodes 1, 2
1 2
3 4
5
Merged of nodes 3, 4
6
7
8
9
Merged of nodes 6, 7, 80
Hierarchical
Resource Discovery
1 2 3
root peers
Leaf sites : local filter
Internal sites : summaries for all nodes in its subtree
Root sites : summaries for other root sites
Resource Discovery
Future work
• Evaluate distribution strategies
• Other ways of summarizing data (related work on selectivity estimation)
• See how this
can be related to ontologies (meaningful path queries)
whether/how it can be integrated with querying
Directories :: Resource Location
Data Delivery
Outline
• A survey on different modes to transmit data: Push/pull Continuous (periodic) /a-periodic Multicast/unicast Directed diffusion (communication only with neighbor nodes)
For the 1st deliverable on the topic
Data Delivery
• The different data delivery modes in DBGlobe
Tradeoffs of using one over the other (e.g., in registering services, directory (location updates)
To be extended for D10 (Data Delivery and Querying)
For the 1st deliverable on the topic
Data Delivery
Data Delivery Modes and Coherence
Data Delivery
Focus: How to achieve temporal (currency) and Semantic (transaction-based) Coherency of Data under different modes of data delivery
The Data Broadcast Model
Client
Server Broadcast Channel
• The server broadcasts data from a database to a large number of clients
• push mode + no direct communication with the server
• Data updates at the server
• Periodic updates for the values on the channel
Data Delivery
Efficient way to disseminate information to large client populations with similar interests Physical support in wireless networks (satellite, cellular) Alternative way of transmitting information for data intensive applications (e.g., web)
Multiple Versions: Not just one value per item, but k such values [Pitoura&Chrysanthis, IEEE TC 2003]
Temporal and Semantic Coherency (Theory and Protocols) [Pitoura,Chrysanthis&Ramamritham, ICDT03]
Data Delivery
Clients must read consistent and current data without contacting the server directly
Currency
(x, u) RS(R) CI(x, R)
Currency Interval of an item x in RS(R) - CI(x, R) - is [cb, ce) where cb is the time instance when the value was stored in the database, ce is the time insatnce of the next change of this value in the database
, say [cb, ce) overlapping- equal to ce
-
RS(R) is a subset an actual database state at the server
older value OV_Currency(R) = ce- , where ce is the smallest among
the right limits of CI(x, R)
Data Delivery
Currency Interval for a set (readset)
Two properties:
Temporal spread (discrepancies among database states)
Temporal Lag (how old with regards some point in time (e.g., T_commit)
Protocols and their properties
Timestamps (versioning)
Invalidation Reports
Propagation
Data Delivery
Consistency
Degrees of Consistency
C0
C1 RS(R) DS
C2 R serializable with the set of server transactions that read values read (directly or indirectly) by R
C3 R serializable with the all server transactions C4 R serializable with the all server transactions and the serial izability order of the server transactions that R observes is consistent with the commit order of transactions at the server
Data Delivery
Protocols and their properties
Data Delivery
Relation to temporal coherency
Based on broadcasting the serialization graph of the server (or parts of it)
Future Work
Multiple servers model
Applications in sensor networks
Data Delivery
DBGlobe IST-2001-32645