Transcript of I2.1: In-Network Storage NS-CTA INARC Meeting 23-24 March 2011 Cambridge, MA.
- Slide 1
- I2.1: In-Network Storage NS-CTA INARC Meeting 23-24 March 2011
Cambridge, MA
- Slide 2
- Research Objective Devise techniques to enhance the efficacy of
data dissemination and query processing in an information network
that is distributed over an unreliable communication network
- Slide 3
- Researchers CategoryNameInstitution/Center LeadMudhakar
SrivatsaIBM/INARC ResearcherTarek AbdelzaherUIUC/INARC
ResearcherArun IyengarIBM/INARC ResearcherXifeng YanUCSB/INARC
CollaboratorGuohong CaoPSU/CNARC CollaboratorVikas
KawadiaBBN/IRC
- Slide 4
- Task Overview Data Dissemination (Guohong Cao C2.1)
(Social-aware) Mobility, Hybrid networks Provenance dissemination
In-network caching Joint work with CNARC Complex graph query
processing on clusters (Xifeng Yan I2.2) Distributed graph query
processing on DTNs Distributed trust computation (Vikas Kawadia
IRC, T2.3) CNARC I2.1 INARC
- Slide 5
- Outline Provenance dissemination (Srivatsa, Iyengar, PSU/CNARC)
In-network storage for provenance queries Diversity based
in-network caching (Tarek, Iyengar, Srivatsa, PSU/CNARC) Enhancing
operational capacity of network Cyclades (Srivatsa, Yan, BBN/IRC)
Distributed graph query processing
- Slide 6
- Provenance Dissemination References M. Srivatsa, W. Gao and A.
Iyengar. Provenance driven Data Dissemination in Disruption
Tolerant Networks. Under review (Fusion 2011) (IBM/INARC &
PSU/CNARC) W. Gao, A. Iyengar, M. Srivatsa and G. Cao. Supporting
Cooperative Caching in Disruption Tolerant Networks. In IEEE Intl
Conference on Distributed Computing Systems (ICDCS), 2011
(IBM/INARC & PSU/CNARC) Y. Zhang, W. Gao, G. Cao, T. La Porta,
B.Krishnamachari, and A. Iyengar Social-Aware Data Diffusion in
Delay Tolerant MANETs". Book chapter to appear in Handbook of
Optimization in Complex Networks: Communication and Social
Networks, Springer. (IBM/INARC & PSU/CNARC) W. Gao, A. Iyengar
and M. Srivatsa. System and Method for Caching Provenance
Information. Patent filed. (IBM/INARC & PSU/CNARC) Discussions
with Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew
Aguirre (ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN
in-network caching code to run experiments on traces typical of
military scenarios
- Slide 7
- 2010 IBM Corporation7 Disruption Tolerant Networks (DTNs)
Opportunistic and intermittent network connectivity Low node
density Unpredictable node mobility
- Slide 8
- 2010 IBM Corporation8 Network Model Network contact graph at
time t An edge iff the nodes i and j have contacted before time t
Each edge is modeled by a contact process (e.g., a homogeneous
Poisson process) Pairwise inter-contact time is exponentially
distributed Parameter: pairwise contact rate We can predict when
the next contact will happen
- Slide 9
- 2010 IBM Corporation9 Basic Idea Semi-ring provenance model d =
a v (b ^ c) Provenance level partial order: {} < {b}, {c} <
{a}, {b, c} < {a, b, c} Quantifying marginal provenance level of
a data item d = f(a 1, , a n ) d = a. (b + c) -> w d b = w d c =
; w d a = Utility-based data placement A unified probabilistic
framework Two caching nodes optimize their caches upon contact
Partition Map Network Contact Graph Routing Table (opportunistic
path) Worker Message Queue Overlapped P">
- Adapt Sedge to DTN Master Vertex -> Partition Map Network
Contact Graph Routing Table (opportunistic path) Worker Message
Queue Overlapped Partition (Cache) Superstep = Time slot e.g. 1
min, 1 hour, 1 day, etc. 1 1 2 2 3 3 4 4 5 5 6 6 P1 P2 P3 P4 P5 P6
Contact Graph Cluster Connection
- Slide 23
- Preliminary Experiments: Running Sedge on DTNs Data Web graph:
30M vertices, 150M edges # of partitions = # of nodes in contact
graph Contact Graph 1.Complete Contact Graph: 12 nodes 2.MIT
Bluetooth Contact: 9 nodes, 24 hours
(http://crawdad.cs.dartmouth.edu) Assumption: pairwise contact
follows Poisson distribution. Other sophisticated contact types are
also supported by Sedge. Query h-step Random Walk Query
- Slide 24
- Complete Contact Graph RW with random start =average contacts
per superstep
- Slide 25
- MIT Bluetooth Contact (24 hours) 1 1 2 2 4 4 3 3 5 5 6 6
Contact Graph 8 8 9 9 7 7 0.37 0.33 0.82 0.33 0.71 0.47 0.4 0.3 1.0
0.250.39 0.33 w(e)=average contacts per hour
- Slide 26
- Cyclades Cyclades is Sedge for DTNs BSP model: a synch
corresponds to a contact in DTN Naively running Sedge on DTNs may
be inefficient (e.g., requires a large number of contacts ->
high query latency) Revisit graph query processing with DTN
constraints Machines have a high data-rates to each other whenever
they are in contact Contact opportunities may be rare Cyclades
innovations: New metric: minimize the number of contacts (synchs in
BSP) Algorithms based on intra-machine speculative execution and
inter-machine opportunistic aggregation We present results on
computing shortest paths based metrics (e.g., node
betweenness)
- Slide 27
- Cyclades initial results Dataset: DBLP data with 1.2M papers
Information network Each paper is a node Two nodes have an
undirected edge if they have a common author Weight of an edge is
the inverse of Jacquard distance between the author lists of the
papers Problem Compute node betweenness centrality
- Slide 28
- Cyclades initial results Approach Partition nodes into
clusters; define a perimeter node as a node that has an edge to a
node in another cluster Intra-cluster: compute all-pair shortest
path matrix M between perimeter nodes (using only edges within the
partition) Inter-cluster: on an opportunistic contact between two
partitions i and j, merge the all-pair shortest path matrices M i
and M j into M ij Guarantee: M ij is the all-pair shortest path
matrix on perimeter nodes in the merger of partitions of i and
j
- Slide 29
- Cyclades initial results Clustering on DBLP data with 100k and
200k nodes Shows the number of perimeter nodes and edges And the
maximum size of clustered partitions
- Slide 30
- Cyclades initial results Comparison of four shortest path
algorithms Centralized (Dijkstra algorithm) Pregel random (Pregel
with random partitioning) Pregel cluster (Pregel with node
clustering) Cyclades (requires provably minimum number of contacts
in the BSP model; guarantees on communication and computation cost
being explored)
- Slide 31
- Cyclades initial results Number of synch operations and
communication cost between partitions Each synch operation requires
at least one DTN contact Since contacts in DTNs are rare, we have
to minimize the number of synchs, to ensure low query latency
- Slide 32
- Cyclades initial results Improved communication cost in our
approach comes at the cost of higher computation and storage cost
Our approach tradeoffs the number of synch operation (query
latency) with computation and storage cost
- Slide 33
- Cyclades initial results Computing the node betweenness
centrality Randomly chosen uv pairs Random walk sampling (picks
nodes with high degree) Expansion sampling (greedily picks nodes
with maximum expansion: |N(S)|/|S| Figure shows the accuracy of
node betweenness with number of samples
- Slide 34
- Military and Network Science Relevance Tactical military
networks: intermittent connectivity, multiple modalities of
communication, unreliable communication Needs disruption tolerance
data dissemination and query answering Needs the ability control
tradeoffs between quality and performance Enhancing trust in
decision making: provenance dissemination and distributed trust
computation Joint analysis of communication and information network
to enhance the efficacy of information delivery and query
processing Examine a spectrum of expressiveness of information
network models
- Slide 35
- Path Ahead In-network analytics and query answering on DTNs
(with CNARC) Examine diversity based (partial) redundancy
elimination mechanisms; quantify tradeoffs between quality and
performance of query answering Characterize graph query processing
algorithms that benefits from hierarchical decomposition and/or
speculative execution Determine better graph partitioning, sampling
and clustering strategies to enhance query processing
- Slide 36
- Collaborations Within task Numerous telecons Xifengs student
(UCSB) -> IBM (summer 2011) to work on distributed graph query
processing in DTNs Within INARC I2.2: Provide query execution
interface for the DTN (battlefield) context; modify distributed
information network processing platform for DTNs. I1: I2.1 provides
a scenario and data set for investigation of QoI metrics for data
pools (in I1.2) that maximize quality of fusion; I2.1 offers a test
scenario for algorithmic advances in I1.1 that focus on improving
inference in resource constrained networks With CNARC (Guohong Cao,
PSU C2.1) Research interest in DTNs and in-network storage Guohongs
student (PSU/CNARC) -> IBM/INARC (summer 2010 & 2011) to
work on delta encoding in DTNs With IRC (Vikas Kawadia, BBN T2.3)
Distributed trust computation over in-network storage
- Slide 37
- Impact Publications M. Srivatsa, W. Gao and A. Iyengar.
Provenance driven data dissemination in disruption tolerant
networks. Under submission, Fusion 2011 W. Gao, A. Iyengar, M.
Srivatsa and G. Cao. Supporting Cooperative Caching in Disruption
Tolerant Networks. In ICDCS 2011 Y. Zhang, W. Gao, G. Cao, T. La
Porta, B.Krishnamachari, and A. Iyengar Social- Aware Data
Diffusion in Delay Tolerant MANETs". Book chapter to appear in
Handbook of Optimization in Complex Networks: Communication and
Social Networks, Springer W. Gao, A. Iyengar and M. Srivatsa.
System and Method for Caching Provenance Information. Patent filed
F. Le, M. Srivatsa, A. Iyengar and G. Cao. Resolving Negative
Interferences between In-Network Caching Methods. Under preparation
Demo/Transitions Md Y. S. Uddin, Guo-Jun Qi, and Tarek Abdelzaher,
Guohong Cao, PhotoNet: A Similarity-aware Image Delivery Service
for Situation Awareness, IPSN Demo, April 2011 Collaborations with
Robert Cole (CERDEC), John Hancock (ArtisTech/IRC), Matthew Aguirre
(ArtisTech/IRC), Alice Leung (BBN/IRC) Used our DTN in-network
caching code to run experiments on traces typical of military
scenarios
- Slide 38
- Questions Contact: Mudhakar Srivatsa (msrivats@us.ibm.com)