Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Data Centric Storage using GHT Lecture 13 October 14, 2004 EENG 460a / CPSC 436 / ENAS 960 Networked...
Data Centric Storage using GHTData Centric Storage using GHTLecture 13 Lecture 13
October 14, 2004 October 14, 2004
EENG 460a / CPSC 436 / ENAS 960 EENG 460a / CPSC 436 / ENAS 960 Networked Embedded Systems &Networked Embedded Systems &
Sensor NetworksSensor Networks
Andreas SavvidesAndreas [email protected]@yale.edu
Office: AKW 212Office: AKW 212Tel 432-1275Tel 432-1275
Course WebsiteCourse Websitehttp://www.eng.yale.edu/enalab/courses/http://www.eng.yale.edu/enalab/courses/
eeng460aeeng460a
Data centric StorageData centric Storage In Sensornets with GHT In Sensornets with GHT
S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. S. Ratnasamy, B. Karp, S. Shenker, D. Estrin, R. Govindan, L. Yin and F. YuGovindan, L. Yin and F. Yu
MONET Special Issue on Sensor MONET Special Issue on Sensor Networks, August 2003Networks, August 2003
OverviewOverview
Data Centric StorageData Centric Storage– Data is stored inside the network – each name Data is stored inside the network – each name
corresponds to a location in spacecorresponds to a location in space– All data with the same name will be stored at All data with the same name will be stored at
the same sensor network locationthe same sensor network location– E.g an elephant sightingE.g an elephant sighting
Why Data centric Storage?Why Data centric Storage?– Energy efficiencyEnergy efficiency– Robustness against mobility and node failuresRobustness against mobility and node failures– ScalabilityScalability
Keywords and TerminologyKeywords and Terminology ObservationObservation ♦ ♦ low-level readings from sensorslow-level readings from sensors
♦ ♦ e.g. Detailed temperature readingse.g. Detailed temperature readings
EventsEvents ♦ ♦ Predefined constellations of low-level Predefined constellations of low-level
observationsobservations
♦ ♦ e.g. temperature greater than 75 Fe.g. temperature greater than 75 F
QueriesQueries ♦ ♦Used to elicit information from sensor networkUsed to elicit information from sensor network
Performance Metric:Total Usage Performance Metric:Total Usage /Hotspot Usage /Hotspot Usage
Use communication as a cost function for Use communication as a cost function for energy consumptionenergy consumption
Total UsageTotal Usage– Total number of packets sent in the Sensor network Total number of packets sent in the Sensor network
Hotspot Usage Hotspot Usage – The maximal number of packets send by a particular The maximal number of packets send by a particular
sensor nodesensor node Costs used in the evaluation Costs used in the evaluation
– Message flooding cost O(n)Message flooding cost O(n)– Point-to-point routing costPoint-to-point routing cost– n is the number of nodesn is the number of nodes nO
Alternative Storage SchemesAlternative Storage Schemes
External Storage (ES)External Storage (ES)– Events propagated and stored at an external Events propagated and stored at an external
locationlocation Local Storage (LS)Local Storage (LS)
– Events stored locally at the detecting nodeEvents stored locally at the detecting node– Queries are flooded to all nodes and the events are Queries are flooded to all nodes and the events are
sent backsent back Data Centric Storage (DCS)Data Centric Storage (DCS)
– Data for an event stored within the sensor networkData for an event stored within the sensor network– Queries are directed to the node that stores the Queries are directed to the node that stores the
datadata
Local Storage (LS)Local Storage (LS)
EventData
EventData
event
event
Queries flooded at allthe nodes
Why do we need DCS?Why do we need DCS?
ScalabilityScalability Robustness against Node failures Robustness against Node failures
and Node mobility and Node mobility To achieve Energy-efficiencyTo achieve Energy-efficiency
Design Criterial: Scalability & Design Criterial: Scalability & RobustnessRobustness
Node failuresNode failures Topology changesTopology changes System scale to large number of nodesSystem scale to large number of nodes Energy ConstraintsEnergy Constraints PersistencePersistence
– (k,v) pair must remain available to queries, despite sensor (k,v) pair must remain available to queries, despite sensor node failures and changes in sensor network topologynode failures and changes in sensor network topology
ConsistencyConsistency– A query k must be routed correctly to a node where (k,v) pairs A query k must be routed correctly to a node where (k,v) pairs
are stored – if these node change, then they should do this are stored – if these node change, then they should do this consisentlyconsisently
Scaling in Database SizeScaling in Database Size Topological generality – system should scale well on a large Topological generality – system should scale well on a large
number of topologiesnumber of topologies
Assumptions in DCSAssumptions in DCS
Large Scale networks whose approximate Large Scale networks whose approximate geographic boundaries are known geographic boundaries are known
Nodes have short range communication Nodes have short range communication and are within the radio range of several and are within the radio range of several other nodesother nodes
Nodes know their own locations by GPS or Nodes know their own locations by GPS or some localization schemesome localization scheme
Communication to the outside world takes Communication to the outside world takes place by one or more access points place by one or more access points
Data Centric StorageData Centric Storage
Relevant Data are stored by “name” Relevant Data are stored by “name” at nodes within the Sensor networkat nodes within the Sensor network
All data with the same general name All data with the same general name will be stored at the same sensor-net will be stored at the same sensor-net node.node.
e.g. (“elephant sightings”)e.g. (“elephant sightings”) Queries for data with a particular Queries for data with a particular
name are then sent directly to the name are then sent directly to the node storing those named data node storing those named data
Geographic Hash Table Geographic Hash Table
Events are named with keys and Events are named with keys and both the storage and the retrieval both the storage and the retrieval are performed using keysare performed using keys
GHT provides (key, value) based GHT provides (key, value) based associative memoryassociative memory
Geographic Hash Table OperationsGeographic Hash Table Operations
GHT supports two operationsGHT supports two operations ♦♦ Put(k,v)-stores v (observed data) Put(k,v)-stores v (observed data)
according to the key k according to the key k ♦ ♦ Get(k)-retrieve whatever value is Get(k)-retrieve whatever value is
associated with key kassociated with key k Hash functionHash function ♦♦ Hash the key in to the geographic Hash the key in to the geographic
coordinatescoordinates ♦♦ Put() and Get() operations on the same Put() and Get() operations on the same
key “ key “k”k” hash hash kk to the same location to the same location
Storing Data in GHTStoring Data in GHT
Put (“elephant”, data)
(12,24)
Hash (‘elephant’)=(12,24)
source:lass.cs.umass.edu
Algorithms Used By GHTAlgorithms Used By GHT
Geographic hash Table uses GPSR for Geographic hash Table uses GPSR for Routing(Routing(Greedy Perimeter Stateless Greedy Perimeter Stateless RoutingRouting))
PEER-TO-PEER look up systemPEER-TO-PEER look up system
((data object is associated with key and data object is associated with key and each node in the system is responsible for each node in the system is responsible for storing a certain range of keysstoring a certain range of keys) )
Algorithm (Contd)Algorithm (Contd) GPSR- Packets are marked with GPSR- Packets are marked with
position of destinations and each position of destinations and each node is aware of its positionnode is aware of its position
Greedy forwarding algorithm Greedy forwarding algorithm Perimeter forwarding algorithmPerimeter forwarding algorithm
A
B
A
B
GPSR: Right-Hand Rule In GPSR: Right-Hand Rule In Perimeter ForwardingPerimeter Forwarding
x
y
z
1
2
3
Home Node and Home perimeterHome Node and Home perimeter
Home node:Home node: Node geographically nearest to the Node geographically nearest to the destination coordinates of the packetdestination coordinates of the packet– Serves as the rendezvous point for Get() and Put() Serves as the rendezvous point for Get() and Put()
operations on the same keyoperations on the same key In GHT packet is not addressed to specific node but In GHT packet is not addressed to specific node but
only to a specific locationonly to a specific location– Use GPSR to find the home node Use GPSR to find the home node – only perimeter mode of GPSR to find only perimeter mode of GPSR to find Home PerimeterHome Perimeter
Home PerimeterHome Perimeter – perimeter that encloses the – perimeter that encloses the destinationdestination– Start from the home node, and use perimeter mode to make Start from the home node, and use perimeter mode to make
a cycle and return to the home nodea cycle and return to the home node
Problems Problems
Robustness could be affectedRobustness could be affected» Nodes could move (i.d. of Home node?)Nodes could move (i.d. of Home node?)» Node failure can OccurNode failure can Occur» Deployment of new NodesDeployment of new Nodes
Not ScalableNot Scalable» Storage capacity of the home nodesStorage capacity of the home nodes» Bottleneck at Home nodes Bottleneck at Home nodes
Solutions to the problemsSolutions to the problems
Perimeter refresh protocolPerimeter refresh protocol– mostly addresses the robustness issuemostly addresses the robustness issue
Structured ReplicationStructured Replication– address the scalability issue address the scalability issue – how to handle storage of many events how to handle storage of many events
Perimeter refresh protocolPerimeter refresh protocol Replicates stored data for key k at nodes around the location to
which k hashes– Stores a copy of the key value pair at each node on the home
perimeter– Each node on the perimeter is called a replica node
How do you ensure consistency & persistence– A node becomes the home node if a packet for a particular key arrives
at that node The perimeter refresh protocols periodically sends out refresh
packets– After a time period Th generate a refresh packet that contains the data
for that key– Packet forwarded on the home perimeter in the same way as Get() and
Put()– The refresh packet will take a tour of the home perimeter regardless
the changes in the network topology since the key’s insertion– This property maintains the perimeter
Perimeter Refresh ProtocolPerimeter Refresh Protocol How do you guard against node failuresHow do you guard against node failures
– When a replica node receives a packet it did When a replica node receives a packet it did not originate, it caches the data in the refresh not originate, it caches the data in the refresh and sets up a takeover timer Tand sets up a takeover timer Ttt
– Timer is reset each time a refresh from another Timer is reset each time a refresh from another node arrivesnode arrives
– If the timer expires the replica node initiates a If the timer expires the replica node initiates a refresh packet addressed to the key’s hashed refresh packet addressed to the key’s hashed locationlocation
Note: That particular node does not Note: That particular node does not determine a new home node. The GHT determine a new home node. The GHT routing causes the refresh to reach a node routing causes the refresh to reach a node home nodehome node
Perimeter Refresh ProtocolPerimeter Refresh Protocol
E
F
B
D
A
C
L
home
Replica
ReplicaAssume key k hashes at location L
A is closest to L so it becomesthe home node
Perimeter Refresh ProtocolPerimeter Refresh Protocol
ED
F
C
B
L
ReplicaReplica
home
Replica
Replica
Suppose the node A dies
Time SpecificationsTime Specifications
Refresh time (TRefresh time (Thh) )
Take over time (TTake over time (Ttt))
Death time (TDeath time (Tdd))
General ruleGeneral rule
TTdd>T>Thh and T and Ttt>T>Thh
In GHT Td=3Th and Tt=2ThIn GHT Td=3Th and Tt=2Th
Characteristics Of Refresh PacketCharacteristics Of Refresh Packet
Refresh packet is addressed to the Refresh packet is addressed to the hashed location of the keyhashed location of the key
Every (TEvery (Thh) secs the home node will ) secs the home node will generate refresh packetgenerate refresh packet
Refresh packet contains the data Refresh packet contains the data stored for the key and routed exactly stored for the key and routed exactly as get() and put() operationsas get() and put() operations
Refresh packet always travels along Refresh packet always travels along the home perimeterthe home perimeter
Structured ReplicationStructured Replication
Too many events are detected then home Too many events are detected then home node will become the hotspot of node will become the hotspot of communication.communication.
Structured replication is used to address Structured replication is used to address the scaling problemthe scaling problem
Hierarchical decomposition of the key Hierarchical decomposition of the key spacespace– Event names have a certain hierarchy depthEvent names have a certain hierarchy depth
Structured ReplicationStructured Replication A node that detects a new event, stores A node that detects a new event, stores
that event to its closest mirrorthat event to its closest mirror– this is easily computablethis is easily computable
This reduces the storage cost, but This reduces the storage cost, but increases the query costincreases the query cost
GHT has to route the queries to all mirror GHT has to route the queries to all mirror nodesnodes– Queries are routes recursivelyQueries are routes recursively
First route query to the root, then to the first level First route query to the root, then to the first level and then to the second level mirrorsand then to the second level mirrors
Structured replication becomes more Structured replication becomes more useful for frequently detected eventsuseful for frequently detected events
EvaluationEvaluation
Simulation to test if the protocol is Simulation to test if the protocol is functioning correctlyfunctioning correctly
Done in the ns-2 network simulator Done in the ns-2 network simulator using an IEEE 802.11 macusing an IEEE 802.11 mac– This is a well known event driven This is a well known event driven
simulator for ad-hoc networks simulator for ad-hoc networks Larger scale simulations for the Larger scale simulations for the
comparative study where done with comparative study where done with a custom simulatora custom simulator
Comparative StudyComparative Study
Simulation compares the following schemesSimulation compares the following schemes– External Storage (ES)External Storage (ES)– Local Storage (LS)Local Storage (LS)– Normal DCS – A query returns a separate message for Normal DCS – A query returns a separate message for
each detected eventeach detected event– Summarized DCS(S-DCS): A query returns a single Summarized DCS(S-DCS): A query returns a single
message regardless of the number of detected eventsmessage regardless of the number of detected events– Structured Replication DCS (SR_DCS) – Assuming an Structured Replication DCS (SR_DCS) – Assuming an
optimal level of SR optimal level of SR Comparison based on CostComparison based on Cost Comparison based on Total usage and Hot spot Comparison based on Total usage and Hot spot
usageusage
Assumptions in comparisonAssumptions in comparison
Asymptotic costs of O(n) for floods and O( n) Asymptotic costs of O(n) for floods and O( n) for point to point routingfor point to point routing
Event locations are distributed randomlyEvent locations are distributed randomly Event locations are not known in advanceEvent locations are not known in advance No more than one query for each event typeNo more than one query for each event type
(Q –Queries in total)(Q –Queries in total) Assume access points to be the most heavily Assume access points to be the most heavily
used area of the sensor networkused area of the sensor network
Comparison based onComparison based onHot-spot/Total Usage Hot-spot/Total Usage
n - Number of nodesn - Number of nodes T - Number of Event typesT - Number of Event types Q – Number Of Event types queried Q – Number Of Event types queried
forfor DDtotal total – Total number of detected – Total number of detected
eventsevents DDQQ- Number of detected events for - Number of detected events for
queriesqueries
DCS TYPESDCS TYPES
Normal DCS – Query returns a Normal DCS – Query returns a separate message for each detected separate message for each detected eventevent
Summarized DCS – Query returns a Summarized DCS – Query returns a single message regardless of the single message regardless of the number of detected eventsnumber of detected events
(usually summary is preferred)(usually summary is preferred)
Comparison Study – contd..Comparison Study – contd..
ESES LSLS DCSDCS
TotalTotal
HotHot
spotspot
nDtotal nDQn q nDnDnQ qtotal
totalDqDQ qDQ
)(summarynQnDnQ total
)(2 summaryQ
Observations from the ComparisonObservations from the Comparison
DCS is preferable only in cases whereDCS is preferable only in cases where
Sensor network is LargeSensor network is Large There are many detected events and There are many detected events and
not all event types queriednot all event types queried
DDtotaltotal>>max(D>>max(Dq,q,Q)Q)
SimulationsSimulations
To check the Robustness of GHTTo check the Robustness of GHT
To compare the Storage methods in To compare the Storage methods in terms of total and hot spot usageterms of total and hot spot usage
Simulation SetupSimulation Setup
ns-2ns-2 Node Density – 1node/256mNode Density – 1node/256m22
Radio Range – 40 mRadio Range – 40 m Number of Nodes -50,100,150,200Number of Nodes -50,100,150,200 Mobility Rate -0,0.1,1m/sMobility Rate -0,0.1,1m/s Query generation Rate -2qpsQuery generation Rate -2qps Event types – 20Event types – 20 Events detected -10/typeEvents detected -10/type Refresh interval -10 sRefresh interval -10 s
Performance metricsPerformance metrics
Availability of data stored to QueriersAvailability of data stored to Queriers
(In terms of success rate)(In terms of success rate)
Loads placed on the nodes Loads placed on the nodes participating in GHT (hotspot usage)participating in GHT (hotspot usage)
Simulation Results for RobustnessSimulation Results for Robustness
GHT offers perfect availability of GHT offers perfect availability of stored events in static casestored events in static case
It offers high availability when nodes It offers high availability when nodes are subjected to mobility and failuresare subjected to mobility and failures
Simulation Results under varying QSimulation Results under varying Q
Number of nodes is
constant= 10000
Simulation Results for comparison Simulation Results for comparison of 3-storage methodsof 3-storage methods
S-DCS have low hot-spot usage under S-DCS have low hot-spot usage under varying “Q”varying “Q”
S-DCS is has the lowest hot-spot S-DCS is has the lowest hot-spot usage under varying “n”usage under varying “n”
ConclusionConclusion Data centric storage entails naming of data and Data centric storage entails naming of data and
storing data at nodes within the sensor networkstoring data at nodes within the sensor network GHT- hashes the key (events) in to geographical GHT- hashes the key (events) in to geographical
co-ordinates and stores a key-value pair at the co-ordinates and stores a key-value pair at the sensor node geographically nearest to the hashsensor node geographically nearest to the hash
GHT uses Perimeter Refresh Protocol and GHT uses Perimeter Refresh Protocol and structured replication to enhance robustness and structured replication to enhance robustness and scalabilityscalability
DCS is useful in large sensor networks and there DCS is useful in large sensor networks and there are many detected events but not all event types are many detected events but not all event types are Queried are Queried
REFERENCESREFERENCES Deepak Ganesan, Deborah Estrin, John Heidemann, Deepak Ganesan, Deborah Estrin, John Heidemann,
Dimensions: why do we need a new data handling architecture for Dimensions: why do we need a new data handling architecture for sensor networks?sensor networks?, ACM SIGCOMM Computer Communication Review, Volume 33 , ACM SIGCOMM Computer Communication Review, Volume 33 Issue 1, January 2003 Scott Shenker, Sylvia Ratnasamy, Brad Issue 1, January 2003 Scott Shenker, Sylvia Ratnasamy, Brad Karp, Ramesh Govindan, Deborah Estrin, Karp, Ramesh Govindan, Deborah Estrin, Data-centric storage in sensornetsData-centric storage in sensornets, ACM SIGCOMM Computer , ACM SIGCOMM Computer Communication Review, Volume 33 Issue 1, January 2003 Communication Review, Volume 33 Issue 1, January 2003
Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Sylvia Ratnasamy, Brad Karp, Scott Shenker, Deborah Estrin, Ramesh Govindan, Li Yin, Fang Yu, Ramesh Govindan, Li Yin, Fang Yu, Data-centric storage in sensornets with GHT, a geographic hash taData-centric storage in sensornets with GHT, a geographic hash tableble, Mobile Networks and Applications, Volume 8 Issue 4, August , Mobile Networks and Applications, Volume 8 Issue 4, August 2003 2003
Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, Chalermek Intanagonwiwat, Ramesh Govindan, Deborah Estrin, John Heidemann, Fabio Silva, John Heidemann, Fabio Silva, Directed diffusion for wireless sensor networkingDirected diffusion for wireless sensor networking, IEEE/ACM , IEEE/ACM Transactions on Networking (TON), Volume 11 Issue, February Transactions on Networking (TON), Volume 11 Issue, February 2003 2003
R. Govindan, J. M. Hellerstein, W. Hong, S. Madden, M. Franklin, S. R. Govindan, J. M. Hellerstein, W. Hong, S. Madden, M. Franklin, S. Shenker, Shenker, The Sensor Network as a DatabaseThe Sensor Network as a Database, USC Technical , USC Technical Report No. 02-771, September 2002 Report No. 02-771, September 2002