Topological Data Analysis and Network...

31
Topological Data Analysis and Network Coverage EE122 Project, Spring 2014 Rey Blume, Eric Chu

Transcript of Topological Data Analysis and Network...

Page 1: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Topological Data Analysis and Network Coverage

EE122 Project, Spring 2014

Rey Blume, Eric Chu

Page 2: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

TDA: Motivation and Intro

• http://www.coloquios.info/ponencias/MBGT-TopologicalDataAnalysis.pdf• Algebraic topology • Increasing amount of data produced, high-dimensional, large amount of

data• Qualitative information to make sense and structure of data• Metrics and coordinates used to calculate statistical values (means,

distances, etc.) often unjustified, e.g. biological problems• Clustering algorithms brittle to choice of epsilon• Why topology?

– Study of qualitative geometric information, connectivity– Less sensitive to actual choice of metrics, coordinate-free– Functoriality (inclusion maps between spaces – in our case, complexes – allow

us to make conclusions at the global level from local pieces)

• TDA: Applications

Page 3: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Basic ApproachEx. Torus

Page 4: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Algebraic Topology: Topology

• ‘Rubber-sheet geometry’

• Continuity

• Different levels of ‘sameness’

– Homeomorphism, homotopy equivalence

– Homology computationally tractable

– Ex.

Page 5: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simplicial Complexes

• Thm. Triangulation of a space X is a simplicial complex K with a homeomorphism |K| -> X– Choice of triangulation doesn’t matter

– Therefore, can go in reverse: point-cloud -> complex -> space

• A simplicial complex K is a set of simplices where– Any face of K is also in K

– Intersection of any two simplices i, j in K is a face of both i and j

Page 6: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simplicial Complexes

• Cech Complex– Intersection of epsilon /2 balls = edge

– Too computationally intensive for anything n-simplex with n > 1

• Therefore, Rips Complex– Pair-wise computations: if distance < epsilon

– Add high dimension simplices whenever possible (all its faces have been added)

– Not homotopy equivalent to the cover of the set, but seems to work reasonably well

Page 7: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Rips Complex

Page 8: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Homology Groups

• Groups– Set of elements with an operation that satisfy closure, associativity, identity element, inverse element– Ex. Integers with addition; Integers with multiplication is NOT a group (no inverses); symmetry group (e.g.

square with rotations)

• Chain groups– k-chain = sum of oriented k-simplices– C_k (K) := kth chain group, set of all chains – Relate chain groups of successive dimensions through the boundary operator

• Alternating sum of its faces

Page 9: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Homology Groups

• Chain complex C*

• kth cycle group

• kth boundary group

• Homology groups from chain groups

– “cycles mod boundaries” - boundaries become identity element in new group

– Cycles that aren’t boundaries are holes

Page 10: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Betti Numbers

• Rank of nth homology group = nth Betti number

• Computation of rank is just linear algebra given simplices– Rank-Nullity Thm.

• Euler-Poincare Formula – Summation of Betti numbers related to Euler Characteristic (topological invariant)

• Betti-0 = # connected components

• Betti-1 = # loops

• Betti-2 = # cavities

• Ex. Circle, Torus, Solid Sphere

Page 11: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Persistent Homology and Barcodes

• Want to ignore noise, capture features that persist

• Increasing epsilon creates different complexes over time

• Barcodes simply neat way of capturing that information– Current research includes

statistical methods to analyze barcodes

Page 12: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Persistent Homology and Barcodes

Page 13: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Network Coverage: Problem, Classical Solutions

• most work fell into one of two groups - approaches that utilized geometric analysis to obtain an exact answer and those that sought a non-deterministic approximation but assumed significant capabilities of the sensors.

• The former approach requires a great deal of prior knowledge about the geometry of the domain and the exact location of the sensors, or at least exact distances for every pair of sensors. The latter does not require this exactness, but often requires a uniform distribution of nodes or a high level of intelligence in the sensors

• http://www.elizabethmunch.com/math/research/ElizabethMunch-TimeVaryingPersistence.pdf

Page 14: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Network Coverage: Why TDA?, Applications

• Ghrist– GPS can be unattractive due to: cost, power consumption, accuracy

limitations

• Coverage problem can be solved if we have:– Exact knowledge of the coverage area shape,– Exact knowledge of each sensors’ position, and– Centralized information gathering and processing

• But, using TDA, can solve even if we have:– Unknown coverage area shape– Crude proximity information– Centralized information gathering processing (still need this one)

• Topology gives global information from local inputs

Page 15: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Network Coverage: Why TDA?, Applications

• Especially applicable for ad-hoc networks, which are a hot area of research, entering public usage

– new iPhone mesh network functionality

– Egypt, openmeshnetwork

• Robotic sensors

Page 16: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data

• http://db.csail.mit.edu/labdata/labdata.html

• 54 sensors in Intel Berkeley Research Lab collecting humidity, temporate, light, etc.

• Computations done using Javaplex and Matlab

Page 17: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Euclidean Position

Max_filtration = 100, num_divisions = 100, vietoris rips

Page 18: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Euclidean Position

Max_filtration = 100, num_divisions = 50, vietoris ripsSmaller divisions = greater time b/n homology calcuation = lose some granularity

Page 19: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Euclidean Position

Max_filtration = 100, num_divisions = 10, vietoris rips

Page 20: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Complexity

• Rips complex construction can be a bottleneck

– # of simplices for intel example

• Witness/Lazy-witness creates far fewer simplices than rips

– Landmark points

• 1) random sampling of landmark points L

• 2) greedy inductive selection process called sequential maxmin

• Formal definitions of each

Page 21: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Complexity

Page 22: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Complexity

• Results: # simplices, run-time for each complex under different parameters

• Num_divisions = 50

– Rips: t=21.2005s, num_simplices=342540

– Witness; Lazy Witness:

• 20 L pts: 0.9828, 6195; 0.6864, 6195

• 30 L pts: 1.2792, 31930; 0. 1.0920, 31930

• 40 L pts: 3.9156, 102090; 3.6816, 102090

Page 23: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Witness with only 20 landmark pts is still largely accurate

Page 24: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Simulation: Intel Lab Data, Connectivity Data

• Data: probability that sensor A will be able to talk to sensor B– Asymmetric

• Create 1-simplex if P(A,B) > thres && P(B,A) > thres

• Create 2-simplex if directional pairs in triplet > thres

• Global connectivity data from local data• Studies show poor correlation between distance

and signal anyway

Page 25: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Threshold = 0.05, 3, 0, 2026

Page 26: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Threshold = 0.3, 3, 349Threshold = 0.1, 3, 2, 981

Page 27: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Threshold = 0.3, 4, 118Threshold = 0.3, 4, 7, 96

Page 28: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Threshold = 0.5, 11, 18Threshold = 0.5, 11, 1, 3

Page 29: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Threshold = 0.7, 46, 0

Page 30: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Another Dataset/Moving Network?

• http://crawdad.cs.dartmouth.edu/all-byname.html• - Dataset of mobility traces of taxi cabs in San Francisco, USA. !!• - Dataset of WiFi-based connectivity between basestations and

vehicles in urban settings.• - Dataset of received signal strength indication (RSSI) collected from

within an indoor office building.• - Dataset of coverage and performance-related information of

MetroFi, a 802.11x municipal wireless mesh network in Portland, Oregon in 2007. !!

• - Data set consisting of measurements from two different wireless mesh network testbeds (802.11g and 802.11a).

• http://www.wings.cs.sunysb.edu/wiki/doku.php?id=mutli-channel-dataset

Page 31: Topological Data Analysis and Network Coverageweb.media.mit.edu/~echu/assets/projects/tda/122_proj_pres.pdf · •most work fell into one of two groups - approaches that utilized

Future Research

• Ns-3, mobile ad-hoc network– Google Loon, Facebook Drones

• Distributed homology calculation• Pursuit evasion problem: Betti-0 = 0 over time