Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.
-
Upload
mariah-ramsey -
Category
Documents
-
view
219 -
download
2
Transcript of Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.
![Page 1: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/1.jpg)
Persistent Homology in Topological Data Analysis
Ben Fraser
May 27, 2015
![Page 2: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/2.jpg)
Data Analysis
Suppose we start with some point cloud data, and want to extract meaningful information from it
![Page 3: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/3.jpg)
Data Analysis
Suppose we start with some point cloud data, and want to extract meaningful information from it
We may want to visualize the data to do so, by plotting it on a graph
![Page 4: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/4.jpg)
Data Analysis
Suppose we start with some point cloud data, and want to extract meaningful information from it
We may want to visualize the data to do so, by plotting it on a graph
However, in higher dimensions, visualization becomes difficult
![Page 5: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/5.jpg)
Data Analysis
Suppose we start with some point cloud data, and want to extract meaningful information from it
We may want to visualize the data to do so, by plotting it on a graph
However, in higher dimensions, visualization becomes difficult
A possible solution: dimensionality reduction
![Page 6: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/6.jpg)
Principal Component Analysis
Essentially, fits an ellipsoid to the data, where each of its axes corresponds to a principal component
![Page 7: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/7.jpg)
Principal Component Analysis
Essentially, fits an ellipsoid to the data, where each of its axes corresponds to a principal component
The smaller axes are those along which the data has less variance
![Page 8: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/8.jpg)
Principal Component Analysis
Essentially, fits an ellipsoid to the data, where each of its axes corresponds to a principal component
The smaller axes are those along which the data has less variance
We could discard these less important principal components to reduce the dimensionality of the data while retaining as much of the variance as possible
![Page 9: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/9.jpg)
Principal Component Analysis
Essentially, fits an ellipsoid to the data, where each of its axes corresponds to a principal component
The smaller axes are those along which the data has less variance
We could discard these less important principal components to reduce the dimensionality of the data while retaining as much of the variance as possible
Then may be easier to graph: identify clusters
![Page 10: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/10.jpg)
Principal Component Analysis
Done by computing the singular value decomposition of X (each row is a point, each column a dimension):
![Page 11: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/11.jpg)
Principal Component Analysis
Done by computing the singular value decomposition of X (each row is a point, each column a dimension):
Then a truncated score matrix, where L is the number of principal components we retain:
![Page 12: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/12.jpg)
Principal Component Analysis 8-dim data → 2-dim to locate clusters:
![Page 13: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/13.jpg)
Principal Component Analysis 3-dim → 2-dim collapses cylinder to circle:
![Page 14: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/14.jpg)
Principal Component Analysis Scale sensitive! Same transformation produces
poor result on same shape/different scale data
![Page 15: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/15.jpg)
Data Analysis
One weakness of PCA is its sensitivity to the scale of the data
![Page 16: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/16.jpg)
Data Analysis
One weakness of PCA is its sensitivity to the scale of the data
Also, it provides no information about the shape of our data
![Page 17: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/17.jpg)
Data Analysis
One weakness of PCA is its sensitivity to the scale of the data
Also, it provides no information about the shape of our data
We want something insensitive to scale which can identify shape (why?)
![Page 18: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/18.jpg)
Data Analysis
One weakness of PCA is its sensitivity to the scale of the data
Also, it provides no information about the shape of our data
We want something insensitive to scale which can identify shape (why?)
Because “data has shape, and shape has meaning” - Ayasdi (Gunnar Carlsson)
![Page 19: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/19.jpg)
Topological Data Analysis
Constructs higher-dimensional structure on our point cloud via simplicial complexes
![Page 20: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/20.jpg)
Topological Data Analysis
Constructs higher-dimensional structure on our point cloud via simplicial complexes
Then analyze this family of nested complexes with persistent homology
![Page 21: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/21.jpg)
Topological Data Analysis
Constructs higher-dimensional structure on our point cloud via simplicial complexes
Then analyze this family of nested complexes with persistent homology
Display Betti numbers in graph form
![Page 22: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/22.jpg)
Topological Data Analysis
Constructs higher-dimensional structure on our point cloud via simplicial complexes
Then analyze this family of nested complexes with persistent homology
Display Betti numbers in graph form
Essentially, we approximate the shape of the data by building a graph on it and considering cliques as higher dimensional objects, and counting the cycles of such objects.
![Page 23: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/23.jpg)
Algorithm Since scale doesn't matter in this analysis, we
can normalize the data.
![Page 24: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/24.jpg)
Algorithm Since scale doesn't matter in this analysis, we
can normalize the data. Also, since we don't want to work with the entire
data set (especially if it is very large), we want to choose a subset of the data to work with
![Page 25: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/25.jpg)
Algorithm Since scale doesn't matter in this analysis, we
can normalize the data. Also, since we don't want to work with the entire
data set (especially if it is very large), we want to choose a subset of the data to work with
We would ideally like this subset to be representative of the original data (but how?)
![Page 26: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/26.jpg)
Algorithm Since scale doesn't matter in this analysis, we
can normalize the data. Also, since we don't want to work with the entire
data set (especially if it is very large), we want to choose a subset of the data to work with
We would ideally like this subset to be representative of the original data (but how?)
This process is called landmarking
![Page 27: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/27.jpg)
Landmarking
The method used here is minMax
![Page 28: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/28.jpg)
Landmarking
The method used here is minMax Start by computing a distance matrix D
![Page 29: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/29.jpg)
Landmarking
The method used here is minMax Start by computing a distance matrix D Then choose a random point l
1 to add to the
subset of landmarks L
![Page 30: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/30.jpg)
Landmarking
The method used here is minMax Start by computing a distance matrix D Then choose a random point l
1 to add to the
subset of landmarks L Then choose each subsequent i-th point to add
as that which has maximum distance from the landmark it is closest to:
![Page 31: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/31.jpg)
Landmarking
The method used here is minMax Start by computing a distance matrix D Then choose a random point l
1 to add to the
subset of landmarks L Then choose each subsequent i-th point to add
as that which has maximum distance from the landmark it is closest to:
li = p such that dist(p,L) = max{dist(x,L) x ϵ X}∀
dist(x,L) = min{dist(x,l) l ϵ L}∀
![Page 32: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/32.jpg)
Landmarking Landmarking is not an exact science however:
on certain types of data the method just used may result in a subset very unrepresentative of the original data. For example:
![Page 33: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/33.jpg)
Algorithm
As long as outliers are ignored, however, the method used works well to pick points as spread out as possible among the data
![Page 34: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/34.jpg)
Algorithm
As long as outliers are ignored, however, the method used works well to pick points as spread out as possible among the data
Next we keep only the distance matrix between the landmark points, and normalize it
![Page 35: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/35.jpg)
Algorithm
As long as outliers are ignored, however, the method used works well to pick points as spread out as possible among the data
Next we keep only the distance matrix between the landmark points, and normalize it
This is all the information we need from the data: the actual position of the points is irrelevant, all we need are the distances between the landmarks, on which we will construct a neighbourhood graph
![Page 36: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/36.jpg)
Neighbourhood Graph
Our goal is to create a nested sequence of graphs. To be precise, by adding a single edge at a time, between points x,y ϵ L, where dist(x,y) is the smallest value in D. Then replace the distance in D with 1.
![Page 37: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/37.jpg)
Neighbourhood Graph
Our goal is to create a nested sequence of graphs. To be precise, by adding a single edge at a time, between points x,y ϵ L, where dist(x,y) is the smallest value in D. Then replace the distance in D with 1.
At each iteration of adding an edge, we keep track of r = dist(x,y), r ϵ [0,1]: this is our proximity parameter, and will be important when we graph the Betti numbers later.
![Page 38: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/38.jpg)
Witness Complex
Def: A point x is a weak witness to a p-simplex (a
0,a
1,...a
p) in A if |x-a| < |x-b| ∀ a ϵ (a
0,a
1,...a
p),
and b ϵ A \ (a0,a
1,...a
p)
![Page 39: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/39.jpg)
Witness Complex
Def: A point x is a weak witness to a p-simplex (a
0,a
1,...a
p) in A if |x-a| < |x-b| ∀ a ϵ (a
0,a
1,...a
p),
and b ϵ A \ (a0,a
1,...a
p)
Def: A point x is a strong witness to a p-simplex (a
0,a
1,...a
p) in A if x is a weak witness and
additionally, |x-a0| = |x-a
1| = … = |x-a
p|.
![Page 40: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/40.jpg)
Witness Complex
Def: A point x is a weak witness to a p-simplex (a
0,a
1,...a
p) in A if |x-a| < |x-b| ∀ a ϵ (a
0,a
1,...a
p),
and b ϵ A \ (a0,a
1,...a
p)
Def: A point x is a strong witness to a p-simplex (a
0,a
1,...a
p) in A if x is a weak witness and
additionally, |x-a0| = |x-a
1| = … = |x-a
p|
The requirement may be added that an edge is only added between two points if there exists a weak witness to that edge.
![Page 41: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/41.jpg)
Simplicial Complexes
Next we want to construct higher dimensional structure on the neighbourhood graph: called a simplicial complex
![Page 42: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/42.jpg)
Simplicial Complexes
Next we want to construct higher dimensional structure on the neighbourhood graph: called a simplicial complex
A simplex is a point, edge, triangle, tetrahedron, etc... (a k-simplex is a k+1-clique in the graph)
![Page 43: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/43.jpg)
Simplicial Complexes
Next we want to construct higher dimensional structure on the neighbourhood graph: called a simplicial complex
A simplex is a point, edge, triangle, tetrahedron, etc... (a k-simplex is a k+1-clique in the graph)
A face of a simplex is a sub-simplex of it
![Page 44: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/44.jpg)
Simplicial Complexes
Next we want to construct higher dimensional structure on the neighbourhood graph: called a simplicial complex
A simplex is a point, edge, triangle, tetrahedron, etc... (a k-simplex is a k+1-clique in the graph)
A face of a simplex is a sub-simplex of it A simplicial k-complex is a set S of simplices,
each of dimension ≤ k, such that a face of any simplex in S is also in S, and the intersection of any two simplices is a face of both of them
![Page 45: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/45.jpg)
Simplicial Complexes
At each iteration, we add an edge: all we need to do is see if that creates any new k-simplices
![Page 46: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/46.jpg)
Simplicial Complexes
At each iteration, we add an edge: all we need to do is see if that creates any new k-simplices
The edge itself adds a single 1-simplex to the complex
![Page 47: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/47.jpg)
Simplicial Complexes
At each iteration, we add an edge: all we need to do is see if that creates any new k-simplices
The edge itself adds a single 1-simplex to the complex
A k-simplex is formed if the intersection of neighbourhoods of a k-2 simplex contains the two points in the added edge
![Page 48: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/48.jpg)
Simplicial Complexes
At each iteration, we add an edge: all we need to do is see if that creates any new k-simplices
The edge itself adds a single 1-simplex to the complex
A k-simplex is formed if the intersection of neighbourhoods of a k-2 simplex contains the two points in the added edge
In other words, if every point in a k-2 simplex is joined to the two points in the edge, then together they form a k-simplex
![Page 49: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/49.jpg)
Boundary Matricies
Next we compute boundary matricies. Essentially, these store the information that k-1 simplices are faces of certain k simplices
![Page 50: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/50.jpg)
Boundary Matricies
Next we compute boundary matricies. Essentially, these store the information that k-1 simplices are faces of certain k simplices
For instance, in a simplicial complex with 100 triangles and 50 tetrahedra, the 4th boundary matrix has 100 rows and 50 columns, with zeros everywhere except where the given triangle is a face of the given tetrahedron, where it is 1.
![Page 51: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/51.jpg)
Boundary Matricies
At each iteration, we need only add rows of zeros to the kth boundary matrix for each k-1 simplex that was formed, since the only k-simplices they could possibly be faces of are those new ones which were formed at this iteration
![Page 52: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/52.jpg)
Boundary Matricies
At each iteration, we need only add rows of zeros to the kth boundary matrix for each k-1 simplex that was formed, since the only k-simplices they could possibly be faces of are those new ones which were formed at this iteration
Then add columns for each of these new k-simplices, and fill them with 0s and 1s by finding their faces (one of which is guaranteed to be one of the new k-1 simplices)
![Page 53: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/53.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
![Page 54: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/54.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
The kth betti number is defined as the rank of the kth homology group, H
k(X) = ker(bd
k)/im(bd
k+1)
![Page 55: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/55.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
The kth betti number is defined as the rank of the kth homology group, H
k(X) = ker(bd
k)/im(bd
k+1)
In lower dimensions, can be understood as the number of k-dimensional holes
![Page 56: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/56.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
The kth betti number is defined as the rank of the kth homology group, H
k(X) = ker(bd
k)/im(bd
k+1)
In lower dimensions, can be understood as the number of k-dimensional holes
Betti0 – number of connected components
![Page 57: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/57.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
The kth betti number is defined as the rank of the kth homology group, H
k(X) = ker(bd
k)/im(bd
k+1)
In lower dimensions, can be understood as the number of k-dimensional holes
Betti0 – number of connected components Betti1 – number of holes
![Page 58: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/58.jpg)
Betti Numbers The kth betti numbers are based on the
connectivity of the k-dimensional simplicial complexes
The kth betti number is defined as the rank of the kth homology group, H
k(X) = ker(bd
k)/im(bd
k+1)
In lower dimensions, can be understood as the number of k-dimensional holes
Betti0 – number of connected components Betti1 – number of holes Betti2 – number of voids
![Page 59: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/59.jpg)
Persistent Homology Why must we compute the betti numbers
across a range of the proximity parameter r?
![Page 60: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/60.jpg)
Persistent Homology Why must we compute the betti numbers
across a range of the proximity parameter r? Because at low values of r, the points may be
too disconnected to see any meaningful structure, and likewise at high values we are approaching a complete graph, also not useful
![Page 61: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/61.jpg)
Persistent Homology However, the solution is not to “guess” an
intermediate value of r whose corresponding simplicial complex best approximates the shape of the data
![Page 62: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/62.jpg)
Persistent Homology However, the solution is not to “guess” an
intermediate value of r whose corresponding simplicial complex best approximates the shape of the data
Indeed, as seen in the previous example, features may briefly appear at some value of r only to disappear within a few edge-adding iterations
![Page 63: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/63.jpg)
Persistent Homology However, the solution is not to “guess” an
intermediate value of r whose corresponding simplicial complex best approximates the shape of the data
Indeed, as seen in the previous example, features may briefly appear at some value of r only to disappear within a few edge-adding iterations
So, the idea is to see which features “persist”, as they are more likely to accurately represent the shape of the data
![Page 64: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/64.jpg)
Example: Circle
Choose 3200 points uniformly from the circumference of a circle
![Page 65: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/65.jpg)
Example: Circle
Choose 3200 points uniformly from the circumference of a circle
From these, choose a landmark subset of 26 points
![Page 66: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/66.jpg)
Example: Circle
Choose 3200 points uniformly from the circumference of a circle
From these, choose a landmark subset of 26 points
Iteratively add one edge, compute the simplicial 2-complex, boundary matrices, and betti numbers
![Page 67: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/67.jpg)
Example: Circle
Choose 3200 points uniformly from the circumference of a circle
From these, choose a landmark subset of 26 points
Iteratively add one edge, compute the simplicial 2-complex, boundary matrices, and betti numbers
Plot the betti numbers against the proximity parameter
![Page 68: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/68.jpg)
Example: Circle As expected, we find a single hole in the data,
and it persists across a wide range of r values. The graph has 1 component
![Page 69: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/69.jpg)
Example: Circle The important information is the lifetime of a
feature, which can be displayed in a persistence diagram/interval graph/barcode, as shown below:
![Page 70: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/70.jpg)
Example: Cylinder
![Page 71: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/71.jpg)
Example: Cylinder
![Page 72: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/72.jpg)
Example: Sphere with 4 voids
![Page 73: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/73.jpg)
Example: Sphere with 4 voids
![Page 74: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/74.jpg)
Trial: Lake Monitoring Data
Data was collected from buoys on Lake Nipissing:
Temperature Specific conductivity Dissolved oxygen concentration pH Chlorophyll (RFU – relative fluorescence
units) Total Algae (RFU)
![Page 75: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/75.jpg)
Trial: Lake Monitoring Data Sept.4,2011, 3-complex, all 6 dimensions:
![Page 76: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/76.jpg)
Trial: Lake Monitoring Data
For higher-dimensional data, may make more sense to construct higher-dimensional complexes
![Page 77: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/77.jpg)
Trial: Lake Monitoring Data
For higher-dimensional data, may make more sense to construct higher-dimensional complexes
Also, to focus our attention to dimensions that we expect to be more strongly correlated
![Page 78: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/78.jpg)
Trial: Lake Monitoring Data
For higher-dimensional data, may make more sense to construct higher-dimensional complexes
Also, to focus our attention to dimensions that we expect to be more strongly correlated
The next trial constructs a 2-complex on DO concentration, pH, and algae, using a larger set of data from Sept.4,2011:
![Page 79: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/79.jpg)
Trial: Lake Monitoring Data
![Page 80: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/80.jpg)
Trial: Lake Monitoring Data 3-complex on Sept.2,2011 data:
![Page 81: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/81.jpg)
Trial: Lake Monitoring Data
Each combination of dimension of the data and dimension of complex being built has so far failed to recognize any significant features in shape of the data
![Page 82: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/82.jpg)
Trial: Lake Monitoring Data
Each combination of dimension of the data and dimension of complex being built has so far failed to recognize any significant features in shape of the data
Combining data sets from different times of year might result in greater variation in the data, and a greater chance of patterns being found
![Page 83: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/83.jpg)
Summary
Construct a filtration of a simplicial complex on our data by building a sequence of neighbourhood graphs across an interval of the proximity parameter
![Page 84: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/84.jpg)
Summary
Construct a filtration of a simplicial complex on our data by building a sequence of neighbourhood graphs across an interval of the proximity parameter
Plot betti numbers against this proximity parameter
![Page 85: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/85.jpg)
Summary
Construct a filtration of a simplicial complex on our data by building a sequence of neighbourhood graphs across an interval of the proximity parameter
Plot betti numbers against this proximity parameter
Features which persist longer more likely represent the shape of the data
![Page 86: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/86.jpg)
Summary
Construct a filtration of a simplicial complex on our data by building a sequence of neighbourhood graphs across an interval of the proximity parameter
Plot betti numbers against this proximity parameter
Features which persist longer more likely represent the shape of the data
Shape is important!
![Page 87: Persistent Homology in Topological Data Analysis Ben Fraser May 27, 2015.](https://reader035.fdocuments.net/reader035/viewer/2022081603/56649f1e5503460f94c35c73/html5/thumbnails/87.jpg)
Acknowledgments
Mark Wachowiak (supervisor, artificial data sets) Renata Smolikova-Wachowiak (lake monitoring data) Gunnar Carlsson (see “on the shape of data”:
https://www.youtube.com/watch?v=kctyag2Xi8o) Adam Cutbill (author of original program) Afra Zomorodian (fast construction of the Vietoris-Rips
complex) Vin de Silva (topological estimation using witness
complexes)