Lecture 10: Discussion and ChallengesNSF/CBMS Conference
Sayan Mukherjee
Departments of Statistical Science, Computer Science, Mathematics
Duke University
www.stat.duke.edu/⇠sayan
May 31, 2016
Software and computing
A review paper
Topological Data AnalysisA software survey
Mikael Vejdemo-JohanssonAI Laboratory, Jozef Stefan Institute, Slovenia
Wednesday, March 19, 14
Cubical homologyPixels and voxels
• Cellular homology theoryBuilding blocks are n-cubes
• Admits very efficient matrix processing methods
• Homotopy reduction techniques reduce to matrix traversals
• Well adapted for 2d and 3d images or pixel/voxel clouds
Wednesday, March 19, 14
ChomP
• Cubical homology — with or without persistence
• GUI, command line interface, and C++ library
• Encodes a wide range of both space and mapping analyses
• Includes a wide range of homotopy-based optimizations
http://chomp.rutgers.edu/Software.html
Wednesday, March 19, 14
HAP
• Module for the GAP computer algebra system
• Primarily focused on research programming into group cohomology
• Includes support for cubical persistent homology
http://www.gap-system.org/Packages/hap.html
Wednesday, March 19, 14
Plex / jPlex / javaPlex
• Family of software packages developed at Stanford, adapted for use from Matlab
• Implements a range of algorithms — both for constructing complexes and computing their persistent (co)homology
• Current recommended incarnation: javaPlexhttp://javaplex.googlecode.com
Wednesday, March 19, 14
Dionysus
• Library for computational homology
• Contains example applications implementing persistent homology and cohomology, as well as time-varying persistence (vineyards) & low-dimensional optimizations
• Relies on Boost, and optionally on CGAL for low-dimensional optimizations
• Includes a Python interface through Boost::Python
http://www.mrzv.org/software/dionysusWednesday, March 19, 14
pHat
• Recent released software package and C++ library
• Implements several optimizations to the persistence algorithm
• Does not (currently) construct the complex for you
• (currently) restricted to �2 coefficients
• Some support for SMP parallelization using OpenMP
http://phat.googlecode.com
Wednesday, March 19, 14
Perseus
• Cubical and simplicial complex representation and several different construction methods
• Uses discrete morse theory to speed up computation
http://www.math.rutgers.edu/~vidit/perseus
Wednesday, March 19, 14
ToMaTo
• C++ library for topological analysis
• Relies on libANN for approximate nearest neighbors
http://geometrica.saclay.inria.fr/data/ToMATo/
Wednesday, March 19, 14
GAP Persistence
• Persistent homology and complex construction in the GAP computer algebra system
http://www-circa.mcs.st-and.ac.uk/~mik/persistence/
Wednesday, March 19, 14
Python Mapper
• Open source solution
• Developed by Müllner & Babu at Stanford University
• Focused on being a research tool
• Exports graph structure in several formats: GraphViz .dotd3.js JSON graph representation
http://math.stanford.com/~muellner/mapper
Wednesday, March 19, 14
Packages
Computing persistence homology
Given boundary matrix D
Find R = DV ,where V is upper-triangularR is reduced, no columns have lowest nonzeros in the same row.
The reduction is via Gaussian elimination, reduce to Smith Normalform.Rank of R are the number of o↵ diagonal ones.
Computing persistence homology
Given boundary matrix D
Find R = DV ,where V is upper-triangularR is reduced, no columns have lowest nonzeros in the same row.
The reduction is via Gaussian elimination, reduce to Smith Normalform.Rank of R are the number of o↵ diagonal ones.
Computing persistence homology
Given boundary matrix D
Find R = DV ,where V is upper-triangularR is reduced, no columns have lowest nonzeros in the same row.
The reduction is via Gaussian elimination, reduce to Smith Normalform.Rank of R are the number of o↵ diagonal ones.
Computing persistence homology
Reduction
60,
Reduction
61,
Reduction
62,
Reduction
63,
Reduction
64,
Reduction
65,
Adding Geometry
Building a complex - single linkage graph
Cech complexFor balls of radius centered at the points, a -simplex is inthe complex iff there is intersection of the balls.
40,
Adding Geometry
If we only have pairwise information
Vietoris-Rips complexFor balls of radius centered at the points, a -simplex is inthe complex iff there is clique in the Cech graph.
43,
Adding geometry
•
Adding geometry
A simplex � = {q0
, ..., qk} is weakly witnessed by a point x ifd(qi , x) < d(q, x) for any i 2 [0, k] and q 2 Q \ {qo , ..., qk}
I is strongly witnessed if in additiond(qi , x) = d(qj , x), 8i , j 2 [0, k].
Given a set of points P = {p1
, p2
, ..., pn} 2 IRd and a subsetQ ✓ P ,
I The witness complex W (P ,Q) is the collection of simplifieswith vertices from Q with all subsimplices weakly witnessed bya point in P .
I Can be defined for a general metric space.
Adding geometry
A simplex � = {q0
, ..., qk} is weakly witnessed by a point x ifd(qi , x) < d(q, x) for any i 2 [0, k] and q 2 Q \ {qo , ..., qk}
I is strongly witnessed if in additiond(qi , x) = d(qj , x), 8i , j 2 [0, k].
Given a set of points P = {p1
, p2
, ..., pn} 2 IRd and a subsetQ ✓ P ,
I The witness complex W (P ,Q) is the collection of simplifieswith vertices from Q with all subsimplices weakly witnessed bya point in P .
I Can be defined for a general metric space.
Adding geometry
A simplex � = {q0
, ..., qk} is weakly witnessed by a point x ifd(qi , x) < d(q, x) for any i 2 [0, k] and q 2 Q \ {qo , ..., qk}
I is strongly witnessed if in additiond(qi , x) = d(qj , x), 8i , j 2 [0, k].
Given a set of points P = {p1
, p2
, ..., pn} 2 IRd and a subsetQ ✓ P ,
I The witness complex W (P ,Q) is the collection of simplifieswith vertices from Q with all subsimplices weakly witnessed bya point in P .
I Can be defined for a general metric space.
Adding geometry
Comparison
Multipqrameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Multiparameter persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Challenges
I Certificates for various approximate filtrations
I Distributed computing
I Discrete Morse approaches
I Randomized algorithms
I Multiscale persistence
I Multidimensional persistence
I Sampling and distribution properties of persistence
Inference
Paradigms
Data Filtration Barcodes Interpretation
Paradigm 1: EDA
Data Filtration Barcodes Modeling
Paradigm 2: Modeling
Paradigms
StatisticalTheory ApplicationsProbability
Theory
• Hypothesis testing
• Bootstrapping
• Bayesian estimation
• Kalman filtering
• E-‐M
• Some idea
• Normal distribution
• Central Limit Theorem
• Gaussian processes
• Markov chains
• Bayes theorem
• Developing ideas
• Test drugs effect
• Noise filtering
• Tracking
• Pattern recognition
• Classification
• Many ideasTDA
Paradigm 1
Study the shape of data, what is the (multiscale) topology of thedata.
Why and what summaries ?
(1) Extracting �k can provide intuition
(2) Projecting onto IRP2 can provide intuition
(3) Persistence landscapes and diagrams can provide informationpause
(4) Statistical guarantees on these summaries(i) minmax results(ii) confidence/credible intervals(iii) consistency(iv) central limit theorems(v) extreme value theory
Paradigm 1
Study the shape of data, what is the (multiscale) topology of thedata.
Why and what summaries ?
(1) Extracting �k can provide intuition
(2) Projecting onto IRP2 can provide intuition
(3) Persistence landscapes and diagrams can provide informationpause
(4) Statistical guarantees on these summaries(i) minmax results(ii) confidence/credible intervals(iii) consistency(iv) central limit theorems(v) extreme value theory
Paradigm 1
Study the shape of data, what is the (multiscale) topology of thedata.
Why and what summaries ?
(1) Extracting �k can provide intuition
(2) Projecting onto IRP2 can provide intuition
(3) Persistence landscapes and diagrams can provide informationpause
(4) Statistical guarantees on these summaries(i) minmax results(ii) confidence/credible intervals(iii) consistency(iv) central limit theorems(v) extreme value theory
Paradigm 1
Study the shape of data, what is the (multiscale) topology of thedata.
Why and what summaries ?
(1) Extracting �k can provide intuition
(2) Projecting onto IRP2 can provide intuition
(3) Persistence landscapes and diagrams can provide informationpause
(4) Statistical guarantees on these summaries
(i) minmax results(ii) confidence/credible intervals(iii) consistency(iv) central limit theorems(v) extreme value theory
Paradigm 1
Study the shape of data, what is the (multiscale) topology of thedata.
Why and what summaries ?
(1) Extracting �k can provide intuition
(2) Projecting onto IRP2 can provide intuition
(3) Persistence landscapes and diagrams can provide informationpause
(4) Statistical guarantees on these summaries(i) minmax results(ii) confidence/credible intervals(iii) consistency(iv) central limit theorems(v) extreme value theory
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression
(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction
(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering
(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵
(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency
(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods
(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning
(iv) Distributions of summaries under null models and hypothesistesting
(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing
(v) Understanding topological noise
Paradigm 2
Summaries as features for downstream analysis.
(1) Machine learning perspective(i) features for classification and regression(ii) features for dimension reduction(iii) kernel models/kernel engineering(iv) bias variance tradeo↵(v) function approximation questions
(2) Sampling distribution perspective(i) Su�ciency(ii) Pseudolikelihoods and empirical likelihoods(iii) Je↵rey’s conditioning(iv) Distributions of summaries under null models and hypothesis
testing(v) Understanding topological noise
Where we are
StatisticalTheory ApplicationsProbability
Theory
• Hypothesis testing
• Bootstrapping
• Bayesian estimation
• Kalman filtering
• E-‐M
• Some idea
• Normal distribution
• Central Limit Theorem
• Gaussian processes
• Markov chains
• Bayes theorem
• Developing ideas
• Test drugs effect
• Noise filtering
• Tracking
• Pattern recognition
• Classification
• Many ideas
TDA
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Open questions
(1) Principled approaches to filtration selection.
(2) Quantification of ✏-su�ciency for di↵erent models/moduloinvariants.
(3) Summaries for graphs
(4) Information geometry for spaces with singularities andstratified spaces
(5) MCMC for models of di↵erent dimensions and algebraicstructures
(6) Signal processing and dictionary learning for shapes
(7) Summaries of complex objects as vector spaces ?
(8) Distribution theory for topological and geometric summaries.
Mathematics
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Spectral simplicial theory
(1) Cheeger inequalities for middle dimensions
(2) Higher-dimensional version of pageRank
(3) Limits of random walks as Brownian motion of forms
(4) Graph sparsification with L
1
(5) Synchronization and learning maps, and multicommodity flows
(6) SLE on simplicial complexes, loop erased random surfaces
Parsons
Stochastic topology
Stochastic topology
Acknowledgements
Many people.
Funding:
I Center for Systems Biology at Duke
I NSF DMS, CCF, IIS
I AFOSR, DARPA
I NIH
Top Related