Feature Discovery Using Topological Data Analysis...

16
Discover what you don't know. CONFIDENTIAL CONFIDENTIAL April 2013 CONFIDENTIAL Feature Discovery Using Topological Data Analysis (TDA) [ai-yaz-dee] means “to seek” in Cherokee

Transcript of Feature Discovery Using Topological Data Analysis...

Page 1: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

April 2013

CONFIDENTIAL

Feature Discovery UsingTopological Data Analysis (TDA)

[ai-yaz-dee] means “to seek” in Cherokee

Page 2: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

The Age of Big Data!

• Financial transactions, GPS coordinates, social media generate 2.5 Quintillion Bytes (exabytes) every day!

• Expected to grow by 100% annually through 2015.

• Huge Potential for user centric solutions

Page 3: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

A Small Problem :-(

• How do we derive any insights from such big data?

• Traditional approach is to ask questions and query answers

• How do you ask question you didn’t know to ask?

From data to insights?

Page 4: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Feature Selection

• Typical data sets have hundreds to thousands of features

• Feature selection decides which features to use for prediction and how they are related

• Key challenge especially in feature rich data sets such as DNA microarrays, etc.

The Curse of Combinatorics!

Page 5: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

How It Works A Pioneering Approach

5

Ayasdi’s approach is using Topological Data Analysis one of the top 10 innovations developed at DARPA in the last decade. “ ”Tony Tether, Director

Defense Advanced Research Projects Agency (2001-2009)

Topology is the study of shape Our Differentiation is TDA

Topology is a branch of mathematics from the 1700’s that studies continuity and connectivity of objects and spaces, utilizing the shape of data to derive meaning in data

The combination of Topological Data Analysis (TDA) with machine-learning automatically creates topological networks revealing statistically significant patterns in complex data

CONFIDENTIAL 3

Page 6: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

How TDA is Different

Traditional Statistics

Algebraic models

VisualizationScatterplots, Heatmaps, Dendrograms,

CONFIDENTIAL 4

Page 7: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

Property #1: Coordinate Freeness Topology studies properties of geometric objects which are not dependent on the particular coordinate frame in which they are represented.

CONFIDENTIAL 5

Page 8: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

Property #2: Deformation Invariance

Topology studies properties of curves and surfaces which do not change when you stretch them.

CONFIDENTIAL 6

Page 9: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

Property #3: Compressed Representation

Topology constructs small, combinatorial representation of continuous objects.

CONFIDENTIAL 7

Page 10: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

MathematicalAlgorithms

TopologicalData Analysis

Advanced Statistics

UserExperience

How It Works The Ayasdi Platform™

Algorithm 1 . . . . 2 . . . . . . 3 . . .

Data transformed into topological networks revealing insights and hidden patterns

50+ statistical tests provide statistical relevance and integrity to the insights

Domain experts interface with data to validate insights and execute on results

10

Ayasdi is one of the real advances in data analysis to have arrived in the last 10 years. “ ” Eric Schadt

Director of the Institute for Genomics & Multiscale BiologyNew York Mount Sinai Medical Center

CONFIDENTIAL 9

Data automatically processed through 100’s of machine learning algorithms

. . . . 4 . . . . . . 5 . . . Algorithm N

Page 11: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Discovering Fraud Finding fraudulent transactions.

This topological network automatically discovers transactions where fraud exists.

About the Data: ~6M transactions across ~300 attributes

Low Risk High Risk

CONFIDENTIAL 10

Page 12: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Detecting Malware AttacksFinding new patterns of malware.

This topological network automatically discovers program types highlighting malware.

About the Data: ~30K system calls ~Hundreds of programs

CONFIDENTIAL 11

Page 13: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Optimizing Yield Performance

High Quality Low Quality

Predicting yield performance across manufacturing lines.

This topological network automatically classifies manufacturing lines and to help formulate a strategy for yield optimization.

About the Data: ~ 1,000+ MFG lines~ 500+ attributes/line

CONFIDENTIAL 12

Page 14: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Developing Targeted Drugs Finding patient sub-populations to develop targeted drugs.

This topological network automatically grouped patients into sub- categories based on genetic characteristics.

About the Data: ~ 3M genetic markers ~ 14K patients

CONFIDENTIAL 13

Page 15: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIALCONFIDENTIAL

Vision

New big data firm to pioneer topological data analysis

The new shape of big data

Has Ayasdi turned machine learning into a magic bullet

Ayasdi: Stanford Math Begets a Data Company

Ayasdi: A Big Data Startup with a Long History

A New Company Uses Big Data to Fight Cancer (And Rethink

Basketball)

CONFIDENTIAL 2

Transform how the world uses data to solve problems

Page 16: Feature Discovery Using Topological Data Analysis (TDA)web.stanford.edu/class/archive/ee/ee392n/ee392n.1134/lecture/apr9/... · Feature Selection • Typical data sets have hundreds

Discover what you don't know.CONFIDENTIAL

Questions?