Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SEA - 5/01/15

38
Tensor Methods: A New Paradigm for Probabilistic Models and Feature Learning Anima Anandkumar U.C. Irvine

Transcript of Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLconf SEA - 5/01/15

  • Tensor Methods: A New Paradigm for

    Probabilistic Models and Feature Learning

    Anima Anandkumar

    U.C. Irvine

  • Learning with Big Data

  • Data vs. Information

  • Data vs. Information

  • Data vs. Information

    Missing observations, gross corruptions, outliers.

  • Data vs. Information

    Missing observations, gross corruptions, outliers.

    Learning useful information is finding needle in a haystack!

  • Matrices and Tensors as Data StructuresMulti-modal and multi-relational data.

    Matrices: pairwise relations. Tensors: higher order relations.

    Multi-modal data figure from Lise Getoor slides.

  • Spectral Decomposition of Tensors

    M2 =

    i

    iui vi

    = + ....

    Matrix M2 1u1 v1 2u2 v2

  • Spectral Decomposition of Tensors

    M2 =

    i

    iui vi

    = + ....

    Matrix M2 1u1 v1 2u2 v2

    M3 =

    i

    iui vi wi

    = + ....

    Tensor M3 1u1 v1 w1 2u2 v2 w2

    We have developed efficient methods to solve tensor decomposition.

  • Strengths of Tensor MethodsFast and accurate, orders of magnitude faster than previous methods.

    Embarrassingly parallel and suited for cloud systems, e.g.Spark.

    Exploit optimized linear algebra libraries.

    Exploit parallelism of GPU systems.

  • Strengths of Tensor MethodsFast and accurate, orders of magnitude faster than previous methods.

    Embarrassingly parallel and suited for cloud systems, e.g.Spark.

    Exploit optimized linear algebra libraries.

    Exploit parallelism of GPU systems.

    102 103101

    100

    101

    102

    103

    104

    Number of communities k

    Runningtime(secs)

    MATLAB Tensor Toolbox(CPU)

    CULA Standard Interface(GPU)

    CULA Device Interface(GPU)

    Eigen Sparse(CPU)

  • Outline

    1 Introduction

    2 Learning Probabilistic Models

    3 Experiments

    4 Feature Learning with Tensor Methods

    5 Conclusion

  • Latent variable models

    Incorporate hidden or latent variables.

    Information structures: Relationships between latent variables andobserved data.

  • Latent variable models

    Incorporate hidden or latent variables.

    Information structures: Relationships between latent variables andobserved data.

    Basic Approach: mixtures/clusters

    Hidden variable is categorical.

  • Latent variable models

    Incorporate hidden or latent variables.

    Information structures: Relationships between latent variables andobserved data.

    Basic Approach: mixtures/clusters

    Hidden variable is categorical.

    Advanced: Probabilistic models

    Hidden variables have more general distributions.

    Can model mixed membership/hierarchicalgroups.

    x1 x2 x3 x4 x5

    h1

    h2 h3

  • Challenges in Learning LVMs

    Computational Challenges

    Maximum likelihood: non-convex optimization. NP-hard.

    Practice: Local search approaches such as gradient descent, EM,Variational Bayes have no consistency guarantees.

    Can get stuck in bad local optima. Poor convergence rates and hardto parallelize.

    Tensor methods yield guaranteed learning for LVMs

  • Unsupervised Learning of LVMs

    GMM HMM

    h1 h2 h3

    x1 x2 x3

    ICA

    h1 h2 hk

    x1 x2 xd

    Multiview and Topic Models

  • Overall Framework for Unsupervised Learning

    = +....

    UnlabeledData

    Probabilisticadmixturemodels

    TensorMethod

    Inference

  • Outline

    1 Introduction

    2 Learning Probabilistic Models

    3 Experiments

    4 Feature Learning with Tensor Methods

    5 Conclusion

  • Demo for Learning Gaussian Mixtures

  • NYTimes Demo

  • Experimental Results on Yelp and DBLPLowest error business categories & largest weight businesses

    Rank Category Business Stars Review Counts1 Latin American Salvadoreno Restaurant 4.0 362 Gluten Free P.F. Changs China Bistro 3.5 553 Hobby Shops Make Meaning 4.5 144 Mass Media KJZZ 91.5FM 4.0 135 Yoga Sutra Midtown 4.5 31

  • Experimental Results on Yelp and DBLPLowest error business categories & largest weight businesses

    Rank Category Business Stars Review Counts1 Latin American Salvadoreno Restaurant 4.0 362 Gluten Free P.F. Changs China Bistro 3.5 553 Hobby Shops Make Meaning 4.5 144 Mass Media KJZZ 91.5FM 4.0 135 Yoga Sutra Midtown 4.5 31

    Top-5 bridging nodes (businesses)

    Business CategoriesFour Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, TempePizzeria Bianco Restaurants, Pizza, PhoenixFEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, PhoenixMatts Big Breakfast Restaurants, Phoenix, Breakfast& BrunchCornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe

  • Experimental Results on Yelp and DBLPLowest error business categories & largest weight businesses

    Rank Category Business Stars Review Counts1 Latin American Salvadoreno Restaurant 4.0 362 Gluten Free P.F. Changs China Bistro 3.5 553 Hobby Shops Make Meaning 4.5 144 Mass Media KJZZ 91.5FM 4.0 135 Yoga Sutra Midtown 4.5 31

    Top-5 bridging nodes (businesses)

    Business CategoriesFour Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, TempePizzeria Bianco Restaurants, Pizza, PhoenixFEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, PhoenixMatts Big Breakfast Restaurants, Phoenix, Breakfast& BrunchCornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe

    Error (E) and Recovery ratio (R)

    Dataset k Method Running Time E RDBLP sub(n=1e5) 500 ours 10,157 0.139 89%DBLP sub(n=1e5) 500 variational 558,723 16.38 99%DBLP(n=1e6) 100 ours 5407 0.105 95%

  • Discovering Gene Profiles of Neuronal Cell Types

    Learning mixture of point processes of cells through tensor methods.

    Components of mixture are candidates for neuronal cell types.

  • Discovering Gene Profiles of Neuronal Cell Types

    Learning mixture of point processes of cells through tensor methods.

    Components of mixture are candidates for neuronal cell types.

  • Hierarchical Tensors for Healthcare Analytics

    = + .... = + .... = +....

    = + ....

  • Hierarchical Tensors for Healthcare Analytics

    = + .... = + .... = +....

    = + ....

    CMS dataset: 1.6 million patients, 15.8 million events.

    Mining disease inferences from patient records.

  • Outline

    1 Introduction

    2 Learning Probabilistic Models

    3 Experiments

    4 Feature Learning with Tensor Methods

    5 Conclusion

  • Feature Learning For Efficient Classification

    Find good transformations of input for improved classification

    Figures used attributed to Fei-Fei Li, Rob Fergus, Antonio Torralba, et al.

  • Tensor Methods for Training Neural Networks

    First order score function m-th order score function

    Input pdf is p(x)

    Sm(x) :=(1)m(m)p(x)

    p(x)

    Capture local variations indata.

    Algorithm for Training Neural Networks

    Estimate score functions using autoencoder.

    Decompose tensor E[y Sm(x)] to obtain weights, for m 3.

    Recursively estimate score function of autoencoder and repeat.

  • Tensor Methods for Training Neural Networks

    Second order score function m-th order score function

    Input pdf is p(x)

    Sm(x) :=(1)m(m)p(x)

    p(x)

    Capture local variations indata.

    Algorithm for Training Neural Networks

    Estimate score functions using autoencoder.

    Decompose tensor E[y Sm(x)] to obtain weights, for m 3.

    Recursively estimate score function of autoencoder and repeat.

  • Tensor Methods for Training Neural Networks

    Third order score function m-th order score function

    Input pdf is p(x)

    Sm(x) :=(1)m(m)p(x)

    p(x)

    Capture local variations indata.

    Algorithm for Training Neural Networks

    Estimate score functions using autoencoder.

    Decompose tensor E[y Sm(x)] to obtain weights, for m 3.

    Recursively estimate score function of autoencoder and repeat.

  • Demo: Training Neural Networks

  • Combining Probabilistic Models with Deep Learning

    Multi-object Detection in Computer vision

    Deep learning is able to extract good features, but not context.

    Probabilistic models capture contextual information.

    Hierarchical models + pre-trained deep learning features.

    State-of-art results on Microsoft COCO.

  • Outline

    1 Introduction

    2 Learning Probabilistic Models

    3 Experiments

    4 Feature Learning with Tensor Methods

    5 Conclusion

  • Conclusion: Tensor Methods for Learning

    Tensor Decomposition

    Efficient sample and computational complexities

    Better performance compared to EM, Variational Bayes etc.

    In practice

    Scalable and embarrassingly parallel: handle large datasets.

    Efficient performance: perplexity or ground truth validation.

  • My Research Group and Resources

    Furong H. Majid J. Hanie S. Niranjan U.N.

    Forough A. Tejaswi N. Hao L. Yang S.

    ML summer school lectures available athttp://newport.eecs.uci.edu/anandkumar/MLSS.html

    IntroductionLearning Probabilistic ModelsExperimentsFeature Learning with Tensor MethodsConclusion