Learning Mixtures of Structured Distributions over Discrete Domains
description
Transcript of Learning Mixtures of Structured Distributions over Discrete Domains
![Page 1: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/1.jpg)
Learning Mixtures of Structured Distributions over Discrete Domains
Xiaorui SunColumbia University
Joint work with Siu-On Chan(UC Berkeley), Ilias Diakonikolas(U Edinburgh), Rocco Servedio(Columbia University)
![Page 2: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/2.jpg)
Density Estimation• PAC-type learning model• Set of possible target distributions over • Learner – Know the set but does not know the target
distribution – Independently draws a few samples from – Outputs (succinct description of a)
distribution which is -close to • Total variation distance is standard measure in
statistics
![Page 3: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/3.jpg)
Learn a structured distribution
• If = {all distributions over }, samples are required
• Much better sample complexities possible for structured distributions– Poisson binomial distributions [DDS12a]• samples
–Monotone/k-modal [Bir87, DDS12b]• samples/ samples
![Page 4: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/4.jpg)
This work: Learn mixture of structured distributions
• Learn mixture of distributions?– A set of distributions over – Target distribution is a mixture of
distributions from– i.e. , such that
• Our result: learn mixtures for several structured distributions– Sample complexity close to optimal– Efficient running time
![Page 5: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/5.jpg)
Our results: learning mixture of log-concave
• Log-concave distribution over [n]– – for
1 n
![Page 6: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/6.jpg)
Our results: log-concave
• Algorithm to learn a mixture of log-concave distributions – Sample complexity: – Running time: bit operations
• Lower bound: samples
![Page 7: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/7.jpg)
Our results: mixture of unimodal
• Unimodal distribution over [n]– s.t.
1 n
![Page 8: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/8.jpg)
Our results: mixture of unimodal
• A mixture of 2 unimodal distributions may have modes
• Algorithm to learn a mixture of unimodal distributions– Sample complexity: samples– Running time: bit operations
• Lower bound: samples
![Page 9: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/9.jpg)
Our results: mixture of MHR
• Monotone hazard rate distribution – Hazard rate of : – if –MHR distribution: is a non-decreasing
function over
1 n
![Page 10: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/10.jpg)
Our results: mixture of MHR
• Algorithm to learn a mixture of MHR distributions – Sample complexity: – Running time: bit operations
• Lower Bound: samples
![Page 11: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/11.jpg)
Compare with parameter estimation
• Parameter estimation [KMV10, MV 10] – Learn a mixture of Gaussians– Independently draw a few samples from – Estimate the parameters of each
Gaussian component accurately • Number of samples inherently
exponentially depends on , even for a mixture of 1-dimensional normal distributions [MV10]
![Page 12: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/12.jpg)
Compare with parameter estimation
• Parameter estimation needs at least exp() samples to learn a mixture of binomial distributions– Similar to the lower bound in [MV 10]
• Density estimation allows to estimate non parametric distributions– E.g. log-concave, unimodal, MHR
• Density estimation for mixture of binomial distributions over using samples– Binomial distribution is log-concave
![Page 13: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/13.jpg)
Outline
• Learning algorithm based on decomposition
• Structural results for log-concave, unimodal, MHR distributions
![Page 14: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/14.jpg)
Flat decomposition
• Key definition: distribution is -flat if there exists a partition of into intervals such that – is an -flat decomposition for
• is obtained by "flattening" within each interval – for
![Page 15: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/15.jpg)
Flat decomposition
1 n
![Page 16: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/16.jpg)
Learn -flat distributions
• Main general Thm: Let = {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that .
• Linear running time with respect to the number of samples
![Page 17: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/17.jpg)
Easier problem: known decomposition
• Given– Samples from an -flat distribution – -flat decomposition for
• Idea: estimate probability mass of every interval in
• samples are enough
![Page 18: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/18.jpg)
Real problem: unknown decomposition
• Only given samples from a -flat distribution
• Exists some -flat decomposition for , but unknown
• A useful fact [DDS+ 13]: If is a -flat decomposition of , and is a “refinement” of , is a -flat decomposition of – If know a refinement of , it is good
![Page 19: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/19.jpg)
Unknown flat decomposition (cont)
• Idea: partition [n] into intervals each with small probability mass,
– Achieve by sampling from
1 n
𝒦ℒ
![Page 20: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/20.jpg)
Unknown flat decomposition (cont)
• Exist (unknown)– Refinement of both and– intervals
1 n
𝒦ℒ
![Page 21: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/21.jpg)
Unknown flat decomposition (cont)
• Exist – Refinement of both and– intervals– -flat decomposition for
1 n
𝒥
![Page 22: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/22.jpg)
Unknown flat decomposition (cont)
• Compare and
1 n
𝒥1 n
𝒥𝒦
![Page 23: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/23.jpg)
Unknown flat decomposition (cont)
• If the total probability mass of every intervals of is at most , then
• Partition [n] into intervals each with probability mass at most – samples are enough
![Page 24: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/24.jpg)
Learn -flat distributions
• Main general Thm: Let {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that
![Page 25: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/25.jpg)
Learn mixture of distributions
• Lem:A mixture of -flat distributions has an -flat decomposition– Tight for interesting distribution classes
• Thm(Learn mixture): Let be a mixture of -flat distributions. There is an algorithm which draws samples, and outputs a hypothesis s.t.
![Page 26: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/26.jpg)
First application: learning mixture of log-concave distributions
• Recall definition:– – for
• Lem: Every log-concave distribution is -flat
• Learn a mixture of log-concave distributions with samples
![Page 27: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/27.jpg)
Second application: learning mixture of unimodal distribution
• Lem: Every unimodal distribution is -flat [Bir87, DDS+13]
• Learn a mixture of unimodal distribution with samples
![Page 28: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/28.jpg)
Third application: learning mixture of MHR distribution
• Monotone hazard rate distribution– Hazard rate of : – if – is a non-decreasing function over
• Lem: Every MHR distribution is -flat• Learn a mixture of MHR distributions
with samples
![Page 29: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/29.jpg)
Conclusion and further directions
• Flat decomposition is a useful way to study mixtures of structured distributions
• Extend to higher dimension?• Efficient algorithm with optimal
sample complexity
Distribution Sample complexity Lower boundLog-concaveUnimodalMHR
![Page 30: Learning Mixtures of Structured Distributions over Discrete Domains](https://reader035.fdocuments.net/reader035/viewer/2022062411/56816835550346895dddebd5/html5/thumbnails/30.jpg)
Thank you !