Resilience: A Criterion for Learning in the Presence of ...
Transcript of Resilience: A Criterion for Learning in the Presence of ...
![Page 1: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/1.jpg)
Resilience: A Criterion for Learning
in the Presence of Arbitrary Outliers
Jacob Steinhardt, Moses Charikar, Gregory Valiant
ITCS 2018
January 14, 2018
![Page 2: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/2.jpg)
Motivation: Robust Learning
Question
What concepts can be learned robustly,even if some data is arbitrarily corrupted?
1
![Page 3: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/3.jpg)
Example: Mean Estimation
Problem
Given data x1, . . . , xn ∈ Rd, of which (1 − ε)n come from p∗
(and remaining εn are arbitrary outliers), estimate mean µ of p∗.
2
![Page 4: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/4.jpg)
Example: Mean Estimation
Problem
Given data x1, . . . , xn ∈ Rd, of which (1 − ε)n come from p∗
(and remaining εn are arbitrary outliers), estimate mean µ of p∗.
2
![Page 5: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/5.jpg)
Example: Mean Estimation
Problem
Given data x1, . . . , xn ∈ Rd, of which (1 − ε)n come from p∗
(and remaining εn are arbitrary outliers), estimate mean µ of p∗.
2
![Page 6: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/6.jpg)
Example: Mean Estimation
Problem
Given data x1, . . . , xn ∈ Rd, of which (1 − ε)n come from p∗
(and remaining εn are arbitrary outliers), estimate mean µ of p∗.
2
![Page 7: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/7.jpg)
Example: Mean Estimation
Problem
Given data x1, . . . , xn ∈ Rd, of which (1 − ε)n come from p∗
(and remaining εn are arbitrary outliers), estimate mean µ of p∗.
Issue: high dimensions2
![Page 8: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/8.jpg)
Mean Estimation: Gaussian Example
Suppose clean data is Gaussian:
xi ∼ N (µ, I)︸ ︷︷ ︸Gaussian mean µ
variance 1 each coord.
3
![Page 9: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/9.jpg)
Mean Estimation: Gaussian Example
Suppose clean data is Gaussian:
xi ∼ N (µ, I)︸ ︷︷ ︸Gaussian mean µ
variance 1 each coord.
√d ‖xi − µ‖2 ≈
√12 + · · ·+ 12 =
√d
3
![Page 10: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/10.jpg)
Mean Estimation: Gaussian Example
Suppose clean data is Gaussian:
xi ∼ N (µ, I)︸ ︷︷ ︸Gaussian mean µ
variance 1 each coord.
√d
ε√d
‖xi − µ‖2 ≈√12 + · · ·+ 12 =
√d
3
![Page 11: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/11.jpg)
Mean Estimation: Gaussian Example
Suppose clean data is Gaussian:
xi ∼ N (µ, I)︸ ︷︷ ︸Gaussian mean µ
variance 1 each coord.
√d
ε√d
‖xi − µ‖2 ≈√12 + · · ·+ 12 =
√d
Cannot filter independentlyeven if know true density!
3
![Page 12: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/12.jpg)
History
Progress in high dimensions only recently:
• Tukey median [1975]: robust but NP-hard
• Donoho estimator [1982]: high error
• [DKKLMS16, LRV16]: first dimension-independent error bounds
4
![Page 13: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/13.jpg)
History
Progress in high dimensions only recently:
• Tukey median [1975]: robust but NP-hard
• Donoho estimator [1982]: high error
• [DKKLMS16, LRV16]: first dimension-independent error bounds
• large body of work since then [CSV17, DKKLMS17, L17, DBS17]
• many other problems including PCA [XCM10], regression[NTN11], classification [FHKP09], etc.
4
![Page 14: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/14.jpg)
This Talk
Question
What general and simple properties enable robust estimation?
5
![Page 15: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/15.jpg)
This Talk
Question
What general and simple properties enable robust estimation?
New information-theoretic criterion: resilience.
5
![Page 16: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/16.jpg)
Resilience
Suppose xii∈S is a set of points in Rd.
Definition (Resilience)
A set S is (σ, ε)-resilient in a norm ‖ · ‖ around a point µ iffor all subsets T ⊆ S of size at least (1− ε)|S|,∥∥∥ 1
|T |∑i∈T
(xi − µ)∥∥∥ ≤ σ.
Intuition: all large subsets have similar mean.
6
![Page 17: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/17.jpg)
Main Result
Let S ⊆ Rd be a set of of (1− ε)n “good” points.
Let Sout be a set of εn arbitrary outliers.
We observe S = S ∪ Sout.
Theorem
If S is (σ, ε1−ε )-resilient around µ, then it is possible to output
µ such that ‖µ− µ‖ ≤ 2σ.
In fact, outputting the center of any resilient subset of S will work!
7
![Page 18: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/18.jpg)
Pigeonhole Argument
Claim: If S and S′ are (σ, ε1−ε )-resilient around µ and µ′ and have size
(1− ε)n, then ‖µ− µ′‖ ≤ 2σ.
8
![Page 19: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/19.jpg)
Pigeonhole Argument
Claim: If S and S′ are (σ, ε1−ε )-resilient around µ and µ′ and have size
(1− ε)n, then ‖µ− µ′‖ ≤ 2σ.
Proof:
S
S
8
![Page 20: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/20.jpg)
Pigeonhole Argument
Claim: If S and S′ are (σ, ε1−ε )-resilient around µ and µ′ and have size
(1− ε)n, then ‖µ− µ′‖ ≤ 2σ.
Proof:
S
S
S′
8
![Page 21: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/21.jpg)
Pigeonhole Argument
Claim: If S and S′ are (σ, ε1−ε )-resilient around µ and µ′ and have size
(1− ε)n, then ‖µ− µ′‖ ≤ 2σ.
Proof:
S
S
S′ S ∩ S′
• Let µS∩S′ be the mean of S ∩ S′.• By Pigeonhole, |S ∩ S′| ≥ ε
1−ε |S′|.
8
![Page 22: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/22.jpg)
Pigeonhole Argument
Claim: If S and S′ are (σ, ε1−ε )-resilient around µ and µ′ and have size
(1− ε)n, then ‖µ− µ′‖ ≤ 2σ.
Proof:
S
S
S′ S ∩ S′
• Let µS∩S′ be the mean of S ∩ S′.• By Pigeonhole, |S ∩ S′| ≥ ε
1−ε |S′|.
• Then ‖µ′ − µS∩S′‖ ≤ σ by resilience.
• Similarly, ‖µ− µS∩S′‖ ≤ σ.• Result follows by triangle inequality.
8
![Page 23: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/23.jpg)
Implication: Mean Estimation
Lemma
If a dataset has bounded covariance, it is (ε,O(√ε))-resilient
(in the `2-norm).
9
![Page 24: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/24.jpg)
Implication: Mean Estimation
Lemma
If a dataset has bounded covariance, it is (ε,O(√ε))-resilient
(in the `2-norm).
Proof: If εn points 1/√ε from mean, would make variance 1.
Therefore, deleting εn points changes mean by at most ≈ ε ·1/√ε =√ε.
9
![Page 25: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/25.jpg)
Implication: Mean Estimation
Lemma
If a dataset has bounded covariance, it is (ε,O(√ε))-resilient
(in the `2-norm).
Proof: If εn points 1/√ε from mean, would make variance 1.
Therefore, deleting εn points changes mean by at most ≈ ε ·1/√ε =√ε.
Corollary
If the clean data has bounded covariance, its mean can beestimated to `2-error O(
√ε) in the presence of εn outliers.
9
![Page 26: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/26.jpg)
Implication: Mean Estimation
Lemma
If a dataset has bounded covariance, it is (ε,O(√ε))-resilient
(in the `2-norm).
Proof: If εn points 1/√ε from mean, would make variance 1.
Therefore, deleting εn points changes mean by at most ≈ ε ·1/√ε =√ε.
Corollary
If the clean data has bounded kth moments, its mean can beestimated to `2-error O(ε1−1/k) in the presence of εn outliers.
9
![Page 27: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/27.jpg)
Implication: Learning Discrete Distributions
Suppose we observe samples from a distribution π on 1, . . . ,m.
Samples come in r-tuples, which are either all good or all outliers.
10
![Page 28: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/28.jpg)
Implication: Learning Discrete Distributions
Suppose we observe samples from a distribution π on 1, . . . ,m.
Samples come in r-tuples, which are either all good or all outliers.
Corollary
The distribution π can be estimated (in TV distance) to errorO(ε
√log(1/ε)/r) in the presence of εn outliers.
10
![Page 29: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/29.jpg)
Implication: Learning Discrete Distributions
Suppose we observe samples from a distribution π on 1, . . . ,m.
Samples come in r-tuples, which are either all good or all outliers.
Corollary
The distribution π can be estimated (in TV distance) to errorO(ε
√log(1/ε)/r) in the presence of εn outliers.
• follows from resilience in `1-norm
• see also [Qiao & Valiant, 2018] later in this session!
10
![Page 30: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/30.jpg)
A Majority of Outliers
Can also handle the case where clean set has size only αn (α < 12 ):
S
S
11
![Page 31: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/31.jpg)
A Majority of Outliers
Can also handle the case where clean set has size only αn (α < 12 ):
S
S
• cover S by resilient sets
11
![Page 32: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/32.jpg)
A Majority of Outliers
Can also handle the case where clean set has size only αn (α < 12 ):
S
S′
SS ∩ S′
• cover S by resilient sets
• at least one set S′ must have high overlap with S...
11
![Page 33: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/33.jpg)
A Majority of Outliers
Can also handle the case where clean set has size only αn (α < 12 ):
S
S′
SS ∩ S′
• cover S by resilient sets
• at least one set S′ must have high overlap with S...
• ...and hence ‖µ′ − µ‖ ≤ 2σ as before.
11
![Page 34: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/34.jpg)
A Majority of Outliers
Can also handle the case where clean set has size only αn (α < 12 ):
S
S′
SS ∩ S′
• cover S by resilient sets
• at least one set S′ must have high overlap with S...
• ...and hence ‖µ′ − µ‖ ≤ 2σ as before.
• Recovery in list-decodable model [BBV08].
11
![Page 35: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/35.jpg)
Implication: Stochastic Block Models
Set of αn good and (1− α)n bad vertices.
12
![Page 36: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/36.jpg)
Implication: Stochastic Block Models
Set of αn good and (1− α)n bad vertices.
• good ↔ good: dense (avg. deg. = a)
• good ↔ bad: sparse (avg. deg. = b)
12
![Page 37: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/37.jpg)
Implication: Stochastic Block Models
Set of αn good and (1− α)n bad vertices.
• good ↔ good: dense (avg. deg. = a)
• good ↔ bad: sparse (avg. deg. = b)
• bad ↔ bad: arbitrary
12
![Page 38: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/38.jpg)
Implication: Stochastic Block Models
Set of αn good and (1− α)n bad vertices.
• good ↔ good: dense (avg. deg. = a)
• good ↔ bad: sparse (avg. deg. = b)
• bad ↔ bad: arbitrary
Question: when can good set be recovered (in terms of α, a, b)?
12
![Page 39: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/39.jpg)
Implication: Stochastic Block Models
Using resilience in “truncated `1-norm”, can show:
Corollary
The set of good vertices can be approximately recovered when-
ever (a−b)2a log(2/α)
α2 .
13
![Page 40: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/40.jpg)
Implication: Stochastic Block Models
Using resilience in “truncated `1-norm”, can show:
Corollary
The set of good vertices can be approximately recovered when-
ever (a−b)2a log(2/α)
α2 .
Matches Kesten-Stigum threshold up to log factors!
13
![Page 41: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/41.jpg)
Implication: Stochastic Block Models
Using resilience in “truncated `1-norm”, can show:
Corollary
The set of good vertices can be approximately recovered when-
ever (a−b)2a log(2/α)
α2 .
Matches Kesten-Stigum threshold up to log factors!
For planted clique (a = n, b = n/2), recover cliques of size Ω(√n log n).
• this is tight [S’17]
13
![Page 42: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/42.jpg)
Algorithmic Results
Can (sometimes) turn info-theoretic into algorithmic results.
14
![Page 43: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/43.jpg)
Algorithmic Results
Can (sometimes) turn info-theoretic into algorithmic results.
Most existing algorithmic results rely on bounded covariance.
14
![Page 44: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/44.jpg)
Algorithmic Results
Can (sometimes) turn info-theoretic into algorithmic results.
Most existing algorithmic results rely on bounded covariance.
We show:
• for strongly convex norms, resilient sets can be “pruned” to havebounded covariance
• if injective norm is approximable, bounded covariance → efficientalgorithm with
√ε error
• both true for `p-norms! (p ∈ [2,∞])
14
![Page 45: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/45.jpg)
Algorithmic Results
Can (sometimes) turn info-theoretic into algorithmic results.
Most existing algorithmic results rely on bounded covariance.
We show:
• for strongly convex norms, resilient sets can be “pruned” to havebounded covariance
• if injective norm is approximable, bounded covariance → efficientalgorithm with
√ε error
• both true for `p-norms! (p ∈ [2,∞])
See [Li, 2017] and [Du, Balakrishnan, & Singh, 2017] for a non-`p-norm.
14
![Page 46: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/46.jpg)
Other Results
Finite-sample bounds
Extension to SVD
15
![Page 47: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/47.jpg)
Summary
Information-theoretic criterion yielding (tight?) robust recovery bounds.
• based on simple pigeonhole arguments
16
![Page 48: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/48.jpg)
Summary
Information-theoretic criterion yielding (tight?) robust recovery bounds.
• based on simple pigeonhole arguments
Benefit: from statistical problem to algorithmic problem.
16
![Page 49: Resilience: A Criterion for Learning in the Presence of ...](https://reader034.fdocuments.net/reader034/viewer/2022051307/6279e730a34cbe5dad3fb0e1/html5/thumbnails/49.jpg)
Summary
Information-theoretic criterion yielding (tight?) robust recovery bounds.
• based on simple pigeonhole arguments
Benefit: from statistical problem to algorithmic problem.
Open questions:
• resilience for other problems (e.g. regression)
• efficient algos under other assumptions
• matching lower bounds?
16