Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and...
Transcript of Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and...
![Page 1: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/1.jpg)
Statistical Decision Theory and Information Constraints
Michael I. Jordan University of California, Berkeley
November 20, 2014
![Page 2: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/2.jpg)
What Is the Big Data Phenomenon?
• Science in confirmatory mode (e.g., particle physics) • Science in exploratory mode (e.g., astronomy, genomics) • Measurement of human activity, particularly online
activity, is generating massive datasets that can be used (e.g.) for personalization and for creating markets
• Sensor networks are becoming pervasive
![Page 3: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/3.jpg)
What Are the Conceptual/Mathematical Issues?
• The need to control statistical risk under constraints on algorithmic runtime
![Page 4: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/4.jpg)
What Are the Conceptual/Mathematical Issues?
• The need to control statistical risk under constraints on algorithmic runtime – how do risk and runtime trade off as a function of the amount of
data?
![Page 5: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/5.jpg)
What Are the Conceptual/Mathematical Issues?
• The need to control statistical risk under constraints on algorithmic runtime – how do risk and runtime trade off as a function of the amount of
data? • Statistical with distributed and streaming data
– how is inferential quality impacted by communication constraints?
![Page 6: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/6.jpg)
What Are the Conceptual/Mathematical Issues?
• The need to control statistical risk under constraints on algorithmic runtime – how do risk and runtime trade off as a function of the amount of
data? • Statistical with distributed and streaming data
– how is inferential quality impacted by communication constraints?
• The tradeoff between statistical risk and privacy (and other externalities)
![Page 7: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/7.jpg)
What Are the Conceptual/Mathematical Issues?
• The need to control statistical risk under constraints on algorithmic runtime – how do risk and runtime trade off as a function of the amount of
data? • Statistical with distributed and streaming data
– how is inferential quality impacted by communication constraints?
• The tradeoff between statistical risk and privacy (and other externalities)
• Many other issues that require a blend of statistical thinking (e.g., a focus on sampling, confidence intervals, evaluation, diagnostics, causal inference) and computational thinking (e.g., scalability, abstraction)
![Page 8: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/8.jpg)
• Take (classical) statistical decision theory as a mathematical point of departure
• Treat computation, communication, privacy, etc as constraints on statistical risk
• This induces tradeoffs among these quantities and the number of data points
Our Approach
![Page 9: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/9.jpg)
• Take (classical) statistical decision theory as a mathematical point of departure
• Treat computation, communication, privacy, etc as constraints on statistical risk
• This induces tradeoffs among these quantities and the number of data points
• Under the hood: geometry, information theory and optimization
Our Approach
![Page 10: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/10.jpg)
• In the 1930’s, Wald laid the foundations of statistical decision theory
• Given a family of probability distributions , a parameter for each , an estimator , and a loss , define the risk:
• Minimax principle [Wald, ‘39, ‘43]: choose
estimator minimizing worst-case risk:
Background
supP2P
EP
hl(✓̂, ✓(P ))
i
P✓(P ) P 2 P
✓̂ l(✓̂, ✓(P ))
RP (✓̂) := EP
hl(✓̂, ✓(P ))
i
![Page 11: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/11.jpg)
Part I: Privacy and Minimax Risk
with John Duchi and Martin Wainwright University of California, Berkeley
![Page 12: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/12.jpg)
• Individuals are not generally willing to allow their personal data to be used without control on how it will be used and now much privacy loss they will incur
• We will quantify “privacy loss” via differential privacy
• We then treat differential privacy as a constraint on inference via statistical decision theory
• This yields (personal) tradeoffs between privacy loss and inferential gain
Privacy and Risk
![Page 13: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/13.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
![Page 14: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/14.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
![Page 15: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/15.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
Private
![Page 16: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/16.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
Private
Xi
Zi
b✓
Q(· | Xi)
![Page 17: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/17.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
Private
Xi
Zi
b✓
Q(· | Xi)
Channel
![Page 18: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/18.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
Private
Xi
Zi
b✓
Q(· | Xi)
Channel
Individuals with private data Estimator
Xiiid⇠ Pi 2 {1, . . . , n}
Zn1 7! b✓(Zn
1 )
![Page 19: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/19.jpg)
A model of privacyLocal privacy: providers do not trust collector [Warner 65, Evfimievski et al. 03]
X1 X2 Xn
ZnZ2Z1
b✓
Private
Xi
Zi
b✓
Q(· | Xi)
Channel
Individuals with private data Estimator
Xiiid⇠ Pi 2 {1, . . . , n}
Zn1 7! b✓(Zn
1 )
![Page 20: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/20.jpg)
Definitions of privacyDefinition: channel is -differentially private if
↵Q
[Dwork, McSherry, Nissim, Smith 06]
Xi
Zi
b✓
Q(· | Xi)sup
S,x2X ,x
02X
Q(Z 2 S | x)Q(Z 2 S | x0
)
exp(↵)
![Page 21: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/21.jpg)
Definitions of privacyDefinition: channel is -differentially private if
↵Q
[Dwork, McSherry, Nissim, Smith 06]
Xi
Zi
b✓
Q(· | Xi)sup
S,x2X ,x
02X
Q(Z 2 S | x)Q(Z 2 S | x0
)
exp(↵)
logQ(z | x)logQ(z | x0
)
↵
![Page 22: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/22.jpg)
Private Minimax RiskCentral object of study: minimax risk
Minimax risk
•Parameter of distribution✓(P )
•Family of distributions P•Loss measuring error`
Mn(✓(P), `) := infb✓supP2P
EP
h`(b✓(Xn
1 ), ✓(P ))i
![Page 23: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/23.jpg)
-private Minimax risk↵
Private Minimax RiskCentral object of study: minimax risk•Parameter of distribution✓(P )
•Family of distributions P•Loss measuring error`
Mn(✓(P), `,↵) := infQ2Q↵
infb✓supP2P
EP,Q
h`(b✓(Zn
1 ), ✓(P ))i
•Family of private channelsQ↵
![Page 24: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/24.jpg)
-private Minimax risk↵
Private Minimax RiskCentral object of study: minimax risk
Best -private channel↵
•Parameter of distribution✓(P )
•Family of distributions P•Loss measuring error`
Mn(✓(P), `,↵) := infQ2Q↵
infb✓supP2P
EP,Q
h`(b✓(Zn
1 ), ✓(P ))i
•Family of private channelsQ↵
![Page 25: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/25.jpg)
-private Minimax risk↵
Private Minimax RiskCentral object of study: minimax risk
Best -private channel↵Minimax risk under privacy constraint
•Parameter of distribution✓(P )
•Family of distributions P•Loss measuring error`
Mn(✓(P), `,↵) := infQ2Q↵
infb✓supP2P
EP,Q
h`(b✓(Zn
1 ), ✓(P ))i
•Family of private channelsQ↵
![Page 26: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/26.jpg)
Vignette: private mean (location) estimation
Example: estimate reasons for hospital visits Patients admitted to hospital for substance abuse Estimate prevalence of different substances
![Page 27: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/27.jpg)
Proportions =
Vignette: private mean (location) estimation
Example: estimate reasons for hospital visits Patients admitted to hospital for substance abuse Estimate prevalence of different substances
1 Alcohol 1 Cocaine 0 Heroin 0 Cannabis 0 LSD 0 Amphetamines
✓
✓1✓2✓3✓4✓5✓6
= .45= .32= .16= .20= .00= .02
![Page 28: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/28.jpg)
Vignette: mean estimationConsider estimation of mean ,errors measured in -norm, i.e. for
✓(P ) := EP [X] 2 Rd
`1 E[kb✓ � ✓k1]
Pd :=
�distributions P supported on [�1, 1]d
![Page 29: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/29.jpg)
Proposition:
Vignette: mean estimationConsider estimation of mean ,errors measured in -norm, i.e. for
✓(P ) := EP [X] 2 Rd
`1 E[kb✓ � ✓k1]
Pd :=
�distributions P supported on [�1, 1]d
Minimax rateMn(Pd, k·k1) ⇣ min
⇢1,
plog dpn
�
(achieved by sample mean)
![Page 30: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/30.jpg)
Proposition:Private minimax rate for ↵ = O(1)
Vignette: mean estimationConsider estimation of mean ,errors measured in -norm, i.e. for
✓(P ) := EP [X] 2 Rd
`1 E[kb✓ � ✓k1]
Pd :=
�distributions P supported on [�1, 1]d
Mn(Pd, k·k1 ,↵) ⇣ min
⇢1,
pd log dpn↵2
�
![Page 31: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/31.jpg)
Proposition:Private minimax rate for ↵ = O(1)
Vignette: mean estimationConsider estimation of mean ,errors measured in -norm, i.e. for
✓(P ) := EP [X] 2 Rd
`1 E[kb✓ � ✓k1]
Pd :=
�distributions P supported on [�1, 1]d
Mn(Pd, k·k1 ,↵) ⇣ min
⇢1,
pd log dpn↵2
�
Effective sample size n 7! n↵2/dNote:
![Page 32: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/32.jpg)
Optimal mechanism?
Non-privateobservation
X =
2
66664
10100
3
77775
![Page 33: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/33.jpg)
Optimal mechanism?
Non-privateobservation
X =
2
66664
10100
3
77775
Idea 1: add independent noise(e.g. standard Laplace
mechanism)
Z = X +W =
2
66664
1 +W1
0 +W2
1 +W3
0 +W4
0 +W5
3
77775
![Page 34: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/34.jpg)
Optimal mechanism?
Non-privateobservation
X =
2
66664
10100
3
77775
Idea 1: add independent noise(e.g. standard Laplace
mechanism)
Z = X +W =
2
66664
1 +W1
0 +W2
1 +W3
0 +W4
0 +W5
3
77775
Problem: magnitude much too large(this is unavoidable: provably sub-optimal)
![Page 35: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/35.jpg)
Optimal mechanism
Non-privateobservation
X =
2
66664
10100
3
77775
![Page 36: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/36.jpg)
Optimal mechanism
Non-privateobservation
X =
2
66664
10100
3
77775v =
2
66664
01100
3
777751� v =
2
66664
10011
3
77775
View 1 View 2
•Draw uniformly inv {0, 1}d
![Page 37: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/37.jpg)
Optimal mechanism
Non-privateobservation
X =
2
66664
10100
3
77775v =
2
66664
01100
3
777751� v =
2
66664
10011
3
77775
View 1 View 2
•Draw uniformly inv {0, 1}d
•With probability
choose closer of and tov 1� v X
(closer : 3 overlap) (farther : 2 overlap)
•otherwise, choose farther
e↵
1 + e↵
![Page 38: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/38.jpg)
Optimal mechanism
Non-privateobservation
X =
2
66664
10100
3
77775v =
2
66664
01100
3
777751� v =
2
66664
10011
3
77775
View 1 View 2
•Draw uniformly inv {0, 1}d
•With probability
choose closer of and tov 1� v X
(closer : 3 overlap) (farther : 2 overlap)
•otherwise, choose farther
e↵
1 + e↵
![Page 39: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/39.jpg)
Optimal mechanism
Non-privateobservation
X =
2
66664
10100
3
77775v =
2
66664
01100
3
777751� v =
2
66664
10011
3
77775
View 1 View 2
•Draw uniformly inv {0, 1}d
•With probability
choose closer of and tov 1� v X
(closer : 3 overlap) (farther : 2 overlap)
At end:Compute sample
average andde-bias
•otherwise, choose farther
e↵
1 + e↵
![Page 40: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/40.jpg)
Empirical evidence
Estimate proportion of emergency room visits involving different substances
Data source: Drug Abuse Warning
NetworkSample size n
![Page 41: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/41.jpg)
Sample size reductionsGiven -private channel , pair induces marginalQ {P1, P2}↵
Mj(S) :=
ZQ(S | x1, . . . , xn)dP
nj (x1, . . . , xn)
Xi Zi1,2Pj Q
![Page 42: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/42.jpg)
Sample size reductionsGiven -private channel , pair induces marginalQ {P1, P2}↵
Mj(S) :=
ZQ(S | x1, . . . , xn)dP
nj (x1, . . . , xn)
Question: How much “contraction” does privacy induce?
Xi Zi1,2Pj Q
![Page 43: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/43.jpg)
Sample size reductionsGiven -private channel , pair induces marginalQ {P1, P2}↵
Mj(S) :=
ZQ(S | x1, . . . , xn)dP
nj (x1, . . . , xn)
Question: How much “contraction” does privacy induce?
Xi Zi1,2
Theorem (data processing): for any -private channel and i.i.d. sample of size
↵n
Dkl (M1||M2) +Dkl (M2||M1) 4n(e↵ � 1)2 kP1 � P2k2TV
Pj Q
![Page 44: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/44.jpg)
Sample size reductionsGiven -private channel , pair induces marginalQ {P1, P2}↵
Mj(S) :=
ZQ(S | x1, . . . , xn)dP
nj (x1, . . . , xn)
Note: for n 7! n↵2 ↵ . 1
Question: How much “contraction” does privacy induce?
Xi Zi1,2
Theorem (data processing): for any -private channel and i.i.d. sample of size
↵n
Dkl (M1||M2) +Dkl (M2||M1) 4n(e↵ � 1)2 kP1 � P2k2TV
Pj Q
![Page 45: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/45.jpg)
Final remarks: privacy
Key: Allows identification of new optimal mechanisms
Rough technique: Reduction of estimation to testing,then apply information-theoretic testing lower bounds
[Le Cam, Hasminskii, Ibragimov, Assouad, Birge, Barron, Yu, ...]Additional examples
‣ Fixed-design regression‣ Convex risk minimization‣ Multinomial estimation‣ Nonparametric density estimation
n 7! n↵2Almost always: effective sample size reduction
n 7! n↵2
dIn d-dimensional problems:
![Page 46: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/46.jpg)
Part II: Communication and Minimax Risk
with John Duchi, Martin Wainwright and Yuchen Zhang
University of California, Berkeley
![Page 47: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/47.jpg)
Communication-constraints
[Yao 79; Abelson 80;Tsitsiklis and Luo 87; Han & Amari 98; Tatikonda & Mitter 04; ...]
![Page 48: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/48.jpg)
Communication-constraints•Large data necessitates distributed storage•Independent data collection (hospitals)•Privacy?
[Yao 79; Abelson 80;Tsitsiklis and Luo 87; Han & Amari 98; Tatikonda & Mitter 04; ...]
![Page 49: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/49.jpg)
Communication-constraints
X1 X2 Xm
Z1 Z2 Zm
b✓
•Large data necessitates distributed storage•Independent data collection (hospitals)•Privacy?
[Yao 79; Abelson 80;Tsitsiklis and Luo 87; Han & Amari 98; Tatikonda & Mitter 04; ...]
![Page 50: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/50.jpg)
Communication-constraints
X1 X2 Xm
Z1 Z2 Zm
b✓
Xi = (Xi1, X
i2, . . . , X
in)
•Large data necessitates distributed storage•Independent data collection (hospitals)•Privacy?
Setting: each of agents has sample of size
mn
Messages to fusion centerZi
[Yao 79; Abelson 80;Tsitsiklis and Luo 87; Han & Amari 98; Tatikonda & Mitter 04; ...]
![Page 51: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/51.jpg)
Communication-constraints
X1 X2 Xm
Z1 Z2 Zm
b✓
Xi = (Xi1, X
i2, . . . , X
in)
•Large data necessitates distributed storage•Independent data collection (hospitals)•Privacy?
Setting: each of agents has sample of size
mn
Messages to fusion centerZi
Question: tradeoffs between communication
and statistical utility?[Yao 79; Abelson 80;Tsitsiklis and Luo 87; Han & Amari 98; Tatikonda & Mitter 04; ...]
![Page 52: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/52.jpg)
Minimax risk with -bounded communicationB
Minimax communicationCentral object of study:• Parameter of distribution✓(P )
• Family of distributions P
Mn(✓(P), B) := inf⇡2⇧B
infb✓supP2P
EP
hkb✓(Zm
1 )� ✓(P )k22i
X1 X2 Xm
Z1 Z2 Zm
b✓
• Loss k·k22
![Page 53: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/53.jpg)
Minimax risk with -bounded communicationB
Minimax communicationCentral object of study:• Parameter of distribution✓(P )
• Family of distributions P
Mn(✓(P), B) := inf⇡2⇧B
infb✓supP2P
EP
hkb✓(Zm
1 )� ✓(P )k22i
Best protocol with smaller than bitsBZiZi = ⇡(Xi)
X1 X2 Xm
Z1 Z2 Zm
b✓
• Loss k·k22
Constrained to be bits B
![Page 54: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/54.jpg)
Vignette: mean estimationX1 X2 Xm
Z1 Z2 Zm
b✓
Consider estimation in normallocation family,
Xijiid⇠ N(✓,�2Id⇥d)
✓ 2 [�1, 1]d
![Page 55: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/55.jpg)
Vignette: mean estimationX1 X2 Xm
Z1 Z2 Zm
b✓
Minimax rateTheorem: when each agent has sample of sizen
E[kb✓(X1, . . . , Xm)� ✓k22] ⇣�2d
nm
Consider estimation in normallocation family,
Xijiid⇠ N(✓,�2Id⇥d)
✓ 2 [�1, 1]d
![Page 56: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/56.jpg)
Minimax rate with B-bounded communication
Vignette: mean estimationX1 X2 Xm
Z1 Z2 Zm
b✓
Theorem: when each agent has sample of sizen
bitsB
Consider estimation in normallocation family,
Xijiid⇠ N(✓,�2Id⇥d)
✓ 2 [�1, 1]d
d
B ^ d
1
logm
�2d
nm. Mn(Nd, B) . d logm
B ^ d
�2d
nm
![Page 57: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/57.jpg)
Minimax rate with B-bounded communication
Vignette: mean estimationX1 X2 Xm
Z1 Z2 Zm
b✓
Theorem: when each agent has sample of sizen
bitsB
Consequence: each sends bits for optimal estimation⇡ d
Consider estimation in normallocation family,
Xijiid⇠ N(✓,�2Id⇥d)
✓ 2 [�1, 1]d
d
B ^ d
1
logm
�2d
nm. Mn(Nd, B) . d logm
B ^ d
�2d
nm
![Page 58: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/58.jpg)
Part III: Computation and Minimax Risk
with John Duchi and Brendan McMahan
![Page 59: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/59.jpg)
Towards computationHighly-varying
or sparse X 2 Rn⇥d ✓ 2 Rd
Noise " 2 RnTargetY 2 Rn
[D., Hazan, Singer 11; D., Jordan, McMahan 13]
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775⇡
z }| {2
6666664
⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
⇤ ⇤⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
3
7777775
z}|{2
666666664
⇤⇤⇤⇤⇤⇤⇤
3
777777775
+
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775
![Page 60: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/60.jpg)
Towards computationHighly-varying
or sparse X 2 Rn⇥d ✓ 2 Rd
Noise " 2 RnTargetY 2 Rn
•Both large n and large d•Analogous models for classification (e.g. logistic)
[D., Hazan, Singer 11; D., Jordan, McMahan 13]
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775⇡
z }| {2
6666664
⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
⇤ ⇤⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
3
7777775
z}|{2
666666664
⇤⇤⇤⇤⇤⇤⇤
3
777777775
+
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775
![Page 61: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/61.jpg)
Columns very different (ill-conditioning)
Towards computationHighly-varying
or sparse X 2 Rn⇥d ✓ 2 Rd
Noise " 2 RnTargetY 2 Rn
•Both large n and large d•Analogous models for classification (e.g. logistic)
[D., Hazan, Singer 11; D., Jordan, McMahan 13]
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775⇡
z }| {2
6666664
⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
⇤ ⇤⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤⇤ ⇤ ⇤ ⇤
3
7777775
z}|{2
666666664
⇤⇤⇤⇤⇤⇤⇤
3
777777775
+
z}|{2
6666664
⇤⇤⇤⇤⇤⇤
3
7777775
![Page 62: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/62.jpg)
Sparse and heavy-tailed
Features of 2000 webpages
Medical risk prediction patients and indicator variablesBinary interactions yield
(sparse) featuresd ⇡ 4.5 · 108
d ⇡ 3 · 104n ⇡ 105
Text data classification Binary word indicatorsfor words, bigramslead to variables,< 1% non-zero
d ⇡ 5 · 104
d ⇡ 3 · 106
[Manning & Schütze 99; Shah & Meinshausen 13; Li & König 11; Ma et al., 09; Crammer, Dredze, Kulesza 09; ...]
![Page 63: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/63.jpg)
Sparse and heavy-tailed
Features of 2000 webpages
Medical risk prediction patients and indicator variablesBinary interactions yield
(sparse) featuresd ⇡ 4.5 · 108
d ⇡ 3 · 104n ⇡ 105
Text data classification Binary word indicatorsfor words, bigramslead to variables,< 1% non-zero
d ⇡ 5 · 104
d ⇡ 3 · 106
[Manning & Schütze 99; Shah & Meinshausen 13; Li & König 11; Ma et al., 09; Crammer, Dredze, Kulesza 09; ...]
![Page 64: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/64.jpg)
Convex risk minimizationLarge n, large d: focus on prediction. Minimize
RP (✓) := EP [`(X; ✓)]
convex inwhere ` ✓ and X ⇠ P
![Page 65: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/65.jpg)
Convex risk minimizationLarge n, large d: focus on prediction. Minimize
Computation = optimization complexity
{x, ✓} `(x; ✓)
r✓`(x; ✓)
1unit
[Nemirovski& Yudin 83]
RP (✓) := EP [`(X; ✓)]
convex inwhere ` ✓ and X ⇠ P
![Page 66: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/66.jpg)
Convex risk minimizationLarge n, large d: focus on prediction. Minimize
Computation = optimization complexity
{x, ✓} `(x; ✓)
r✓`(x; ✓)
1unit
[Nemirovski& Yudin 83]
RP (✓) := EP [`(X; ✓)]
convex inwhere ` ✓ and X ⇠ P
Minimax risk with n computationsMn(⇥,P, `) := inf
b✓2Cn
supP2P
n
EP [RP (b✓)]�min✓2⇥
RP (✓)o
![Page 67: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/67.jpg)
Optimal convergenceLinear functionals on hypercube`(x; ✓) = x
>✓
⇥ = [�1, 1]d
E[X]
Lower bounds
Mn(⇥,P, `) � 1
8
1pn
dX
j=1
ppj
Feature j of appears with probability pjX 2 {�1, 0, 1}d
![Page 68: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/68.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
✓i+1 ✓i �1pigi
![Page 69: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/69.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
✓i+1 ✓i �1pigi
Efficient? Sometimes[Robbins & Monro 51; Polyak & Juditsky 92; Nedić & Bertsekas 01; Lai 03; Nemirovski, Juditsky, Lan, Shapiro 09; Agarwal, Bartlett, Ravikumar, Wainwright 12;...]
![Page 70: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/70.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
✓i+1 ✓i �1pigi
Problem: method (and variants) sub-optimal for data with high-variance features
Efficient? Sometimes[Robbins & Monro 51; Polyak & Juditsky 92; Nedić & Bertsekas 01; Lai 03; Nemirovski, Juditsky, Lan, Shapiro 09; Agarwal, Bartlett, Ravikumar, Wainwright 12;...]
![Page 71: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/71.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
✓i+1 ✓i �1pigi
Problem: method (and variants) sub-optimal for data with high-variance features
Efficient? Sometimes[Robbins & Monro 51; Polyak & Juditsky 92; Nedić & Bertsekas 01; Lai 03; Nemirovski, Juditsky, Lan, Shapiro 09; Agarwal, Bartlett, Ravikumar, Wainwright 12;...]
Sometimes a factor of d worse than necessary...
![Page 72: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/72.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
✓i+1 ✓i �1pigi
![Page 73: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/73.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
Unc
omm
on
Common
Contours of R(✓)
✓i+1 ✓i �1pigi
![Page 74: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/74.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
Unc
omm
on
CommonCan we do something a bit more second order?
Contours of R(✓)
✓i+1 ✓i �1pigi
![Page 75: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/75.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
Unc
omm
on
CommonCan we do something a bit more second order?
hard
Rescaled risk
Contours of R(✓) easier
✓i+1 ✓i �1pigi
![Page 76: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/76.jpg)
Methods and challengesStochastic approximationIterate:
gi = r`(Xi; ✓i)1. Random
2. Update
Xi
Unc
omm
on
CommonCan we do something a bit more second order?
hard
Rescaled risk
Contours of R(✓) easier
Ai = diag
✓X
ji
gjg>j
◆ 12
✓i+1 ✓i �A�1i gi
AdaGrad
![Page 77: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/77.jpg)
Optimal convergence
Upper bounds
Iterate:gi = r`(Xi; ✓i)1. Gradient
3. Update
2. Scale Ai = diag(Pji
gjg>j )12
✓i+1 ✓i �A�1i gi
For any risk functional,E[R(b✓)]�R(✓⇤)
1p2n
E
mindiagonal A⌫0
⇢k✓⇤k21 tr(A) +
nX
i=1
g>i A�1gi
��
![Page 78: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/78.jpg)
Optimal convergence
Upper bounds
Iterate:gi = r`(Xi; ✓i)1. Gradient
3. Update
2. Scale Ai = diag(Pji
gjg>j )12
✓i+1 ✓i �A�1i gi
Oracle: adaptiveto this data
For any risk functional,E[R(b✓)]�R(✓⇤)
1p2n
E
mindiagonal A⌫0
⇢k✓⇤k21 tr(A) +
nX
i=1
g>i A�1gi
��
![Page 79: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/79.jpg)
Optimal convergence
Upper bounds
Lower bounds
Mn(⇥,P, `) � 1
8
1pn
dX
j=1
ppj
Feature j of appears with probability pjX
Linear functional on hypercube`(x; ✓) = x
>✓
⇥ = [�1, 1]d
For any risk functional,E[R(b✓)]�R(✓⇤)
1p2n
E
mindiagonal A⌫0
⇢k✓⇤k21 tr(A) +
nX
i=1
g>i A�1gi
��
![Page 80: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/80.jpg)
Optimal convergence
Upper bounds
Lower bounds
Mn(⇥,P, `) � 1
8
1pn
dX
j=1
ppj
Feature j of appears with probability pjX
Linear functional on hypercube`(x; ✓) = x
>✓
⇥ = [�1, 1]d
Linear models on hypercubeE[R(b✓)]�R(✓⇤)
p2pn
dX
j=1
ppj
![Page 81: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/81.jpg)
Optimal convergence
Upper bounds
Lower bounds
Mn(⇥,P, `) � 1
8
1pn
dX
j=1
ppj
Feature j of appears with probability pjX
Linear functional on hypercube`(x; ✓) = x
>✓
⇥ = [�1, 1]d
Linear models on hypercubeE[R(b✓)]�R(✓⇤)
p2pn
dX
j=1
ppj
Note: adaptivity < factor 12 larger than lower bound
![Page 82: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/82.jpg)
Empirics: text classificationReuters news feed document classification task:d = 2,000,000 features, 4000 non-zero / document,n = 800,000 documents
1[Crammer et al. ’06]
Misc
lassifi
catio
n ra
te 1
![Page 83: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/83.jpg)
Empirics: text classificationReuters news feed document classification task:d = 2,000,000 features, 4000 non-zero / document,n = 800,000 documents
1[Crammer et al. ’06]
Misc
lassifi
catio
n ra
te 125 - 50%
improvement
![Page 84: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/84.jpg)
Empirics: image rankingTask: 15,000 different nounsGiven noun, rank images in order of relevance for nounPer noun: observations in dimensionsn ⇡ 2 · 106 d ⇡ 104
[Bengio, Weston, Grangier 10]
![Page 85: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/85.jpg)
Empirics: image rankingTask: 15,000 different nounsGiven noun, rank images in order of relevance for nounpr
opor
tion
good
in to
p k
k
Per noun: observations in dimensionsn ⇡ 2 · 106 d ⇡ 104
[Bengio, Weston, Grangier 10]
![Page 86: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/86.jpg)
One more result
0 20 40 60 80 100 1200
5
10
15
20
25
Time (hours)
Ave
rag
e F
ram
e A
ccu
racy
(%
)Accuracy on Test Set
SGDGPUDownpour SGDDownpour SGD w/AdagradSandblaster L!BFGS
[Dean et al. 12]
![Page 87: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/87.jpg)
One more result
0 20 40 60 80 100 1200
5
10
15
20
25
Time (hours)
Ave
rag
e F
ram
e A
ccu
racy
(%
)Accuracy on Test Set
SGDGPUDownpour SGDDownpour SGD w/AdagradSandblaster L!BFGS
Adaptive method
[Dean et al. 12]
![Page 88: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/88.jpg)
One more result
0 20 40 60 80 100 1200
5
10
15
20
25
Time (hours)
Ave
rag
e F
ram
e A
ccu
racy
(%
)Accuracy on Test Set
SGDGPUDownpour SGDDownpour SGD w/AdagradSandblaster L!BFGS
Adaptive method
[Dean et al. 12]
Order of magnitude improvement in time & costIn many production systems at Google(only result I may show...)
![Page 89: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/89.jpg)
Unifying picture
![Page 90: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/90.jpg)
Unifying pictureConstraints in minimax analysis
Reduction of estimation to testingV X Z
( is private, only a few bits, a function evaluation...)Z
find initial using onlyV Z
![Page 91: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/91.jpg)
Unifying pictureConstraints in minimax analysis
Reduction of estimation to testing
Information-theoretic techniques: strong data processing
V X Z
Relate information ‣ Fano‣ Assouad‣ Le Cam
tovia constraint on pairV Z V X
X Z
apply
( is private, only a few bits, a function evaluation...)Z
find initial using onlyV Z
![Page 92: Statistical Decision Theory and Information Constraints€¦ · Statistical Decision Theory and Information Constraints Michael I. Jordan University of California, Berkeley November](https://reader034.fdocuments.net/reader034/viewer/2022042220/5ec6bd9dd3e7652ec166470c/html5/thumbnails/92.jpg)
Unifying pictureConstraints in minimax analysis
Reduction of estimation to testing
Information-theoretic techniques: strong data processing
V X Z
Relate information ‣ Fano‣ Assouad‣ Le Cam
tovia constraint on pairV Z V X
X Z
apply
Understanding leads to new, optimal schemes
( is private, only a few bits, a function evaluation...)Z
find initial using onlyV Z