Regret to the Best vs. Regret to the Average
date post
16-Jan-2016Category
Documents
view
38download
0
Embed Size (px)
description
Transcript of Regret to the Best vs. Regret to the Average
Regret to the Best vs. Regret to the Average
Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAUpenn + Tel Aviv Univ.Slides: Csaba
MotivationExpert algorithms attempt to control regret to the return of the best expertRegret to the average return? Same bound!Weak???EW: wi1=1, wit=wi,t-1e git , pit=wit/Wt, Wt = i witE1: 1 0 1 0 1 0 1 0 1 0 E2: 0 1 0 1 0 1 0 1 0 1
GA,T=T/2-cT1/2GT+ = GT- = GT0 = T/2RT+ cT1/2, RT0 c T1/2
Notation - gainsgit2 [0,1] - gainsg=(git) - sequence of gainsGiT(g)= t=1T git- cumulated gainsG0T(g)=(i GiT(g))/N- average gainG-T(g)=mini GiT(g)- worst gainG+T(g)=maxi GiT(g)- best gainGDT(g)=i Di GiT(g)- weighted avg. gain
Notation - algorithmswit unnormalized weightspit=wit/Wt, normalized weights Wt = i witgA,t=i pit git gain of AGAT(g)= t gA,t cumulated gain of A
Notation - regretregret to the..R+T(g) = (G+T(g) GA,T(g)) 1 bestR-T(g) = (G-T(g) GA,T(g)) 1 worstR0T(g) = (G0T(g) GA,T(g)) 1 avgRDT(g) = (GDT(g) GA,T(g)) 1 dist.
GoalAlgorithm A is nice if ..R+A,T O(T1/2)R0A,T 1Program:Examine existing algorithms (difference algorithms) lower boundShow nice algorithmsShow that no substantial further improvement is possible
Difference algorithmsDef: A is a difference algorithm if for N=2, git2 {0,1}, p1t = f(dt), p2t = 1-f(dt), dt = G1t-G2t
Examples:EW: wit = e GitFPL: Choose argmaxi ( Git+Zit )Prod: wit = s (1+ gis) = (1+)Git
A lower bound for difference algorithmsTheorem: If A is a difference algorithm then there exist some series, g, g (tuned to A), such that R+AT (g) R0AT (g) R+AT (g) R-AT (g) = (T)
For R+AT = maxg R+AT(g), R-AT = maxg R-AT(g), R0AT = maxg R0AT(g),
R+AT R0AT R+AT R-AT = (T)
ProofAssume T is even, p11
: first time t when p1t 2/3 ) R+AT(g) /3 9 2 {2,3,..,} s.t. p1-p1-1 1/(6)
Proof/2p1-p1-1 1/(6)
G+T=G-T=G0T=T/2GAT(g) + (T-2)/2 (1-1/(6))R-AT(g) (T-2)/(12) ) R+AT(g)R-AT(g) (T-2)/36
p1,t=p1,p1,t+1=p1,-1Gain: 1-1/(6)p1t=p1,T-tGain: p1t+1-p1t=1
TightnessWe know that for difference algorithmsR+AT R0AT R+AT R-AT = (T)Can a (difference) algorithm achieve this?Theorem: EW=EW(), with appropriately tuned =(), 0 1/2 has R+EW,T T1/2+ (1+ln N)
R0EW,T T1/2-
Breaking the frontierWhats wrong with the difference algorithms?They are designed to find the best expert with low regret (fast)..they dont pay attention to the average gain and how it compares with the best gain
BestWorst(A)G+T-G-T: the spread of cumulated gainIdea: Stay with the average, until the spread becomes large. Then switch to learning (using algorithm A).When the spread is large enough, G0T=GBW(A),T G-T ) Nothing to looseSpread threshold: NR; where R=RT,N is a bound on the regret of A.
BestWorst(A)Theorem: R+BW(A),T = O(NR), GBW(A),T G-{T}
Proof: At the time of switch, GBW(A) (G++ (N-1)+G-)/N. Since G+ G-+NR, GBW(A) G- + R.
- PhasedAgression(A,R,D)for k=1:log2(R) do:=2k-1/RA.reset(); s:=0 // local time, new phasewhile (G+s-GDs
PA(A,R,D) Theorem Theorem: Let A be any algorithm with regret R = RT,N to the best expert, D any distribution. Then for PA=PA(A,R,D),
R+PA,T 2R(log R+1)RDPA,T 1
ProofConsider local time s during phase k.D and A share the gains & the regretG+s-GPA,s < 2k-1/R R + (1-2k-1/R) 2R < 2RGDs-GPA,s 2k-1/R R =2k-1What happens at the end of the phase?GPA,s-GD,s 2k-1/R (G+s-R-GDs) 2k-1/R (G+s-GDs-R+GDsGDs) 2k-1/R R = 2k-1.What if PA ends in phase k at time T:G+T-GPA,T 2R k 2R (log R + 1)GDT-GPA,T 2k-1 - j=1k-1 2j-1= 2k-1(2k-1-1)=1
General lower boundsTheorem: R+A,T=O(T1/2) ) R0A,T=(T1/2) R+A,T (Tlog(T))1/2/10 ) R0A,T=(T), where 0.02 Compare this with R+PA,T 2R(log R+1), RDPA,T 1, where R=(T log N)1/2
ConclusionsAchieving constant regret to the average is a reasonable goal.Classical algorithms do not have this property, but satisfy R+AT R0AT (T).Modification: Learn only when it makes sense; ie. when the best is much better than the averagePhasedAgression: Optimal tradeoffCan we remove dependence on T?