• date post

16-Jan-2016
• Category

## Documents

• view

38

0

TAGS:

• #### rlog r

Embed Size (px)

description

Regret to the Best vs. Regret to the Average. Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman. Upenn + Tel Aviv Univ. Slides: Csaba. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. Motivation. - PowerPoint PPT Presentation

### Transcript of Regret to the Best vs. Regret to the Average

• Regret to the Best vs. Regret to the Average

Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAUpenn + Tel Aviv Univ.Slides: Csaba

• MotivationExpert algorithms attempt to control regret to the return of the best expertRegret to the average return? Same bound!Weak???EW: wi1=1, wit=wi,t-1e git , pit=wit/Wt, Wt = i witE1: 1 0 1 0 1 0 1 0 1 0 E2: 0 1 0 1 0 1 0 1 0 1

GA,T=T/2-cT1/2GT+ = GT- = GT0 = T/2RT+ cT1/2, RT0 c T1/2

• Notation - gainsgit2 [0,1] - gainsg=(git) - sequence of gainsGiT(g)= t=1T git- cumulated gainsG0T(g)=(i GiT(g))/N- average gainG-T(g)=mini GiT(g)- worst gainG+T(g)=maxi GiT(g)- best gainGDT(g)=i Di GiT(g)- weighted avg. gain

• Notation - algorithmswit unnormalized weightspit=wit/Wt, normalized weights Wt = i witgA,t=i pit git gain of AGAT(g)= t gA,t cumulated gain of A

• Notation - regretregret to the..R+T(g) = (G+T(g) GA,T(g)) 1 bestR-T(g) = (G-T(g) GA,T(g)) 1 worstR0T(g) = (G0T(g) GA,T(g)) 1 avgRDT(g) = (GDT(g) GA,T(g)) 1 dist.

• GoalAlgorithm A is nice if ..R+A,T O(T1/2)R0A,T 1Program:Examine existing algorithms (difference algorithms) lower boundShow nice algorithmsShow that no substantial further improvement is possible

• Difference algorithmsDef: A is a difference algorithm if for N=2, git2 {0,1}, p1t = f(dt), p2t = 1-f(dt), dt = G1t-G2t

Examples:EW: wit = e GitFPL: Choose argmaxi ( Git+Zit )Prod: wit = s (1+ gis) = (1+)Git

• A lower bound for difference algorithmsTheorem: If A is a difference algorithm then there exist some series, g, g (tuned to A), such that R+AT (g) R0AT (g) R+AT (g) R-AT (g) = (T)

For R+AT = maxg R+AT(g), R-AT = maxg R-AT(g), R0AT = maxg R0AT(g),

R+AT R0AT R+AT R-AT = (T)

• ProofAssume T is even, p11

: first time t when p1t 2/3 ) R+AT(g) /3 9 2 {2,3,..,} s.t. p1-p1-1 1/(6)

• Proof/2p1-p1-1 1/(6)

G+T=G-T=G0T=T/2GAT(g) + (T-2)/2 (1-1/(6))R-AT(g) (T-2)/(12) ) R+AT(g)R-AT(g) (T-2)/36

p1,t=p1,p1,t+1=p1,-1Gain: 1-1/(6)p1t=p1,T-tGain: p1t+1-p1t=1

• TightnessWe know that for difference algorithmsR+AT R0AT R+AT R-AT = (T)Can a (difference) algorithm achieve this?Theorem: EW=EW(), with appropriately tuned =(), 0 1/2 has R+EW,T T1/2+ (1+ln N)

R0EW,T T1/2-

• Breaking the frontierWhats wrong with the difference algorithms?They are designed to find the best expert with low regret (fast)..they dont pay attention to the average gain and how it compares with the best gain

• BestWorst(A)G+T-G-T: the spread of cumulated gainIdea: Stay with the average, until the spread becomes large. Then switch to learning (using algorithm A).When the spread is large enough, G0T=GBW(A),T G-T ) Nothing to looseSpread threshold: NR; where R=RT,N is a bound on the regret of A.

• BestWorst(A)Theorem: R+BW(A),T = O(NR), GBW(A),T G-{T}

Proof: At the time of switch, GBW(A) (G++ (N-1)+G-)/N. Since G+ G-+NR, GBW(A) G- + R.

• PhasedAgression(A,R,D)for k=1:log2(R) do:=2k-1/RA.reset(); s:=0 // local time, new phasewhile (G+s-GDs
• PA(A,R,D) Theorem Theorem: Let A be any algorithm with regret R = RT,N to the best expert, D any distribution. Then for PA=PA(A,R,D),

R+PA,T 2R(log R+1)RDPA,T 1

• ProofConsider local time s during phase k.D and A share the gains & the regretG+s-GPA,s < 2k-1/R R + (1-2k-1/R) 2R < 2RGDs-GPA,s 2k-1/R R =2k-1What happens at the end of the phase?GPA,s-GD,s 2k-1/R (G+s-R-GDs) 2k-1/R (G+s-GDs-R+GDsGDs) 2k-1/R R = 2k-1.What if PA ends in phase k at time T:G+T-GPA,T 2R k 2R (log R + 1)GDT-GPA,T 2k-1 - j=1k-1 2j-1= 2k-1(2k-1-1)=1

• General lower boundsTheorem: R+A,T=O(T1/2) ) R0A,T=(T1/2) R+A,T (Tlog(T))1/2/10 ) R0A,T=(T), where 0.02 Compare this with R+PA,T 2R(log R+1), RDPA,T 1, where R=(T log N)1/2

• ConclusionsAchieving constant regret to the average is a reasonable goal.Classical algorithms do not have this property, but satisfy R+AT R0AT (T).Modification: Learn only when it makes sense; ie. when the best is much better than the averagePhasedAgression: Optimal tradeoffCan we remove dependence on T?