Regret to the Best vs. Regret to the Average

19
Regret to the Best vs. Regret to the Average Eyal Even-Dar Michael Kearns Yishay Mansour Jennifer Wortman Upenn + Tel Aviv Univ. Slides: Csaba

description

Regret to the Best vs. Regret to the Average. Eyal Even-DarMichael Kearns Yishay MansourJennifer Wortman. Upenn + Tel Aviv Univ. Slides: Csaba. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A. Motivation. - PowerPoint PPT Presentation

Transcript of Regret to the Best vs. Regret to the Average

Page 1: Regret to the  Best vs.  Regret to the  Average

Regret to the Best vs.

Regret to the Average

Eyal Even-Dar Michael Kearns Yishay Mansour Jennifer Wortman

Upenn + Tel Aviv Univ.Slides: Csaba

Page 2: Regret to the  Best vs.  Regret to the  Average

Motivation

Expert algorithms attempt to control regret to the return of the best expert

Regret to the average return? Same bound! Weak???

EW: wi1=1, wit=wi,t-1e git , pit=wit/Wt, Wt = i wit

E1: 1 0 1 0 1 0 1 0 1 0 …E2: 0 1 0 1 0 1 0 1 0 1 …

GA,T=T/2-cT1/2

GT+ = GT

- = GT0 = T/2

RT+ · cT1/2, RT

0· c T1/2

Page 3: Regret to the  Best vs.  Regret to the  Average

Notation - gains

git2 [0,1] - gains

g=(git) - sequence of gains

GiT(g)= t=1T git - cumulated gains

G0T(g)=(i GiT(g))/N - average gain

G-T(g)=mini GiT(g) - worst gain

G+T(g)=maxi GiT(g) - best gain

GDT(g)=i Di GiT(g) - weighted avg. gain

Page 4: Regret to the  Best vs.  Regret to the  Average

Notation - algorithms

wit – unnormalized weights

pit=wit/Wt, – normalized weightsWt = i wit

gA,t=i pit git – gain of A

GAT(g)= t gA,t – cumulated gain of A

Page 5: Regret to the  Best vs.  Regret to the  Average

Notation - regret

regret to the.. R+

T(g) = (G+T(g) – GA,T(g)) Ç 1 – best

R-T(g) = (G-

T(g) – GA,T(g)) Ç 1 – worst

R0T(g) = (G0

T(g) – GA,T(g)) Ç 1 – avg

RDT(g) = (GD

T(g) – GA,T(g)) Ç 1 – dist.

Page 6: Regret to the  Best vs.  Regret to the  Average

Goal

Algorithm A is “nice” if .. R+

A,T · O(T1/2)

R0A,T · 1

Program: Examine existing algorithms (“difference

algorithms”) – lower bound Show “nice” algorithms Show that no substantial further improvement is

possible

Page 7: Regret to the  Best vs.  Regret to the  Average

“Difference” algorithms

Def:A is a difference algorithm if for N=2, git2 {0,1}, p1t = f(dt), p2t = 1-f(dt), dt = G1t-G2t

Examples: EW: wit = e Git

FPL: Choose argmaxi ( Git+Zit )

Prod: wit = s (1+ gis) = (1+)Git

Page 8: Regret to the  Best vs.  Regret to the  Average

A lower bound for difference algorithms

Theorem:If A is a difference algorithm then there exist some series, g, g’ (tuned to A), such that

R+AT (g) R0

AT (g’) ¸ R+AT (g) R-

AT (g’) = (T)

For R+AT = maxg R+

AT(g), R-AT = maxg R-

AT(g),

R0AT = maxg R0

AT(g),

R+AT R0

AT ¸ R+AT R-

AT = (T)

Page 9: Regret to the  Best vs.  Regret to the  Average

Proof

Assume T is even, p11 · ½

: first time t when p1t¸ 2/3 ) R+AT(g) ¸ /3

9 2 {2,3,..,} s.t. p1-p1-1 ¸ 1/(6)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 …0 0 0 0 0 0 0 0 0 0 0 0 0 0 …

g:

Page 10: Regret to the  Best vs.  Regret to the  Average

Proof/2 p1-p1-1 ¸ 1/(6)

G+T=G-

T=G0=T/2

GAT(g’)· + (T-2)/2 (1-1/(6)) R-

AT(g’) ¸ (T-2)/(12) ) R+

AT(g)R-AT(g’)¸ (T-2)/36

1 1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 1 1 1 1 1g’:

p1,t=p1,

p1,t+1=p1,-1

Gain: · 1-1/(6)p1t=p1,T-t

Gain: p1t+1-p1t=1

Page 11: Regret to the  Best vs.  Regret to the  Average

Tightness

We know that for difference algorithms

R+AT R0

AT ¸ R+AT R-

AT = (T) Can a (difference) algorithm achieve this? Theorem: EW=EW(), with appropriately

tuned =(), 0· · 1/2 has

R+EW,T· T1/2+ (1+ln N)

R0EW,T· T1/2-

Page 12: Regret to the  Best vs.  Regret to the  Average

Breaking the frontier

What’s wrong with the difference algorithms? They are designed to find the best expert with

low regret (fast) ..they don’t pay attention to the average gain

and how it compares with the best gain

Page 13: Regret to the  Best vs.  Regret to the  Average

BestWorst(A)

G+T-G-

T: the spread of cumulated gain Idea: Stay with the average, until the spread

becomes large. Then switch to learning (using algorithm A).

When the spread is large enough, G0

T=GBW(A),T À G-T

) “Nothing” to loose Spread threshold: NR; where R=RT,N is a

bound on the regret of A.

Page 14: Regret to the  Best vs.  Regret to the  Average

BestWorst(A)

Theorem: R+BW(A),T = O(NR), GBW(A),T¸ G-{T}

Proof:At the time of switch, GBW(A) ¸ (G++ (N-1)+G-)/N. Since G+¸ G-+NR,

GBW(A)¸ G- + R.

Page 15: Regret to the  Best vs.  Regret to the  Average

PhasedAgression(A,R,D)

for k=1:log2(R) do=2k-1/RA.reset(); s:=0 // local time, new phase

while (G+s-GD

s<2R) do

qs := A.getNormedWeights( gs-1 )

ps := qs + (1-) Dend

endA.reset()run A until time T

Page 16: Regret to the  Best vs.  Regret to the  Average

PA(A,R,D) – Theorem

Theorem:Let A be any algorithm with regret R = RT,N to the best expert, D any distribution.Then for PA=PA(A,R,D),

R+PA,T· 2R(log R+1)

RDPA,T· 1

Page 17: Regret to the  Best vs.  Regret to the  Average

Proof Consider local time s during phase k. D and A share the gains & the regret

G+s-GPA,s < 2k-1/R£ R + (1-2k-1/R) £ 2R < 2R

GDs-GPA,s· 2k-1/R £ R =2k-1

What happens at the end of the phase?

GPA,s-GD,s ¸ 2k-1/R £ (G+

s-R-GDs)

¸ 2k-1/R £ (G+s-GD

s-R+GDsGD

s)¸ 2k-1/R £ R = 2k-1.

What if PA ends in phase k at time T:

G+T-GPA,T · 2R k · 2R (log R + 1)

GDT-GPA,T· 2k-1 - j=1

k-1 2j-1= 2k-1(2k-1-1)=1

Page 18: Regret to the  Best vs.  Regret to the  Average

General lower bounds

Theorem:

R+A,T=O(T1/2) ) R0

A,T=(T1/2)

R+A,T· (Tlog(T))1/2/10 ) R0

A,T=(T), where ¸ 0.02

Compare this with

R+PA,T· 2R(log R+1), RD

PA,T· 1,

where R=(T log N)1/2

Page 19: Regret to the  Best vs.  Regret to the  Average

Conclusions

Achieving constant regret to the average is a reasonable goal.

“Classical” algorithms do not have this property, but satisfy R+

AT R0AT ¸ (T).

Modification: Learn only when it makes sense; ie. when the best is much better than the average

PhasedAgression: Optimal tradeoff Can we remove dependence on T?