BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from...

BYU Computer Science Department

Hierarchical Bayesian Models for Rating Individual Players

from Group Competitions

Joshua E. Menke


Why Rank and Rate?

• Ranking in groups important• Sports, executive teams between competing

corporations, military training, etc.• Computer and Video Gaming Industry

– Big business: $18 billion gross output in U.S. in 2004

• Players prefer games that help them compare themselves.

• Use for balancing teams: TrueSkill™• Use for game / level design.


Brief Rating Background

• Elo (1978) for Chess– Thurstone Case V: Normal distribution– Later modified to use a logistic distribution

• Glickman (1999, 2001) for Chess– Bradley-Terry Model

(Bradley and Terry, 1952)

– Uncertainty based on number of matches played and time between matches.


Rating Players From Groups

• TrueSkill™ (Herbrich, Graepel, 2006)– Generalized Bayesian Thurstone Case V

• Huang (2006)– Generalized Bradley-Terry

(Maximum Likelihood)

• Menke et. al (2006)– Hierarchical Bayesian Bradley-Terry– Extensions: improve predictions / analyze game


Bradley-Terry Model

• Two opponents, ability parameters 1 and 2, probability the first opponent wins:

1/(1+2)

• Current logistic Elo uses Bradley-Terry with

x = exp(x).

• Wider distribution: • Allows weaker players a greater chance of winning.


Wolfenstein: Enemy Territory™

• Two Teams or Sides, WWII: Axis vs. Allies

• Objective-based

• Multiplayer

• Online: Players come, go, change teams

• Asymmetry: Team sizes / Maps fairness

• Soccer (Football) Example

• Splash Damage, London


Map-Side in Enemy Territory

• Axis side vs. Allies side

• Matches take place on certain maps

• Different objectives for each side

• Player i on side s for map m


First Data Set

• Matches: 100 per server, 3 servers for 300

• Players: 877

• Matches per Player: ~ 7


Data Example

InitGame: ...\mapname\fueldump\...Winner: AXIS Time: 1800000Name: |R!P|Orpheo GUID DFBB5: Axis: 0 Allies: 1450200Name: |R!P|Crazyeskimo GUID EF071: Axis: 1549800 Allies: 0Name: sliveR GUID 0A589: Axis: 1614950 Allies: 0Name: DaSaNi GUID 3F6C7: Axis: 1278400 Allies: 0Name: BlackSheep GUID 6C875: Axis: 352600 Allies: 1336200*

* Played on both teams

• Map Name, Winner, Duration• Name, GUID, milliseconds on Axis,Allies


Model

Bayes Law:

We need:

• Prior: p(), model the individual players

• Likelihood: p(matches|), model match outcomes given players

p(µjmatches) = p(matchesjµ)p(µ)Rp(matchesjµ)p(µ)dµ


Basic Player Model

• Let i represent player i’s ability to help their side win a match

• A simple model for i

i » N(,02)


Basic

i » N(,02)

• Let = 0 without loss of generality

• 2 is given a prior distribution

• Symmetric around 0Good players + , bad players –

• But: Assumes map-side has no effect


Accounting for Map-Side Effects

• Map fairness varied in Enemy Territory

• Sometimes harder for Axis, and vice versa

• Basic model naïve

• Map effects uniform for all players



• Let i,m-s represent player i’s ability to help side s win a match played on map m:

im-s ´ i + m-s with i » N(0,2)

• 2 given a prior distribution

• Player’s rating increases or decreases based on map-side



im-s ´ i + m-s

• Similar to Agresti’s (1988) “homefield” parameter, except one for Axis, one for Allies: model decision for simplicity.


Map-Side Effects

• More skilled team can have equal challenge for a given map by playing on the harder side

• Judge which maps are more balanced.

• Useful for map/level designers


Server Difficulty

• Compare players across different servers

• Determine how a given server affects a player’s rating adding server bias j

i,m-s,j ´ i+m-s+j

j » N(0,2)

• With given a prior distribution


Server Difficulty

i,m-s,j ´ i+m-s+j

• Modeled as an increase instead of decrease in player ability for simplicity.

• Lower not higher is more difficult.

• Player performance composed of base ability, map-side offset, and server difficulty


Server Difficulty

• Can use to choose servers

• Rank players globally across servers

• Requires some server “cross-over”


Likelihood

• Choose side s’s probability of winning a match on map m proportional to:

• Exponentiated sum of player ratings

• Modified by map-size and server

¸s;m =exp(P jP s j

i=1;i2P sµi ;m¡ s;j )


Bradley Terry Likelihood

• Probability of sAxis defeating sAllies:

¸sA x i s ;m=(¸sA x i s ;m +¸sA l l i es ;m)


Likelihood Function

• Product of map predictions

• G: total # of matches, w(g): winning side for match g, l(g): losing side, m: map

P (wj¸) =QG

g=1 ¸w(g);m(¸w(g);m +¸ l(g);m)¡ 1

¸w(g);m = exp(P jPw (g) j

i=1;i2Pw (g)(µi ;m¡ s;j ))

¸ l(g);m = exp(P jP l ( g) j

i=1;i2P l ( g)(µi ;m¡ s;j ))


Public Server Problem

• Players come, go, change teams at will

• Need time played per team

• Available in original data


Simple Exposure Model

• Weighted sum: % time played per time

i,w(g) (i,l(g)): % of the total match time player i spent on the winning (losing) team

¸w(g);m = exp(P jPw (g) j

i=1;i2Pw (g)(¿i ;w(g)µi ;m¡ s;j ))

¸ l(g);m = exp(P jP l ( g) j

i=1;i2P l ( g)(¿i ;l(g)µi ;m¡ s;j ))


Prior Selection

• Instead of non-informative priors, hyperprior distributions:

2, 2

, and 2 ~ Inverse Gamma

• chosen such that the means are 1 and the variances 1/3.

• Keeps player ratings between -3,3

• Hyperpriors to infer relative differences


Fit with MCMC: Quickly

• Markov-Chain Monte Carlo Integration

• Samples complete conditional distributions– Thousands of samples per parameter– Take the mean / standard deviation of samples


MCMC Results Example: 3-1

Ranked 2 standard deviations below mean

3rd place 8-1 vs. 8th place 9-0


Combined Server Difficulty

• Ranked in order of difficulty

• Lower posterior mean is more difficult

• Veterans could choose to play on server 2

• Newer players on Server 1


Combined Map-Side Effects

• Oasis biased towards Allies.

• Better players should play on Axis

• Venice a balanced map

• Of interest: both popular maps.


Bayesian 2 Goodness-of-fit

• Valen Johnson, Annals of Statistics, 2004• Yields p-values for joint samples

• Server 2 does have a less consistent player base• Biased accuracy near 100%


Problems with MCMC

• Average Enemy Territory match: – 15 minutes

• Time to fit 300 matches with MCMC:– 30 minutes

• MCMC can not keep up with new matches


Second Data Set

• Matches: 5,000

• Players: 2,000+

• Time for MCMC: On the order of days

• Common Efficient Solutions:– Newton-Raphson method– Elo / Glickman Update– Expectation Propagation


Newton-Raphson Method

• Batch Gradient Descent

• L’: vector of first derivatives

• L’’: matrix of second partial derivatives

• k: current iteration

• Note: [-L'']-1 covariance matrix of multivariate normal approximation


Problems with Newton-Raphson

• Requires storing match history and re-fitting the data after every match, becomes impractical and slow.

– Preferable to update based on last match only

• Matrix of partial-second derivatives too large.

– Millions of players: impossible to store. – Takes too long to invert.


Recursive Newton-Raphson

• Based on Bottou and Lecun (2004)

• t a “leaky” approximation to [-L'']-1 (covariance matrix).


Recursive Newton-Raphson

• Bottou and Lecun: Empirical / Theoretical– asymptotically outperforms Newton-Raphson – Any batch gradient descent method.


Applied to Enemy Territory

• Derive from the log posterior

• Priors instead from MCMC

• Example: Player Rating. – Winning Time - Prediction - Shrinkage


Bayesian Shrinkage Terms

• Batch: applied once on entire set of matches

• Recursive: Applied once per update– Weight each by 1/|matches|

• |matches| unknown a priori

– Weight by infinite geometric series 2-t-1

• Sums to 1.0, like applying once

• Effect of prior diminishes given data


Time-Varying

• Recursive algorithms track time-varying differences

• Update a weighted sum of prior performance and recent performance

• Variance approximation leaky, can track changes over time.


Results: Accuracy

• Measured before updating for each match

• For an unfair comparison:– TrueSkill™ Reported Large Teams: ~ 0.62

– More to show 70% is good.


Uses for Ratings

• Rank Players

• Improve Map Design

• Help Choose Servers

• Level up, MMORPG– Clear progression path– Play on easier servers first, “graduate” to harder

ones


Active Team Balancing

• Public Server dynamics mean teams need to be balanced during play

• Greedy: Move player to bring probability of both teams winning closest to 50-50

• Uncomfortable for player moved

• Increases “fun” factor overall

• Sequential optimal design


Future Directions

• Explicitly Model time-varying changes

• Number of players vs. map-side rating

• Online Bayesian Neural Network Training

• Expectation-Propagation for this model

• Direct Comparisons to TrueSkill™


Questions?

• Thanks for coming!

• Demo if time:http://stats.etpub.org

BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from...

Documents

Transcript of BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from...