BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of BYU Computer Science Department Hierarchical Bayesian Models for Rating Individual Players from...
BYU Computer Science Department
Hierarchical Bayesian Models for Rating Individual Players
from Group Competitions
Joshua E. Menke
BYU Computer Science Department
Why Rank and Rate?
• Ranking in groups important• Sports, executive teams between competing
corporations, military training, etc.• Computer and Video Gaming Industry
– Big business: $18 billion gross output in U.S. in 2004
• Players prefer games that help them compare themselves.
• Use for balancing teams: TrueSkill™• Use for game / level design.
BYU Computer Science Department
Brief Rating Background
• Elo (1978) for Chess– Thurstone Case V: Normal distribution– Later modified to use a logistic distribution
• Glickman (1999, 2001) for Chess– Bradley-Terry Model
(Bradley and Terry, 1952)
– Uncertainty based on number of matches played and time between matches.
BYU Computer Science Department
Rating Players From Groups
• TrueSkill™ (Herbrich, Graepel, 2006)– Generalized Bayesian Thurstone Case V
• Huang (2006)– Generalized Bradley-Terry
(Maximum Likelihood)
• Menke et. al (2006)– Hierarchical Bayesian Bradley-Terry– Extensions: improve predictions / analyze game
BYU Computer Science Department
Bradley-Terry Model
• Two opponents, ability parameters 1 and 2, probability the first opponent wins:
1/(1+2)
• Current logistic Elo uses Bradley-Terry with
x = exp(x).
• Wider distribution: • Allows weaker players a greater chance of winning.
BYU Computer Science Department
Wolfenstein: Enemy Territory™
• Two Teams or Sides, WWII: Axis vs. Allies
• Objective-based
• Multiplayer
• Online: Players come, go, change teams
• Asymmetry: Team sizes / Maps fairness
• Soccer (Football) Example
• Splash Damage, London
BYU Computer Science Department
Map-Side in Enemy Territory
• Axis side vs. Allies side
• Matches take place on certain maps
• Different objectives for each side
• Player i on side s for map m
BYU Computer Science Department
First Data Set
• Matches: 100 per server, 3 servers for 300
• Players: 877
• Matches per Player: ~ 7
BYU Computer Science Department
Data Example
InitGame: ...\mapname\fueldump\...Winner: AXIS Time: 1800000Name: |R!P|Orpheo GUID DFBB5: Axis: 0 Allies: 1450200Name: |R!P|Crazyeskimo GUID EF071: Axis: 1549800 Allies: 0Name: sliveR GUID 0A589: Axis: 1614950 Allies: 0Name: DaSaNi GUID 3F6C7: Axis: 1278400 Allies: 0Name: BlackSheep GUID 6C875: Axis: 352600 Allies: 1336200*
* Played on both teams
• Map Name, Winner, Duration• Name, GUID, milliseconds on Axis,Allies
BYU Computer Science Department
Model
Bayes Law:
We need:
• Prior: p(), model the individual players
• Likelihood: p(matches|), model match outcomes given players
p(µjmatches) = p(matchesjµ)p(µ)Rp(matchesjµ)p(µ)dµ
BYU Computer Science Department
Basic Player Model
• Let i represent player i’s ability to help their side win a match
• A simple model for i
i » N(,02)
BYU Computer Science Department
Basic
i » N(,02)
• Let = 0 without loss of generality
• 2 is given a prior distribution
• Symmetric around 0Good players + , bad players –
• But: Assumes map-side has no effect
BYU Computer Science Department
Accounting for Map-Side Effects
• Map fairness varied in Enemy Territory
• Sometimes harder for Axis, and vice versa
• Basic model naïve
• Map effects uniform for all players
BYU Computer Science Department
Accounting for Map-Side Effects
• Let i,m-s represent player i’s ability to help side s win a match played on map m:
im-s ´ i + m-s with i » N(0,2)
• 2 given a prior distribution
• Player’s rating increases or decreases based on map-side
BYU Computer Science Department
Accounting for Map-Side Effects
im-s ´ i + m-s
• Similar to Agresti’s (1988) “homefield” parameter, except one for Axis, one for Allies: model decision for simplicity.
BYU Computer Science Department
Map-Side Effects
• More skilled team can have equal challenge for a given map by playing on the harder side
• Judge which maps are more balanced.
• Useful for map/level designers
BYU Computer Science Department
Server Difficulty
• Compare players across different servers
• Determine how a given server affects a player’s rating adding server bias j
i,m-s,j ´ i+m-s+j
j » N(0,2)
• With given a prior distribution
BYU Computer Science Department
Server Difficulty
i,m-s,j ´ i+m-s+j
• Modeled as an increase instead of decrease in player ability for simplicity.
• Lower not higher is more difficult.
• Player performance composed of base ability, map-side offset, and server difficulty
BYU Computer Science Department
Server Difficulty
• Can use to choose servers
• Rank players globally across servers
• Requires some server “cross-over”
BYU Computer Science Department
Likelihood
• Choose side s’s probability of winning a match on map m proportional to:
• Exponentiated sum of player ratings
• Modified by map-size and server
¸s;m =exp(P jP s j
i=1;i2P sµi ;m¡ s;j )
BYU Computer Science Department
Bradley Terry Likelihood
• Probability of sAxis defeating sAllies:
¸sA x i s ;m=(¸sA x i s ;m +¸sA l l i es ;m)
BYU Computer Science Department
Likelihood Function
• Product of map predictions
• G: total # of matches, w(g): winning side for match g, l(g): losing side, m: map
P (wj¸) =QG
g=1 ¸w(g);m(¸w(g);m +¸ l(g);m)¡ 1
¸w(g);m = exp(P jPw (g) j
i=1;i2Pw (g)(µi ;m¡ s;j ))
¸ l(g);m = exp(P jP l ( g) j
i=1;i2P l ( g)(µi ;m¡ s;j ))
BYU Computer Science Department
Public Server Problem
• Players come, go, change teams at will
• Need time played per team
• Available in original data
BYU Computer Science Department
Simple Exposure Model
• Weighted sum: % time played per time
i,w(g) (i,l(g)): % of the total match time player i spent on the winning (losing) team
¸w(g);m = exp(P jPw (g) j
i=1;i2Pw (g)(¿i ;w(g)µi ;m¡ s;j ))
¸ l(g);m = exp(P jP l ( g) j
i=1;i2P l ( g)(¿i ;l(g)µi ;m¡ s;j ))
BYU Computer Science Department
Prior Selection
• Instead of non-informative priors, hyperprior distributions:
2, 2
, and 2 ~ Inverse Gamma
• chosen such that the means are 1 and the variances 1/3.
• Keeps player ratings between -3,3
• Hyperpriors to infer relative differences
BYU Computer Science Department
Fit with MCMC: Quickly
• Markov-Chain Monte Carlo Integration
• Samples complete conditional distributions– Thousands of samples per parameter– Take the mean / standard deviation of samples
BYU Computer Science Department
MCMC Results Example: 3-1
Ranked 2 standard deviations below mean
3rd place 8-1 vs. 8th place 9-0
BYU Computer Science Department
Combined Server Difficulty
• Ranked in order of difficulty
• Lower posterior mean is more difficult
• Veterans could choose to play on server 2
• Newer players on Server 1
BYU Computer Science Department
Combined Map-Side Effects
• Oasis biased towards Allies.
• Better players should play on Axis
• Venice a balanced map
• Of interest: both popular maps.
BYU Computer Science Department
Bayesian 2 Goodness-of-fit
• Valen Johnson, Annals of Statistics, 2004• Yields p-values for joint samples
• Server 2 does have a less consistent player base• Biased accuracy near 100%
BYU Computer Science Department
Problems with MCMC
• Average Enemy Territory match: – 15 minutes
• Time to fit 300 matches with MCMC:– 30 minutes
• MCMC can not keep up with new matches
BYU Computer Science Department
Second Data Set
• Matches: 5,000
• Players: 2,000+
• Time for MCMC: On the order of days
• Common Efficient Solutions:– Newton-Raphson method– Elo / Glickman Update– Expectation Propagation
BYU Computer Science Department
Newton-Raphson Method
• Batch Gradient Descent
• L’: vector of first derivatives
• L’’: matrix of second partial derivatives
• k: current iteration
• Note: [-L'']-1 covariance matrix of multivariate normal approximation
BYU Computer Science Department
Problems with Newton-Raphson
• Requires storing match history and re-fitting the data after every match, becomes impractical and slow.
– Preferable to update based on last match only
• Matrix of partial-second derivatives too large.
– Millions of players: impossible to store. – Takes too long to invert.
BYU Computer Science Department
Recursive Newton-Raphson
• Based on Bottou and Lecun (2004)
• t a “leaky” approximation to [-L'']-1 (covariance matrix).
BYU Computer Science Department
Recursive Newton-Raphson
• Bottou and Lecun: Empirical / Theoretical– asymptotically outperforms Newton-Raphson – Any batch gradient descent method.
BYU Computer Science Department
Applied to Enemy Territory
• Derive from the log posterior
• Priors instead from MCMC
• Example: Player Rating. – Winning Time - Prediction - Shrinkage
BYU Computer Science Department
Bayesian Shrinkage Terms
• Batch: applied once on entire set of matches
• Recursive: Applied once per update– Weight each by 1/|matches|
• |matches| unknown a priori
– Weight by infinite geometric series 2-t-1
• Sums to 1.0, like applying once
• Effect of prior diminishes given data
BYU Computer Science Department
Time-Varying
• Recursive algorithms track time-varying differences
• Update a weighted sum of prior performance and recent performance
• Variance approximation leaky, can track changes over time.
BYU Computer Science Department
Results: Accuracy
• Measured before updating for each match
• For an unfair comparison:– TrueSkill™ Reported Large Teams: ~ 0.62
– More to show 70% is good.
BYU Computer Science Department
Uses for Ratings
• Rank Players
• Improve Map Design
• Help Choose Servers
• Level up, MMORPG– Clear progression path– Play on easier servers first, “graduate” to harder
ones
BYU Computer Science Department
Active Team Balancing
• Public Server dynamics mean teams need to be balanced during play
• Greedy: Move player to bring probability of both teams winning closest to 50-50
• Uncomfortable for player moved
• Increases “fun” factor overall
• Sequential optimal design
BYU Computer Science Department
Future Directions
• Explicitly Model time-varying changes
• Number of players vs. map-side rating
• Online Bayesian Neural Network Training
• Expectation-Propagation for this model
• Direct Comparisons to TrueSkill™
BYU Computer Science Department
Questions?
• Thanks for coming!
• Demo if time:http://stats.etpub.org