Columbia University Seminar in Applied Mathematics
Transcript of Columbia University Seminar in Applied Mathematics
-
8/14/2019 Columbia University Seminar in Applied Mathematics
1/29
Columbia University Seminar
in Applied Mathematics
Mark E. Johnson
SportMetrika, [email protected]
4 October 2005
-
8/14/2019 Columbia University Seminar in Applied Mathematics
2/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
3/29
Outline
1.1. Why a mathematician in sports?Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
4/29
A mathematician in sports?
Sports is just behind thetimes; businesses werentalways analytical either
Off-season decision making
Moneyball
Finding market inefficiencies
Convincing others of a new wayto look at things
-
8/14/2019 Columbia University Seminar in Applied Mathematics
5/29
From Michael Lewis Moneyball
"... OPS was the simple addition of on-base and slugging percentages.Crude as it was, it was a much better indicator than any otheroffensive statistic of the number of runs a team would score. Simplyadding the two statistics together, however, implied that they wereof equal importance. If the goal was to raise a team's OPS, an extra
percentage point of on-base was as good as an extra percentagepoint of slugging.
Before his thought experiment Paul (DePodesta) had felt uneasywith this crude assumption; now he saw that the assumption wasabsurd. An extra point of on-base percentage was clearly morevaluable than an extra point of slugging percentage -- but by howmuch? In his model an extra point of on-base percentage wasworth three times an extra point of slugging percentage."
-
8/14/2019 Columbia University Seminar in Applied Mathematics
6/29
A little history of statistics and
baseball Bill James, the pioneer. SABR Society for American Baseball Research
http://www.sabr.org
Some websites and information: Books
Bill James Historical Abstracts
Baseball Hacks, by Joe Adler
Some sites
http://www.baseballprospectus.com/ http://www.Baseball-Reference.com
http://www.Retrosheet.org
Other resources http://www.sportmetrika.com/resources.php
-
8/14/2019 Columbia University Seminar in Applied Mathematics
7/29
Video
[insert video here]
-
8/14/2019 Columbia University Seminar in Applied Mathematics
8/29
Outline
1. Why a mathematician in sports?
2.2. Quantitative Decision MakingQuantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
9/29
Sports Decision Making
League-level decisions
Commissioner
Team-level decisions General Manager
Playing the game Field Manager
-
8/14/2019 Columbia University Seminar in Applied Mathematics
10/29
League-level
Schedules
An optimization problem
Maximize chances of close races at end of season
Maximize attendance
Play every team N times, etc
No more than X days on the road at-a-time.
Playoff Format 5-game playoff versus 7-game playoff
A wildcard? Why only 8 teams?
-
8/14/2019 Columbia University Seminar in Applied Mathematics
11/29
Team-level: Player Evaluation
A player is an investment to a sports teams owner. When is this agood or a bad investment?
Opportunities to acquire a player: Amateur Draft Trades
Free agents Contract extensions
How can wepredicthow good a player will be in the future? Is there a sufficient amount of data to analyze and project a players
future performance? How do we analyze it?
Sure, a good player will increase a teams chances of winning agame, but how much $ is that worth? Is that the only strategy?
-
8/14/2019 Columbia University Seminar in Applied Mathematics
12/29
Player evaluation, continued
Is the data sufficiently detailed to allow us toremove as much context from events? Ballparks affect measurements
So does the strength of your competition A pitcher throws differently to a batter, depending on
the situation.
Ones first objective would be to remove asmuch context from the observations, so that youcompare apples to apples. This can be done with wisely-chosen mathematical
models.
-
8/14/2019 Columbia University Seminar in Applied Mathematics
13/29
In-game strategies
Are there enough data to answer questions likethese? Setting a lineup
Should the best hitter bat 3rd, 4th, or 5th?
Is the 9th slot really the best place for the pitcher? Bunting
When should you give up the opportunity for bigger hits bysacrificing a players at-bat?
Intentional Walks
Do too many teams walk Barry Bonds? Stealing bases What success rate should you expect out of a player before
you start signaling that he attempt to steal a base?
-
8/14/2019 Columbia University Seminar in Applied Mathematics
14/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3.3. What isWhat is statisticsstatistics ??4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
15/29
What are statistics ?
Fans of sports use the term statistics to refer towhat they read in the morning newspaper.
A journalist tells stories by using story-telling statistics
A mathematician cares about how players andteams are going to do in the future, so they tryto define measurements that can be used topredict expected DESIRED outcomes.
-
8/14/2019 Columbia University Seminar in Applied Mathematics
16/29
Some quick examples
R
500
600
700
800
900
1000
1100
40 50 60 70 80 90 100 110 120
R
H
1200
1300
1400
1500
1600
1700
1800
40 50 60 70 80 90 100 110 120
H
RA
500
600
700
800
900
1000
1100
40 50 60 70 80 90 100 110 120
RA
What wins games? Data from The Baseball Archive
http://www.baseball1.com
All MLB teams over the last 10 seasons
Runs scored versus wins
Runs allowed versus winsHits versus wins
-
8/14/2019 Columbia University Seminar in Applied Mathematics
17/29
Common Statistics relationship
with each other
From Joe Adlers online article:Analyzing Baseball Statistics Using R
http://www.oreillynet.com/pub/a/network/2004/10/27/baseball.html
-
8/14/2019 Columbia University Seminar in Applied Mathematics
18/29
What scores runs? (recall
Moneyball quote)AVG
0.200
0.210
0.220
0.230
0.240
0.250
0.260
0.270
0.280
0.290
0.300
450 550 650 750 850 950 1050
AVG
OPS
0.600
0.650
0.700
0.750
0.800
0.850
0.900
450 550 650 750 850 950 1050
OPS
OPS2
0.900
0.950
1.000
1.050
1.100
1.150
1.200
1.250
1.300
450 550 650 750 850 950 1050
OPS2
OPS3
1.200
1.250
1.300
1.350
1.400
1.450
1.500
1.550
1.600
450 550 650 750 850 950 1050
OPS3
-
8/14/2019 Columbia University Seminar in Applied Mathematics
19/29
Return to Moneyball for a
moment Why a 3? Why not a 2?
Linear regression can be used to
determine a best fit of A, B, and C in thelinear model:
A * OBP + B * SLG + C = Runs
What is A / B ?
-
8/14/2019 Columbia University Seminar in Applied Mathematics
20/29
It depends on how you try to
answer the question This most likely is not the rightway to answer this question. Consider outliers due to odd
seasons, such as thenumbers that may be
generated because BarryBonds is on your team, or ifyou play in Coors Field.
Or, perhaps a least-squaresdistant metric is not bestchoice.
Or, this may not be the bestquestion to ask.
Pending editorial review, seemy Hack in Joe AdlersBaseball Hacks for more on
this subject.
-
8/14/2019 Columbia University Seminar in Applied Mathematics
21/29
Defining States in baseball
Baseball is discrete
Events can be recorded, as can the state
of affairs before and after the event. For the time being, consider a state as
being the (number of outs, base-runners)pair.
-
8/14/2019 Columbia University Seminar in Applied Mathematics
22/29
State transitions and Run
Expectancy The number of runs
expectedin theremainder of the inning,underaverage conditions.
These values were takendirectly fromhttp://www.tangotiger.net,but can be easily derived
either by: Averaging outcomes
using play-by-play data. .821.652.42Loaded
.631.472.052nd and3rd
.541.241.901st and3rd
.47.971.571st and2nd
.39.981.483rd
.34.731.192nd
.25.57.951st
.12.30.56Empty
2 outs1 out0 outsRE 99-02
-
8/14/2019 Columbia University Seminar in Applied Mathematics
23/29
When should you steal a base?
(0,1st) = 0.95
(0,2nd) = 1.19
(1,none) = 0.30
Success is worth +0.24 runs Failure is worth -0.65 runs
Failure loses 2.71 times what success gains
When is it worth the risk? When you succeed2.71 times more than you fail (about a 73%success rate).
-
8/14/2019 Columbia University Seminar in Applied Mathematics
24/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4.4. Related problems in sports statisticsRelated problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
25/29
Other sports and other applications
Other sports Football
College: BCS ranking system how do determine a best team with very little data NFL: New England Patriots have won Superbowl three out of the last four season and
are known to be very analytical in their game calling When to punt on fourth down, as a function of: time left in game, score, field position,
number of time outs left, etc
Basketball Dean Oliver (Basketball on Paper), worked for the Seattle Supersonics http://www.82games.com Mark Cuban & Jeff Sagarin
Tennis Ranking & Tournament Scheduling
Other applications
Fantasy sports Constructing a team, but with a different scoring system Gambling
Betting against the line-makers; when to bet against the line or popular demand.
-
8/14/2019 Columbia University Seminar in Applied Mathematics
26/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5.5. Some other examples of mathematicsSome other examples of mathematicsin industryin industry
6. Getting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
27/29
Other applications of mathematics
in industry Yahoo Ad targeting and pricing E-commerce
Calculating product affinities (up-sells and cross-sells) Finding what you want
Search algorithms Site testing Web Traffic Arbitrage
Netflix Supply and demand modeling
Product recommendation Entelos
Mathematical modeling of disease Simulations of human response to disease
-
8/14/2019 Columbia University Seminar in Applied Mathematics
28/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6.6. Getting hereGetting here7. Future directions
-
8/14/2019 Columbia University Seminar in Applied Mathematics
29/29
Outline
1. Why a mathematician in sports?
2. Quantitative Decision Making
3. What is statistics ?4. Related problems in sports statistics
5. Some other examples of mathematics inindustry
6. Getting here7.7. Future directionsFuture directions