Columbia University Seminar in Applied Mathematics

download Columbia University Seminar in Applied Mathematics

of 29

Transcript of Columbia University Seminar in Applied Mathematics

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    1/29

    Columbia University Seminar

    in Applied Mathematics

    Mark E. Johnson

    SportMetrika, [email protected]

    4 October 2005

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    2/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    3/29

    Outline

    1.1. Why a mathematician in sports?Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    4/29

    A mathematician in sports?

    Sports is just behind thetimes; businesses werentalways analytical either

    Off-season decision making

    Moneyball

    Finding market inefficiencies

    Convincing others of a new wayto look at things

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    5/29

    From Michael Lewis Moneyball

    "... OPS was the simple addition of on-base and slugging percentages.Crude as it was, it was a much better indicator than any otheroffensive statistic of the number of runs a team would score. Simplyadding the two statistics together, however, implied that they wereof equal importance. If the goal was to raise a team's OPS, an extra

    percentage point of on-base was as good as an extra percentagepoint of slugging.

    Before his thought experiment Paul (DePodesta) had felt uneasywith this crude assumption; now he saw that the assumption wasabsurd. An extra point of on-base percentage was clearly morevaluable than an extra point of slugging percentage -- but by howmuch? In his model an extra point of on-base percentage wasworth three times an extra point of slugging percentage."

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    6/29

    A little history of statistics and

    baseball Bill James, the pioneer. SABR Society for American Baseball Research

    http://www.sabr.org

    Some websites and information: Books

    Bill James Historical Abstracts

    Baseball Hacks, by Joe Adler

    Some sites

    http://www.baseballprospectus.com/ http://www.Baseball-Reference.com

    http://www.Retrosheet.org

    Other resources http://www.sportmetrika.com/resources.php

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    7/29

    Video

    [insert video here]

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    8/29

    Outline

    1. Why a mathematician in sports?

    2.2. Quantitative Decision MakingQuantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    9/29

    Sports Decision Making

    League-level decisions

    Commissioner

    Team-level decisions General Manager

    Playing the game Field Manager

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    10/29

    League-level

    Schedules

    An optimization problem

    Maximize chances of close races at end of season

    Maximize attendance

    Play every team N times, etc

    No more than X days on the road at-a-time.

    Playoff Format 5-game playoff versus 7-game playoff

    A wildcard? Why only 8 teams?

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    11/29

    Team-level: Player Evaluation

    A player is an investment to a sports teams owner. When is this agood or a bad investment?

    Opportunities to acquire a player: Amateur Draft Trades

    Free agents Contract extensions

    How can wepredicthow good a player will be in the future? Is there a sufficient amount of data to analyze and project a players

    future performance? How do we analyze it?

    Sure, a good player will increase a teams chances of winning agame, but how much $ is that worth? Is that the only strategy?

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    12/29

    Player evaluation, continued

    Is the data sufficiently detailed to allow us toremove as much context from events? Ballparks affect measurements

    So does the strength of your competition A pitcher throws differently to a batter, depending on

    the situation.

    Ones first objective would be to remove asmuch context from the observations, so that youcompare apples to apples. This can be done with wisely-chosen mathematical

    models.

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    13/29

    In-game strategies

    Are there enough data to answer questions likethese? Setting a lineup

    Should the best hitter bat 3rd, 4th, or 5th?

    Is the 9th slot really the best place for the pitcher? Bunting

    When should you give up the opportunity for bigger hits bysacrificing a players at-bat?

    Intentional Walks

    Do too many teams walk Barry Bonds? Stealing bases What success rate should you expect out of a player before

    you start signaling that he attempt to steal a base?

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    14/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3.3. What isWhat is statisticsstatistics ??4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    15/29

    What are statistics ?

    Fans of sports use the term statistics to refer towhat they read in the morning newspaper.

    A journalist tells stories by using story-telling statistics

    A mathematician cares about how players andteams are going to do in the future, so they tryto define measurements that can be used topredict expected DESIRED outcomes.

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    16/29

    Some quick examples

    R

    500

    600

    700

    800

    900

    1000

    1100

    40 50 60 70 80 90 100 110 120

    R

    H

    1200

    1300

    1400

    1500

    1600

    1700

    1800

    40 50 60 70 80 90 100 110 120

    H

    RA

    500

    600

    700

    800

    900

    1000

    1100

    40 50 60 70 80 90 100 110 120

    RA

    What wins games? Data from The Baseball Archive

    http://www.baseball1.com

    All MLB teams over the last 10 seasons

    Runs scored versus wins

    Runs allowed versus winsHits versus wins

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    17/29

    Common Statistics relationship

    with each other

    From Joe Adlers online article:Analyzing Baseball Statistics Using R

    http://www.oreillynet.com/pub/a/network/2004/10/27/baseball.html

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    18/29

    What scores runs? (recall

    Moneyball quote)AVG

    0.200

    0.210

    0.220

    0.230

    0.240

    0.250

    0.260

    0.270

    0.280

    0.290

    0.300

    450 550 650 750 850 950 1050

    AVG

    OPS

    0.600

    0.650

    0.700

    0.750

    0.800

    0.850

    0.900

    450 550 650 750 850 950 1050

    OPS

    OPS2

    0.900

    0.950

    1.000

    1.050

    1.100

    1.150

    1.200

    1.250

    1.300

    450 550 650 750 850 950 1050

    OPS2

    OPS3

    1.200

    1.250

    1.300

    1.350

    1.400

    1.450

    1.500

    1.550

    1.600

    450 550 650 750 850 950 1050

    OPS3

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    19/29

    Return to Moneyball for a

    moment Why a 3? Why not a 2?

    Linear regression can be used to

    determine a best fit of A, B, and C in thelinear model:

    A * OBP + B * SLG + C = Runs

    What is A / B ?

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    20/29

    It depends on how you try to

    answer the question This most likely is not the rightway to answer this question. Consider outliers due to odd

    seasons, such as thenumbers that may be

    generated because BarryBonds is on your team, or ifyou play in Coors Field.

    Or, perhaps a least-squaresdistant metric is not bestchoice.

    Or, this may not be the bestquestion to ask.

    Pending editorial review, seemy Hack in Joe AdlersBaseball Hacks for more on

    this subject.

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    21/29

    Defining States in baseball

    Baseball is discrete

    Events can be recorded, as can the state

    of affairs before and after the event. For the time being, consider a state as

    being the (number of outs, base-runners)pair.

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    22/29

    State transitions and Run

    Expectancy The number of runs

    expectedin theremainder of the inning,underaverage conditions.

    These values were takendirectly fromhttp://www.tangotiger.net,but can be easily derived

    either by: Averaging outcomes

    using play-by-play data. .821.652.42Loaded

    .631.472.052nd and3rd

    .541.241.901st and3rd

    .47.971.571st and2nd

    .39.981.483rd

    .34.731.192nd

    .25.57.951st

    .12.30.56Empty

    2 outs1 out0 outsRE 99-02

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    23/29

    When should you steal a base?

    (0,1st) = 0.95

    (0,2nd) = 1.19

    (1,none) = 0.30

    Success is worth +0.24 runs Failure is worth -0.65 runs

    Failure loses 2.71 times what success gains

    When is it worth the risk? When you succeed2.71 times more than you fail (about a 73%success rate).

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    24/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4.4. Related problems in sports statisticsRelated problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    25/29

    Other sports and other applications

    Other sports Football

    College: BCS ranking system how do determine a best team with very little data NFL: New England Patriots have won Superbowl three out of the last four season and

    are known to be very analytical in their game calling When to punt on fourth down, as a function of: time left in game, score, field position,

    number of time outs left, etc

    Basketball Dean Oliver (Basketball on Paper), worked for the Seattle Supersonics http://www.82games.com Mark Cuban & Jeff Sagarin

    Tennis Ranking & Tournament Scheduling

    Other applications

    Fantasy sports Constructing a team, but with a different scoring system Gambling

    Betting against the line-makers; when to bet against the line or popular demand.

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    26/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5.5. Some other examples of mathematicsSome other examples of mathematicsin industryin industry

    6. Getting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    27/29

    Other applications of mathematics

    in industry Yahoo Ad targeting and pricing E-commerce

    Calculating product affinities (up-sells and cross-sells) Finding what you want

    Search algorithms Site testing Web Traffic Arbitrage

    Netflix Supply and demand modeling

    Product recommendation Entelos

    Mathematical modeling of disease Simulations of human response to disease

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    28/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6.6. Getting hereGetting here7. Future directions

  • 8/14/2019 Columbia University Seminar in Applied Mathematics

    29/29

    Outline

    1. Why a mathematician in sports?

    2. Quantitative Decision Making

    3. What is statistics ?4. Related problems in sports statistics

    5. Some other examples of mathematics inindustry

    6. Getting here7.7. Future directionsFuture directions