Sports Analytics in the Era of Big Data and Data Science

Post on 12-Apr-2017

1.960 views 2 download

Transcript of Sports Analytics in the Era of Big Data and Data Science

SPORTS ANALYTICS IN THE ERA OF BIG DATA AND DATA SCIENCE KONSTANTINOS PELECHRINIS

@kpelechrinis https://412sportsanalytics.wordpress.com

DATA-DRIVEN COACHES?

DATA-DRIVEN FRONT OFFICES?

WHY NOW?

➤ Data analysis & use of statistics is not new in sports!!

➤ Now we have the technology to collect many more detailed information about the game

➤ Detailed box score

➤ Play-by-play data

➤ Player tracking

TRACKING

RESOURCES

Some of the examples

are taken from this book

SPORT MARKETS

➤ A typical business or firm operates with the objective of profit maximization

➤ This might not be the case for the owner of a professional sports team!!

➤ For profit year by year

➤ Maximize wins

➤ Capital appreciation

SPORT MARKETS

➤ Becoming the dominant player is not the goal in sports industry

➤ If a team were assured of victory in almost any competition the whole league would be of little - if at all - interest

➤ Competitive balance

➤ Salary cap!

➤ Draft!

SPORT MARKETS

● ●

● ●●

●●●●

●●

●●

● ●

●●●●

●● ●

●● ●

●●

LAA

BAL WSN

LAD

STL DET

SFGPIT & OAK

CLENYY

TORMIL

ATL

MIA

CHC PHI

BOSMIN

TEXCOL

ARI

KCR

SEA

NYM

SDP & TBRCIN

CHW

HOU

40

45

50

55

60

50 100 150 200 250Team Payroll (Millions of Dollars)

Perc

enta

ge o

f Gam

es W

on

Correlation coef=0.26p-value = 0.16!

Only 6% of the win/loss percentage is

explained by the payroll differences!

RANKING TEAMS

➤ Team performance is central to sports data science

➤ Ratings and rankings

➤ Challenges

➤ Imbalance in team schedules

➤ Win/Loss percentages does not consider strength schedule

RANKING TEAMS

➤ Network-based solution

➤ Win/loss directed network

➤ PageRank

RANKING TEAMS

RANKING TEAMS

RANKING TEAMS

➤ Unidimensional scaling

➤ Matrix of how many times each team beats the other

➤ Transform to proportions, average across rows or columns and standardized it

➤ Automatic adjustment for schedule strength

RANKING TEAMS

NYK PHIMINLAL ORL

SACCHA DENDETIND MIABOS MILBKN UTAPHXNOP WASOKC TORCHI PORDALCLE MEMLAC SASHOUATL

GSW

0

200

400

600

ATL BKN BOS CHA CHI CLE DAL DEN DET GSWHOU IND LAC LAL MEM MIA MIL MIN NOP NYK OKC ORL PHI PHX POR SAC SAS TOR UTA WAS

Ran

king

Sco

re

COACHING DECISIONS

➤ Evidence-based coaching

➤ Go for the 4th down or not?

➤ Go for the 2-point conversion or take the cheap shot?

➤ Shoot for three to win or shoot for two to tie the game?

➤ …

➤ We can now quantify the rationality of coaches!

COACHING DECISIONS

COACHING DECISIONS

OR

COACHING DECISIONS

E[p]= 2* - 1*

15

14

14

15

9

24

12

13

13

1624

21

10

17

21

11

12

14

11

12

10

14

9

16

22

6

14

5

14

22

1218

-0.50

-0.25

0.00

0.25

0.50

ARI ATL BAL BUF CAR CHI CIN CLE DAL DEN DET GB HOU IND JAC KC MIA MIN NE NO NYG NYJ OAK PHI PIT SD SEA SF STL TB TENWAS

Exp

ecte

d P

oint

Gai

n

COACHING DECISIONS

COACHING DECISIONS

Touchback

-2

-1

0

1

2

3

0 25 50 75 100Distance to the goal line when 4th down

Exp

ecte

d po

ints

gai

ned

COMPUTATIONAL GAME MODELS

COMPUTATIONAL GAME MODELS

-1.0

-0.5

0.0

0.5

1.0

Q1 Q2 Q3 Q4Quarter

Rat

io r

QuarterQ1

Q2

Q3

Q4

0.00

0.01

0.02

0.03

0.04

0 20 40 60Time (minute)

Turn

over

Den

sity

COMPUTATIONAL GAME MODELS

Bootstrap

BB

Historical game data

Correlationmatrix

LogisticRegression

Model

x1111,· · ·· · ·,xB1B1

x1212,. . .. . .,xB2B2

P1P1

P2P2

H0 : P1 = P2H0 : P1 = P2

H1 : P1 6= P2H1 : P1 6= P2

P1 � P2P1 � P2

pp-value

Mean accuracy=0.627 Mean accuracy=0.787

Mean accuracy=0.517 Mean accuracy=0.6

0.00

0.25

0.50

0.75

1.00

8 9 10 11 12 13 14 15 16 17Week

Accuracy

Legend text

2014

2015

LEAGUE CHANGES

➤ Can we predict and/or evaluate the impact of a rule change?

➤ What if we move the three point line further away?

➤ What was the impact of the new PAT rule?

➤ Will the new touchback rule give an advantage to the offense?

LEAGUE CHANGES

Should the 3-point line be moved further away?

LEAGUE CHANGES

LEAGUE CHANGES

SPORTS MARKETING

➤ Sports are part of the entertainment market

➤ Marketing decisions can always benefit from good data!

➤ What price should the ticket have?

➤ What team-branded merchandise should you sell?

➤ Does a swag promotion justify a higher ticket price?

➤ What is the best strategy for national branding?

➤ …

SPORTS MARKETING

➤ Case study: Consumer preferences for Dodger’s stadium seating

➤ Conjoint analysis

➤ Product profiles

➤ Consumers rank the products

➤ Ranking reveals their preference

SPORTS MARKETING

Part worths (i.e., regression coefficients),

reflect the strength of consumer preferences

for each level of each product attribute.

SPORTS MARKETING

➤ Can we use these results to assess willingness for a consumer to pay for tickets?

➤ $20 tickets have part-worth of 3.25, while $95 tickets have part-worth of -3.50.

➤ Difference in part-worth is 6.25, which in terms of $ this corresponds to $75

➤ 1 part-worth is worth $11.11 to the consumer

➤ For this consumer we see that the part-worth differential between a loge seat and a field seat is 2.75

➤ This consumer is willing to spend 2.75*11.11=$30.55 for a field seat compared to a loge seat

PROMOTING BRANDS & PRODUCTS

PROMOTING BRANDS & PRODUCTS

= a* + b* + c* + d

PROMOTING BRANDS & PRODUCTS

DATA SOURCES

➤ There are various websites where you can get data

➤ Mainly aggregate statistics, boxscores etc

DATA SOURCES

➤ Flexibility —> play-by-play data

➤ Major leagues provide an API

➤ Sport enthusiast have created libraries to access them

Case study: NFLgame in Python

https://github.com/BurntSushi/nflgame

DATA SOURCESgames = nflgame.games(2015,week=1,kind=‘REG’)

>>> games [<nflgame.game.Game object at 0x107652210>, <nflgame.game.Game object at 0x107652310>, <nflgame.game.Game object at 0x107652410>, <nflgame.game.Game object at 0x107652510>, <nflgame.game.Game object at 0x107652610>, <nflgame.game.Game object at 0x107652710>, <nflgame.game.Game object at 0x107652810>, <nflgame.game.Game object at 0x107652910>, <nflgame.game.Game object at 0x107652a10>, <nflgame.game.Game object at 0x107652b10>, <nflgame.game.Game object at 0x107652c10>, <nflgame.game.Game object at 0x107652d10>, <nflgame.game.Game object at 0x107652e10>, <nflgame.game.Game object at 0x107652f10>, <nflgame.game.Game object at 0x107d02050>, <nflgame.game.Game object at 0x107d02150>]

>>> games[0].home u'NE' >>> games[0].away u'PIT' >>>

>>> games[0].score_home 28 >>> games[0].score_away 21

DATA SOURCES

>>> for i in games[0].drives: ... print i ... PIT (Start: Q1 15:00, End: Q1 09:40) Missed FG NE (Start: Q1 09:40, End: Q1 07:41) Punt PIT (Start: Q1 07:41, End: Q1 03:14) Punt NE (Start: Q1 03:14, End: Q2 11:11) Touchdown PIT (Start: Q2 11:11, End: Q2 08:38) Missed FG NE (Start: Q2 08:38, End: Q2 04:01) Touchdown PIT (Start: Q2 04:01, End: Q2 00:03) Field Goal NE (Start: Q2 00:03, End: Q2 00:00) End of Half NE (Start: Q3 15:00, End: Q3 10:37) Touchdown PIT (Start: Q3 10:37, End: Q3 06:43) Touchdown NE (Start: Q3 06:43, End: Q3 04:15) Punt PIT (Start: Q3 04:15, End: Q4 11:39) Field Goal NE (Start: Q4 11:39, End: Q4 09:20) Touchdown PIT (Start: Q4 09:20, End: Q4 08:29) Punt NE (Start: Q4 08:29, End: Q4 07:29) Punt PIT (Start: Q4 07:29, End: Q4 07:00) Interception NE (Start: Q4 07:00, End: Q4 02:59) Punt PIT (Start: Q4 02:59, End: Q4 00:02) Touchdown NE (Start: Q4 00:02, End: Q4 00:00) End of Game

DATA SOURCESplays = nflgame.combine_plays(games) >>> for p in plays: ... print p ... (NE, NE 35, Q1) S.Gostkowski kicks 65 yards from NE 35 to end zone, Touchback. (PIT, PIT 20, Q1, 1 and 10) (15:00) De.Williams right tackle to PIT 38 for 18 yards (D.Hightower). (PIT, PIT 38, Q1, 1 and 10) (14:21) B.Roethlisberger pass short right to A.Brown pushed ob at PIT 47 for 9 yards (D.Hightower). (PIT, PIT 47, Q1, 2 and 1) (14:04) De.Williams right guard to NE 49 for 4 yards (J.Collins; M.Brown). (PIT, NE 49, Q1, 1 and 10) (13:26) B.Roethlisberger pass short right to H.Miller to NE 35 for 14 yards (J.Mayo). (PIT, NE 35, Q1, 1 and 10) (12:42) (Shotgun) De.Williams right guard to NE 24 for 11 yards (J.Collins). (PIT, NE 24, Q1, 1 and 10) (12:05) A.Brown sacked at NE 32 for -8 yards (M.Brown). (PIT, NE 32, Q1, 2 and 18) (11:20) (Shotgun) De.Williams right end pushed ob at NE 28 for 4 yards (D.Hightower). PENALTY on PIT-M.Gilbert, Offensive Holding, 10 yards, enforced at NE 32 - No Play. (PIT, NE 42, Q1, 2 and 28) (10:53) W.Johnson right guard to NE 36 for 6 yards (R.Ninkovich). NE-D.Easley was injured during the play. He is Out. (PIT, NE 36, Q1, 3 and 22) (10:28) (Shotgun) B.Roethlisberger pass short right to H.Miller to NE 26 for 10 yards (P.Chung; M.Butler). (PIT, NE 26, Q1, 4 and 12) (9:44) J.Scobee 44 yard field goal is No Good, Wide Right, Center-G.Warren, Holder-J.Berry. (NE, NE 34, Q1, 1 and 10) (9:40) (Shotgun) T.Brady pass short left to J.Edelman pushed ob at NE 47 for 13 yards (W.Gay). PENALTY on NE-N.Solder, Unnecessary Roughness, 15 yards, enforced between downs. (NE, NE 32, Q1, 1 and 10) (9:14) (Shotgun) T.Brady pass short left to D.Lewis to NE 44 for 12 yards (J.Harrison). (NE, NE 44, Q1, 1 and 10) (9:00) (No Huddle, Shotgun) T.Brady pass short left to D.Lewis ran ob at PIT 43 for 13 yards. (NE, PIT 43, Q1, 1 and 10) (8:31) (No Huddle, Shotgun) T.Brady pass incomplete short right to R.Gronkowski. (NE, PIT 43, Q1, 2 and 10) (8:27) T.Brady pass incomplete deep right to D.Amendola. (NE, PIT 43, Q1, 3 and 10) (8:22) (Shotgun) T.Brady sacked at PIT 43 for 0 yards (B.Dupree). (NE, PIT 43, Q1, 4 and 10) (7:48) R.Allen punts 36 yards to PIT 7, Center-J.Cardona, fair catch by A.Brown. (PIT, PIT 7, Q1, 1 and 10) (7:41) De.Williams left guard to PIT 13 for 6 yards (A.Branch; G.Grissom). (PIT, PIT 13, Q1, 2 and 4) (7:07) De.Williams left tackle to PIT 12 for -1 yards (C.Jones). (PIT, PIT 12, Q1, 3 and 5) (6:26) (Shotgun) B.Roethlisberger pass short left to A.Brown pushed ob at PIT 22 for 10 yards (D.McCourty). (PIT, PIT 22, Q1, 1 and 10) (5:54) De.Williams right guard to PIT 26 for 4 yards (R.Ninkovich). PENALTY on PIT-K.Beachum, Illegal Formation, 5 yards, enforced at PIT 22 - No Play. (PIT, PIT 17, Q1, 1 and 15) (5:29) (Shotgun) B.Roethlisberger pass short right to A.Brown to PIT 20 for 3 yards (J.Collins). (PIT, PIT 20, Q1, 2 and 12) (4:48) B.Roethlisberger sacked at PIT 14 for -6 yards (D.Hightower). (PIT, PIT 14, Q1, 3 and 18) (4:03) (Shotgun) B.Roethlisberger pass deep left to H.Miller to PIT 31 for 17 yards (D.McCourty; T.Brown). (PIT, PIT 31, Q1, 4 and 1) (3:25) J.Berry punts 50 yards to NE 19, Center-G.Warren. D.Amendola to NE 34 for 15 yards (V.Williams). PENALTY on NE-M.Slater, Illegal Block Above the Waist, 10 yards, enforced at NE 20. (NE, NE 10, Q1, 1 and 10) (3:14) D.Lewis left tackle to NE 18 for 8 yards (W.Allen). (NE, NE 18, Q1, 2 and 2) (2:40) D.Lewis up the middle to NE 19 for 1 yard (M.Mitchell). (NE, NE 19, Q1, 3 and 1) (2:05) T.Brady up the middle to NE 20 for 1 yard (L.Timmons; S.McLendon). (NE, NE 20, Q1, 1 and 10) (1:14) D.Lewis left end pushed ob at NE 25 for 5 yards (L.Timmons). PENALTY on NE-N.Solder, Offensive Holding, 10 yards, enforced at NE 20 - No Play. (NE, NE 10, Q1, 1 and 20) (:45) (Shotgun) T.Brady pass short left to A.Dobson to NE 19 for 9 yards (W.Gay). (NE, NE 19, Q1, 2 and 11) (:12) (Shotgun) T.Brady pass short left to J.Edelman to NE 28 for 9 yards (C.Allen). ….

What does all this mean for me?

Work = Fun

BUT…

➤ Good understanding of fundamentals of statistics and probabilities

➤ Ability to work with APIs and data

➤ Python, R, MySQL

➤ Of course domain knowledge