Quantitative Analysis Report

7/30/2019 Quantitative Analysis Report

1/28

STATISTICAL INFERENCES AND

REGRESSION ANALYSIS IN CRICKET

SUBMITTED BY

GAGANDEEP SINGH12PGP015

MANOJ H - 12PGP026

NIKESH AGARWAL - 12PGP030

SOURAV MONDAL - 12PGP042

VIJAYKRISHNAN G - 12PGP016


2/28

i

ABSTRACT

Cricket is a sport which employs extensive statistical tools for representation and analysis of

data. We, in this project, intended to find how the impact of toss differs on the results of day and

day-night matches. For the purpose of this statistical inference, we used the hypothesis testing of

two population tool to study the mean of both day and day-night population. The findings showed

that toss has a very minimum difference in impact on the result between the day and day-night

matches. We have also studied and estimated with ninety percent confidence, the likely target

interval for runs scored by Indian team while chasing against Pakistan using single population

estimation. This was done with the help of the population which contained all the matches where

India faced Pakistan and batted second. In addition to these, we studied the compensation of IPL

players and tried to establish the relationship between the players skill using their statistical

attributes, and the compensation they are paid using the simple linear regression and multiple

linear regression analysis.

GAGANDEEP SINGH12PGP015

([email protected])

VIJAY KRISHNAN G - 12PGP016

([email protected])

MANOJ H - 12PGP026

([email protected])

NIKESH AGARWAL - 12PGP030

([email protected])

SOURAV MONDAL - 12PGP042

([email protected])


3/28

ii

ACKNOWLEDGEMENT

We would like to sincerely thank Prof. Naval Bajpai, Indian Institute of Management

Raipur for his valuable guidance in this project right from the conception till the completion of

the same.

We would also like to thank our beloved Prof. B.S. Sahay, Director of Indian Institute of

Management Raipur, for rendering his support during the entire project period.

We also thank all the anonymous referees for their valuable comments on the report.

Last but not the least; we thank our classmates for their encouragement and support.
http://iimraipur.ac.in/pdf/nbajpai.pdfhttp://iimraipur.ac.in/pdf/nbajpai.pdf


4/28

iii

TABLE OFCONTENTS

ABSTRACT --------------------------------------------------------------------------------- I

ACKNOWLEDGEMENT ------------------------------------------------------------------------------- II

TABLE OF CONTENTS ----------------------------------------------------------------------------------- III

LIST OF FIGURES ----------------------------------------------------------------------------------- VI

LIST OF TABLES ----------------------------------------------------------------------------------- VI

CHAPTER 1 INTRODUCTION --------------------------------------------- 1

1.1 CRICKET ---------------------------------------------------------------------------------------------------------------------- 1

1.2 STATISTICS IN CRICKET -------------------------------------------------------------------------------------------------- 1

1.2.1 INDIVIDUAL STATISTICS ------------------------------------------------------------------------------------------- 1

1.2.2 TEAM STATISTICS --------------------------------------------------------------------------------------------------- 2

1.3 APPLICATION OF TOOLS ------------------------------------------------------------------------------------------------ 2

1.3.1 PIE CHART ------------------------------------------------------------------------------------------------------------ 2

1.3.2 WAGON-WHEEL ---------------------------------------------------------------------------------------------------- 2

1.3.3 WORM GRAPH ------------------------------------------------------------------------------------------------------ 2

1.3.4 MANHATTAN CHART ---------------------------------------------------------------------------------------------- 2

1.4 OBJECTIVE OF THE PROJECT ------------------------------------------------------------------------------------------- 3

1.5 STATISTICAL TOOLS EMPLOYED --------------------------------------------------------------------------------------- 3

1.5.1 CHARTS AND GRAPHS --------------------------------------------------------------------------------------------- 3

1.5.2 SINGLE POPULATION ESTIMATION ---------------------------------------------------------------------------- 3

1.5.3 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------- 3

1.5.4 SIMPLE LINEAR REGRESSION ------------------------------------------------------------------------------------ 4

1.5.5 MULTIPLE LINEAR REGRESSION -------------------------------------------------------------------------------- 4

CHAPTER 2 LITERATURE REVIEW ------------------------------------- 5

CHAPTER 3 RESEARCH METHODOLOGY ------------------------------ 7

3.1 WINNING PERCENTAGE USING PIE CHART ------------------------------------------------------------------------ 7

3.1.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 7


5/28

iv

3.1.2 POPULATION -------------------------------------------------------------------------------------------------------- 7

3.1.3 PIE CHART ------------------------------------------------------------------------------------------------------------ 7

3.1.4 INFERENCES --------------------------------------------------------------------------------------------------------- 7

3.2 CAPTAINCY RECORD CALCULATION USING BAR CHART-------------------------------------------------------- 8

3.2.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 83.2.2 POPULATION -------------------------------------------------------------------------------------------------------- 8

3.2.3 INFERENCES --------------------------------------------------------------------------------------------------------- 8

3.3 ACHIEVABLE SCORE AT THE END OF 50 OVERS ------------------------------------------------------------------- 9

3.3.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9

3.3.2 TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9

3.4 DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES --------------------------- 9

3.4.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9

3.4.2

TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9

3.5 VALUATION OF PLAYERS IN IPL --------------------------------------------------------------------------------------- 9

3.5.1 REGRESSION --------------------------------------------------------------------------------------------------------- 9

CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION 10

4.1 ESTIMATION OF SINGLE POPULATION ----------------------------------------------------------------------------- 10

4.1.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 10

4.1.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 10

4.1.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 104.1.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 10

4.1.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 10

4.1.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 10

4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 10

4.2 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------------ 11

4.2.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 11

4.2.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 11

4.2.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 11

4.2.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 11

4.2.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 11

4.2.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 12

4.2.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 12

4.3 REGRESSION ANALYSIS OF IPL VALUATION OF PLAYERS------------------------------------------------------ 12

4.4 REGRESSION ANALYSIS ------------------------------------------------------------------------------------------------- 14


6/28

v

4.4.1 AMOUNT VERSUS STRIKE RATE -------------------------------------------------------------------------------- 14

4.4.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 14

4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATE ------------------------------------------------------------------ 14


4.5 DESCRIPTION OF STATISTICS OF BATSMAN ---------------------------------------------------------------------- 164.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS ------------------------------------------------------------------ 16


4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGE ---------------------------------------------------- 17


CHAPTER 5 DISCUSSIONS ----------------------------------------------- 18

5.1 BOWLERS ------------------------------------------------------------------------------------------------------------------- 18

5.2 BATSMEN ------------------------------------------------------------------------------------------------------------------ 185.2.1 REASONS FOR NON-EXPLANATION --------------------------------------------------------------------------- 19

CHAPTER 6 CONCLUSION ----------------------------------------------- 20

6.1 LIMITATIONS -------------------------------------------------------------------------------------------------------------- 20

6.2 FUTURE SCOPE------------------------------------------------------------------------------------------------------------ 20

REFERENCES ----------------------------------------------------------------- 21


7/28

vi

LIST OF FIGURES

FIGURE 3.1 PIE CHART FOR WINNING PERCENTAGE 7

FIGURE 4.1 RESIDUAL PLOTS FOR BOWLERS 15FIGURE 4.2 RESIDUAL PLOTS FOR AMOUNT 17

LIST OF TABLES

TABLE 3.1 POPULATION DATA 7TABLE 3.2 INDIA'S WINNING RECORD UNDER MS DHONI 8

TABLE 3.3 MS DHONI'S CAPTAINCY RECORD 8

TABLE 4.1DISTRIBUTION PLOT 11

TABLE 4.2 DESCRIPTION OF VARIABLES 13

TABLE 4.3 DESCRIPTION OF STATISTICS OF BOWLERS 14

TABLE 4.4 BATSMAN STATISTICS 16


8/28

1

CHAPTER 1 INTRODUCTION1.1CRICKETThe game of cricket has fascinated the minds of many statisticians simply because of the sheer

amount and variety of statistics it generates. Individual statistics are recorded for each player

during a match, and aggregated over a career for batting and bowling across formats. Team

statistics are recorded and maintained separately for various teams in different formats of the

cricket like Test matches, One Day Internationals, Twenty 20s, First-Class matches and List-A

matches. The test matches are the international variant of the First Class matches and hence the

corresponding statistics will be included in the first class statistics of an individual/team.

Similarly, the One Day Internationals are a variant of the List-A matches and hence the

corresponding statistics will be included in the List-A statistics of an individual/team.

1.2STATISTICS IN CRICKETThe applications of statistics in cricket are very diverse, ranging from analysis of the

team/players performance in a particular match/over a period of time, to a comprehensive study

of the evolution of the various aspects of the game. For example, with the help of the games

statistics, one can predict the impact of a particular player on the outcome, and that would serve

as the performance indicator of the player, taken over a period of time. Based on the analysis of

general statistics across the different formats of cricket, venue-based and team-based statistics

could be arrived at, which upon performing an in-depth analysis tend to reveal a lot of clues on

how the game has evolved over the years.

1.2.1 INDIVIDUAL STATISTICSThey are generally calculated for each individual player either for a certain set of matches or

aggregated over his career.

o Matches Played

o Runs Scored

o Highest Score

o Batting/Bowling Averages

o Centuries, Strike Rate

o Maiden Overs

o Economy Rate

o Best Bowling

o Wickets

o Partnerships

o Catches &Stumping

o Captaincy Statistics


9/28

2

1.2.2 TEAM STATISTICSThey are generally calculated for the whole team taken together, considering all the individual

players statistics into account.

o Match Results

o Result Marginso Series Results

o Innings Totals

o Match Scores

o Run Rate

o Extras etc.

1.3APPLICATION OF TOOLSOf late, the impact of television coverage on the sport has been profound, and it has provided a

huge impetus to develop interesting forms of statistical representation to the viewers. The

television networks are thus engaged in pioneering the cause of several new innovative ways of

presenting cricket statistics. Some of the most widely used new forms of statistical representation

include:

1.3.1 PIE CHARTThe Pie charts are one of the most widely used methods in representing cricket statistics, and it is

a circular chart which is subdivided into many sectors. The size of each of the sector is

dependent on the proportion of the total quantity it represents. For example, the extras can be

presented as a pie-chart with the different sectors representing the Leg-byes, No Balls, and

Wides etc.

1.3.2 WAGON-WHEELIt displays a 2D or 3D plot of various shots or runs scored by a player/team upon a cricket fields

overhead view.

1.3.3 WORM GRAPHThis is used to represent the runs scored and wickets taken during an innings, plotted against the

time or balls bowled during a match.

1.3.4 MANHATTAN CHARTThis is used to represent the runs scored and wickets in each over during a match. It is a variant

of the bar graph/histogram, and it is named as Manhattan Chart because of its similarity to the

Manhattan skyline.

With the help of various tools like the ones mentioned above, the purpose is to make the viewer

understand clearly the impact of statistics on the game of cricket. Thereafter, many methods are

devised by the cricket pundits to perform analysis of the statistics, and then to use statistical

inferences to arrive at estimations and predictions about the game.


10/28

3

1.4OBJECTIVE OF THE PROJECTThe main objective of this project is to illustrate the application of statistical inferences and

regression analysis in cricket. A case is taken into account such that the situation is an India-

Pakistan cricket match, and to perform a pre-match analysis, all the One Day Internationals

which ended in a result between India and Pakistan so far are taken into account; the results are

represented using a pie-chart and then proportion of results in each teams favor is interpreted.

Since the data represented using the pie chart was taken from matches spread across a long

duration of time, another type of statistic could be considered to perform the analysis. The wins,

losses and other results achieved by Team India under the leadership of MS Dhoni are

considered, and represented using the bar-chart, which could be used to understand the extremely

high win-loss ratio of MS Dhoni, and hence, the head-to-head record advantage of Pakistan

would not have a significant say in the outcome of the game.

The prediction of the outcome of the game is done in two stages:

a) In the pre-match analysis, prediction is done if there would be a difference in the impact

of toss between the day and day-night matches, using 2-population Hypothesis testing.

b) During the innings break, estimation of an achievable target score range for India is done

with a confidence interval of ninety percent.

Then, a regression analysis is carried out to determine if the pricing of the players in the IPL

auction is explained fully by the various parametric statistics of the individual players or whether

the pricing is influenced by other factors as well.

1.5STATISTICAL TOOLS EMPLOYED1.5.1 CHARTS AND GRAPHSA chart is a graphical representation of data, in which the data is represented by symbols, such as

bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart can represent tabular

numeric data, functions or some kinds of qualitative structures. Charts are often used to ease

understanding of large quantities of data and the relationships between parts of the data. Charts

can usually be read more quickly than the raw data that they are produced from.

1.5.2 SINGLE POPULATION ESTIMATIONThe Z statistic can be used in the calculation of prediction intervals. A prediction interval

consisting of a lower endpoint designated and an upper endpoint designated, is an interval such

that a future observation X will lie in the interval with high probability.

1.5.3 HYPOTHESIS TESTING FOR TWO POPULATIONA statistical hypothesis test is a method of making decisions using data, whether from a

controlled experiment or an observation study. In statistics, a result is called statistically

significant if it is unlikely to have occurred by chance alone, according to a pre-determined

threshold probability, the significance level.


11/28

4

1.5.4 SIMPLE LINEAR REGRESSIONIn statistics, simple linear regression is the least squares estimator of a linear regression model

with a single explanatory variable. In other words, simple linear regression fits a straight line

through the set of n points in such a way that makes the sum of squared residuals of the model as

small as possible.

1.5.5 MULTIPLE LINEAR REGRESSIONMultiple linear regressions are when more than one explanatory variable is used to estimate the

least squares.


12/28

5

CHAPTER 2 LITERATURE REVIEWEstenson et al (1994), and Bennett and Flueck (1983) have studied the players compensation

that is being done in the game of baseball. Results of auction have showed that salaries matched

marginal revenue products and that the open auction showed the declining price anomaly found

to exist in real-world auctions. Similarly, Dobson and Goddard (1998) and Kahn (1992)

considered compensations made for players in football.

Jones and Walsh, (1988) made similar studies in ice-hockey and concluded that skills are the

principal determinant of salaries at all positions. Berri, (1999) answers the question of measuring

the productivity of an individual participating in a team sport that links the player's statistics in

the National Basketball Association (NBA) to team wins. An economic model is employed in the

measurement of each player's marginal product. Such a study is useful in answering the question

offered in the title, or a broader list of questions by both industry insiders and other interested

observers.

In cricket, there are a few studies which deal with the game of cricket. Barr and Kantor (2004)

intended to determine the important skill set for a batsman in one-day cricket. The batting

average statistic has been used to assess the worth of a batsman. However, in the one-day game,

limits on the number of balls bowled have introduced a very important additional dimension to

performance. Assessing batting performance in the one-day game requires the application of at

least a two-dimensional measurement approach because of the time dimension imposed on

limited over cricket. They had used a new graphical representation with Strike rate on one axis

and the Probability of getting out on the other, akin to the risk-return framework used in portfolio

analysis, to obtain useful, direct and comparative insights into batting performance, particularly

in the context of the one-day game. However, we have not come across any study that links

compensation to player attributes.

Rosen (1974) based his model of product differentiation on the hypothesis that goods are valued

for their utility generating attributes. According to him, while making a purchase decision,

consumers evaluate product quality attributes, and pay the sum of implicit prices for each quality

attribute, which is reflected in the observed market price. Hence, price of a product is nothing but

the summation of the prices of all quality attributes.

Shapiro (1983) presented a theoretical framework to examine the halo effect on prices.

Developing an equilibrium price-quality schedule for high-quality products, under the

assumption of competitive markets and imperfect information, he showed that reputation

facilitates a price premium; hence, reputation building can be considered as an investment good.

Weemaes and Riethmuller (2001) studied the role of quality attributes on preferences for fruit

juices. The study involved market valuation of various attributes of fruit juice. It did not consider

consumers preferences, but generated quality attributes from the product label. The study


13/28

6

revealed that consumers paid a premium for nutrition, convenience, and information. In a similar

study on tea, Deodhar and Intodia (2004) showed that colour and aroma were the two important

attributes of a prepared tea.

Extending the analogy to cricket, a cricket player is valued for his on-the-field (and perhaps, off-

the-field) performance. We propose that a cricket player sells his cricketing skills for the IPLtournament. The franchisee team owners bid for the players services, for they would like to

maximize their utility and player performance is an important argument of their utility function.

In equilibrium, the final bid price of a player must be a function of the valuation of winning

attributes of a player.


14/28

7

CHAPTER 3 RESEARCH METHODOLOGY3.1WINNING PERCENTAGE USING PIE CHART3.1.1 OBJECTIVETo give a clear representation of the matches ending in a result between India and Pakistan in

ODI matches played so far. We consider the entire matches played so far. We have a sample size

of117 excluding four matches which have ended in no result

3.1.2 POPULATIONTable 3.1 Population Data

Total Matches Won by India Won by Pakistan117 48 69

3.1.3 PIE CHARTFor the above data a pie-chart can be used best to represent the data.

Figure 3.1 Pie Chart for Winning Percentage

3.1.4 INFERENCESThe above pie-chart implies that among the total number of matches Pakistan won more matches

with total win percentage of 59% and India won 41% of the total number of matches.

India

41%Pakistan

59%

Total Matches: 117


15/28

8

3.2CAPTAINCY RECORD CALCULATION USING BAR CHART3.2.1 OBJECTIVEThe objective is to present the best way to represent the captaincy record of MS Dhoni. The

number matches won or lost by India under the captaincy of Captain Mahendra Singh Dhoni is

taken as the population and the graph is made for the same

3.2.2 POPULATIONTable 3.2 India's Winning Record under MS Dhoni

Total Matches Won Lost Tied No result

117 80 32 2 3

For the above data a bar-chart can be used as the best tool to represent the data.

Table 3.3 MS Dhoni's Captaincy Record

3.2.3 INFERENCESFrom the above bar chart we can see that under the captaincy of Mahendra Singh Dhoni India

played a total of 117 matches among which India won 80 matches, lost 32 matches and tied 2 of

them. For 3 of the matches there were no results.

WON LOST TIED NO RESULT

80

32

2 3


16/28

9

3.3ACHIEVABLE SCORE AT THE END OF 50 OVERSIn the game of cricket, the team chasing can win when it exceeds the score scored by the

opponent. For successful chasing of the total we need to have the team batting second score more

than the team batting first. Thus, we intended to find the runs that Indian team could score while

chasing a target against Pakistan.

3.3.1 POPULATION AND SAMPLINGThe data of that particular team while chasing was considered as population

3.3.2 TECHNIQUE EMPLOYEDEstimation of single population mean was applied to get the intended result. We were able to

predict the mean with a confidence level of 90%.

3.4DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES

We intended to study the impact of toss between day and day- night matches played betweenIndia and Pakistan.

3.4.1 POPULATION AND SAMPLINGWe used the data of matches played between India and Pakistan as the Population. From the

population we applied the technique of random sampling and arrived at a sample size of 38 for

both the populations of day and day-night matches.

3.4.2 TECHNIQUE EMPLOYEDThe hypothesis testing for two populations was applied to study the differences between both the

population means.

3.5VALUATION OF PLAYERS IN IPLNext, our objective was to find the whether the valuation of players in IPL is matching their

skills or are they over or under valued for their skill.

3.5.1 REGRESSIONWe developed a regression model for finding the correlation between a players compensation

against their skills. We choose a sample consisting of 7 batsmen and 7 bowlers and developed

the regression.


17/28

10

CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION4.1ESTIMATION OF SINGLE POPULATION4.1.1 SET NULL AND ALTERNATE HYPOTHESISIn this step, we are trying to predict whether India will be able to successfully chase the

total of 245 runs in 50 overs. According to the data given we are estimating the single

population mean at assumed standard deviation as 53

4.1.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples is greater than 30(64), we take z-test for single sample

population mean. We calculate the estimate value using the formula

4.1.3 LEVEL OF SIGNIFICANCEAlpha = 0.10

4.1.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null

hypothesis will be rejected if the computed value of z is outside +1.645

4.1.5 COLLECTION OF DATASample size (Runs): 64

Standard Deviation: 52.71

Mean of Sample: 243.68

4.1.6 ANALYZE THE DATAZ -0.95

P 0.340

90% of CI (232.78, 254.58)

SE Mean 6.63

4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATIONWith the 90% confidence we can say that India will chase down the total of 245 in 50

overs because the total score 245 comes in the range of (232.78, 254.58).


18/28

11

Table 4.1Distribution Plot

4.2HYPOTHESIS TESTING FOR TWO POPULATION4.2.1 SET NULL AND ALTERNATE HYPOTHESIS

Null Hypothesis =(1 - 2)=0 (No significant difference in runs scored)

Alternate Hypothesis =(1 - 2)0

4.2.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples in both cases is greater than 30 and are independent and their

population variance is unknown, we take z-test for two sample population mean.

4.2.3 LEVEL OF SIGNIFICANCEAlpha = 0.10

4.2.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null

hypothesis will be rejected f the computed value of z is outside +1.645

4.2.5 COLLECTION OF DATASample size 1: 38

Sample size 2: 38

Variance of sample 1: 2370.775

Variance of sample 2: 3119.37

Mean of sample 1: 7.539473684

Mean of sample 2: 5.039473684


19/28

12

4.2.6 ANALYZE THE DATAZ 0.207988776

P (Z


20/28

13

The data sources include the official website of IPL and two other websites, Cricinfo and

Wikipedia. For the sake of convenience we have considered only 8 Indian players in each

category i.e. Bowlers and Batsman. While we have considered final bidding price as the

dependent variable, there is a wealth of data available on the cricketing attributes of IPL players

hypothesized above. We have data relating to the individual performances of these 16 players

spanning across all the IPLs taken place till date

1) Batsman: For the multiple regression analysis we have taken 2 important independent

variables which are the prime determinant of the performances of the players in the long

run. The two variables are the Total runs scored and the Batting averages.

2) Bowlers: For the multiple regression analysis of bowlers also we have taken 2 important

independent variables which are the prime determinant of the performances of the players

in the long run. These are the wickets taken and strike rate.

The relevant variables are drawn from observations on skills that are considered important for

Twenty20 form of the game. For example, in this shorter version of the game, no one is likely tomake centuries frequently. However, a player contributing many runs on a continuous basis and

having high batting average would be an asset for the team. While IPL is a Batsmans game, a

wicket taking bowler could put a lot of pressure on the opposition, and hence, he would be

considered quite useful.

To paraphrase the estimated variable coefficients should be having the right signs and are

statistically significant, the equation has a reasonably high (adjusted) R-square and maintains

parsimony, and there are sufficient degrees of freedom. Based on such guidelines, the variables

chosen for estimating equation (1) and their description is reported in Table. It has been taken

into consideration as to which combination offered the best goodness of fit in terms ofR-square,adjusted R-square, correct signs of the coefficients, t-statistics, and F-statistics. The exact

specification of the regression is given below in Equation (2).

P (BATSMAN)= b0 + b1(RUNS)+ b2(AVERAGE)

P (Bowlers) = b0 + b1(wickets) + b2(strike rate)

Table 4.2 Description of Variables

Variable Description

P Final bid price of a player.

Runs Total runs scored over a span of 5 IPL .

Average Average runs scored in the same period.

Wickets Total number of wickets taken by a bowler in 5 IPLs.

Strike Rate Strike rate i.e. balls per wicket.


21/28

14

Table 4.3 Description of statistics of bowlersName of the bowler Wickets Strike rate Amount(in US Dollars)

Harbhajan Singh 54 18.88 1300000Ishant Sharma 36 29.33 450000

Munaf Patel 70 21.22 700000

Pragyan Ojha 69 25.63 500000

Praveen Kumar 53 22.35 800000

R. Ashwin 49 20.66 850000

R.P. Singh 74 19.78 500000

Zaheer Khan 65 19.22 900000

4.4REGRESSION ANALYSIS

4.4.1 AMOUNT VERSUS STRIKE RATEThe regression equation is

Amount = 1899656 - 51941 Strike Rate

Predictor Coef SE-Coef T P

Constant 1899656 529535 3.59 0.012

Strike Rate -51941 23649 -2.20 0.070

S = 226442 R-Sq = 44.6% R-Sq(adj) = 35.3%

4.4.2 ANALYSIS OF VARIANCESource DF SS MS F P

Regression 1 2.47345E+11 2.47345E+11 4.82 0.070

Residual Error 6 3.07655E+11 51275908679

Total 7 5.55000E+11

Durbin-Watson statistic = 1.42023

dl=0.76 du=1.33 4-du=2.67 4-dl=3.24

Hence there is is no autocorrelation.

4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATEThe regression equation is

Amount = 3190691 - 13138 Wickets - 75398 Strike Rate


22/28

15

Predictor Coef SE Coef T P

Constant 3190691 716164 4.46 0.007

Wickets -13138 5955 -2.21 0.078

Strike Rate -75398 21286 -3.54 0.017

S = 176571 R-Sq = 71.9% R-Sq(adj) = 60.7%


Regression 2 3.99114E+11 1.99557E+11 6.40 0.042

Residual Error 5 1.55886E+11 31177190382

Total 7 5.55000E+11

Source DF SeqSS

Wickets 1 7940674349

Strike Rate 1 3.91173E+11


Hence there is no autocorrelation.

4000002000000-200000-400000

99

90

50

10

1

Residual

Percent

1000000800000600000400000

200000

100000

0

-100000

-200000

Fitted Value

Residual

2000001000000-100000-200000

3

2

1

0

Residual

Frequency

87654321

200000

100000

0

-100000

-200000

Observation Order

Residual

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Amount

Figure 4.1Residual Plots for Bowlers


23/28

16

4.5DESCRIPTION OF STATISTICS OF BATSMAN4.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS

Table 4.4 Batsman Statistics

Name of the Batsman Runs Average Amount(in US Dollars)

M.S.Dhoni 1782 37.12 1300000

S.K.Raina 2254 33.64 1800000

V.Sehwag 1879 30.3 1800000

SR Tendulkar 2047 37.9 2400000

V.Kohli 1639 28.25 500000

R.G.Sharma 1975 31.35 2000000

G.Gambhir 2065 33.31 1800000

R.Dravid 1703 27.91 1800000

The regression equation is

Amount in US Dollars) = - 1663273 + 1740 Runs


Constant -1663273 1649219 -1.01 0.352

Runs 1740.5 855.5 2.03 0.088

S = 467406 R-Sq = 40.8% R-Sq(adj) = 31.0%

4.5.2 ANALYSIS OF VARIANCESource DF SS MS F PRegression 1 9.04188E+11 9.04188E+11 4.14 0.088

Residual Error 6 1.31081E+12 2.18469E+11

Total 7 2.21500E+12


Thus there is no autocorrelation


24/28

17

4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGEThe regression equation is

Amount (in US Dollars) = - 1946333 + 1562 Runs + 19235 Average


Constant -1946333 1992016 -0.98 0.373

Runs 1562 1080 1.45 0.207

Average 19235 59656 0.32 0.760

S = 506777 R-Sq = 42.0% R-Sq(adj) = 18.8%


Regression 2 9.30888E+11 4.65444E+11 1.81 0.256

Residual Error 5 1.28411E+12 2.56822E+11

Total 7 2.21500E+12

Source DF Seq SS

Runs 1 9.04188E+11

Average 1 26699560534


Thus there is no autocorrelation.

10000005000000-500000-1000000

99

90

50

10

1

Residual

Percent

200000015000001000000

500000

250000

0

-250000

-500000

Fitted Value

Residual

5000002500000-250000-500000-750000

2.0

1.5

1.0

0.5

0.0

Residual

Frequency

87654321

500000

250000

0

-250000

-500000

Observation Order

Residua

l

Normal Probability Plot Versus Fits

Histogram Versus Order

Residual Plots for Amount(in US Dollars)

Figure 4.2Residual Plots for Amount


25/28

18

CHAPTER 5 DISCUSSIONS5.1BOWLERSWe can clearly see that coefficient of determination is very low i.e.44.6% for strike rate as an

individual factor i.e. Simple Linear regression. This indicates the low level of correlation

between strike rate of a bowler and the amount paid to him in IPL. In other words only around

44.6% of the change in amount is determined or explained by strike rate of the bowler. The rest

of the change is unexplained. Similarly coefficient of determination in multiple regression model

has been determined as 71.9% which too is low. Hence it can be safely concluded that the

performance factors are not at the helm for determination of the bid price of the bowlers which is

rather determined by various other factors which have been discussed later on in the below

mentioned analysis of the regression output.The corresponding p-value has been obtained as

0.042 which lies in the rejection region.

H0: Key performance indicators (wickets and strike rate) are the key determinant of the

amount paid to bowlers in IPL.

H1: Other factors act as the key determinants of the amount paid to the bowlers.

Since the null hypothesis has been rejected and the alternative hypothesis has been selected the

key conclusion that can be derived from the above exercise is that there are a variety of other

reasons responsible for the insuperably high amount of money paid to bowlers.

5.2BATSMENIt is clearly evident that the coefficient of determination for runs is quite low at 40.8% which

indicates runs do not play a major role in the fixing of the disbursements of the cricketers.This

indicates the low level of correlation between runs scored by a batsman and the amount paid to

him in IPL.It implies that only 40.8% of the change in amount is determined or explained by

runs scored by the batsman in the T-20 format. The rest of the change is unexplained.Similarly

coefficient of determination in multiple regression model has been determined as 42% which too

is really low. Hence it can be safely concluded that the performance factors are not the key

factors to be considered as majority of the part is dependent upon various other factors. The

corresponding p-value has been obtained as 0.256 which lies in the rejection region.

H0: Key performance indicators (runs and average) are the key determinant of the amount paid

to batsman in IPL.

H1: Other factors act as the key determinants of the amount paid to the batsman.


26/28

19

Thus null hypothesis has been rejected driving home the point that there are various other factors

in operation which may be responsible for the amount of money being so high.

These high premiums, over and above thecompensation for their cricketing attributes, seem to

bea reflection of their ability to draw huge crowds nationallydue to their charismatic association

with film stars, the racial controversies surrounding them etc.

5.2.1 REASONS FOR NON-EXPLANATIONSome of the reasons which may account for non-explanation of the relation might be as follows:

1. Iconic Value.

2. Glamour.

3. Controversy.

4. Age.

5. Popularity.


27/28

20

CHAPTER 6 CONCLUSION6.1LIMITATIONSThe limitations of our study are:

1. The usage of the pie chart and the bar chart to represent the statistics for earlier India-Pakistan matches was appropriate, but when predictions are made with the help of those

representative forms with respect to the current match, it is not exactly possible because

of the inherent unpredictability in the game of cricket.

2. While estimating the achievable target score with a ninety percent confidence interval

range, we take into account only the matches played already between the two teams,without considering other factors like the difference in the set of players between those

games and the current match, the form in which the individual players are currently in,

the nature of the pitch, weather conditions etc. This might result in incorrect range

estimation.

3. In the determination of the difference in the impact of toss, we calculate the net run-ratedifference between the teams batting first and second, and arrive at two populations, oneeach for the Day and Day-night matches. But in this case, the net run rate difference is

calculated across the maximum overs for all the matches, and the event of teams chasing

down targets easily without losing wickets is not explained through our population.

4. During the process of developing a regression model for determining the pricing of an

IPL player based on his statistical attributes, there are many intangible attributes of anindividual player. For example, a players brand value, image, relevance to the franchise

is all taken into account while determining his price. But, these aspects are completely

ignored in our study while determining the regression model. This probably explains the

low correlation between the independent variables and the pricing of the player.

6.2FUTURE SCOPESingle population estimation could be used to estimate the likely scores of people with

confidence based on their previous performances. This could help the teams in formulating the

strategies against he opponent.

Regression model could be applied in to fix a players compensation based on his skill set. This

could help the team franchise to fix a ceiling price on each player before going in for auction.This could help them spend the money accordingly and thus could achieve maximum return on

money.


28/28

REFERENCES

Armstrong, J and Willis, R J (1993). Scheduling the Cricket World Cup: A Case Study, The

Journal of the Operational Research Society, 44(11), 1067-1072.

Barr, G D I and Kantor, B S (2004). A Criterion for Comparing and Selecting Batsmen in

Limited Overs Cricket, Journal of the Operational Research Society, 55(12), 1266-

1274.

Bennett, J M and Flueck, J A (1983). An Evaluation of Major League Baseball Offensive

Performance Models, The American Statistician, 37(1), 76-82.

Berri, D J (1999). Who Is Most Valuable? Measuring the Players Production of Wins in the

National Basketball Association,Managerial and Decision Economics, 20(8), 411-427.

Cricinfo. http://www.cricinfo.com/, as on September 13, 2012.

Estenson, P S (1994). Salary Determination in Major League Baseball: A Classroom Exercise,

Managerial and Decision Economics, 15(5), 537-541.

Jones, J C H and Walsh, W D (1988). Salary Determination in the National Hockey League:

The Effects of Skills, Franchise Characteristics, and Discrimination, Industrial and

Labor Relations Review, 41(4), 592-604.

Rastogi, S. K. (APRIL - JUNE 2009). "Player Pricing and Valuation of Cricketing

Attributes:Exploring the IPL Twenty20 Vision". Vikalpa, Volume 34, 15-23.

Quantitative Analysis Report

Documents

Transcript of Quantitative Analysis Report