Quantitative Analysis Report
-
Upload
manoj-hariharan -
Category
Documents
-
view
222 -
download
0
Transcript of Quantitative Analysis Report
-
7/30/2019 Quantitative Analysis Report
1/28
STATISTICAL INFERENCES AND
REGRESSION ANALYSIS IN CRICKET
SUBMITTED BY
GAGANDEEP SINGH12PGP015
MANOJ H - 12PGP026
NIKESH AGARWAL - 12PGP030
SOURAV MONDAL - 12PGP042
VIJAYKRISHNAN G - 12PGP016
-
7/30/2019 Quantitative Analysis Report
2/28
i
ABSTRACT
Cricket is a sport which employs extensive statistical tools for representation and analysis of
data. We, in this project, intended to find how the impact of toss differs on the results of day and
day-night matches. For the purpose of this statistical inference, we used the hypothesis testing of
two population tool to study the mean of both day and day-night population. The findings showed
that toss has a very minimum difference in impact on the result between the day and day-night
matches. We have also studied and estimated with ninety percent confidence, the likely target
interval for runs scored by Indian team while chasing against Pakistan using single population
estimation. This was done with the help of the population which contained all the matches where
India faced Pakistan and batted second. In addition to these, we studied the compensation of IPL
players and tried to establish the relationship between the players skill using their statistical
attributes, and the compensation they are paid using the simple linear regression and multiple
linear regression analysis.
GAGANDEEP SINGH12PGP015
VIJAY KRISHNAN G - 12PGP016
MANOJ H - 12PGP026
NIKESH AGARWAL - 12PGP030
SOURAV MONDAL - 12PGP042
-
7/30/2019 Quantitative Analysis Report
3/28
ii
ACKNOWLEDGEMENT
We would like to sincerely thank Prof. Naval Bajpai, Indian Institute of Management
Raipur for his valuable guidance in this project right from the conception till the completion of
the same.
We would also like to thank our beloved Prof. B.S. Sahay, Director of Indian Institute of
Management Raipur, for rendering his support during the entire project period.
We also thank all the anonymous referees for their valuable comments on the report.
Last but not the least; we thank our classmates for their encouragement and support.
http://iimraipur.ac.in/pdf/nbajpai.pdfhttp://iimraipur.ac.in/pdf/nbajpai.pdf -
7/30/2019 Quantitative Analysis Report
4/28
iii
TABLE OFCONTENTS
ABSTRACT --------------------------------------------------------------------------------- I
ACKNOWLEDGEMENT ------------------------------------------------------------------------------- II
TABLE OF CONTENTS ----------------------------------------------------------------------------------- III
LIST OF FIGURES ----------------------------------------------------------------------------------- VI
LIST OF TABLES ----------------------------------------------------------------------------------- VI
CHAPTER 1 INTRODUCTION --------------------------------------------- 1
1.1 CRICKET ---------------------------------------------------------------------------------------------------------------------- 1
1.2 STATISTICS IN CRICKET -------------------------------------------------------------------------------------------------- 1
1.2.1 INDIVIDUAL STATISTICS ------------------------------------------------------------------------------------------- 1
1.2.2 TEAM STATISTICS --------------------------------------------------------------------------------------------------- 2
1.3 APPLICATION OF TOOLS ------------------------------------------------------------------------------------------------ 2
1.3.1 PIE CHART ------------------------------------------------------------------------------------------------------------ 2
1.3.2 WAGON-WHEEL ---------------------------------------------------------------------------------------------------- 2
1.3.3 WORM GRAPH ------------------------------------------------------------------------------------------------------ 2
1.3.4 MANHATTAN CHART ---------------------------------------------------------------------------------------------- 2
1.4 OBJECTIVE OF THE PROJECT ------------------------------------------------------------------------------------------- 3
1.5 STATISTICAL TOOLS EMPLOYED --------------------------------------------------------------------------------------- 3
1.5.1 CHARTS AND GRAPHS --------------------------------------------------------------------------------------------- 3
1.5.2 SINGLE POPULATION ESTIMATION ---------------------------------------------------------------------------- 3
1.5.3 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------- 3
1.5.4 SIMPLE LINEAR REGRESSION ------------------------------------------------------------------------------------ 4
1.5.5 MULTIPLE LINEAR REGRESSION -------------------------------------------------------------------------------- 4
CHAPTER 2 LITERATURE REVIEW ------------------------------------- 5
CHAPTER 3 RESEARCH METHODOLOGY ------------------------------ 7
3.1 WINNING PERCENTAGE USING PIE CHART ------------------------------------------------------------------------ 7
3.1.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 7
-
7/30/2019 Quantitative Analysis Report
5/28
iv
3.1.2 POPULATION -------------------------------------------------------------------------------------------------------- 7
3.1.3 PIE CHART ------------------------------------------------------------------------------------------------------------ 7
3.1.4 INFERENCES --------------------------------------------------------------------------------------------------------- 7
3.2 CAPTAINCY RECORD CALCULATION USING BAR CHART-------------------------------------------------------- 8
3.2.1 OBJECTIVE ------------------------------------------------------------------------------------------------------------ 83.2.2 POPULATION -------------------------------------------------------------------------------------------------------- 8
3.2.3 INFERENCES --------------------------------------------------------------------------------------------------------- 8
3.3 ACHIEVABLE SCORE AT THE END OF 50 OVERS ------------------------------------------------------------------- 9
3.3.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9
3.3.2 TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9
3.4 DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES --------------------------- 9
3.4.1 POPULATION AND SAMPLING ---------------------------------------------------------------------------------- 9
3.4.2
TECHNIQUE EMPLOYED ------------------------------------------------------------------------------------------ 9
3.5 VALUATION OF PLAYERS IN IPL --------------------------------------------------------------------------------------- 9
3.5.1 REGRESSION --------------------------------------------------------------------------------------------------------- 9
CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION 10
4.1 ESTIMATION OF SINGLE POPULATION ----------------------------------------------------------------------------- 10
4.1.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 10
4.1.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 10
4.1.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 104.1.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 10
4.1.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 10
4.1.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 10
4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 10
4.2 HYPOTHESIS TESTING FOR TWO POPULATION ------------------------------------------------------------------ 11
4.2.1 SET NULL AND ALTERNATE HYPOTHESIS -------------------------------------------------------------------- 11
4.2.2 DETERMINE APPROPRIATE STATISTICAL TEST ------------------------------------------------------------- 11
4.2.3 LEVEL OF SIGNIFICANCE ----------------------------------------------------------------------------------------- 11
4.2.4 SET THE DECISION RULE ----------------------------------------------------------------------------------------- 11
4.2.5 COLLECTION OF DATA -------------------------------------------------------------------------------------------- 11
4.2.6 ANALYZE THE DATA ----------------------------------------------------------------------------------------------- 12
4.2.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATION ----------------------------------------------- 12
4.3 REGRESSION ANALYSIS OF IPL VALUATION OF PLAYERS------------------------------------------------------ 12
4.4 REGRESSION ANALYSIS ------------------------------------------------------------------------------------------------- 14
-
7/30/2019 Quantitative Analysis Report
6/28
v
4.4.1 AMOUNT VERSUS STRIKE RATE -------------------------------------------------------------------------------- 14
4.4.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 14
4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATE ------------------------------------------------------------------ 14
4.4.4 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 15
4.5 DESCRIPTION OF STATISTICS OF BATSMAN ---------------------------------------------------------------------- 164.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS ------------------------------------------------------------------ 16
4.5.2 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 16
4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGE ---------------------------------------------------- 17
4.5.4 ANALYSIS OF VARIANCE ----------------------------------------------------------------------------------------- 17
CHAPTER 5 DISCUSSIONS ----------------------------------------------- 18
5.1 BOWLERS ------------------------------------------------------------------------------------------------------------------- 18
5.2 BATSMEN ------------------------------------------------------------------------------------------------------------------ 185.2.1 REASONS FOR NON-EXPLANATION --------------------------------------------------------------------------- 19
CHAPTER 6 CONCLUSION ----------------------------------------------- 20
6.1 LIMITATIONS -------------------------------------------------------------------------------------------------------------- 20
6.2 FUTURE SCOPE------------------------------------------------------------------------------------------------------------ 20
REFERENCES ----------------------------------------------------------------- 21
-
7/30/2019 Quantitative Analysis Report
7/28
vi
LIST OF FIGURES
FIGURE 3.1 PIE CHART FOR WINNING PERCENTAGE 7
FIGURE 4.1 RESIDUAL PLOTS FOR BOWLERS 15FIGURE 4.2 RESIDUAL PLOTS FOR AMOUNT 17
LIST OF TABLES
TABLE 3.1 POPULATION DATA 7TABLE 3.2 INDIA'S WINNING RECORD UNDER MS DHONI 8
TABLE 3.3 MS DHONI'S CAPTAINCY RECORD 8
TABLE 4.1DISTRIBUTION PLOT 11
TABLE 4.2 DESCRIPTION OF VARIABLES 13
TABLE 4.3 DESCRIPTION OF STATISTICS OF BOWLERS 14
TABLE 4.4 BATSMAN STATISTICS 16
-
7/30/2019 Quantitative Analysis Report
8/28
1
CHAPTER 1 INTRODUCTION1.1CRICKETThe game of cricket has fascinated the minds of many statisticians simply because of the sheer
amount and variety of statistics it generates. Individual statistics are recorded for each player
during a match, and aggregated over a career for batting and bowling across formats. Team
statistics are recorded and maintained separately for various teams in different formats of the
cricket like Test matches, One Day Internationals, Twenty 20s, First-Class matches and List-A
matches. The test matches are the international variant of the First Class matches and hence the
corresponding statistics will be included in the first class statistics of an individual/team.
Similarly, the One Day Internationals are a variant of the List-A matches and hence the
corresponding statistics will be included in the List-A statistics of an individual/team.
1.2STATISTICS IN CRICKETThe applications of statistics in cricket are very diverse, ranging from analysis of the
team/players performance in a particular match/over a period of time, to a comprehensive study
of the evolution of the various aspects of the game. For example, with the help of the games
statistics, one can predict the impact of a particular player on the outcome, and that would serve
as the performance indicator of the player, taken over a period of time. Based on the analysis of
general statistics across the different formats of cricket, venue-based and team-based statistics
could be arrived at, which upon performing an in-depth analysis tend to reveal a lot of clues on
how the game has evolved over the years.
1.2.1 INDIVIDUAL STATISTICSThey are generally calculated for each individual player either for a certain set of matches or
aggregated over his career.
o Matches Played
o Runs Scored
o Highest Score
o Batting/Bowling Averages
o Centuries, Strike Rate
o Maiden Overs
o Economy Rate
o Best Bowling
o Wickets
o Partnerships
o Catches &Stumping
o Captaincy Statistics
-
7/30/2019 Quantitative Analysis Report
9/28
2
1.2.2 TEAM STATISTICSThey are generally calculated for the whole team taken together, considering all the individual
players statistics into account.
o Match Results
o Result Marginso Series Results
o Innings Totals
o Match Scores
o Run Rate
o Extras etc.
1.3APPLICATION OF TOOLSOf late, the impact of television coverage on the sport has been profound, and it has provided a
huge impetus to develop interesting forms of statistical representation to the viewers. The
television networks are thus engaged in pioneering the cause of several new innovative ways of
presenting cricket statistics. Some of the most widely used new forms of statistical representation
include:
1.3.1 PIE CHARTThe Pie charts are one of the most widely used methods in representing cricket statistics, and it is
a circular chart which is subdivided into many sectors. The size of each of the sector is
dependent on the proportion of the total quantity it represents. For example, the extras can be
presented as a pie-chart with the different sectors representing the Leg-byes, No Balls, and
Wides etc.
1.3.2 WAGON-WHEELIt displays a 2D or 3D plot of various shots or runs scored by a player/team upon a cricket fields
overhead view.
1.3.3 WORM GRAPHThis is used to represent the runs scored and wickets taken during an innings, plotted against the
time or balls bowled during a match.
1.3.4 MANHATTAN CHARTThis is used to represent the runs scored and wickets in each over during a match. It is a variant
of the bar graph/histogram, and it is named as Manhattan Chart because of its similarity to the
Manhattan skyline.
With the help of various tools like the ones mentioned above, the purpose is to make the viewer
understand clearly the impact of statistics on the game of cricket. Thereafter, many methods are
devised by the cricket pundits to perform analysis of the statistics, and then to use statistical
inferences to arrive at estimations and predictions about the game.
-
7/30/2019 Quantitative Analysis Report
10/28
3
1.4OBJECTIVE OF THE PROJECTThe main objective of this project is to illustrate the application of statistical inferences and
regression analysis in cricket. A case is taken into account such that the situation is an India-
Pakistan cricket match, and to perform a pre-match analysis, all the One Day Internationals
which ended in a result between India and Pakistan so far are taken into account; the results are
represented using a pie-chart and then proportion of results in each teams favor is interpreted.
Since the data represented using the pie chart was taken from matches spread across a long
duration of time, another type of statistic could be considered to perform the analysis. The wins,
losses and other results achieved by Team India under the leadership of MS Dhoni are
considered, and represented using the bar-chart, which could be used to understand the extremely
high win-loss ratio of MS Dhoni, and hence, the head-to-head record advantage of Pakistan
would not have a significant say in the outcome of the game.
The prediction of the outcome of the game is done in two stages:
a) In the pre-match analysis, prediction is done if there would be a difference in the impact
of toss between the day and day-night matches, using 2-population Hypothesis testing.
b) During the innings break, estimation of an achievable target score range for India is done
with a confidence interval of ninety percent.
Then, a regression analysis is carried out to determine if the pricing of the players in the IPL
auction is explained fully by the various parametric statistics of the individual players or whether
the pricing is influenced by other factors as well.
1.5STATISTICAL TOOLS EMPLOYED1.5.1 CHARTS AND GRAPHSA chart is a graphical representation of data, in which the data is represented by symbols, such as
bars in a bar chart, lines in a line chart, or slices in a pie chart. A chart can represent tabular
numeric data, functions or some kinds of qualitative structures. Charts are often used to ease
understanding of large quantities of data and the relationships between parts of the data. Charts
can usually be read more quickly than the raw data that they are produced from.
1.5.2 SINGLE POPULATION ESTIMATIONThe Z statistic can be used in the calculation of prediction intervals. A prediction interval
consisting of a lower endpoint designated and an upper endpoint designated, is an interval such
that a future observation X will lie in the interval with high probability.
1.5.3 HYPOTHESIS TESTING FOR TWO POPULATIONA statistical hypothesis test is a method of making decisions using data, whether from a
controlled experiment or an observation study. In statistics, a result is called statistically
significant if it is unlikely to have occurred by chance alone, according to a pre-determined
threshold probability, the significance level.
-
7/30/2019 Quantitative Analysis Report
11/28
4
1.5.4 SIMPLE LINEAR REGRESSIONIn statistics, simple linear regression is the least squares estimator of a linear regression model
with a single explanatory variable. In other words, simple linear regression fits a straight line
through the set of n points in such a way that makes the sum of squared residuals of the model as
small as possible.
1.5.5 MULTIPLE LINEAR REGRESSIONMultiple linear regressions are when more than one explanatory variable is used to estimate the
least squares.
-
7/30/2019 Quantitative Analysis Report
12/28
5
CHAPTER 2 LITERATURE REVIEWEstenson et al (1994), and Bennett and Flueck (1983) have studied the players compensation
that is being done in the game of baseball. Results of auction have showed that salaries matched
marginal revenue products and that the open auction showed the declining price anomaly found
to exist in real-world auctions. Similarly, Dobson and Goddard (1998) and Kahn (1992)
considered compensations made for players in football.
Jones and Walsh, (1988) made similar studies in ice-hockey and concluded that skills are the
principal determinant of salaries at all positions. Berri, (1999) answers the question of measuring
the productivity of an individual participating in a team sport that links the player's statistics in
the National Basketball Association (NBA) to team wins. An economic model is employed in the
measurement of each player's marginal product. Such a study is useful in answering the question
offered in the title, or a broader list of questions by both industry insiders and other interested
observers.
In cricket, there are a few studies which deal with the game of cricket. Barr and Kantor (2004)
intended to determine the important skill set for a batsman in one-day cricket. The batting
average statistic has been used to assess the worth of a batsman. However, in the one-day game,
limits on the number of balls bowled have introduced a very important additional dimension to
performance. Assessing batting performance in the one-day game requires the application of at
least a two-dimensional measurement approach because of the time dimension imposed on
limited over cricket. They had used a new graphical representation with Strike rate on one axis
and the Probability of getting out on the other, akin to the risk-return framework used in portfolio
analysis, to obtain useful, direct and comparative insights into batting performance, particularly
in the context of the one-day game. However, we have not come across any study that links
compensation to player attributes.
Rosen (1974) based his model of product differentiation on the hypothesis that goods are valued
for their utility generating attributes. According to him, while making a purchase decision,
consumers evaluate product quality attributes, and pay the sum of implicit prices for each quality
attribute, which is reflected in the observed market price. Hence, price of a product is nothing but
the summation of the prices of all quality attributes.
Shapiro (1983) presented a theoretical framework to examine the halo effect on prices.
Developing an equilibrium price-quality schedule for high-quality products, under the
assumption of competitive markets and imperfect information, he showed that reputation
facilitates a price premium; hence, reputation building can be considered as an investment good.
Weemaes and Riethmuller (2001) studied the role of quality attributes on preferences for fruit
juices. The study involved market valuation of various attributes of fruit juice. It did not consider
consumers preferences, but generated quality attributes from the product label. The study
-
7/30/2019 Quantitative Analysis Report
13/28
6
revealed that consumers paid a premium for nutrition, convenience, and information. In a similar
study on tea, Deodhar and Intodia (2004) showed that colour and aroma were the two important
attributes of a prepared tea.
Extending the analogy to cricket, a cricket player is valued for his on-the-field (and perhaps, off-
the-field) performance. We propose that a cricket player sells his cricketing skills for the IPLtournament. The franchisee team owners bid for the players services, for they would like to
maximize their utility and player performance is an important argument of their utility function.
In equilibrium, the final bid price of a player must be a function of the valuation of winning
attributes of a player.
-
7/30/2019 Quantitative Analysis Report
14/28
7
CHAPTER 3 RESEARCH METHODOLOGY3.1WINNING PERCENTAGE USING PIE CHART3.1.1 OBJECTIVETo give a clear representation of the matches ending in a result between India and Pakistan in
ODI matches played so far. We consider the entire matches played so far. We have a sample size
of117 excluding four matches which have ended in no result
3.1.2 POPULATIONTable 3.1 Population Data
Total Matches Won by India Won by Pakistan117 48 69
3.1.3 PIE CHARTFor the above data a pie-chart can be used best to represent the data.
Figure 3.1 Pie Chart for Winning Percentage
3.1.4 INFERENCESThe above pie-chart implies that among the total number of matches Pakistan won more matches
with total win percentage of 59% and India won 41% of the total number of matches.
India
41%Pakistan
59%
Total Matches: 117
-
7/30/2019 Quantitative Analysis Report
15/28
8
3.2CAPTAINCY RECORD CALCULATION USING BAR CHART3.2.1 OBJECTIVEThe objective is to present the best way to represent the captaincy record of MS Dhoni. The
number matches won or lost by India under the captaincy of Captain Mahendra Singh Dhoni is
taken as the population and the graph is made for the same
3.2.2 POPULATIONTable 3.2 India's Winning Record under MS Dhoni
Total Matches Won Lost Tied No result
117 80 32 2 3
For the above data a bar-chart can be used as the best tool to represent the data.
Table 3.3 MS Dhoni's Captaincy Record
3.2.3 INFERENCESFrom the above bar chart we can see that under the captaincy of Mahendra Singh Dhoni India
played a total of 117 matches among which India won 80 matches, lost 32 matches and tied 2 of
them. For 3 of the matches there were no results.
WON LOST TIED NO RESULT
80
32
2 3
-
7/30/2019 Quantitative Analysis Report
16/28
9
3.3ACHIEVABLE SCORE AT THE END OF 50 OVERSIn the game of cricket, the team chasing can win when it exceeds the score scored by the
opponent. For successful chasing of the total we need to have the team batting second score more
than the team batting first. Thus, we intended to find the runs that Indian team could score while
chasing a target against Pakistan.
3.3.1 POPULATION AND SAMPLINGThe data of that particular team while chasing was considered as population
3.3.2 TECHNIQUE EMPLOYEDEstimation of single population mean was applied to get the intended result. We were able to
predict the mean with a confidence level of 90%.
3.4DIFFERENCE IN IMPACT OF TOSS BETWEEN DAY AND DAY-NIGHT MATCHES
We intended to study the impact of toss between day and day- night matches played betweenIndia and Pakistan.
3.4.1 POPULATION AND SAMPLINGWe used the data of matches played between India and Pakistan as the Population. From the
population we applied the technique of random sampling and arrived at a sample size of 38 for
both the populations of day and day-night matches.
3.4.2 TECHNIQUE EMPLOYEDThe hypothesis testing for two populations was applied to study the differences between both the
population means.
3.5VALUATION OF PLAYERS IN IPLNext, our objective was to find the whether the valuation of players in IPL is matching their
skills or are they over or under valued for their skill.
3.5.1 REGRESSIONWe developed a regression model for finding the correlation between a players compensation
against their skills. We choose a sample consisting of 7 batsmen and 7 bowlers and developed
the regression.
-
7/30/2019 Quantitative Analysis Report
17/28
10
CHAPTER 4 STATISTICAL ANALYSIS AND INTERPRETATION4.1ESTIMATION OF SINGLE POPULATION4.1.1 SET NULL AND ALTERNATE HYPOTHESISIn this step, we are trying to predict whether India will be able to successfully chase the
total of 245 runs in 50 overs. According to the data given we are estimating the single
population mean at assumed standard deviation as 53
4.1.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples is greater than 30(64), we take z-test for single sample
population mean. We calculate the estimate value using the formula
4.1.3 LEVEL OF SIGNIFICANCEAlpha = 0.10
4.1.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null
hypothesis will be rejected if the computed value of z is outside +1.645
4.1.5 COLLECTION OF DATASample size (Runs): 64
Standard Deviation: 52.71
Mean of Sample: 243.68
4.1.6 ANALYZE THE DATAZ -0.95
P 0.340
90% of CI (232.78, 254.58)
SE Mean 6.63
4.1.7 STATISTICAL CONCLUSION AND BUSINESS IMPLICATIONWith the 90% confidence we can say that India will chase down the total of 245 in 50
overs because the total score 245 comes in the range of (232.78, 254.58).
-
7/30/2019 Quantitative Analysis Report
18/28
11
Table 4.1Distribution Plot
4.2HYPOTHESIS TESTING FOR TWO POPULATION4.2.1 SET NULL AND ALTERNATE HYPOTHESIS
Null Hypothesis =(1 - 2)=0 (No significant difference in runs scored)
Alternate Hypothesis =(1 - 2)0
4.2.2 DETERMINE APPROPRIATE STATISTICAL TESTAs the number of samples in both cases is greater than 30 and are independent and their
population variance is unknown, we take z-test for two sample population mean.
4.2.3 LEVEL OF SIGNIFICANCEAlpha = 0.10
4.2.4 SET THE DECISION RULEFor value of 0.10, value of Z from the z distribution table is +1.645. The null
hypothesis will be rejected f the computed value of z is outside +1.645
4.2.5 COLLECTION OF DATASample size 1: 38
Sample size 2: 38
Variance of sample 1: 2370.775
Variance of sample 2: 3119.37
Mean of sample 1: 7.539473684
Mean of sample 2: 5.039473684
-
7/30/2019 Quantitative Analysis Report
19/28
12
4.2.6 ANALYZE THE DATAZ 0.207988776
P (Z
-
7/30/2019 Quantitative Analysis Report
20/28
13
The data sources include the official website of IPL and two other websites, Cricinfo and
Wikipedia. For the sake of convenience we have considered only 8 Indian players in each
category i.e. Bowlers and Batsman. While we have considered final bidding price as the
dependent variable, there is a wealth of data available on the cricketing attributes of IPL players
hypothesized above. We have data relating to the individual performances of these 16 players
spanning across all the IPLs taken place till date
1) Batsman: For the multiple regression analysis we have taken 2 important independent
variables which are the prime determinant of the performances of the players in the long
run. The two variables are the Total runs scored and the Batting averages.
2) Bowlers: For the multiple regression analysis of bowlers also we have taken 2 important
independent variables which are the prime determinant of the performances of the players
in the long run. These are the wickets taken and strike rate.
The relevant variables are drawn from observations on skills that are considered important for
Twenty20 form of the game. For example, in this shorter version of the game, no one is likely tomake centuries frequently. However, a player contributing many runs on a continuous basis and
having high batting average would be an asset for the team. While IPL is a Batsmans game, a
wicket taking bowler could put a lot of pressure on the opposition, and hence, he would be
considered quite useful.
To paraphrase the estimated variable coefficients should be having the right signs and are
statistically significant, the equation has a reasonably high (adjusted) R-square and maintains
parsimony, and there are sufficient degrees of freedom. Based on such guidelines, the variables
chosen for estimating equation (1) and their description is reported in Table. It has been taken
into consideration as to which combination offered the best goodness of fit in terms ofR-square,adjusted R-square, correct signs of the coefficients, t-statistics, and F-statistics. The exact
specification of the regression is given below in Equation (2).
P (BATSMAN)= b0 + b1(RUNS)+ b2(AVERAGE)
P (Bowlers) = b0 + b1(wickets) + b2(strike rate)
Table 4.2 Description of Variables
Variable Description
P Final bid price of a player.
Runs Total runs scored over a span of 5 IPL .
Average Average runs scored in the same period.
Wickets Total number of wickets taken by a bowler in 5 IPLs.
Strike Rate Strike rate i.e. balls per wicket.
-
7/30/2019 Quantitative Analysis Report
21/28
14
Table 4.3 Description of statistics of bowlersName of the bowler Wickets Strike rate Amount(in US Dollars)
Harbhajan Singh 54 18.88 1300000Ishant Sharma 36 29.33 450000
Munaf Patel 70 21.22 700000
Pragyan Ojha 69 25.63 500000
Praveen Kumar 53 22.35 800000
R. Ashwin 49 20.66 850000
R.P. Singh 74 19.78 500000
Zaheer Khan 65 19.22 900000
4.4REGRESSION ANALYSIS
4.4.1 AMOUNT VERSUS STRIKE RATEThe regression equation is
Amount = 1899656 - 51941 Strike Rate
Predictor Coef SE-Coef T P
Constant 1899656 529535 3.59 0.012
Strike Rate -51941 23649 -2.20 0.070
S = 226442 R-Sq = 44.6% R-Sq(adj) = 35.3%
4.4.2 ANALYSIS OF VARIANCESource DF SS MS F P
Regression 1 2.47345E+11 2.47345E+11 4.82 0.070
Residual Error 6 3.07655E+11 51275908679
Total 7 5.55000E+11
Durbin-Watson statistic = 1.42023
dl=0.76 du=1.33 4-du=2.67 4-dl=3.24
Hence there is is no autocorrelation.
4.4.3 AMOUNT VERSUS WICKETS, STRIKE RATEThe regression equation is
Amount = 3190691 - 13138 Wickets - 75398 Strike Rate
-
7/30/2019 Quantitative Analysis Report
22/28
15
Predictor Coef SE Coef T P
Constant 3190691 716164 4.46 0.007
Wickets -13138 5955 -2.21 0.078
Strike Rate -75398 21286 -3.54 0.017
S = 176571 R-Sq = 71.9% R-Sq(adj) = 60.7%
4.4.4 ANALYSIS OF VARIANCESource DF SS MS F P
Regression 2 3.99114E+11 1.99557E+11 6.40 0.042
Residual Error 5 1.55886E+11 31177190382
Total 7 5.55000E+11
Source DF SeqSS
Wickets 1 7940674349
Strike Rate 1 3.91173E+11
Durbin-Watson statistic = 1.39505
Hence there is no autocorrelation.
4000002000000-200000-400000
99
90
50
10
1
Residual
Percent
1000000800000600000400000
200000
100000
0
-100000
-200000
Fitted Value
Residual
2000001000000-100000-200000
3
2
1
0
Residual
Frequency
87654321
200000
100000
0
-100000
-200000
Observation Order
Residual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Amount
Figure 4.1Residual Plots for Bowlers
-
7/30/2019 Quantitative Analysis Report
23/28
16
4.5DESCRIPTION OF STATISTICS OF BATSMAN4.5.1 AMOUNT (IN US DOLLARS) VERSUS RUNS
Table 4.4 Batsman Statistics
Name of the Batsman Runs Average Amount(in US Dollars)
M.S.Dhoni 1782 37.12 1300000
S.K.Raina 2254 33.64 1800000
V.Sehwag 1879 30.3 1800000
SR Tendulkar 2047 37.9 2400000
V.Kohli 1639 28.25 500000
R.G.Sharma 1975 31.35 2000000
G.Gambhir 2065 33.31 1800000
R.Dravid 1703 27.91 1800000
The regression equation is
Amount in US Dollars) = - 1663273 + 1740 Runs
Predictor Coef SE Coef T P
Constant -1663273 1649219 -1.01 0.352
Runs 1740.5 855.5 2.03 0.088
S = 467406 R-Sq = 40.8% R-Sq(adj) = 31.0%
4.5.2 ANALYSIS OF VARIANCESource DF SS MS F PRegression 1 9.04188E+11 9.04188E+11 4.14 0.088
Residual Error 6 1.31081E+12 2.18469E+11
Total 7 2.21500E+12
Durbin-Watson statistic = 2.59499
Thus there is no autocorrelation
-
7/30/2019 Quantitative Analysis Report
24/28
17
4.5.3 AMOUNT (IN US DOLLARS) VERSUS RUNS, AVERAGEThe regression equation is
Amount (in US Dollars) = - 1946333 + 1562 Runs + 19235 Average
Predictor Coef SE Coef T P
Constant -1946333 1992016 -0.98 0.373
Runs 1562 1080 1.45 0.207
Average 19235 59656 0.32 0.760
S = 506777 R-Sq = 42.0% R-Sq(adj) = 18.8%
4.5.4 ANALYSIS OF VARIANCESource DF SS MS F P
Regression 2 9.30888E+11 4.65444E+11 1.81 0.256
Residual Error 5 1.28411E+12 2.56822E+11
Total 7 2.21500E+12
Source DF Seq SS
Runs 1 9.04188E+11
Average 1 26699560534
Durbin-Watson statistic = 2.39651
Thus there is no autocorrelation.
10000005000000-500000-1000000
99
90
50
10
1
Residual
Percent
200000015000001000000
500000
250000
0
-250000
-500000
Fitted Value
Residual
5000002500000-250000-500000-750000
2.0
1.5
1.0
0.5
0.0
Residual
Frequency
87654321
500000
250000
0
-250000
-500000
Observation Order
Residua
l
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Amount(in US Dollars)
Figure 4.2Residual Plots for Amount
-
7/30/2019 Quantitative Analysis Report
25/28
18
CHAPTER 5 DISCUSSIONS5.1BOWLERSWe can clearly see that coefficient of determination is very low i.e.44.6% for strike rate as an
individual factor i.e. Simple Linear regression. This indicates the low level of correlation
between strike rate of a bowler and the amount paid to him in IPL. In other words only around
44.6% of the change in amount is determined or explained by strike rate of the bowler. The rest
of the change is unexplained. Similarly coefficient of determination in multiple regression model
has been determined as 71.9% which too is low. Hence it can be safely concluded that the
performance factors are not at the helm for determination of the bid price of the bowlers which is
rather determined by various other factors which have been discussed later on in the below
mentioned analysis of the regression output.The corresponding p-value has been obtained as
0.042 which lies in the rejection region.
H0: Key performance indicators (wickets and strike rate) are the key determinant of the
amount paid to bowlers in IPL.
H1: Other factors act as the key determinants of the amount paid to the bowlers.
Since the null hypothesis has been rejected and the alternative hypothesis has been selected the
key conclusion that can be derived from the above exercise is that there are a variety of other
reasons responsible for the insuperably high amount of money paid to bowlers.
5.2BATSMENIt is clearly evident that the coefficient of determination for runs is quite low at 40.8% which
indicates runs do not play a major role in the fixing of the disbursements of the cricketers.This
indicates the low level of correlation between runs scored by a batsman and the amount paid to
him in IPL.It implies that only 40.8% of the change in amount is determined or explained by
runs scored by the batsman in the T-20 format. The rest of the change is unexplained.Similarly
coefficient of determination in multiple regression model has been determined as 42% which too
is really low. Hence it can be safely concluded that the performance factors are not the key
factors to be considered as majority of the part is dependent upon various other factors. The
corresponding p-value has been obtained as 0.256 which lies in the rejection region.
H0: Key performance indicators (runs and average) are the key determinant of the amount paid
to batsman in IPL.
H1: Other factors act as the key determinants of the amount paid to the batsman.
-
7/30/2019 Quantitative Analysis Report
26/28
19
Thus null hypothesis has been rejected driving home the point that there are various other factors
in operation which may be responsible for the amount of money being so high.
These high premiums, over and above thecompensation for their cricketing attributes, seem to
bea reflection of their ability to draw huge crowds nationallydue to their charismatic association
with film stars, the racial controversies surrounding them etc.
5.2.1 REASONS FOR NON-EXPLANATIONSome of the reasons which may account for non-explanation of the relation might be as follows:
1. Iconic Value.
2. Glamour.
3. Controversy.
4. Age.
5. Popularity.
-
7/30/2019 Quantitative Analysis Report
27/28
20
CHAPTER 6 CONCLUSION6.1LIMITATIONSThe limitations of our study are:
1. The usage of the pie chart and the bar chart to represent the statistics for earlier India-Pakistan matches was appropriate, but when predictions are made with the help of those
representative forms with respect to the current match, it is not exactly possible because
of the inherent unpredictability in the game of cricket.
2. While estimating the achievable target score with a ninety percent confidence interval
range, we take into account only the matches played already between the two teams,without considering other factors like the difference in the set of players between those
games and the current match, the form in which the individual players are currently in,
the nature of the pitch, weather conditions etc. This might result in incorrect range
estimation.
3. In the determination of the difference in the impact of toss, we calculate the net run-ratedifference between the teams batting first and second, and arrive at two populations, oneeach for the Day and Day-night matches. But in this case, the net run rate difference is
calculated across the maximum overs for all the matches, and the event of teams chasing
down targets easily without losing wickets is not explained through our population.
4. During the process of developing a regression model for determining the pricing of an
IPL player based on his statistical attributes, there are many intangible attributes of anindividual player. For example, a players brand value, image, relevance to the franchise
is all taken into account while determining his price. But, these aspects are completely
ignored in our study while determining the regression model. This probably explains the
low correlation between the independent variables and the pricing of the player.
6.2FUTURE SCOPESingle population estimation could be used to estimate the likely scores of people with
confidence based on their previous performances. This could help the teams in formulating the
strategies against he opponent.
Regression model could be applied in to fix a players compensation based on his skill set. This
could help the team franchise to fix a ceiling price on each player before going in for auction.This could help them spend the money accordingly and thus could achieve maximum return on
money.
-
7/30/2019 Quantitative Analysis Report
28/28
REFERENCES
Armstrong, J and Willis, R J (1993). Scheduling the Cricket World Cup: A Case Study, The
Journal of the Operational Research Society, 44(11), 1067-1072.
Barr, G D I and Kantor, B S (2004). A Criterion for Comparing and Selecting Batsmen in
Limited Overs Cricket, Journal of the Operational Research Society, 55(12), 1266-
1274.
Bennett, J M and Flueck, J A (1983). An Evaluation of Major League Baseball Offensive
Performance Models, The American Statistician, 37(1), 76-82.
Berri, D J (1999). Who Is Most Valuable? Measuring the Players Production of Wins in the
National Basketball Association,Managerial and Decision Economics, 20(8), 411-427.
Cricinfo. http://www.cricinfo.com/, as on September 13, 2012.
Estenson, P S (1994). Salary Determination in Major League Baseball: A Classroom Exercise,
Managerial and Decision Economics, 15(5), 537-541.
Jones, J C H and Walsh, W D (1988). Salary Determination in the National Hockey League:
The Effects of Skills, Franchise Characteristics, and Discrimination, Industrial and
Labor Relations Review, 41(4), 592-604.
Rastogi, S. K. (APRIL - JUNE 2009). "Player Pricing and Valuation of Cricketing
Attributes:Exploring the IPL Twenty20 Vision". Vikalpa, Volume 34, 15-23.