Session - 1 Basics Of Computers VIVEK KUMAR SINGH [email protected].
Vivek Sports
-
Upload
vivek-narayanan -
Category
Documents
-
view
227 -
download
0
Transcript of Vivek Sports
-
8/13/2019 Vivek Sports
1/13
Fundamentals of Data
Analytics AssignmentSubmitted To Assistant Professor J.Balaji
By Vivek Narayanan,
PGDM Number: 13061
1/9/13 Statistics In Sports
-
8/13/2019 Vivek Sports
2/13
1 | P a g e
Chapter 1Introduction to Descriptive Statistics
Scales of Measurement
The 4 generally used scales of measurement are nominal, ordinal, ratio and interval.
The nominal datameasurement scale is used for data that is expressed with thepurpose of identifying some kind of attribute. It can be expressed using either a
numeric code or some kind of nonnumeric label.
The ordinal datameasurement scale is used when you want to classify informationbased on a specific order or rank that is necessary.
The interval datameasurement scale is used for numeric data that is expressed inintervals of some kind of fixed measurement.
Finally, the ratio datameasurement scale is used to express the ratio of some ofthe values of interval data.
Let us take an example related to sports and explain the above scales.
The Below table Lists the Football Statistics of 4 teams in World Cup History.
Team Name Ranking Number of world cups won Number of Goals Scored in World Cup
Italy 6 4 70
Germany 2 3 65
Spain 1 1 63
England 14 1 55
Here The Team names-Italy, Spain, Germany and England Depict the nominalscale. The ranking Along with these names indicate the ordinalscale. If we take the ratio of the number of goals a team has scored compared to another
team that indicates the ratioscale.
The Number of goals the Teams have scored between the intervals 0-20mins,20-40mins, 40-60mins and 60-90mins can be classified on basis of the interval scale.
-
8/13/2019 Vivek Sports
3/13
2 | P a g e
Percentiles and Quartiles
A percentileis a certain percentage of a set of data. Percentiles are used to observe howmany of a given set of data fall within a certain percentage range.
Let designate a percentile as Pmwhere mrepresents the percentile we're finding, forexample for the tenth percentile, m} would be 10. Given that the total number of
elements in the data set is N
The term quartile is derived from the word quarter which means one fourth ofsomething. Thus a quartile is a certain fourth of a data set. When you arrange a date set
increasing order from the lowest to the highest, then you divide this data into groups of
four, you end up with quartiles.
Below data represents the highest earnings of footballers in 2013
Name of the Player Lionel Messi C. Ronaldo Samuel Etoo Naymar Wayne
Rooney
Amount in Million
Pounds
30 25.7 20.5 17.1 15.4
Let us find the 40thpercentile and the 3 quartiles of the worlds top 5 earnings of
footballers.
40thpercentile will be 19.14 million pounds First Quartile =17.1 million pounds Second Quartile=20.5 million pounds Third Quartile=25.7 million pounds
Measures of Central Tendency, Variability, Skewness and Kurtosis
Measures of central tendency include mean, median and mode.
We can use the same table above and determine the mean, median and mode.
The Meanfor the given dataset is 21.74 million pounds
The Medianis 20.5 million pounds Modefor the data is 30 million pounds The Standard Deviationof data is 6.065723 Skewnessof the data set is 0.502247 which means the earnings of the players
displayed is reducing gradually based on their popularity.
TheKurtosisis -1.55252
-
8/13/2019 Vivek Sports
4/13
3 | P a g e
Histogram and Frequency Polygon
The Number of pins down in a game of bowling is given in the below table:
Pins Down Frequency
0 2
1 1
2 2
3 0
4 2
5 4
6 9
7 11
8 13
9 8
10 8
The Histogram for the above data is displayed below:
The Frequency Polygonfor the same data can be represented as follows:
0
2
4
6
8
10
12
14
12
Histogram
0
2
4
6
8
10
12
14
-
8/13/2019 Vivek Sports
5/13
4 | P a g e
Methods of displaying data
Data can be displayed in the form of pie-charts, bar char ts, frequency polygonand
ogive.
Below data represents the highest earnings of footballers in 2013
Name of the Player Lionel Messi C. Ronaldo Samuel Etoo Naymar Wayne Rooney
Amount in Million
Pounds
30 25.7 20.5 17.1 15.4
The following data shows the worlds most popular sports in Percentage.Sport Popularity Percentage
Football 51 %
Cricket 28%
Tennis 15%
Others 6%
0
5
10
15
20
25
30
Messi Ronaldo Eto'o Naymar Rooney
Bar Chart Representing Player's and their Earnings
Salary in Million Pounds
-
8/13/2019 Vivek Sports
6/13
5 | P a g e
The above data can be represented in the form of a pie chart.
Sachin Tendulkars scores in the last few matches can be seen in the below table
Interval Of Runs Frequency Cumulative Frequency
10< n < 20 5 5
20< n < 30 7 12
30
-
8/13/2019 Vivek Sports
7/13
6 | P a g e
Exploratory Data Analysis Stem and leaf displays
Exploratory data analysis (EDA) is an approach to analysing data sets to summarize their
main characteristics, often with visual methods. Here we show you the Stem and leaf displays
and the box plot.
Example: Following are the cricket scores scored by a player in a season.
23 53 4 24 55 73 34 64
45 30 75 121 116 56 78 39
We can represent the above data in the stem and leaf form as shown below:
Outlier 0/4
Stem Leaf
2 3 4
3 0 4 8
4 5
5 3 5 6
6 4
7 3 5 9
Outliers 11/6 12/1
Another method of representing data is to summarize the data in a Box and Whisker Plot or
Box Plot. This method uses the smallest value, the largest value, the median and the upper
and lower quartile values. This is often referred to as a five point summary
The scores of a batsman are given below:
11 12 12 13 15 15 15 16 17 20 21 21
21 22 22 22 23 24 26 27 27 27 28 29
29 30 31 32 34 35 37 41 41 42 45 47
50 52 53 56 60 62
The Box Plot can be represented as shown below:
Lower
Whisker
Lower
Hinge Median
Upper
Hinge
Upper
Whisker
11 21 27 40 62
-
8/13/2019 Vivek Sports
8/13
7 | P a g e
Chapter 2Probability
Probability of Events
In probability theory, an event is a set of outcomes of an experiment (a subset of the sample
space) to which a probability is assigned. A single outcome may be an element of many
different events, and different events in an experiment are usually not equally likely, since
they may include very different groups of outcomes.
Example: In a class of 36 learners in a boys school, 20 play cricket, 26 play rugby and 4 do
not play cricket or rugby.
If a learner is chosen at random, calculate the probability that he:
1. Plays rugby and cricket
2. Plays cricket only3. Does not play cricket or rugby
4. plays cricket or rugby
5. Does not play rugby
Answer: n(S) = 36
Event C = plays cricket
Event R = plays rugby
These events are not mutually exclusive.
P(R and C) = n(R and C)/n(S)
Hence probability he play rugby and cricket= 14/36 or 7/18
2. P (cricket only) = n (cricket only)/n(S)
= 6/36
Probability he play cricket only = 1/6
3. Probability that he does not play cricket or rugby= 4/36 or 1/9
4. P(C U R) = P(C) + P(R)P(C R)
= 20/36 + 26/36 -14/36
= 32/36Probability that he play cricket or rugby= 8/9
5. P(R') = 1P(R)
= 1 -26/36
= 10/36
Probability that he does not play rugby=5/18
-
8/13/2019 Vivek Sports
9/13
8 | P a g e
Mutually Exclusive Events
Two events are 'mutually exclusive' if they cannot occur at the same time. The probability of
mutually exclusive events is denoted by P (AUB) = P (A) + P (B)
Example - In a class there are 50 students, twenty students like playing cricket and ten
students like playing football. Find the probability a randomly selected student likes playing
cricket or football?
Answer: P(C U F) = P(C) + P (F)
=20/50 + 10/50
=3/5
Hence probabil i ty a randomly selected student li kes playing cri cket or football is 3/5 or 60%
Conditional Probability
In probability theory, a conditional probability is the probability that an event will occur,
when another event is known to occur or to have occurred. If the events are A and B
respectively, this is said to be "the probability of A given B".
Example - At a middle school, 18% of all students play football and basketball and 32% of all
students play football. What is the probability that a student plays basketball given that the
student plays football?
Answer: P (Football and Basketball) = 18%
P (Football) = 32%
P (Basketball | Football ) =P (F and B)/P (B) = 18/32 = 56 %
Independence of Events
In probability theory, to say that two events are independent (alternatively statistically
independent, marginally independent or absolutely independent) means that the occurrence of
one does not affect the probability of the other.
Example-Russell is playing in a cricket match and a game of football at the weekend.
The probability that his team will win the cricket match is 0.7, and the probability of winningis 0.9 in the football. What is the probability that his team will win in both matches?
Answer: Using Multiplication Law we get:
P (win both matches) = P (win cricket AND win football) = P (win cricket) P (win football)
= 0.7 0.9 = 0.63
Hence, Probabil ity that his team wil l win in both matches=0.63
-
8/13/2019 Vivek Sports
10/13
9 | P a g e
Bayes Theorem
In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) is
a result that is of importance in the mathematical manipulation of conditional probabilities. It
is a result that derives from the more basic axioms of probability.
Example - Cricket DRS system is claimed to be around 95% accurate in giving a batsman
out, if in fact, the batsman is really out. Suppose the DRS also yields F+( False Positive )
results for just 1% of the bowler reviews, i.e. it gives a batsman 'out' when he is really 'not
out' just like the umpire originally said. If 10% of the batsmen subject to bowler reviews are
actually out (as obtained in the previous paragraph), what is the probability that a batsman isactually out given that the DRS overturnsthe umpire's decision to say he is out?
Solution: Let OUT be the event that the batsman reviewed is actually out (its complementary
event is NOTOUT), and RED the event that DRS gave him out. The desired probability
P(OUT|RED) is obtained using the Bayes formula by:
P(OUT|RED) = P(OUTRED)/P(RED)
Expanding the terms, we can write this as
= [P(RED|OUT) x P(OUT)] /
[P(RED|OUT) x P(OUT) + P(RED|NOTOUT) x P(NOTOUT)]
= [0.95 * 0.1] / [0.95*0.1 + 0.01 * 0.9]
= 0.095/0.104 = 91%
-
8/13/2019 Vivek Sports
11/13
10 | P a g e
Chapter 3Random Variables
In probability and statistics, a random variable or stochastic variable is a variable whose
value is subject to variations due to chance (i.e. randomness, in a mathematical sense).As
opposed to other mathematical variables, a random variable conceptually does not have asingle, fixed value (even if unknown); rather, it can take on a set of possible different values,
each with an associated probability.
Binomial Distribution
The Binomial probability formula is given by nCrprqn-rwhere p represents the probability of
success and q represents the probability of failure.
Example - Probability that a batsman scores a century in a cricket match is 1/3. Find theprobability that out of 4 matches, he may score century
(1) in exactly 3 matches
(2) in one of the matches
Solution: Here "success" is denoted by "scoring century"
Given probability that a batsman scores a century in a cricket match is 1/3. That is p = 1/3.
"Failure" is denoted by "not scoring century". We know that
q = 1 - p = 1 - 1/3 = 2/3.
Total number if matches n = 4.
Binomial probability formula is given by nCrprqn-r
(1) We have to find the probability that he scores century in exactly three matches.
That is r = 3.
P (scoring century in exactly 3 matches)
= 5C3(1/3)3(2/3)5-3
= 5C3(1/3)3(2/3)2
= 10 * (1/27)*(4/9)
= 40/243
P(scoring century in exactly 3 matches) = 40/243
(2) We have to find the probability that he scores century in one of the matches.
That is r = 1
P (scoring century in one of the matches)
= 5C1(1/3)1(2/3)5-1
=5C1(1/3)1(2/3)4
= 5 * 1/3 *(16/81)
= 80/243
P(scoring century in one of the matches) = 80/243
-
8/13/2019 Vivek Sports
12/13
-
8/13/2019 Vivek Sports
13/13
12 | P a g e
Chapter 4Normal Distribution
Transformation of Normal Random Variable
In probability theory, the normal (or Gaussian) distribution is a very commonly occurring
continuous probability distributiona function that tells the probability of a number in some
context falling between any two real numbers.
Example: The number of goals Manchester United score in Barclays Premier League season
is assumed to be distributed with a mean of 100 and standard deviation 15.Manchester United
need 115 goals to create the record for highest goals in a single season.
Probability that Manchester United will score less than 115 goals is
P (X115) = 0.1587
Probability that Manchester United will score between 70 to 120 goals is
P (70 < X < 120) = 0.8860
***************************************************************************