Vivek Sports

download Vivek Sports

of 13

Transcript of Vivek Sports

  • 8/13/2019 Vivek Sports

    1/13

    Fundamentals of Data

    Analytics AssignmentSubmitted To Assistant Professor J.Balaji

    By Vivek Narayanan,

    PGDM Number: 13061

    1/9/13 Statistics In Sports

  • 8/13/2019 Vivek Sports

    2/13

    1 | P a g e

    Chapter 1Introduction to Descriptive Statistics

    Scales of Measurement

    The 4 generally used scales of measurement are nominal, ordinal, ratio and interval.

    The nominal datameasurement scale is used for data that is expressed with thepurpose of identifying some kind of attribute. It can be expressed using either a

    numeric code or some kind of nonnumeric label.

    The ordinal datameasurement scale is used when you want to classify informationbased on a specific order or rank that is necessary.

    The interval datameasurement scale is used for numeric data that is expressed inintervals of some kind of fixed measurement.

    Finally, the ratio datameasurement scale is used to express the ratio of some ofthe values of interval data.

    Let us take an example related to sports and explain the above scales.

    The Below table Lists the Football Statistics of 4 teams in World Cup History.

    Team Name Ranking Number of world cups won Number of Goals Scored in World Cup

    Italy 6 4 70

    Germany 2 3 65

    Spain 1 1 63

    England 14 1 55

    Here The Team names-Italy, Spain, Germany and England Depict the nominalscale. The ranking Along with these names indicate the ordinalscale. If we take the ratio of the number of goals a team has scored compared to another

    team that indicates the ratioscale.

    The Number of goals the Teams have scored between the intervals 0-20mins,20-40mins, 40-60mins and 60-90mins can be classified on basis of the interval scale.

  • 8/13/2019 Vivek Sports

    3/13

    2 | P a g e

    Percentiles and Quartiles

    A percentileis a certain percentage of a set of data. Percentiles are used to observe howmany of a given set of data fall within a certain percentage range.

    Let designate a percentile as Pmwhere mrepresents the percentile we're finding, forexample for the tenth percentile, m} would be 10. Given that the total number of

    elements in the data set is N

    The term quartile is derived from the word quarter which means one fourth ofsomething. Thus a quartile is a certain fourth of a data set. When you arrange a date set

    increasing order from the lowest to the highest, then you divide this data into groups of

    four, you end up with quartiles.

    Below data represents the highest earnings of footballers in 2013

    Name of the Player Lionel Messi C. Ronaldo Samuel Etoo Naymar Wayne

    Rooney

    Amount in Million

    Pounds

    30 25.7 20.5 17.1 15.4

    Let us find the 40thpercentile and the 3 quartiles of the worlds top 5 earnings of

    footballers.

    40thpercentile will be 19.14 million pounds First Quartile =17.1 million pounds Second Quartile=20.5 million pounds Third Quartile=25.7 million pounds

    Measures of Central Tendency, Variability, Skewness and Kurtosis

    Measures of central tendency include mean, median and mode.

    We can use the same table above and determine the mean, median and mode.

    The Meanfor the given dataset is 21.74 million pounds

    The Medianis 20.5 million pounds Modefor the data is 30 million pounds The Standard Deviationof data is 6.065723 Skewnessof the data set is 0.502247 which means the earnings of the players

    displayed is reducing gradually based on their popularity.

    TheKurtosisis -1.55252

  • 8/13/2019 Vivek Sports

    4/13

    3 | P a g e

    Histogram and Frequency Polygon

    The Number of pins down in a game of bowling is given in the below table:

    Pins Down Frequency

    0 2

    1 1

    2 2

    3 0

    4 2

    5 4

    6 9

    7 11

    8 13

    9 8

    10 8

    The Histogram for the above data is displayed below:

    The Frequency Polygonfor the same data can be represented as follows:

    0

    2

    4

    6

    8

    10

    12

    14

    12

    Histogram

    0

    2

    4

    6

    8

    10

    12

    14

  • 8/13/2019 Vivek Sports

    5/13

    4 | P a g e

    Methods of displaying data

    Data can be displayed in the form of pie-charts, bar char ts, frequency polygonand

    ogive.

    Below data represents the highest earnings of footballers in 2013

    Name of the Player Lionel Messi C. Ronaldo Samuel Etoo Naymar Wayne Rooney

    Amount in Million

    Pounds

    30 25.7 20.5 17.1 15.4

    The following data shows the worlds most popular sports in Percentage.Sport Popularity Percentage

    Football 51 %

    Cricket 28%

    Tennis 15%

    Others 6%

    0

    5

    10

    15

    20

    25

    30

    Messi Ronaldo Eto'o Naymar Rooney

    Bar Chart Representing Player's and their Earnings

    Salary in Million Pounds

  • 8/13/2019 Vivek Sports

    6/13

    5 | P a g e

    The above data can be represented in the form of a pie chart.

    Sachin Tendulkars scores in the last few matches can be seen in the below table

    Interval Of Runs Frequency Cumulative Frequency

    10< n < 20 5 5

    20< n < 30 7 12

    30

  • 8/13/2019 Vivek Sports

    7/13

    6 | P a g e

    Exploratory Data Analysis Stem and leaf displays

    Exploratory data analysis (EDA) is an approach to analysing data sets to summarize their

    main characteristics, often with visual methods. Here we show you the Stem and leaf displays

    and the box plot.

    Example: Following are the cricket scores scored by a player in a season.

    23 53 4 24 55 73 34 64

    45 30 75 121 116 56 78 39

    We can represent the above data in the stem and leaf form as shown below:

    Outlier 0/4

    Stem Leaf

    2 3 4

    3 0 4 8

    4 5

    5 3 5 6

    6 4

    7 3 5 9

    Outliers 11/6 12/1

    Another method of representing data is to summarize the data in a Box and Whisker Plot or

    Box Plot. This method uses the smallest value, the largest value, the median and the upper

    and lower quartile values. This is often referred to as a five point summary

    The scores of a batsman are given below:

    11 12 12 13 15 15 15 16 17 20 21 21

    21 22 22 22 23 24 26 27 27 27 28 29

    29 30 31 32 34 35 37 41 41 42 45 47

    50 52 53 56 60 62

    The Box Plot can be represented as shown below:

    Lower

    Whisker

    Lower

    Hinge Median

    Upper

    Hinge

    Upper

    Whisker

    11 21 27 40 62

  • 8/13/2019 Vivek Sports

    8/13

    7 | P a g e

    Chapter 2Probability

    Probability of Events

    In probability theory, an event is a set of outcomes of an experiment (a subset of the sample

    space) to which a probability is assigned. A single outcome may be an element of many

    different events, and different events in an experiment are usually not equally likely, since

    they may include very different groups of outcomes.

    Example: In a class of 36 learners in a boys school, 20 play cricket, 26 play rugby and 4 do

    not play cricket or rugby.

    If a learner is chosen at random, calculate the probability that he:

    1. Plays rugby and cricket

    2. Plays cricket only3. Does not play cricket or rugby

    4. plays cricket or rugby

    5. Does not play rugby

    Answer: n(S) = 36

    Event C = plays cricket

    Event R = plays rugby

    These events are not mutually exclusive.

    P(R and C) = n(R and C)/n(S)

    Hence probability he play rugby and cricket= 14/36 or 7/18

    2. P (cricket only) = n (cricket only)/n(S)

    = 6/36

    Probability he play cricket only = 1/6

    3. Probability that he does not play cricket or rugby= 4/36 or 1/9

    4. P(C U R) = P(C) + P(R)P(C R)

    = 20/36 + 26/36 -14/36

    = 32/36Probability that he play cricket or rugby= 8/9

    5. P(R') = 1P(R)

    = 1 -26/36

    = 10/36

    Probability that he does not play rugby=5/18

  • 8/13/2019 Vivek Sports

    9/13

    8 | P a g e

    Mutually Exclusive Events

    Two events are 'mutually exclusive' if they cannot occur at the same time. The probability of

    mutually exclusive events is denoted by P (AUB) = P (A) + P (B)

    Example - In a class there are 50 students, twenty students like playing cricket and ten

    students like playing football. Find the probability a randomly selected student likes playing

    cricket or football?

    Answer: P(C U F) = P(C) + P (F)

    =20/50 + 10/50

    =3/5

    Hence probabil i ty a randomly selected student li kes playing cri cket or football is 3/5 or 60%

    Conditional Probability

    In probability theory, a conditional probability is the probability that an event will occur,

    when another event is known to occur or to have occurred. If the events are A and B

    respectively, this is said to be "the probability of A given B".

    Example - At a middle school, 18% of all students play football and basketball and 32% of all

    students play football. What is the probability that a student plays basketball given that the

    student plays football?

    Answer: P (Football and Basketball) = 18%

    P (Football) = 32%

    P (Basketball | Football ) =P (F and B)/P (B) = 18/32 = 56 %

    Independence of Events

    In probability theory, to say that two events are independent (alternatively statistically

    independent, marginally independent or absolutely independent) means that the occurrence of

    one does not affect the probability of the other.

    Example-Russell is playing in a cricket match and a game of football at the weekend.

    The probability that his team will win the cricket match is 0.7, and the probability of winningis 0.9 in the football. What is the probability that his team will win in both matches?

    Answer: Using Multiplication Law we get:

    P (win both matches) = P (win cricket AND win football) = P (win cricket) P (win football)

    = 0.7 0.9 = 0.63

    Hence, Probabil ity that his team wil l win in both matches=0.63

  • 8/13/2019 Vivek Sports

    10/13

    9 | P a g e

    Bayes Theorem

    In probability theory and statistics, Bayes' theorem (alternatively Bayes' law or Bayes' rule) is

    a result that is of importance in the mathematical manipulation of conditional probabilities. It

    is a result that derives from the more basic axioms of probability.

    Example - Cricket DRS system is claimed to be around 95% accurate in giving a batsman

    out, if in fact, the batsman is really out. Suppose the DRS also yields F+( False Positive )

    results for just 1% of the bowler reviews, i.e. it gives a batsman 'out' when he is really 'not

    out' just like the umpire originally said. If 10% of the batsmen subject to bowler reviews are

    actually out (as obtained in the previous paragraph), what is the probability that a batsman isactually out given that the DRS overturnsthe umpire's decision to say he is out?

    Solution: Let OUT be the event that the batsman reviewed is actually out (its complementary

    event is NOTOUT), and RED the event that DRS gave him out. The desired probability

    P(OUT|RED) is obtained using the Bayes formula by:

    P(OUT|RED) = P(OUTRED)/P(RED)

    Expanding the terms, we can write this as

    = [P(RED|OUT) x P(OUT)] /

    [P(RED|OUT) x P(OUT) + P(RED|NOTOUT) x P(NOTOUT)]

    = [0.95 * 0.1] / [0.95*0.1 + 0.01 * 0.9]

    = 0.095/0.104 = 91%

  • 8/13/2019 Vivek Sports

    11/13

    10 | P a g e

    Chapter 3Random Variables

    In probability and statistics, a random variable or stochastic variable is a variable whose

    value is subject to variations due to chance (i.e. randomness, in a mathematical sense).As

    opposed to other mathematical variables, a random variable conceptually does not have asingle, fixed value (even if unknown); rather, it can take on a set of possible different values,

    each with an associated probability.

    Binomial Distribution

    The Binomial probability formula is given by nCrprqn-rwhere p represents the probability of

    success and q represents the probability of failure.

    Example - Probability that a batsman scores a century in a cricket match is 1/3. Find theprobability that out of 4 matches, he may score century

    (1) in exactly 3 matches

    (2) in one of the matches

    Solution: Here "success" is denoted by "scoring century"

    Given probability that a batsman scores a century in a cricket match is 1/3. That is p = 1/3.

    "Failure" is denoted by "not scoring century". We know that

    q = 1 - p = 1 - 1/3 = 2/3.

    Total number if matches n = 4.

    Binomial probability formula is given by nCrprqn-r

    (1) We have to find the probability that he scores century in exactly three matches.

    That is r = 3.

    P (scoring century in exactly 3 matches)

    = 5C3(1/3)3(2/3)5-3

    = 5C3(1/3)3(2/3)2

    = 10 * (1/27)*(4/9)

    = 40/243

    P(scoring century in exactly 3 matches) = 40/243

    (2) We have to find the probability that he scores century in one of the matches.

    That is r = 1

    P (scoring century in one of the matches)

    = 5C1(1/3)1(2/3)5-1

    =5C1(1/3)1(2/3)4

    = 5 * 1/3 *(16/81)

    = 80/243

    P(scoring century in one of the matches) = 80/243

  • 8/13/2019 Vivek Sports

    12/13

  • 8/13/2019 Vivek Sports

    13/13

    12 | P a g e

    Chapter 4Normal Distribution

    Transformation of Normal Random Variable

    In probability theory, the normal (or Gaussian) distribution is a very commonly occurring

    continuous probability distributiona function that tells the probability of a number in some

    context falling between any two real numbers.

    Example: The number of goals Manchester United score in Barclays Premier League season

    is assumed to be distributed with a mean of 100 and standard deviation 15.Manchester United

    need 115 goals to create the record for highest goals in a single season.

    Probability that Manchester United will score less than 115 goals is

    P (X115) = 0.1587

    Probability that Manchester United will score between 70 to 120 goals is

    P (70 < X < 120) = 0.8860

    ***************************************************************************