Statistics crash course

download Statistics crash course

of 22

Transcript of Statistics crash course

  • 7/30/2019 Statistics crash course

    1/22

    Probability and statistics crash course

    http://www.comp.leeds.ac.uk/hannah/mathsclub

    Probability 1 (for dummies:-)

    Stats 1 (averages and deviations)

    Probability 2 (Trials and distributions)

    Stats 2 (significance)

    Stats 3 (errors)

    . 1/

  • 7/30/2019 Statistics crash course

    2/22

    Preliminaries

    So what is statistics?

    Applied branch of mathematics

    Concerning data and its representation

    Descriptive Statistics (today) are concerned withrepresenting and summarising data

    Analytical Statistics (in a few weeks) are concerneddrawing conclusions from data

    ... probability theory enables us to find the consequencesof a given ideal world, while statistical theory enables us toto measure the extent to which our world is idealSkiena, 2001.

    . 2/

  • 7/30/2019 Statistics crash course

    3/22

    Descriptive statistics: Why?

    Summarising data.

    32 7 16 33

    33 10 13 35

    22 11 15 34

    21 13 17 32

    23 16 15 24

    Max, Min, Mean(s), Median, Mode, Variance, StandardDeviation, Interquartile range, ...

    All ways of presenting numerical data in such a way that welearn something of its spread and tendency and deviation.

    . 3/

  • 7/30/2019 Statistics crash course

    4/22

    What is an average?

    Average originally meant Financial loss incurred throughdamage to goods in transit, from the Italian avaria, a wordfrom 12c. Mediterranean maritime trade. Sometimes traced

    to Arabic arwariya damaged merchandise, but this is lesscertain.

    Later, the meaning of the word shifts to equal sharing ofsuch loss by the interested parties.

    . 4/

  • 7/30/2019 Statistics crash course

    5/22

    Measures of central tendency

    Arithmetic Mean (often what we think of when we say theword Average).Add em all up and divide by the number there are.

    x =1

    n

    ni=1

    xi

    . 5/

  • 7/30/2019 Statistics crash course

    6/22

    An aside about samples and populations

    Often we cant measure an entire population, and insteadhave to measure a subset (a sample). The mean on theprevious slide x is, strictly speaking, a sample mean. The

    population mean is usually referred to as , and the size ofthe whole population as N.

    = 1N

    Ni=1

    xi

    . 6/

  • 7/30/2019 Statistics crash course

    7/22

    The other two

    Median = put them all in order, and choose the middle one.IF there are an even number, then there are two middleones, so use the number halfway between these.

    Mode = choose the most frequent one.

    . 7/

  • 7/30/2019 Statistics crash course

    8/22

    Symmetricity/Skewness

    I am just going to mention this in passing today, but...

    0 10 20 30 40 50 60 700

    100

    200

    300

    400

    500

    600

    700A fictitious but nastily skewed dataset

    Count

    Number

    Figure 1: A skewed dataset

    This dataset has a mean of 21.8, a median of 12 and amode of 12.

    . 8/

  • 7/30/2019 Statistics crash course

    9/22

    An aside about types of data

    There are various types of data we can consider withinstatistics. Not all measures of central tendency apply to allof these

    Data type Description Average

    Nominal Categories or names Mode

    Ordinal Orderings (e.g., First,Second, Third . . . )

    Median

    Interval Proper numbers Mean (symmetrical)

    and Ratio Median (skewed)

    . 9/

  • 7/30/2019 Statistics crash course

    10/22

    nd now over to my sequinned assistant. .

    . 10/

  • 7/30/2019 Statistics crash course

    11/22

    To conclude the average bit

    Arithmetic Mean; Median; Geometric median; Mode;Geometric Mean; Harmonic Mean; Quadratic Mean (orRMS); Generalised Mean (like quadratic mean but with

    different powers); Weighted Mean (some matter more thanothers); Truncated Mean (leave out the tricky outliers);Interquartile Mean (uses the interquartile range, of whichmore later); Midrange (max+min/2); Winsorized mean (Liketruncated but not quite); Annualization (to do with financestuff).

    All of these have their own wikipedia page, so, you knowwhere to start!

    . 11/

  • 7/30/2019 Statistics crash course

    12/22

    Boring practical bit

    32 7 16 33

    33 10 13 35

    22 11 15 3421 13 17 32

    23 16 15 24

    . 12/

  • 7/30/2019 Statistics crash course

    13/22

    Boring practical bit: answers

    32 7 16 33

    33 10 13 35

    22 11 15 3421 13 17 32

    23 16 15 24

    Mean 26.2 11.4 15.2 31.6Median 23 11 15 33

    Mode ? ? 15 ?

    . 13/

  • 7/30/2019 Statistics crash course

    14/22

    Deviation

    As well as knowing some kind of average of a particularsample, you might want to know something of its spread.

    1.5 1 0.5 0 0.5 1 1.5 2 2.5 3 3.50

    1

    2

    3

    4

    5

    6x 10

    4 More fictitious data

    Number

    Count

    Figure 2: Three datasets with the same mean but

    different spreads.

    . 14/

    Th ll i l

  • 7/30/2019 Statistics crash course

    15/22

    The really simple one

    The range is the simplest way of describing the spread ofdata - find the max, find the min, subtract the min from themax, there you go.

    . 15/

  • 7/30/2019 Statistics crash course

    16/22

    Deviation

    The deviation of a sample is measured with reference tosome measure of central tendency you want to know howmuch the sample deviates from something. With average

    deviation, variance, and standard deviation, this is themean or the sample mean x.

    . 16/

  • 7/30/2019 Statistics crash course

    17/22

    Measures of deviation

    Average deviation =

    |x |N

    Variance = 2 =

    (x )2N

    Standard deviation = =

    (x )2

    N

    For reasons you will now be familiar with, when consideringsamples, becomes s, and becomes x. To account forbias, sample standard deviation is divided by n 1 ratherthan n.

    . 17/

    W k d l

  • 7/30/2019 Statistics crash course

    18/22

    Worked example

    This examplea involves the rainfall in Liberiab.

    J F M A M J J A S O N D

    1 2 4 6 18 37 31 16 28 24 9 4

    The mean of this data is

    1 + 2 + 4 + 6 + 18 + 37 + 31 + 16 + 28 + 24 + 9 + 412

    = 15

    The range of this data is 36; (max-min, or 37-1)

    ataken from Sternsteins StatisticsbNo, Ive never been there either

    . 18/

    A d i ti

  • 7/30/2019 Statistics crash course

    19/22

    Average deviation

    The average deviation

    = |1

    15|

    +|2

    15|

    +|4

    15|

    +|6

    15|

    +|18

    15|

    + ...

    12

    =14 + 13 + 11 + 9 + 3 + 22 + 16 + 1 + 13 + 9 + 6 + 11

    12(10.7 Inches)

    . 19/

    V i d t d d d i ti

  • 7/30/2019 Statistics crash course

    20/22

    Variance and standard deviation

    The variance

    = 14

    2

    + 13

    2

    + 11

    2

    + 9

    2

    + 3

    2

    + 22

    2

    + 16

    2

    + 1

    2

    + 13

    2

    + 9

    2

    + 6

    2

    + 1112

    (143.7 Inches squared)

    AND the standard deviation is the square root of thevariance, so...

    =

    143.7 = 12.0

    and the units of the standard deviation are... the same asthe units of measurement.

    . 20/

    Interq artile range

  • 7/30/2019 Statistics crash course

    21/22

    Interquartile range

    One final measure of deviation is the interquartile range.

    This is related to the median, and the first thing you do is

    place your data in order.

    Discard the lowest and the highest 14

    of your data, and use

    the range of what remains. This is much more robust tooutliers.

    . 21/

    A d t fi i h

  • 7/30/2019 Statistics crash course

    22/22

    And to finish

    If your data is normally distributed (of which more nextweek), knowing the standard deviation tells you all sorts ofuseful stuff.

    Figure 3: Another graph stolen from wikipedia

    . 22/