     • date post

04-Apr-2018
• Category

Documents

• view

216

0

Embed Size (px)

Transcript of Statistics crash course

• 7/30/2019 Statistics crash course

1/22

Probability and statistics crash course

http://www.comp.leeds.ac.uk/hannah/mathsclub

Probability 1 (for dummies:-)

Stats 1 (averages and deviations)

Probability 2 (Trials and distributions)

Stats 2 (significance)

Stats 3 (errors)

. 1/

• 7/30/2019 Statistics crash course

2/22

Preliminaries

So what is statistics?

Applied branch of mathematics

Concerning data and its representation

Descriptive Statistics (today) are concerned withrepresenting and summarising data

Analytical Statistics (in a few weeks) are concerneddrawing conclusions from data

... probability theory enables us to find the consequencesof a given ideal world, while statistical theory enables us toto measure the extent to which our world is idealSkiena, 2001.

. 2/

• 7/30/2019 Statistics crash course

3/22

Descriptive statistics: Why?

Summarising data.

32 7 16 33

33 10 13 35

22 11 15 34

21 13 17 32

23 16 15 24

Max, Min, Mean(s), Median, Mode, Variance, StandardDeviation, Interquartile range, ...

All ways of presenting numerical data in such a way that welearn something of its spread and tendency and deviation.

. 3/

• 7/30/2019 Statistics crash course

4/22

What is an average?

Average originally meant Financial loss incurred throughdamage to goods in transit, from the Italian avaria, a wordfrom 12c. Mediterranean maritime trade. Sometimes traced

to Arabic arwariya damaged merchandise, but this is lesscertain.

Later, the meaning of the word shifts to equal sharing ofsuch loss by the interested parties.

. 4/

• 7/30/2019 Statistics crash course

5/22

Measures of central tendency

Arithmetic Mean (often what we think of when we say theword Average).Add em all up and divide by the number there are.

x =1

n

ni=1

xi

. 5/

• 7/30/2019 Statistics crash course

6/22

An aside about samples and populations

Often we cant measure an entire population, and insteadhave to measure a subset (a sample). The mean on theprevious slide x is, strictly speaking, a sample mean. The

population mean is usually referred to as , and the size ofthe whole population as N.

= 1N

Ni=1

xi

. 6/

• 7/30/2019 Statistics crash course

7/22

The other two

Median = put them all in order, and choose the middle one.IF there are an even number, then there are two middleones, so use the number halfway between these.

Mode = choose the most frequent one.

. 7/

• 7/30/2019 Statistics crash course

8/22

Symmetricity/Skewness

I am just going to mention this in passing today, but...

0 10 20 30 40 50 60 700

100

200

300

400

500

600

700A fictitious but nastily skewed dataset

Count

Number

Figure 1: A skewed dataset

This dataset has a mean of 21.8, a median of 12 and amode of 12.

. 8/

• 7/30/2019 Statistics crash course

9/22

An aside about types of data

There are various types of data we can consider withinstatistics. Not all measures of central tendency apply to allof these

Data type Description Average

Nominal Categories or names Mode

Ordinal Orderings (e.g., First,Second, Third . . . )

Median

Interval Proper numbers Mean (symmetrical)

and Ratio Median (skewed)

. 9/

• 7/30/2019 Statistics crash course

10/22

nd now over to my sequinned assistant. .

. 10/

• 7/30/2019 Statistics crash course

11/22

To conclude the average bit

Arithmetic Mean; Median; Geometric median; Mode;Geometric Mean; Harmonic Mean; Quadratic Mean (orRMS); Generalised Mean (like quadratic mean but with

different powers); Weighted Mean (some matter more thanothers); Truncated Mean (leave out the tricky outliers);Interquartile Mean (uses the interquartile range, of whichmore later); Midrange (max+min/2); Winsorized mean (Liketruncated but not quite); Annualization (to do with financestuff).

All of these have their own wikipedia page, so, you knowwhere to start!

. 11/

• 7/30/2019 Statistics crash course

12/22

Boring practical bit

32 7 16 33

33 10 13 35

22 11 15 3421 13 17 32

23 16 15 24

. 12/

• 7/30/2019 Statistics crash course

13/22

32 7 16 33

33 10 13 35

22 11 15 3421 13 17 32

23 16 15 24

Mean 26.2 11.4 15.2 31.6Median 23 11 15 33

Mode ? ? 15 ?

. 13/

• 7/30/2019 Statistics crash course

14/22

Deviation

As well as knowing some kind of average of a particularsample, you might want to know something of its spread.

1.5 1 0.5 0 0.5 1 1.5 2 2.5 3 3.50

1

2

3

4

5

6x 10

4 More fictitious data

Number

Count

Figure 2: Three datasets with the same mean but

. 14/

Th ll i l

• 7/30/2019 Statistics crash course

15/22

The really simple one

The range is the simplest way of describing the spread ofdata - find the max, find the min, subtract the min from themax, there you go.

. 15/

• 7/30/2019 Statistics crash course

16/22

Deviation

The deviation of a sample is measured with reference tosome measure of central tendency you want to know howmuch the sample deviates from something. With average

deviation, variance, and standard deviation, this is themean or the sample mean x.

. 16/

• 7/30/2019 Statistics crash course

17/22

Measures of deviation

Average deviation =

|x |N

Variance = 2 =

(x )2N

Standard deviation = =

(x )2

N

For reasons you will now be familiar with, when consideringsamples, becomes s, and becomes x. To account forbias, sample standard deviation is divided by n 1 ratherthan n.

. 17/

W k d l

• 7/30/2019 Statistics crash course

18/22

Worked example

This examplea involves the rainfall in Liberiab.

J F M A M J J A S O N D

1 2 4 6 18 37 31 16 28 24 9 4

The mean of this data is

1 + 2 + 4 + 6 + 18 + 37 + 31 + 16 + 28 + 24 + 9 + 412

= 15

The range of this data is 36; (max-min, or 37-1)

ataken from Sternsteins StatisticsbNo, Ive never been there either

. 18/

A d i ti

• 7/30/2019 Statistics crash course

19/22

Average deviation

The average deviation

= |1

15|

+|2

15|

+|4

15|

+|6

15|

+|18

15|

+ ...

12

=14 + 13 + 11 + 9 + 3 + 22 + 16 + 1 + 13 + 9 + 6 + 11

12(10.7 Inches)

. 19/

V i d t d d d i ti

• 7/30/2019 Statistics crash course

20/22

Variance and standard deviation

The variance

= 14

2

+ 13

2

+ 11

2

+ 9

2

+ 3

2

+ 22

2

+ 16

2

+ 1

2

+ 13

2

+ 9

2

+ 6

2

+ 1112

(143.7 Inches squared)

AND the standard deviation is the square root of thevariance, so...

=

143.7 = 12.0

and the units of the standard deviation are... the same asthe units of measurement.

. 20/

Interq artile range

• 7/30/2019 Statistics crash course

21/22

Interquartile range

One final measure of deviation is the interquartile range.

This is related to the median, and the first thing you do is

Discard the lowest and the highest 14

the range of what remains. This is much more robust tooutliers.

. 21/

A d t fi i h

• 7/30/2019 Statistics crash course

22/22

And to finish

If your data is normally distributed (of which more nextweek), knowing the standard deviation tells you all sorts ofuseful stuff.

Figure 3: Another graph stolen from wikipedia

. 22/