INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics...

Post on 05-Jan-2016

221 views 0 download

Tags:

Transcript of INVESTIGATION Data Colllection Data Presentation Tabulation Diagrams Graphs Descriptive Statistics...

INVESTIGATIONINVESTIGATION

Data Colllection

Data Presentation

TabulationDiagramsGraphs

Descriptive Statistics

Measures of LocationMeasures of Dispersion

Measures of Skewness & Kurtosis

Inferential Statistiscs

Estimation Hypothesis TestingPonit estimateInteval estimate

Univariate analysis

Multivariate analysis

EXAMPLE:

(1) 7,8,9,10,11 n=5, x=45, =45/5=9

(2) 3,4,9,12,15 n=5, x=45, =45/5=9

(3) 1,5,9,13,17 n=5, x=45, =45/5=9

S.D. : (1) 1.58 (2) 4.74 (3) 6.32

x

x

x

3

Measures of Dispersion

Or

Measures of variability

measures of dispersion summarize differences in the data, how the numbers differ from one another.

Measures of DispersionMeasures of Dispersion

5

Series I: 70 70 70 70 70 70 70 70 70 70

Series II: 66 67 68 69 70 70 71 72 73 74

Series III: 1 19 50 60 70 80 90 100 110 120

6

Measures of Variability

• A single summary figure that describes the spread of observations within a distribution.

7

MEASURES OF DESPERSION

• RANGE

• INTERQUARTILE RANGE

• VARIANCE

• STANDARD DEVIATION

8

Measures of Variability

• Range– Difference between the smallest and largest observations.

• Interquartile Range– Range of the middle half of scores.

• Variance– Mean of all squared deviations from the mean.

• Standard Deviation– Rough measure of the average amount by which observations deviate from

the mean. The square root of the variance.

9

Range

• The difference between the lowest and highest values in the data set.

• The range can be misleading with outliers

data: 2,4,5,2,5,6,1,6,8,25,2

Sorted data: 1,2,2,2,3,4,5,6,6,8,25

24125 MinimumMaximumRange

10

Variability Example: Range

• Hotel Rates

52, 76, 100, 136, 186, 196, 205, 150, 257, 264, 264, 280, 282, 283, 303, 313, 317, 317, 325, 373, 384, 384, 400, 402, 417, 422, 472, 480, 643, 693, 732, 749, 750, 791, 891

• Range: 891-52 = 839

11

Measures of Position

Quartiles, Deciles,

Percentiles

12

Q1, Q2, Q3 divides ranked scores into four equal parts

Quartiles

25% 25% 25% 25%

Q3Q2Q1(minimum) (maximum)

(median)

13

Quartiles: 1 = Q n+1

th4

2 = Q 2(n+1) n+1 = th

4 2

3 = Q 3(n+1) th

4Inter quartile : IQR = Q3 – Q1

14

Inter quartile Range

• The inter quartile range is Q3-Q1

• 50% of the observations in the distribution are in the inter quartile range.

• The following figure shows the interaction between the quartiles, the median and the inter quartile range.

15

Inter quartile Range

16

Sample Number Unsorted Values1 252 27 3 20 4 235 266 247 198 169 2510 1811 3012 2913 3214 2615 2416 2117 2818 2719 2020 1621 14

17

Sample Number Unsorted Values Ranked Values1 25 142 27 163 20 164 23 185 26 196 24 20 7 19 208 16 219 25 2310 18 2411 30 2412 29 2513 32 2514 26 2615 24 2616 21 2717 28 2718 27 2819 20 2920 16 3021 14 32

18

Sample Number Unsorted Values Ranked Values1 25 14 Minimum2 27 163 20 164 23 185 26 196 24 20 LQ or Q1

7 19 208 16 219 25 2310 18 2411 30 24 Md or Q2

12 29 2513 32 2514 26 2615 24 2616 21 27 UQ or Q3

17 28 2718 27 2819 20 2920 16 3021 14 32 Maximum

19

D1, D2, D3, D4, D5, D6, D7, D8, D9

divides ranked data into ten equal parts

Deciles

10% 10% 10% 10% 10% 10% 10% 10% 10% 10%

D1 D2 D3 D4 D5 D6 D7 D8 D9

20

Quartiles

Q1 = P25

Q2 = P50

Q3 = P75

D1 = P10

D2 = P20

D3 = P30

• • •

D9 = P90

Deciles

21

Quartiles, Deciles, Percentiles

Fractiles

(Quantiles)partitions data into approximately equal parts

22

Percentiles and Quartiles

• Maximum is 100th percentile: 100% of values lie at or below the maximum

• Median is 50th percentile: 50% of values lie at or below the median

• Any percentile can be calculated. But the most common are 25th (1st Quartile) and 75th (3rd Quartile)

23

Locating Percentiles in a Frequency Distribution

• A percentile is a score below which a specific percentage of the distribution falls(the median is the 50th percentile.

• The 75th percentile is a score below which 75% of the cases fall.

• The median is the 50th percentile: 50% of the cases fall below it

• Another type of percentile :The quartile lower quartile is 25th percentile and the upper quartile is the 75th percentile

24

NUMBER OF CHILDREN

260 26.6 26.6 26.6

161 16.4 16.5 43.1

260 26.6 26.6 69.7

155 15.8 15.9 85.6

70 7.2 7.2 92.7

31 3.2 3.2 95.9

21 2.1 2.1 98.1

11 1.1 1.1 99.2

8 .8 .8 100.0

977 99.8 100.0

2 .2

979 100.0

0

1

2

3

4

5

6

7

EIGHT OR MORE

Total

Valid

NAMissing

Total

Frequency PercentValid

PercentCumulative

Percent

50th percentile

80th percentile

50% included here

80% included

here

25th percentile

25% included here

25

26

27

Five Number Summary

• Minimum Value

• 1st Quartile

• Median

• 3rd Quartile

• Maximum Value

28

VARIANCE:

Deviations of each observation from the mean, then averaging the sum of squares of these deviations.

STANDARD DEVIATION:

“ ROOT- MEANS-SQUARE-DEVIATIONS”

29

Variance

• The average amount that a score deviates from the typical score.

– Score – Mean = Difference Score

– Average of Difference Scores = 0

– In order to make this number not 0, square the difference scores (no negatives to cancel out the positives).

30

Variance: Computational Formula

• Population • Sample

2

222

)(

N

XXN

2

222

)(

n

XXnS

31

Standard Deviation

• To “undo” the squaring of difference scores, take the square root of the variance.

• Return to original units rather than squared units.

32

Quantifying Uncertainty

• Standard deviation: measures the variation of a variable in the sample.

– Technically,

s x xN ii

N

1

12

1

( )

33

Standard Deviation

• Population • Sample

Rough measure of the average amount by which observations deviate on either side of the mean. The square root of the variance.

2 s s2

(X )N

2 2)(

n

XXS

N X2 ( X )

N2

22

2

2 )(

n

XXnS

Example:

-1 1

3 9

-2 4

-3 9

2 4

1 1

Data: X = {6, 10, 5, 4, 9, 8}; N = 6

Total: 42 Total: 28

Standard Deviation:

76

42

N

XX

Mean:

Variance:2

2 ( ) 284.67

6

X Xs

N

16.267.42 ss

XX 2)( XX X

6

10

5

4

9

8

35

Calculating a Mean and a Standard Deviation

Absolute SquaredData Deviation Deviation Deviation

x x - Mean |x - Mean| (x-Mean)²10 -20 20 40020 -10 10 10030 0 0 040 10 10 10050 20 20 400

Sums 150 0 60 1000Means 30 0 12 200

Variance

14.1421356Standard deviation = Variance

36

Example of SD with discrete data

• Marks achieved by 7 students: 3, 4, 6, 2, 8, 8, 5

• Mean of these marks = 36/7 = 5.14

• Deviations from mean…

x x-x

3 3 - 5.14= -2.14

4 4 - 5.14= -1.14

6 6 - 5.14= 0.86

2 2 - 5.14= -3.14

8 8 - 5.14= 2.86

8 2.86

5 5 - 5.14= -0.14

Total = 0Problem! The sum of the deviations is always going to be 0!

Solution! Square them to get rid of the negatives…

(x – x)2

37

Example of SD with discrete data

• Marks achieved by 7 students: 3, 4, 6, 2, 8, 8, 5

• Mean of these marks = 36/7 = 5.14

• Deviations from mean…

x x-x

3 3 - 5.14= -2.14 4.59

4 4 - 5.14= -1.14 1.31

6 6 - 5.14= 0.86 0.73

2 2 - 5.14= -3.14 9.88

8 8 - 5.14= 2.86 8.16

8 2.86 8.16

5 5 - 5.14= -0.14 0.02

Total = 0

(x – x)2

Total = 32.85

Variance = 32.85 / 7 = 4.69

SD = √4.69 = 2.17

38

Variability Example: Standard Deviation

Mean: 6

Standard Deviation: 20.2

0.4

100

36004000

10

)60()400(102

2

S

S

S

S

0.210

40

10

)69()68()68()67()67()66()64()64()64()63(

)(

2222222222

2

S

S

n

XXS

2

2

2 )(

n

XXnS

X X2

3 94 164 164 166 367 497 498 648 649 81

Sum: 60 Sum: 400

39

40

41

42

Mean and Standard Deviation

• Using the mean and standard deviation together:– Is an efficient way to describe a distribution with just two numbers.

– Allows a direct comparison between distributions that are on different scales.

43

WHICH MEASURE TO USE ?

DISTRIBUTION OF DATA IS SYMMETRIC

---- USE MEAN & S.D.,

DISTRIBUTION OF DATA IS SKEWED

---- USE MEDIAN & QUARTILES

44

Mean, Median and Mode

45

Distributions

• Bell-Shaped (also known as symmetric” or “normal”)

• Skewed:– positively (skewed to the right) –

it tails off toward larger values

– negatively (skewed to the left) – it tails off toward smaller values

46

15 30 45 60 75

95% Confidence Interval for Mu

28 33 38 43

95% Confidence Interval for Median

Variable: Age

A-Squared:P-Value:

MeanStDevVarianceSkewnessKurtosisN

Minimum1st QuartileMedian3rd QuartileMaximum

32.3851

13.3380

28.0000

0.9620.014

36.450015.7356247.608

0.6796268.51E-02

60

11.000025.000031.500046.750079.0000

40.5149

19.1921

42.0000

Anderson-Darling Normality Test

95% Confidence Interval for Mu

95% Confidence Interval for Sigma

95% Confidence Interval for Median

Descriptive Statistics

47

ANY QUESTIONS