Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate...

38
Statistics I Chapter 2: Analysis of univariate data

Transcript of Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate...

Page 1: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Statistics IChapter 2: Analysis of univariate data

Page 2: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Chapter 2: Analysis of univariate data

Contents

1. Representations and graphs

I Frequency tables.

I Bar and pie charts, pictograms, histograms, frequency polygons.Other graphs. Lying with graphs.

2. Numerical summary:

I Central tendency (mean, median, mode)

I Location (quartiles and percentiles). Box plots.

I Spread (variance, standard deviation, range, IQR, coefficient ofvariation)

I Shape (coefficients of asymmetry and kurtosis)

Page 3: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Chapter 2: Analysis of univariate data

Recommended reading

I Pena, D., Romo, J. Introduccion a la Estadıstica para las CienciasSociales (1997).

I Chapters 2, 3, 4 y 5.

I Newbold, P. Statistics for Business and Economics (2008).I Chapters 1 y 2

PID: Mini-Vıdeos de autoformacion por Internet y telefonosmoviles

I Descriptiva: Una variable cualitativa

I Descriptiva: Una variable cuantitativa (salvo Desigualdad deChebyshev)

Page 4: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of qualitative variables

I Sample: 46 professionals of a computer company in the UnitedStates.

I Variable: EDUC: education level (1=High School; 2=College;3=Advanced Degree)

I Variable: MGT: position of responsibility (1=yes; 0=no)

In order to obtain information:

How to summarize primary data in a more useful way that allows a quickvisual interpretation?

Page 5: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of qualitative variables: frequency tables andbar charts

Education level Number of employees Proportion of employeesHigh School 14 0.304

College 19 0.413Advanced Degree 13 0.283

Total 46 1

Page 6: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of qualitative variables: frequency tables andbar charts

Position of responsibility Number of employees Proportion of employees

No 26 0.562Yes 20 0.435

Total 46 1

Page 7: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of qualitative variables: general outline of afrequency table

Freq. Freq.Class, xi Absolute, ni Relative, fi

x1 n1 f1 = n1n

x2 n2 f2 = n2n

......

...xk nk fk = nk

n

Total n 1

Note:

I ni = number of xi in the sample, fi = number of xin

I 0 ≤ fi ≤ 1

Page 8: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of qualitative variables: the bar chart

I Bars are of the same width and equally-spaced, with the heightscorresponding to the frequencies

I There are gaps between the bars

I Bars are labeled with class names

Page 9: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: the pie chart

I Sample: Among the 1100 visitors of the art exhibition ”Turner andthe Masters” (Prado Museum, June 22 to September 19 2010),those who bought their tickets online (a 20.3 %) Source: Institutefor Tourism Studies

I Variable: Main reason for buying the ticket online

Instituto de Estudios Turísticos

Turner y los Maestros

 

21  

Tabla 7. Visitantes según si ha tenido que esperar para entrar a la exposición

Filtro: Adquiere la entrada en taquilla

%

Ha tenido que esperar 12,1

No ha tenido que esperar 87,9

Total 100,0

Tabla 8. Visitantes por actividades realizadas en el tiempo de espera para entrar a la exposición

Filtro: Ha adquirido la entrada en taquilla y ha tenido que esperar desde que ha sacado

la entrada hasta que ha accedido a la exposición

%

Visitar la colección del Museo 16,6

Visitar o estar en la cafetería del Museo 7,7

Visitar la tienda del Museo 28,1

Estar o visitar otros espacios del Museo que no tienen colección

33,0

Esperar en el exterior del Museo 27,5

Tabla 9. Visitantes por la razón principal para adquirir la entrada por vía telemática

Filtro: Adquiere la entrada por vía telemática

%

Por comodidad 60,5

Rapidez 10,1

Puedo elegir el día y la hora de la visita 14,0

No tengo que esperar en taquilla 9,5

Porque la entrada es más barata 4,3

Por el horario 24 horas 1,2

Había oído hablar bien del servicio 0,4

Total 100,0

Page 10: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: the pie chartI Each slice is a fraction of the total size of the pieI Many software programs rank slices alphabeticallyI Although ’pretty’ harder to read than barchartsI Avoid 3D piecharts, for those the area in the background seems to

be smaller than the area in the foreground

Page 11: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: the Pareto chartI Bar chart in which the categories of the variable are ranked in order

of frequency.

I Applies only to nominal qualitative variables.

I Useful in the detection of the most significant “reasons”(a fewoptions account for almost all the purchasing frequency)

Page 12: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: pictograms

I Sample: 70 university students from Madrid

I Variable: Preferred political party

Preferred political party Students numb. Students prop.PSOE 23 0.33

PP 15 0.21Unidos Podemos 20 0.29

Ciudadanos 7 0.10Otros 5 0.07Total 70 1

The area of the graph is proportional to the frequency.

Page 13: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

ExerciseResults from a survey conducted among 15-20 year-old about theirfavorite leisure activity

I What is the objective variable and who are the individuals?

I For what percentage of young people reading is the preferred leisureactivity?

Page 14: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Exercise

From a test taken by a group of students, graded between 1 and 8, thefollowing table has been obtained:

Grade, xi ni fi1 4 0.082 43 0.164 7 0.145 56 107 7 0.148

I How many students have taken the test?

I What percentage of students have obtained a grade greater than orequal to 6?

Page 15: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Exercise

In a survey about health habits, 30 randomly chosen students have beenasked about the sport they usually practice. The results are shown in thefollowing table:

Sport, xi ni fiBasket 12 0.4Swimming 3 0.1Football 9 0.3None 6 0.2Total 30 1

To which of the following bar charts do data correspond?

Page 16: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Exercise Estadística Aplicada

a) c)

b) d)

Deporte

0

2

4

6

8

10

12

14

Baloncesto Natación Fútbol Ningún deporte

Deporte

0

2

4

6

8

10

12

14

Baloncesto Natación Fútbol Ningún deporte

Deporte

0

2

4

6

8

10

12

14

Baloncesto Natación Fútbol Ningún deporte

Deporte

0

2

4

6

8

10

12

14

Baloncesto Natación Fútbol Ningún deporte

Page 17: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of discrete quantitative variables: the table offrequencies

I Sample: 50 plants treated by the new pesticide

I Variable: The number of leaves attacked by a pest.

Cumulative CumulativeAbsolute Relative Absolute Relative

xi Frequency ni Frequency fi Frequency Ni Frequency Fi

0 6 0.12 6 0.121 10 0.20 16 0.322 12 0.24 28 0.563 8 0.16 36 0.724 5 0.10 41 0.825 4 0.08 45 0.906 3 0.06 48 0.968 1 0.02 49 0.98

10 1 0.02 50 1Total 50 1

Page 18: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of discrete quantitative variables: the table offrequencies

I What percentage of the sampled plants had only 4 leaves attacked?

I How many plants had no more than 3 leaves attacked?

I How many plants had at least 5 leaves attacked?

I What percentage of plants have between 3 and 5 leaves attacked?

I What percentage of plants had at least 8 leaves attacked?

I What percentage of plants had at most 3 leaves attacked?

Page 19: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of discrete quantitative variables: the bar chartBar charts can also be created for discrete data if there are not too manydifferent values.

Page 20: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of discrete quantitative variables: the generalformat of the table

Cumulative CumulativeAbsolute Relative Absolute Relatative

Class, xi Freq., ni Freq., fi Freq., Ni Freq., Fi

x1 n1 f1 = n1n N1 = n1 F1 = f1

x2 n2 f2 = n2n N2 = N1 + n2 F2 = F1 + f2

......

......

...xk nk fk = nk

n Nk = n Fk = 1Total n 1

Note:

I x1 < x2 < . . . < xk

I ni = number of xi in the sample, fi = number of xin

I Ni = Ni−1 + ni , Fi = Fi−1 + fiI 0 ≤ fi ,Fi ≤ 1

I Fi and Ni also make sense for qualitative ordinal variables

Page 21: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Qualitative ordinal variables: cumulative frequencies

We can also include cumulative frequencies in the table.

I Sample: 901 employees.

I Variable: levels of satisfaction (S=satisfied, V=very, U=unsatisfied)

Cumulative CumulativeAbsolute Relative Absolute Relatative

Class Frequency Frequency Frequency FrequencyVU 62 0.07 62 0.07U 108 0.12 170 0.19S 319 0.35 489 0.54

VS 412 0.46 901 1Total 901 1

Page 22: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Qualitative ordinal variables: bar charts with cumulativefrequencies

Attention! Many software programs rank in alphabetical order when thevariable is qualitative. If it is an ordinal variable, it must be ranked inascending order.

Page 23: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Bar charts for discrete data

I Sample: 46 professionals of a computer company in the United States.

I variable: EXPRNC: number of years working in the company

Experience, xi Absolute freq., ni Relative freq., fi1 5 0,1092 4 0,0873 4 0,0874 4 0,0875 3 0,0656 4 0,0877 1 0,0228 4 0,08710 4 0,08711 2 0,04312 2 0,04313 2 0,04314 1 0,02215 1 0,02216 3 0,06517 1 0,02220 1 0,022

Total 46 1

Page 24: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of discrete quantitative variables: the bar chart

Too many different values.

Page 25: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of continuous quantitative variables

I Sample: 46 professionals of a computer company in the United States.

I Variable: EXPRNC: years of experience

I Variable: SALARY: anual gross income (in US dollars)

Page 26: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Grouping by class intervals: continuous (or discrete) data

Class Interval Midpoint ni fi Ni Fi

[l0, l1] x1 = l0+l12 n1 f1 N1 F1

(l1, l2] x2 = l1+l22 n2 f2 N2 F2

......

......

......

(lk−1, lk ] xk = lk−1+lk2 nk fk n 1

Total n 1

Note:

I Left end-point is excluded, but right end-point is included in Excel(it is a convention)

I Reverse end-point convention can be applied - check your softwarefor definition

I Useful for tabulating discrete data if X takes many values

Page 27: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Grouping by class intervals

I Very often class intervals have the same width

I Determine the width w of each interval by

w =largest number - smallest number

number of desired intervals

I How many intervals? Roughly between 5 and 20. More specifically:I k ≈

√n if n is ’small’

I k ≈ 1 + 3.22 log(n) if n is ’large’

I Intervals never overlap

I Round up the interval width to get desirable interval endpoints

Page 28: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Grouping by class intervals: the histogram and thefrequency polygon

I Find range: 20− 1 = 19

I Select number of classes: say k =√

46 = 6.78 ≈ 7

I Compute interval width: 19/7 = 2.71⇒ 3.

I Determine the end-points (beginning before the first one and endingafter the last one): [0, 3], (3, 6], . . . , (18, 21]

Page 29: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of quantitative variables: the histogram andthe frequency polygon

I There are no gaps between the bars/bins

I Bin widths = widths of class intervals (identical), class boundariesare marked on the horizontal axis

I Bin heights = frequencies (here, absolute)

I Bin areas are proportional to the frequencies

Page 30: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Quantitative variables: the histogram

Page 31: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Description of quantitative variables: the histogram andthe frequency polygon

Page 32: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: cartograms (INE, Encuesta de Turismo de residentes)Average travel expenses per person (third quarter, 2016)

Average expenditure on excursions per person (third quarter, 2016)

Page 33: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: pictograms

Page 34: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Other graphics: time series

INE, Encuesta de Poblacion Activa

Page 35: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

How to lie with pictograms

Published in ”La Voz de Galicia”, on October 24, 2010.

I Letting height proportional to frequency gives a false impression.

I Is there anything else you don’t like?

Page 36: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Lying with graphsImproper use of scales: the coordinate origin is not 0

Page 37: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Lying with graphs

Page 38: Statistics I Chapter 2: Analysis of univariate · PDF fileChapter 2: Analysis of univariate data ... I Frequency tables. I Bar and pie charts, pictograms, histograms, frequency polygons.

Lying with graphsTh escale is upside down