Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf ·...

12
Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways to describe the data collected. Graphical descriptions are often used. They are easy to learn but difficult to master. Nowadays, standard data management programmes, such as Excel, allow us to produce graphics on the fly. For this reason, we will not spend time on graphical presentations such as histogram and stem-and-leaf. Most introductory business statistics textbooks will have a chapter on the topic. If you are interested to learn more about graphical presentation of data, I recommend two books 1. Cohn, Victor (1989): News & Numbers: A guide to reporting statistical claims and controversies in health and other fields, Iowa State University Press. 2. Spirer, Herbert F., Louise Spirer, and A.J. Jaffe (1998): Misused Statistics, Marcel Dekker, Inc. Here, we shall focus on selective descriptive statistics we use often in the Economics and Finance disci- pline. In the following discussion, we do not distinguish between the population statistics and their sample counterparts. In most cases, population statistics and their sample counterparts use exactly the same for- mula. The approach to use the sample counterparts to estimate the population statistics is known as “analog estimation methods”. 1 In the following discussion, the only exception to this rule is in the sample variance formula. Definition 1 (Population Parameter): A population parameter is number calculated from all the population measurements that describes some aspect of the population. Population parameters are generally unknown and need to be estimated. Often population parameters are denoted by Greek letters (α, β, θ, etc.). For example, the population mean, often denoted by μ, is a population parameter and is the average of the population measurements. Definition 2 (Estimator): An estimator is a formula or a rule that takes a set of data and 1 Additional discussion may be found in a book by Charles F. Manski: Analog estimation methods in econometrics. New York: Chapman and Hall, 1988. Ka-fu WONG, September 15, 2004 ECON1003 Lesson 2: Descriptive Statistics 1

Transcript of Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf ·...

Page 1: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Lesson 2: Descriptive Statistics

Ka-fu WONG

September 15, 2004

There are a lot of ways to describe the data collected. Graphical descriptions are often used. They are

easy to learn but difficult to master. Nowadays, standard data management programmes, such as Excel,

allow us to produce graphics on the fly. For this reason, we will not spend time on graphical presentations

such as histogram and stem-and-leaf. Most introductory business statistics textbooks will have a chapter on

the topic. If you are interested to learn more about graphical presentation of data, I recommend two books

1. Cohn, Victor (1989): News & Numbers: A guide to reporting statistical claims and controversies in

health and other fields, Iowa State University Press.

2. Spirer, Herbert F., Louise Spirer, and A.J. Jaffe (1998): Misused Statistics, Marcel Dekker, Inc.

Here, we shall focus on selective descriptive statistics we use often in the Economics and Finance disci-

pline. In the following discussion, we do not distinguish between the population statistics and their sample

counterparts. In most cases, population statistics and their sample counterparts use exactly the same for-

mula. The approach to use the sample counterparts to estimate the population statistics is known as “analog

estimation methods”.1 In the following discussion, the only exception to this rule is in the sample variance

formula.

Definition 1 (Population Parameter): A population parameter is number calculated from all the

population measurements that describes some aspect of the population. Population parameters

are generally unknown and need to be estimated. Often population parameters are denoted

by Greek letters (α, β, θ, etc.). For example, the population mean, often denoted by µ, is a

population parameter and is the average of the population measurements.

Definition 2 (Estimator): An estimator is a formula or a rule that takes a set of data and

1Additional discussion may be found in a book by Charles F. Manski: Analog estimation methods in econometrics. NewYork: Chapman and Hall, 1988.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

1

Page 2: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

returns an estimate of the quantity we are interested in.

θ(x1, x2, ..., xn)

Generally, the formula will return two different numbers for two different samples.

Definition 3 (Point Estimate): A point estimate is a one-number estimate of the value of a

population parameter. In other words, a point estimate is a realization of the estimator based

on a sample. Often point estimates are denoted by English letters (a, b, c, etc.).

Definition 4 (Sample Statistic): A sample statistic is a number calculated using sample measure-

ments that describes some aspect of the sample. Thus, a sample statistic could be mechanically

computed and need not be a point estimate of a population parameter we are interested in. Of

course, a statistic is often a point estimate of a population parameter we are interested in.

1 Measure of central tendency

There are three major statistics that measure the central tendency: mean, median and mode. They help us

to answer questions like:

1. What is the likely income of a fresh university graduate?

2. What is the likely age of a first-year university student?

3. What is the likely stock return for a hedge fund?

In essence, we want to find a single number b such that the difference between a randomly drawn

observation and this number is small on average, based on some criteria. Let x be the randomly drawn

observation. Different criteria will end up with different measures of central tendency. Historically when

computing power is limited, we often consider the squared difference, i.e., (x − b)2. Then, the measure of

central tendency is the solution to the minimization problem of

minb

n∑

i=1

(xi − b)2.

It turns out the solving this minimization problem is relatively less demanding in computing resources.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

2

Page 3: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Recently, as computing power has become readily available, we are willing to consider the alternative

criteria – the absolute difference, i.e., |x− b|, where |.| denote “the absolute value of”. Then, the measure of

central tendency is the solution to the minimization problem of

minb

n∑

i=1

|xi − b|.

Although solving this minimization problem cost substantial computing resources, the solution has some

desirable property as discussed below.

The major reason for the dominance of the “squared difference” criteria over the “abolute difference”

is that the former minimization problem is smooth or differentiable. Thus, we may use differentiation to

reduce its demand on computing resources. We may also use differentiation to deduce some properties of

the “estimator”.

We will see these criteria again in our study of Econometrics (Economic Statistics) later.

These three statistics have different properties. Depending on the question we are asking, the likely

population distribution and the computing resources we have, we may want to focus on one of them or

report all of them. Often, we report all of them unless we are short of computing resources.

Definition 5 (Mean): The mean of a collection of n observations (say, {x1, x2, ..., xn} is defined

by

m =x1 + x2 + ...+ xn

n=∑ni=1 xin

=n∑

i=1

1nxi

Sometimes, we want to emphasize that the mean depends on the n observations and write:

m(x1, x2, ..., xn) or m(sn) where sn ≡ {x1, x2, ..., xn}. We note that m is the solution to the

minimization problem of

minb

n∑

i=1

(xi − b)2

and is used often historically due to its computational convenience.

Mean is a very important concept in Statistics and Economics. Population mean is also known as

“expected value” to students in Economics and Finance. Sample mean is often used to estimate

the population mean.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

3

Page 4: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Example 1 (Simple Mean): Suppose that a sample of 10 observations have been collected: 3,

5, 3, 4, 2, 1, 0, 6, 9, 7. The simple mean is

(3 + 5 + 3 + 4 + 2 + 1 + 0 + 6 + 9 + 7)/10 = 4

Challenge 1 (Mean of repeated observations): Consider a collection of 781 observations which

take only three values as following:

Value 3 5 7

No. of observations 234 200 347

Can you compute its mean?

For large number of repeated observations, the following weighted mean formula would be handy..

Definition 6 (Weighted Mean): Suppose we have the following collection of repeated observa-

tions

Value x1 x2 x3 ... xn

No. of observations w1 w2 w3 ... wn

The weighted mean is defined by

m =w1x1 + w2x2 + ...+ wnxn

w1 + w2 + ...+ wn=∑ni=1 wixi∑ni=1 wi

=n∑

i=1

wi∑ni=1 wi

xi =n∑

i=1

w∗i xi

where wi is called the positive weight for observation xi. It may be demonstrated that∑ni w∗i = 1..

Thus, the simple mean has equal weight (w∗i = 1/n) for all observations.

Example 2 (Mean of repeated observations): Consider a collection of 781 observations which

take only three values as following:

Value 3 5 7

No. of observations 234 200 347

The mean can be computed as

(3× 234 + 5× 200 + 7× 347)(234 + 200 + 347)

=4131781

= 5.29

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

4

Page 5: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Note that we are not explicitly asked to computed a weighted mean, but the weighted mean

formula is convenient.

Example 3 (Proportions interpreted as mean (I)): The proportion of observations with certain

characteristics in a collection of observations may be computed using the mean formula. Suppose

we are interested in the proportion of male in a class of 40. We can define an indicator variable

for each person i in class

xi =

1 if the student is male

0 otherwise

The proportion is simply

m =x1 + x2 + ...+ x40

40=∑40i=1 xi40

The interpretation of proportion as mean has greatly simplified our calculation. Often, survey

results are coded into numbers. For example, in census data, the sex variable is often coded 1 for

male and 2 for female. While the simple average of this variable is slightly difficult to interpret,

the simple average after a “re-coding” to 1 for male and 0 for female may be interpreted as the

proportion of male. How should we code the data if we want to compute the proportion of female

in a class.

Example 4 (Proportions interpreted as mean (II)): Suppose the followings are the grades of the

13 students in a class: {A,B,A−, C,D,B,B−, A−, A,A+, B+, B,B}. What is the proportion

of students who get B grade or above?

First we convert the grades into an indicator.

xi =

1 if the student gets a B grade or above

0 otherwise

In term of indicators, we have {1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1}. The sample proportion is simply

m =x1 + x2 + ...+ x13

13=

1013≈ 0.77

Thus, 77% students got B grade or above in class.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

5

Page 6: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Definition 7 (Percentile): The p-th percentile of a set of data is the number such that p% of

the data is less than that number. Operationally, the determination of a p-th for a collection of

n observations takes two steps.

1. Rank the data in ascending order (from the smallest to the largest).

2. The p-th percentile is the k-th observation in ascending, where k = ceil(n × p/100) and

ceil(.) means rounding up to the nearest integer.

Example 5 (Percentile): Suppose we have a collection of 1000 observations, arranged in as-

cending order: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, ...., 99.7, 99.8, 99.9, 100.0}. What is

the 75-th percentile?

The 75-th percentile is the 750-th observation (750 = 0.75× 1000). The 750-th observation is 75.

Challenge 2 (Percentile): Suppose we have collection of 2000 observations, arranged in as-

cending order: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, ...., 199.7, 199.8, 199.9, 200.0}. What

is the 75-th percentile?

Definition 8 (Median): The median of a set of data is the number such that 50% of the data is

less than that number. in other words, the sample median is the middle observation of the ordered

data. Suppose the data {x1, x2, ..., xn} has been ordered ascendingly, x1 <= x2 <= ... <= xn,

then

• If the number of observations, n, is an odd number, the median is the [(n+ 1)/2]-th ordered

observation.

• Otherwise, if the the number of observations, n, is an even number, the median is the point

half way between the (n/2)-th observation and the (n/2 + 1)-th observation in your ordered

list. Note that this will not be a observation, unless the two observations in question are

equal.

Median is also known as second quartile, or 50th percentile.

We note that median is the solution to the minimization problem of

minb

n∑

i=1

|xi − b|.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

6

Page 7: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Example 6 (Median): Consider a collection of 10 observations: 3, 5, 3, 4, 2, 1, 0, 6, 9, 7. To

find the median, we first rank the data from the smallest to the largest: {0, 1, 2, 3, 3, 4, 5, 6, 7, 9}.Because we have even number of observations, the median is the average of the two middle values

(3 + 4)/2 = 3.5

Example 7 (Effect of an extreme observation on median and mean): Two samples of five

executives received the following bonus last year ($000):

sample #1 15 17 16 15 200

sample #2 15 17 16 15 18

The excutive pay of 200 in the first sample appears an extreme observation. The following are

the sample mean and median of the first and second sample.

sample #1 sample #2

mean 52.60 16.20

median 16.00 16.00

Another way to see the impact of an extreme observation is to compute the sample mean and

median with and without the extreme observations for the first sample.

with without

mean 52.60 15.75

median 16.00 15.50

Thus, mean can be greatly affected by extreme observations but median will not.

Example 8 (Monthly income from main employment 2001): Possibly because median is less

sensitive to extreme observations, the Census and Statistics Department has chosen to report the

median income of employees surveyed in 2001. The following table are median monthly income

from main employment of employees by education in 2001 extracted from the 2001 Population

Census Main Report.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

7

Page 8: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Male Female

No schooling/Kindergarten 8,000 4,600

Primary 9,500 5,500

Lower secondary 10,000 6,500

Upper secondary 12,000 10,000

Matriculation 14,500 8,800

Tertiary: Non-degree course 20,000 15,300

Tetiary: Degree course 26,250 18,000

From the table, one can easily observed that female employees of the same education are paid

less than male.

The following table are median monthly income from main employment of employees by industry

in 2001 extracted from the 2001 Population Census Main Report.

Male Female

Manufacturing 12,000 8,500

Construction 10,000 8,600

Wholesale, retail and import/export trades, restaurants and hotels 11,000 8,000

Transport, storage and communications 10,500 10,000

Financing, insurance, real estate and business services 15,000 13,000

Community, Social and personal services 15,000 6,200

Others 13,048 8,800

The tables have many potential uses. What conclusion can we draw if we were analysts for

1. The University Admissions Tutor persuading secondary school students to pursue a degree

course.

2. The Admissions Tutor of the Economics and Finance programmes to persuade students to

choose his programmes.

3. The Equal Opportunities Commission to see whether there is any violation in equal oppor-

tunities law.

Definition 9 (First Quartile): The first quartile of a set of data is the number such that 25%

of the data is less than that number. Thus, first quartile is also known as 25th percentile.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

8

Page 9: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Definition 10 (Third Quartile): The third quartile of a set of data is the number such that 75%

of the data is less than that number. Thus, third quartile is also known as 75th percentile.

Note that the operational definitions of first quartile and third quartile may vary slightly across textbooks

and softwares.

Definition 11 (Mode): Mode of a set of data is the most common value found in the set of data.

A set of data can have more than one modal value.

2 Measuring Dispersion

The accuracy (i.e., how close is the point estimate to the parameter we are interested in) of central tendency

measures depends on the how spread out the data, i.e., dispersion, are. If the data is very concentrate, we

will be more confident that the central tendency measure is an accurate predictor of the question we have

in mind, say, the likely income earned by a fresh university graduate. If the data is very disperse, we will be

less confident that the central tendency measure is an accurate predictor of the question we have in mind.

Definition 12 (Minimum, Maximum and Range): Rank the data in ascending order. The first

observation in the ranked data is the minimum. The last observation in the ranked data is the

maximum. The difference between the largest and the smallest value in the dataset is the range.

Suppose the ranked data are: 0, 1, 2, 3, 3, 4, 5, 6, 7, 9. The first observation 0 is the minimum.

The last observation 9 is the maximum. The range is 9− 0 = 9.

Definition 13 (Variance and Standard Deviation): Both variance and standard deviation are

measures of how spread out the data are. Variance of a collection of n observations is computed

as the average squared deviation of each number from its mean.

v =(x1 −m)2 + (x2 −m)2 + ...+ (xn −m)2

n=∑ni=1(xi −m)2

n

where m is the mean of the n observations.

Standard deviation is the square root of the variance

s =√v.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

9

Page 10: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Note that the use of the above variance formula when applied to a sample of data as an estima-

tor for the population variance is generally biased. The unbiased estimator for the population

variance turns out to be

v∗ =n

n− 1v =

∑ni=1(xi −m)2

n− 1

where m is the sample mean. The division by “n − 1” instead of “n” is because m has to be

estimated from the same sample. It is often said “a degree of freedom” is lost in the computating

of sample variance. The difference between v and v∗ will be negligible when n is large.

Variance and standard deviation are convenient measure of dispersion. They differ mainly in the units of

measurement. Large standard deviation means the data is more dispersed.

Example 9 (Dispersion of asset returns): Dispersion is used as a measure of risk. Consider two

assets of the same expected (mean) returns of 2%.

Asset bad time normal time bad time

A 0% 2% 4%

B -2% 2% 6%

The dispersion of returns of the second asset is larger than the first. If we were to compute the

standard deviation for the population, we would get a bigger variance for the second asset. Thus,

the second asset is more risky. Thus, the knowledge of dispersion is essential for investment

decision. And so is the knowledge of expected (mean) returns.

The following two theorems related standard deviations and dispersion.

Theorem 1 (Chebyshev’s theorem): For any set of observations, the minimum proportion of the

values that lie within k standard deviations of the mean is at least:

1− 1k2

where k is any constant greater than 1. The following table shows the relationship between k

and the proportion of data covered (i.e., coverage).

k 1 2 3 4 5 6

Coverage 0.00% 75.00% 88.89% 93.75% 96.00% 97.22%

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

10

Page 11: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

The Chebyshev’s theorem is in contrast to the empirical rule.

Theorem 2 (Empirical rule): For any symmetrical, bell-shaped distribution:

1. About 68% of the observations will lie within one standard deviation of the mean.

2. About 95% of the observations will lie within two standard deviation of the mean.

3. Virtually all the observations will be within three standard deviation of the mean

Empirical rule is also known as normal rule.

Definition 14 (Coefficient of variation): The coefficient of variation is the ratio of the standard

deviation (s) to the arithmetic mean (m), expressed as a percentage:

CV =s

m× 100%

Coefficient of variation is often used to measure the relative dispersion.

In finance, under some conditions the well-known Sharpe Ratio can be written as the inverse of CV.2 If x is

the return of an investment strategy in excess of the market portfolio, m will be the expected excess return

and s will be the standard deviation of the excess return. Of course, an investment strategy of a higher

Sharpe Ratio is preferred.

Definition 15 (Skewness): Skewness is a measure of symmetry, or more precisely, the lack of

symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of

the center point.

skewness =∑ni=1(xi −m)3

(n− 1)s3

where m is the sample mean and s is the standard deviation.

A positive skewness implies the distribution is skewed to the right and mode < median < mean.

A negative skewness implies the distribution is skewed to the left and mean < median < mode.

Zero skewness implies a symmetric distributiion and mean = median = mode.

Skewness measures the degree of asymmetry in risk: upside risk versus downside risk. Right skewed dis-

tribution of asset returns implies higher upside risk than downside risk. Left skewed distribution of asset

returns implies higher downside risk than upside risk.2Sharpe ratio is a measure of the performance of investment strategies, with an adjustment for risk. For additional discussion

of the Sharpe ratio, please refer to http://www.stanford.edu/∼wfsharpe/art/sr/sr.htm.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

11

Page 12: Lesson 2: Descriptive Statisticskafuwong.econ.hku.hk/teaching/econ1003/2004F/ppt/Lesson2.pdf · Lesson 2: Descriptive Statistics Ka-fu WONG September 15, 2004 There are a lot of ways

Definition 16 (Kurtosis): Kurtosis is a measure of whether the data are peaked or flat relative

to a normal distribution. That is, data sets with high kurtosis tend to have a distinct peak near

the mean, decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have

a flat top near the mean rather than a sharp peak.

kurtosis =∑ni=1(xi −m)4

(n− 1)s4

where m is the mean and s is the standard deviation.

Excess kurtosis is defined as

kurtosis− 3.

It is defined such that a normal distribution will have a excess kurtosis of zero. A positive excess

kurtosis implies the distribution has fatter tails than normal.

Among all descriptive statistics discussed above, we are often interested in the mean and standard devia-

tion. As we will show in later chapters, under some regularity conditions, sample mean will be distributed as

“normal” and thus allows us to compute confidence interval, to conduct hypothesis testing. Simple sample

mean can also be extended to allow the mean to vary with some other variables. For instance, we may

compute the mean income from a sample. We may also allow the mean income to vary with education,

similar to the one reported in one of the earlier examples.

Ka-fu WONG, September 15, 2004ECON1003 Lesson 2: Descriptive Statistics

12