Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data...

41
Lesson1-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson 1: Analysis of Economic Data is difficult but intuitive
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data...

Page 1: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-1 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Lesson 1:

Analysis of Economic Data is difficult but intuitive

Page 2: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-2 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Outline

Capture-Recapture experiment

Estimator

Simulations

What is Statistics?

Sampling

How to estimate unemployment rate?

Page 3: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-3 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Capture/Re-captureCapture/Re-capture

Goal:1. Illustrate that how to estimate the

population size when the cost of counting all individuals is prohibitive.

2. Illustrate how easy and intuitive statistics could be. Statistics need not be completely deep, murky, and mysterious. Our common sense can help us to negotiate our way through the course.

Page 4: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-4 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Counting the stones

We are interested in knowing the number of black stones in the box.

We only need to do to obtain a reasonable estimate of stones in the box – allowing for errors of counting or estimation.

Page 5: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-5 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Two examples

Example #1: The box contains only a small number of stones.

Example #2: The box contains a lot of stones that will take days to count.

Page 6: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-6 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

History and examples of capture / recapture method

Capture-recapture methods were originally developed in the wildlife biology to monitor the census of bird, fish, and insect populations (counting all individuals is prohibitive). Recently, these methods have been utilized considerably in the areas of disease and event monitoring.

http://www.pitt.edu/~yuc2/cr/history.htm

Page 7: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-7 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

The fish example

Estimating the number of fish in a lake or pond. C fish is caught, tagged, and returned to the lake.

Later on, R fish are caught and checked for tags.

Say T of them have tags. The numbers C, R, and T are used to estimate the

fish population.

Page 8: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-8 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Stones in a box

The objective is to estimate the number of fish (represented by black stones) in a box.

Capture one handful of fish (black stones). Count them and call it C. Mark the fish by replacing the black stones with red stones. Put them back into the box.

Capture another handful of fish (stones). Count the total number of fish or stones (R) and the number of marked fish or white stones (T).

Based on this information, How to obtain a reasonable estimate of

the number of fish or stones in the box?

Page 9: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-9 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Stones in a box

We know that C/N ≈ T/R Hence, a simple estimate is

CR/T C= the number of fish or stones captured

in the first round. R= the total number of fish or stones

captured in the second round. T= the number of marked fish or white

stones captured in the second round.

Page 10: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-10 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulations to see the properties of this proposed estimator

How good is the proposed estimator? To see the properties of this proposed estimator, I

have use MATLAB to simulation our Capture-recapture experiment with different numbers of capture (C) and different numbers of recapture (R), relative to the total number of fish in the pond.

Throughout, N=500 and 1000 simulations

Page 11: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-11 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Definition: Estimator

Estimator is a formula or a rule that takes a set of data and returns an estimate of the population quantity (also known as population parameter) we are interested in.

θ(x1,x2,...,xn)

Example: An estimator for the population mean

If we are interested in the population mean, a very intuitive estimator of the population mean based on a sample (x1,x2,...,xn) is

θ(x1,x2,...,xn)= (x1+x2+...+xn)/n

Page 12: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-12 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulating the properties of a sample mean estimator

If we were to study the properties of the following two estimators for the population mean:

θ(x1,x2,...,xn)= (x1+x2+...+xn)/n

versus θ(x1,x2,...,xn)= (x1+x2+...+xn+1)/n

With some basic computing skills, we may perform Monte Carlo simulations to compare their properties.

Page 13: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-13 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulating the properties of a sample mean estimator

1. We will need to define a population. Suppose the population consists of 10 balls numbered from 1 to 10 in a bag. We know that the population mean is (1+2+3+4+5+6+7+8+9+10)/10 = 5.5.

2. We will need to define the sampling process. Suppose we draw a sample of size 5 with replacement. For the sample, compute the two sample mean estimates of the population mean.

3. We will need to decide on the number of repetitions. Suppose we will repeat the process for 10,000 times.

4. After repeating the sampling process 10,000 times, we will have 10,000 sample means for each of the estimator, each of them are estimate of the population mean based on the respective samples.

Page 14: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-14 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulating the properties of a sample mean estimator

The above simulation is performed using MATLAB. The means of the 10,000 sample means of the two estimators are

5.4990 and 5.6990.

Page 15: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-15 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Which estimator is more desirable?

00.050.10.150.20.250.30.350.40.45

0 5 10 15 20

Page 16: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-16 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulation design – via MATLAB

Individual simulation experiment: Create 500 “black” fish, labelled 1 to 500. Capture a random sample of C fish, mark them by

converting their label to zero (i.e., red fish). Capture another random sample of R fish. Count

the number of marked fish in the sample. Call it T.

Compute the estimate as CR/T. If T=0, we are in trouble. Such experiments

with T=0 are dropped. Repeat this experiment 1000 times. Hence, we have

1000 estimates. Compute the mean and standard deviation of these

1000 estimates.

Page 17: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-17 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of our estimatorIncreasing C and R

N C R S Mean Std

500 40 40 971 640.76 401.57

500 60 60 1000 579.22 321.54

500 80 80 1000 533.61 154.67

500 100 100 1000 522.85 104.29

500 120 120 1000 513.82 77.41

500 140 140 1000 507.04 60.98

500 250 250 1000 500.64 22.93

500 500 500 1000 500.00 0.00

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with at least one marked fish in recapture.

Page 18: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-18 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of our estimatorConstant C and increasing R

N C R S Mean Std

500 120 40 1000 507.86 75.07

500 120 60 1000 513.40 79.55

500 120 80 1000 508.19 73.56

500 120 100 1000 511.24 74.55

500 120 120 1000 510.93 75.41

500 120 140 1000 511.21 75.63

500 120 250 1000 510.49 74.04

500 120 500 1000 507.47 77.32

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with at least one marked fish in recapture.

Page 19: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-19 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of our estimatorIncreasing C and constant R

N C R S Mean Std

500 40 120 961 646.59 405.72

500 60 120 1000 582.17 327.97

500 80 120 1000 533.28 142.23

500 100 120 1000 512.28 95.40

500 120 120 1000 508.78 78.75

500 140 120 1000 507.50 60.61

500 250 120 1000 500.86 22.38

500 500 120 1000 500.00 0.00

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with at least one marked fish in recapture.

Page 20: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-20 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conclusion from the simulations

The proposed estimator generally overestimate the number of fish in pond, i.e., estimate is larger than the true number of fish in pond.

That is, there is a bias. Holding R constant, increasing the number of

capture (C) helps: Bias is reduced, i.e., Mean is closer to the true

population The estimator is more precise, i.e., standard

deviation of the estimator is smaller. Holding C constant, increasing the number of

recapture (R) does not help: Bias is more or less unchanged. The precision of the estimator is more or less

unchanged.

Page 21: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-21 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Additional issues

Our proposed estimator is good enough but it can be better. Alternative estimators have been developed to reduce or eliminate the bias of estimating N.

For instance, Seber (1982, p.60) suggests an estimator of N

(C+1)(R+1)/(T+1) – 1(Note that our proposed formula is CR/T.)

Seber, G. (1982): The Estimation of Animal Abundance and Related Parameters, second edition, Charles.

Page 22: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-22 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulations to see the properties of this modified estimator

How good is the modified estimator? To see the properties of this modified estimator, we

repeat the above simulation exercise with this new formula.

(C+1)(R+1)/(T+1) – 1

Page 23: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-23 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of modified estimatorIncreasing C and R

N C R S Mean Std

500 40 40 1000 488.60 271.05

500 60 60 1000 504.39 202.16

500 80 80 1000 498.88 121.47

500 100 100 1000 501.72 91.20

500 120 120 1000 498.10 72.01

500 140 140 1000 501.14 58.44

500 250 250 1000 498.60 21.72

500 500 500 1000 500.00 0.00

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with non-zero marked fish in recapture.

Page 24: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-24 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of modified estimatorConstant C and increasing R

N C R S Mean Std

500 120 40 1000 498.55 67.38

500 120 60 1000 500.05 71.54

500 120 80 1000 495.58 69.22

500 120 100 1000 497.01 71.14

500 120 120 1000 498.45 71.05

500 120 140 1000 495.17 67.46

500 120 250 1000 500.41 75.29

500 120 500 1000 496.73 74.27

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with non-zero marked fish in recapture.

Page 25: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-25 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Properties of modified estimatorIncreasing C and constant R

N C R S Mean Std

500 40 120 1000 491.84 291.00

500 60 120 1000 499.33 216.81

500 80 120 1000 496.51 117.05

500 100 120 1000 493.50 87.53

500 120 120 1000 503.24 73.65

500 140 120 1000 498.59 56.30

500 250 120 1000 499.76 22.58

500 500 120 1000 500.00 0.00

•N = Total number of fish in the pond.•C = number of captured fish.•R = number of re-captured fish.•S = number of simulation with non-zero marked fish in recapture.

Page 26: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-26 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Conclusion from the simulations

The modified estimator performs better than the original estimator.

There is no apparent bias. The estimator is more precise.

Holding R constant, increasing the number of capture (C) helps:

The estimator is more precise, i.e., standard deviation of the estimator is smaller.

Holding C constant, increasing the number of recapture (R) does not help:

The precision of the estimator is more or less unchanged.

Page 27: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-27 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What is Meant by Statistics?

Statistics is the science of 1. collecting, 2. organizing, 3. presenting, 4. analyzing, and 5. interpreting numerical data to assist in making more effective decisions.

Page 28: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-28 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Who Uses Statistics?

Statistical techniques are used extensively by Economists, marketing, accounting, quality control, consumers, professional sports people, hospital administrators, educators, politicians, physicians, etc...

Page 29: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-29 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Who Uses Statistics?

As economists, We must verifying our models with data. We need to provide forecast of the economy

(GDP growth). We need quantitative estimates of

How individual decisions are influenced by policy variables (such as unemployment benefits, education subsidy) in order to forecast the impact of public policies.

How macro policies (government expenditure) will affect output.

Page 30: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-30 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Who Uses Statistics?

In the business community, managers must make decisions based on what

will happen to such things as demand, costs, and profits.

These decisions are an effort to shape the future of the organization.

If the managers make no effort to look at the past and extrapolate into the future, the likelihood of achieving success is slim.

Page 31: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-31 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Why do we need to understand Statistics?

We are constantly deluged with statistics in the media (newspapers, magazines, journals, text books, etc.).

We need to have a means to condense large quantities of information into a few facts or figures.

We need to predict what will likely occur given what has occurred in the past.

We need to generalize what we have learned in specific situations to the more general case.

Page 32: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-32 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

We are users of statistics

We do not want to become professors of statistics. We do not want to develop advanced statistics

theory.

We are users of statistics To be effective users, we need to have a good

grip of basic statistics theory. We need to practice using the tools.

This course will give you the basic, enough for you to move on to your next Econometrics class.

Page 33: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-33 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Populations and Samples

A population is a collection of all possible individuals, objects, or measurements of interest.

A sample is a portion, or part, of the population of interest.

Page 34: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-34 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Populations and Samples

Population

Sample

Page 35: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-35 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Sampling a Population of Existing Units

Random Sampling A procedure for selecting a subset of the

population units in such a way that every unit in the population has an equal chance of selection

Sampling with replacement When a unit is selected as part of the sample,

its value is recorded and placed back into the population for possible reselection

Sampling without replacement Units are not placed back into the population

after selection

Page 36: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-36 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Approximate Random Samples

Frame

A list of all population units. Required for random sampling, but not for approximate random sampling methods like systematic and voluntary response sampling.

Systematic Sample

Every k-th element of the population is selected for the sample

Voluntary Response Sample

Sample units are self-selected (as in radio/TV surveys)

Page 37: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-37 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

How to estimate the unemployment rate

First, survey a large number of individuals (say, 1000) Are you 15 and over? If not, you are definitely not

in the labor force. If you are 15 and over,

Have you work for pay or profit during the seven days before enumeration or have a formal job attachment?

If yes, you are counted as employed. If not employed,

Have you been available for work during the seven days before enumeration? And

Have you sought work during the 30 days before enumeration?

If yes to both questions, you are counted as unemployed.

Page 38: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-38 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

How to estimate the unemployment rate

The unemployment rate is computed as#unemployed/ (#unemployed + #employed)

Note that the estimate of the unemployment rate is based on a random subset (which we call a sample) of the individuals of an economy -- not all individuals in an economy.

Page 39: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-39 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

Simulation: An estimation of the unemployment rate

A process of estimating unemployment rate may be simulated at home or in a classroom with a bag of black and white stones (as in a game of GO).

Suppose black stones stand for unemployed and white stones stand for employed individuals. A random selection of 20 individuals is like randomly grabbing 20 stones from the bag.

We ask each selected individuals whether they are white (employed) or black (unemployed). The unemployment rate may be computed using the formula

Page 40: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-40 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

What to take away today

Statistics could be easy and intuitive. Statistics need not be completely deep, murky, and

mysterious. Our common sense can help us to negotiate our way

through the course.

Page 41: Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Lesson1-41 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data

- END -

Lesson 1: Lesson 1: Analysis of Economic Data is difficult but intuitive