Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation...

30
Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics, SPH [email protected] 10/23/2018 PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Transcript of Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation...

Page 1: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Point estimation

PUBH 7401: Fundamentals of Biostatistical Inference

Eric F. LockUMN Division of Biostatistics, SPH

[email protected]

10/23/2018

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 2: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

General Framework of Statistical Inference

Experimental set-up is usually as follows:

I Identify population of interest

I Take a sample from that population

I Calculate a statistic

I Use that statistic to infer something about population –often, a parameter

If I repeated steps 2-4, I would obtain a different sample, calculatea different value for the statistic which may affect my inferences instep 4.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 3: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Parameters

I Any summary measure of a population of interest

I Seen this before in the context of studying probability massfunctions and probability density functions

I In that context, parameter was some unknown quantity in thepmf/pdf. For example, µ and σ2 for the normal distribution orα and β for the gamma distribution

I Functions of those unknown quantities in a pdf/pmf are alsoparameters, e.g. E (X ) = αβ if X follows a gammadistribution

I Usually use Greek letters to denote parameters.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 4: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Point Estimator versus Point Estimate

Definition of Point EstimatorA point estimator is a procedure or method for obtaining anestimate of the parameter (θ) from sample data.

I The point estimate is the estimate computed from a sample(a number).

I As any function of the sample data is a statistic, a pointestimator is just a statistic.

I Use ’θ̂’ to denote estimate for parameter θ: θ̂ = h(X ).I Use sample mean to estimate population mean µ:

µ̂ = X̄ = 1n

n∑i=1

Xi .

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 5: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example - Point Estimator versus Point Estimate

I A nutritionist is interested in estimating the average numberof calories an undergraduate student consumes throughalcohol in a week during the school year, µ

I Let X be the number of calories consumed via alcohol by arandom student: X1, . . . ,X6:

780, 100, 250, 1080, 0, 300

I Use the sample mean as the point estimatorµ̂ = h(X ) = 1

n∑n

i=1 Xi

I In this case, the point estimate is µ̂ = 418.3 calories.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 6: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example - Different Possible EstimatorsWe tend to think that there is one estimation procedure for eachtype of data scenario but really there are many possibilitiesConsider the accompanying 20 observations on dielectricbreakdown voltage for pieces of epoxy resin. We are interested inestimating the population mean dielectric breakdown voltage (µ).Consider the following estimators and resulting estimates for µ.24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.9427.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88

1 Estimator: Sample Mean X ; Estimate: 27.7932 Estimator: Sample Median X̃ ; Estimate: 27.9603 Estimator: The midrange X e = {max(Xi )−min(Xi )}/2;

Estimate: 27.6704 Estimator: The 10% trimmed mean (discard the smallest and

largest 10% of sample and compute average); Estimate:27.838

5 Estimator: The number 27; Estimate: 27

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 7: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Different Possible Estimators

Consider the linear regression model:

Yi = β0 + β1Xi + εi (1)

I In the context of estimating parameters in a linear regressionmodel we typically use the value of the parameters whichminimize the sum of squared error:

Minimizen∑

i=1(Yi − β0 − β1Xi )2

I Could have chosen some other error to minimize or used anentirely different procedure

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 8: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Need Some Tools to Evaluate Possible Estimators

I Want some method to compare the performance of themethod over many different applications of the method(i.e. many different samples)

I We don’t care how well the method does in one particularsample

I For example, suppose that we are comparing differentdifferent procedures for shooting a free throw in basketball(e.g., overhand, underhand, backwards, etc.). You would notjudge the performance of the METHOD based on the result ofone shot

I Just because you miss once does not mean that theMETHOD is bad

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 9: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

General Framework of Statistical Inference

Experimental set-up is usually as follows:

I Identify population of interest

I Take a sample from that population

I Calculate a statistic - an estimate

I Use that statistic to infer something about population - theparameter

When we are comparing different statistics/estimators we areinterested in comparing different properties of the samplingdistribution of the statistic/estimator!

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 10: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Three General Metrics for Comparing Different Estimators

I Bias: E (θ̂ − θ) → How much does the center of the samplingdistribution of θ̂ differ from the true value of the parameter?

I Variance: V (θ̂) → How variable is the sampling distributionof θ̂?

I Mean squared error (MSE) E{(θ̂ − θ)2} → Combination ofbias and variance.

Note: MSE = Bias2 + Var

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 11: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating Health Care Costs

I Recall that we were interested in estimating the averagehealthcare costs for workers at a large company.

I The population distribution was Gamma(2, 1000)

I This company planned to take a sample of 100 people. Let’sconsider two possible estimators of E (X ) = µ:

I µ̂1 = 1100

∑100i=1 Xi : sample mean (what we considered last

time)

I µ̂2 = 190

∑i :i∈D Xi where D is the set of observations in the

middle 90% of the sample (trimmed mean)

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 12: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating Health Care Costs

Histogram of µ̂1: 1000 different samples

Mean Cost in Different Samples

Den

sity

1500 2000 2500

0.00

000.

0005

0.00

100.

0015

0.00

200.

0025

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 13: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating Health Care Costs

Histogram of µ̂2: 1000 different samples

Mean Cost in Different Samples

Den

sity

1500 2000 2500

0.00

000.

0005

0.00

100.

0015

0.00

200.

0025

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 14: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating Health Care Costs

Mean of sampling distribution of µ̂1

## [1] 2000.584

SD of sampling distribution of µ̂1

## [1] 141.3011

Mean of sampling distribution of µ̂2

## [1] 1887.656

SD of sampling distribution of µ̂2

## [1] 137.8121

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 15: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating Health Care Costs

MSE of sampling distribution of µ̂1

## [1] 19946.38

MSE of sampling distribution of µ̂2

## [1] 31594.4

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 16: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Note on comparing properties of estimators

I The properties of various estimators can depend on the truevalue of the parameter and the distribution of the population.

I The properties we simulated in the previous example hold ifthe population follows a gamma distribution with α = 2 andβ = 1000.

I In other examples we may want to derive the properties forour estimator under more general conditions.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 17: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating a Proportion

Suppose we want to estimate the proportion of fifth grade studentswho read at grade level in Minnesota which we denote as θ.Having a limited budget, we randomly sample only n number of ofstudents in Minnesota. Let Xi equal 1 if the i th student reads atgrade level and equal 0 otherwise. We consider two differentestimators for θ given below:

θ̂1(X1, . . . ,Xn) = X1 + . . .+ Xnn

θ̂2(X1, . . . ,Xn) = X1 + . . .+ Xn + 1n + 2

Find the bias of each of the estimators and compare.Find the variance of each of the estimators and compare.Find the MSE of each of the estimators and compare.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 18: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #1 Plot of Bias (n=20)

0.0 0.2 0.4 0.6 0.8 1.0

−0.

04−

0.02

0.00

0.02

0.04

Comparison of Bias

True Proportion

Bia

s

Estimator 1Estimator 2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 19: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #1 Plot of Bias Squared (n=20)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

000.

0005

0.00

100.

0015

0.00

20

Comparison of Bias Squared

True Proportion

Squ

ared

Bia

s

Estimator 1Estimator 2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 20: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #1 Plot of Variance (n=20)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

2

Comparison of Variance

True Proportion

Var

ianc

e

Estimator 1Estimator 2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 21: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #1 Plot of MSE (n=20)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

2

Comparison of MSE

True Proportion

MS

E

Estimator 1Estimator 2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 22: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #1 Plot of Ratio of MSE (n=20)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Ratio of MSE

True Proportion

MS

E1/

MS

E2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 23: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example: Estimating a Proportion Part 2

Find the bias, variance, and MSE of the above problem for ageneral n. How does the relationship between the two estimatorschange as n increases?

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 24: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #2 Plot of MSE (n=100)

0.0 0.2 0.4 0.6 0.8 1.0

0.00

000.

0005

0.00

100.

0015

0.00

200.

0025

Comparison of MSE

True Proportion

MS

E

Estimator 1Estimator 2

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 25: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Example #2 Plot of Ratio of MSE (n=100)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Ratio of MSE

True Proportion

MS

E2/

MS

E1

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 26: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

A Note on Unbiasedness

I A point estimator θ̂ is said to be an unbiased estimator of θ ifE (θ̂) for every possible value of θ.

I If θ̂ is not unbiased, the difference E (θ̂)− θ is called the biasof θ.

I Is the sample mean x̄ an unbiased estimator for µ?

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 27: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Robustness

I We had seen that the sampling distribution of an estimator θ̂depends on the distribution of the population

I Note: when I say distribution of the population I mean if wecollect data X1, . . . ,Xn, the distribution of Xi

I Therefore, the bias, variance, and MSE of θ̂ will depend onthe distribution of the population

I Of course, in real data analysis we do not know thedistribution of Xi

I An estimator which gives reasonable “good” results under avariety of distributions of the population is said to be robust

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 28: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Standard Error and Estimated Standard Error

Definition of Standard ErrorThe standard error of an estimator θ̂ is the standard deviation ofthe sampling distribution of θ̂, σθ̂ =

√V (θ̂)

Definition of Estimated Standard ErrorIf the standard error itself involves unknown parameters whosevalues can be estimated, substitution of these estimates into σθ̂yields the estimated standard error.

Will come back to this...

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 29: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Standard error example – proportion

I We want to estimate the proportion of fifth grade studentswho read at grade level in Minnesota: θ. We sample nstudents in Minnesota. Let Xi equal 1 if the i th student readsat grade level and equal 0 otherwise. Consider the samplemean estimate for θ:

θ̂ = 1n

n∑i=1

Xi

I What is the standard error for θ̂?

I What is a reasonable estimated standard error for θ̂?

I For n = 50 students, 30 read at grade level.I What is θ̂ and its estimated standard error?

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation

Page 30: Point estimation - ericfrazerlock.comericfrazerlock.com/point_estimation.pdf · Point estimation PUBH 7401: Fundamentals of Biostatistical Inference Eric F. Lock UMN Division of Biostatistics,

Consistency

I An estimator is consistent if it will always converge to the trueparameter as n→∞

I If X1,X2,X3, . . . ,Xn are independent from a distribution withparameter θ,

I θ̂ = h(X1, . . . ,Xn) is a consistent estimator of θ if

MSE(θ̂) = E{(θ̂ − θ)2} → 0

as n→∞.

I The sample mean X̄ is a consistent estimator for µ.

PUBH 7401: Fundamentals of Biostatistical Inference Point estimation