Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

26
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model

Transcript of Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Page 1: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Inferences from sample data

• Confidence Intervals

• Hypothesis Testing

• Regression Model

Page 2: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Confidence Intervals

• How well does the xbar represent the true population mean µ ?

• Can use CIs to determine “how close” we are to the true mean

• General form of a confidence interval– sample statistic ± (multiplier based on confidence

level) x (standard error of statistic)– sampling distribution based on central limit theorem

Page 3: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Confidence Interval for Mean

General expression: xbar ± t (alpha/2, df) x s/(square root of n)

student t distribution

standard error(n = sample size)

margin of error -- how close we are likely (based on confidencelevel) to be to population parameter

confidence level -- how confident we are that populationparameter will be in our interval -- 95 % meansalpha is .05.

Page 4: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

A Confidence Interval Example:

Suppose xbar = 100, s = 5 and n = 25.Construct a 95% confidence interval and interpret the interval.

100 ± t (.025,24) x 5 / (square root of 25)100 ± 2.064 x 1 (97.94, 102.64)

We are 95% confident that the true meanis in our interval of (about) 98 to 103.

Page 5: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

.100 .050 .025 .010 .005t

df

Page 6: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Confidence Interval Questions :

1. If we took another sample, would we get the same confidence interval ?

2. How does the confidence level relate to the margin of error ?

3. What can be done to reduce the margin of error ?

Page 7: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Conceptual view of confidence intervals:

Page 8: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

CONFIDENCE(alpha,standard_dev,size)

Alpha is the significance level used to compute the confidence level. The confidence level equals 100*(1 - alpha)%, or in other words, an alpha of 0.05 indicates a 95 percent confidence level.

Standard_dev is the population standard deviation for the data range and is assumed to be known.

Size is the sample size.

If we assume alpha equals 0.05, we need to calculate the area under the standard normal curve that equals (1 - alpha), or 95 percent. This value is ± 1.96. The confidence interval is therefore:     

Page 9: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Example

Suppose we observe that, in our sample of 50 commuters, the average length of travel to work is 30 minutes with a population standard deviation of 2.5. With alpha = .05, CONFIDENCE(.05, 2.5, 50) returns 0.69291. The corresponding confidence interval is then 30 ± 0.69291 = approximately [29.3, 30.7].

For any population mean, μ0, in this interval, the probability of obtaining a sample mean further from μ0 than 30 is more than 0.05. Likewise, for any population mean, μ0, outside this interval, the probability of obtaining a sample mean further from μ0 than 30 is less than 0.05.

Page 10: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Hypothesis Testing

• A study and assessment of data to examine two hypotheses: null and alternative.

• Six step process

1. state hypotheses, decision making alternatives and consequences of wrong decisions2. select the appropriate test statistic3. sketch sampling distribution and identify rejection region4. collect data, compute statistics5. test the null hypothesis and state conclusions6. state managerial decision

Page 11: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Hypothesis test example:

Our engineering staff claims we will obtain an average catapult launch of more than 110 inches. We will not market the catapult unless this is true.

1. state hypotheses Ho = ‘statement of no effect’ Ho = mean of launch is less than or equal to 110 inches Null action = don’t market.

Ha = ‘there is an effect or difference’ Ha = mean launch is greater than 110 inches Alternative action = market catapult; ‘launch’ marketing campaign.

Page 12: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Type 1 and Type 2 Errors

Page 13: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

With sample data, always a chance to make incorrectdecisions setting significance level. For type 1 error, the alpha is the maximum risk we are willing to take forthis type of error.

Rules of thumb from Harvey Brightman:1. Type 1 error costly and type 2 is not -- set alpha low -- .05 or less

2. Type 2 error costly and type 1 is not -- set alpha higher -- perhaps .25 or above

3. Both errors costly -- set alpha low and increase sample size

We are going to set alpha = .01

Page 14: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

2. Select test statistic

For large samples, use Z and for small samples use t

Z = xbar - mu -------------------------------- sigma / (square root of n)

-- for the t test statistic, substitute s for sigma

Page 15: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

3. Sketch sampling distribution and rejection region

t = 2.5395 .01,19

0

0.1

0.2

0.3

0.4

0.5

-4 -3 -2 -1 0 1 2 3 4

=TINV(2*0.01,19)

Page 16: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

TINV(probability,degrees_freedom)Probability is the probability associated with the two-tailed Student's t-distribution.Degrees_freedom is the number of degrees of freedom with which to characterize the distribution.

Remarks•A one-tailed t-value can be returned by replacing probability with 2*probability. For a probability of 0.05 and degrees of freedom of 10, the two-tailed value is calculated with TINV(0.05,10), which returns 2.28139. The one-tailed value for the same probability and degrees of freedom can be calculated with TINV(2*0.05,10), which returns 1.812462.

EXCEL

Page 17: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

4. Collect data and compute statistics

Let x bar = 115, s = 8 and n = 20

115 - 110t* = ------------ 8 / (square root of 20)

t* = 2.795

5. Statistical decision

Since t * is in the rejection region, we reject the null and accept the alternative hypothesis.

Page 18: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

0

1

2

3

4

5

6

Fre

quency

80 85 90 95100105110115120125130Launch distances

Comparing Two Catapults

0

0.2

0.4

0.6

0.8

1

Cum

ula

tive P

robabilitie

s

80 90 100 110 120 130 Distances

Cumulative PercentagesTwo Catapults

Observations from Two Populations

Page 19: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Group 2Group 1

96.050Mean113.950Mean1.562Standard Error1.375Standard Error

96.000Median115.000Median85.000Mode117.000Mode

6.985Standard Deviation6.151Standard Deviation48.787Variance37.839Variance-1.213Kurtosis-0.838Kurtosis-0.034Skewness-0.267Skewness22.000Range21.000Range85.000Minimum104.000Minimum

107.000Maximum125.000Maximum1921.000Sum2279.000Sum

20.000Count20.000Count3.061Confidence Level(0.95)2.696Confidence Level(0.95)

Summary Statistics for the Two Groups

Page 20: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Using the t test statistic to test for a difference in the two means:

Hypothesis test:

tx x

s n n

1 22 1

1

1

2( )

sn s n s

n n2 1 1

2

2 2

2

1 2

1 1

2

( ) ( )

Test statistic for hypothesis test on difference in two means

Page 21: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

In our case:n1 = 20 s1 = 6.15n2 = 20 s2 = 6.98s = 9.56

t* = 17.9 / 3.02 = 5.93

Comparing to a t at .05,18 = 1.734

Would conclude statistically different

the Numbers

Page 22: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

EXCEL Example

Page 23: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

Tools | Data Analysis |

=TINV(0.05,16)

=TINV(0.05*2,16)

Page 24: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

24

Linear Regression Model

Linear regression form:

Y Xt t 0 1 1

systematicvariation intime series

regression function(linear function oftime)

error term representsunsystematic orrandom variation

Page 25: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

25

Tools |Data Analysis |Regression

Page 26: Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.

26

=375.17+92.6255*C3

note: column C is time period

Our regression model