STAT 5372: Experimental Statistics

41
STAT 5372: Experimental Statistics Wayne Woodward Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214)768-2457 e-mail: e-mail: [email protected] URL: URL: faculty.smu.edu/waynew Hours: Hours: 2:00 - 3:00 MWF 3:00 - 4:00 Th - others by appointment

description

STAT 5372: Experimental Statistics. Wayne Woodward Office: 143 Heroy Phone: (214)768-2457 e-mail: [email protected] URL: faculty.smu.edu/waynew Hours: 2:00 - 3:00 MWF 3:00 - 4:00 Th - others by appointment. - PowerPoint PPT Presentation

Transcript of STAT 5372: Experimental Statistics

Page 1: STAT 5372: Experimental Statistics

STAT 5372:Experimental Statistics

STAT 5372:Experimental Statistics

Wayne WoodwardWayne Woodward

Office:Office: 143 Heroy Phone:Phone: (214)768-2457 e-mail:e-mail: [email protected] URL:URL: faculty.smu.edu/waynew Hours:Hours: 2:00 - 3:00 MWF 3:00 - 4:00 Th - others by appointment

Page 2: STAT 5372: Experimental Statistics

2

• Name

• Major (undergraduate/graduate)

• Previous stat courses:– STAT 5371?

– STAT/CSE/EMIS 4340?

– other – describe briefly

• Have you used SAS?

On a sheet of paper:

Page 3: STAT 5372: Experimental Statistics

3

Review Review

• Sampling Distributions

• Statistical Inference– Confidence Intervals

– Hypothesis Tests

Page 4: STAT 5372: Experimental Statistics

4

Sampling / Sampling Distributions

Sampling / Sampling Distributions

• Population -- totality of all observations of interest

• Random Variable (rv) -- a characteristic that can take on different values from object to object

• Sample -- subset of a population

– random sample:random sample: observations made independently and at random

Parameter – a characteristic of a population

Y1, Y2, … , Yn – typical notation for a random sample

-- population mean (), standard deviation (), …

Page 5: STAT 5372: Experimental Statistics

5

• Discrete – you can count the possible outcomes

–Discrete distributions: binomial, Poisson, …

• Continuous – possible values fall along a continuum

–Continuous distributions: normal (Gaussian), chi-square, t, F, …

Random VariablesRandom Variables

Page 6: STAT 5372: Experimental Statistics

6

Normal Curve:

-- symmetric, bell-shaped

-- for this particular distribution:

- data concentrated about 60

- very few data values above 100 or less than 20

Page 7: STAT 5372: Experimental Statistics

7

XZ

Standard NormalStandard Normal (Z-score)

Normal table gives P[Z ≤ z]

Has mean zero and standard deviation 1

Graph of standard normal is symmetric about 0

Page 8: STAT 5372: Experimental Statistics

8

Page 9: STAT 5372: Experimental Statistics

9

Page 10: STAT 5372: Experimental Statistics

10

Find: P[Z≤ 2.5]

Suppose = 50 and = 10.

Find: P[X≤ 45]

P[Z > 1.6]

P[X > 70]

Page 11: STAT 5372: Experimental Statistics

11

Examples of Statistics:Examples of Statistics:

1

1 n

ii

Y Yn

2

2 1

( )

1

n

ii

Y Y

Sn

sample mean

sample variance

Statistic - function of random variables

- typically used to estimate parameters

Page 12: STAT 5372: Experimental Statistics

12

Statistics are random variables and have their own distributions

Key ConceptKey ConceptKey ConceptKey Concept

- called sampling distributions

Page 13: STAT 5372: Experimental Statistics

13

Sampling Distribution of the Sample Mean

Sampling Distribution of the Sample Mean

IF:• Data are Normally distributed

• Observations are independent

Then:

/

YZ

n

The Sample Mean has aNormal Probability Distribution with

-- Mean

-- Standard Errorn

has a standard normal distribution

Page 14: STAT 5372: Experimental Statistics

14

Suppose = 50 and = 10 for a normal population and suppose further that a random sample of size n = 25 is taken.

Find: [ 45]P X

[ 47.5]P X

Page 15: STAT 5372: Experimental Statistics

15

Central Limit TheoremCentral Limit TheoremIF:• Independent Observations

• Sample Size is Sufficiently Large

Then:

( ) The Sample Mean is with

Y Approximately Normally Distributed

-- Mean

-- Standard Error / n

/

YZ

n

has a an approximateapproximateStandard Normal distribution

Page 16: STAT 5372: Experimental Statistics

16

Suppose = 50 and = 10 for a non-normal population and suppose further that a random sample of size n = 50 is taken.

Find: [ 52]P X

Page 17: STAT 5372: Experimental Statistics

17

Distribution of Sample Mean - Unknown

Distribution of Sample Mean - Unknown

/

Yt

S n

IF:IF:• Data Values are Normally Distributed

• Observations are Independent

Then:Then:

has a Student’s tStudent’s t distributionwith n - 1 df

Page 18: STAT 5372: Experimental Statistics

18t-distribution -- Figure 5.16, page 229

Page 19: STAT 5372: Experimental Statistics

19

Page 20: STAT 5372: Experimental Statistics

20

t

tNotation

.05,20t

.01,15t

.05z

.025z

z is obtained from bottom (inf.) row of t-table

.9,18t

Page 21: STAT 5372: Experimental Statistics

21

(1-)x100% Confidence Intervals for (1-)x100% Confidence Intervals for Setting:• Data are Normally Distributed

• Observations are Independent

• We want an interval that probably contains the population mean

Case 1: known

/ 2 / 2X z X zn n

Case 2: unknown

/ 2 / 2s s

X t X tn n

( 1n df )

Page 22: STAT 5372: Experimental Statistics

22

CI Example CI Example An insurance company is concerned about the number and magnitude of hail damage claims it received this year. A random sample 20 of the thousands of claims it received this year showed an average claim amount of $6,500 and a standard deviation of $1,500. (You can assume that claims have a normal distribution.

Find a a 95% confidence interval on the mean claim damage amount. 

Suppose that company actuaries believe the company does not need to increase insurance rates for hail damage if the mean claim damage amount is no greater than $7,000. Use the above information to make a recommendation regarding whether rates should be raised.  

Page 23: STAT 5372: Experimental Statistics

23

Last time we found 95% CI to be:

($5798, $7202)

What does this mean?

“There is a .95 probability that the population mean ( is between $5798 and $7202”?

Not exactly.

Page 24: STAT 5372: Experimental Statistics

24

Interpretation of 95% Confidence Interval

Interpretation of 95% Confidence Interval

100 different 95% CI plotted in the case for which true mean is 80

i.e. about 95% of these confidence intervals should “cover” the true mean

Page 25: STAT 5372: Experimental Statistics

25

Last time we found 95% CI to be:

($5798, $7202)

What does this mean?

“There is a .95 probability that the population mean () is between $5798 and $7202”?

Not exactly.

“About 95% of confidence intervals obtained in this manner will cover the true mean.”

A better statement;

We say:“we are 95% confident that the mean falls in the interval … ”

Page 26: STAT 5372: Experimental Statistics

Concern has been mounting that SAT scores are falling.

• 3 years ago -- National AVG = 955

• Random Sample of 200 graduating high school students this year (sample average = 935) (each year the standard deviation is about 100)

Question: Have SAT scores dropped ?

Procedure: Determine how “extreme” or “rare” our sample AVG of 935 is if population AVG really is 955.

Page 27: STAT 5372: Experimental Statistics

27

If Population average = 955, what is the probability of getting a sample average (from a sample of size 200) that is less than or equal to 935?

Page 28: STAT 5372: Experimental Statistics

We must decide:

• The sample came from population with population AVG = 955 and just by chance the sample AVG is “small.”

OR

• We are not willing to believe that the pop. AVG this year is really 955. (Conclude SAT scores have fallen.)

Page 29: STAT 5372: Experimental Statistics

29

 

Statistical HypothesisStatistical Hypothesis- statement about the parameters of one or more populations

Null Hypothesis (Null Hypothesis (HH00)- hypothesis to be “tested”

(standard, traditional, claimed, etc.)- hypothesis of no change, effect, or difference

(usually what the investigator wants to disprove)

 

Alternative Hypothesis (Alternative Hypothesis (HHaa)- null is not correct

(usually what the hypothesis the investigator suspects or wants to show)

Hypothesis Testing TerminologyHypothesis Testing TerminologyHypothesis Testing TerminologyHypothesis Testing Terminology

Page 30: STAT 5372: Experimental Statistics

30

Basic Hypothesis Testing Question:Basic Hypothesis Testing Question:

Do the Data provide sufficient evidence to refute the Null Hypothesis?

Test StatisticTest Statistic - measures how far the observed statistic is from the hypothesized parameter (under H0)

Example: H0: = 50

50

/

Xt

s n

Test statistic:

Page 31: STAT 5372: Experimental Statistics

31

Critical Region (Rejection Region)Critical Region (Rejection Region)- region of test statistic that leads to

rejection of null (i.e. t > c, etc.)

Critical ValueCritical Value- endpoint of critical region

Significance LevelSignificance Level - probability that the test statistic will

be in the critical region if null is true - probability of rejecting H0 when it is true

Hypothesis Testing (cont.)Hypothesis Testing (cont.)Hypothesis Testing (cont.)Hypothesis Testing (cont.)

Page 32: STAT 5372: Experimental Statistics

32

 

Types of HypothesesTypes of Hypotheses

0 0

0

:

:a

H

H

0 0

0

:

:a

H

H

One-Sided TestsOne-Sided Tests

Two-sided TestsTwo-sided Tests

0 0

0

:

:a

H

H

Page 33: STAT 5372: Experimental Statistics

33

Rejection Regions for One- and Two-Sided Alternatives

Rejection Regions for One- and Two-Sided Alternatives

-t

Critical Value

0 0 0 : : vs. aH H

0 0 0 : : vs. aH H

0 0 0 : : vs. aH H

0H t t Reject if

0H t tReject if

0 / 2|H t tReject if |

Page 34: STAT 5372: Experimental Statistics

34

A Standard Hypothesis Test Write-up

A Standard Hypothesis Test Write-up

1. State the null and alternative

2. Give significance level, test statistic,and the rejection region

3. Show calculations

4. State the conclusion- statistical decision

- give conclusion in language of the problem

Page 35: STAT 5372: Experimental Statistics

35

Hypothesis Testing Example 1Hypothesis Testing Example 1A solar cell requires a special crystal. If properly manufactured, the mean weight of these crystals is .4g. Suppose that 25 crystals are selected at random from a batch of crystals and it is calculated that for these crystals, the average is .41g with a standard deviation of .02g. At the = .01 level of significance, can we conclude that the batch is bad?

Page 36: STAT 5372: Experimental Statistics

36

Hypothesis Testing Example 2Hypothesis Testing Example 2A box of detergent is designed to weigh on the average 3.25 lbs per box. A random sample of 18 boxes taken from the production line on a single day has a sample average of 3.238 lbs and a standard deviation of 0.037 lbs. Test whether the boxes seem to be underfilled.

Page 37: STAT 5372: Experimental Statistics

37

Actual Situation

Errors in Hypothesis TestingErrors in Hypothesis Testing

Null is True Null is False

Do NotReject Ho

Reject Ho

Conclusion

CorrectDecision

CorrectDecision

( )

( )( 1 - )

( 1 - )(Power)

Type IIError

Type IError

Power

Page 38: STAT 5372: Experimental Statistics

38

Note:Note: There are many ways that H0 can be false

Example:Example:H0: 50This null hypothesis is “false” if:

(a) (b) (c)

If (c) is the actual situation, then the “power” of the test will probably be large

In the case of (a), the “power” will likely be small

Page 39: STAT 5372: Experimental Statistics

39

p-Value p-Value

(observed value of t)

-2.39

p-value

0 0 0 : : vs. aH H

0H t t Reject if

Suppose t = - 2.39 is observed from data for test above

Note: “Large negative values” of t make us believe alternative is true

the probability of an observation as extreme or more extreme than the one observed when the null is true

Page 40: STAT 5372: Experimental Statistics

40

Note:Note:-- if p-value is less than or equal to then we reject null at the significance level 

-- the p-value is the smallest level of significance at which the null hypothesis would be rejected

Page 41: STAT 5372: Experimental Statistics

41

Find the p-values for Examples 1 and 2