Stats for Engineers Lecture 10
description
Transcript of Stats for Engineers Lecture 10
![Page 1: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/1.jpg)
Stats for Engineers Lecture 10
![Page 2: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/2.jpg)
Recap: Linear regression
Linear regression: fitting a straight line to the mean value of as a function of
403020100
250
200
150
100
x
y
We measure a response variable at various values of a controlled variable
![Page 3: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/3.jpg)
Least-squares estimates and :
οΏ½ΜοΏ½=ππ₯π¦
ππ₯π₯β§οΏ½ΜοΏ½=π¦β οΏ½ΜοΏ½π₯
ππ₯π₯=βππ₯ π2β
(βπ π₯π)2
π =βπ
(π₯πβπ₯ )2
ππ₯π¦=βππ₯ π π¦ πβ
βππ₯ πβ
ππ¦ π
π =βπ
(π₯πβπ₯) ( π¦ πβ π¦ )β
Sample means
οΏ½ΜοΏ½=οΏ½ΜοΏ½+οΏ½ΜοΏ½ π₯Equation of the fitted line is
![Page 4: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/4.jpg)
Estimating : variance of y about the fitted line
ΒΏππ¦π¦ βοΏ½ΜοΏ½ππ₯π¦
πβ2
Quantifying the goodness of the fit
Residual sum of squares
![Page 5: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/5.jpg)
Predictions
403020100
250
200
150
100
x
y
For given of interest, what is mean ?
Predicted mean value: .
It can be shown that
Confidence interval for mean y at given x
What is the error bar?
![Page 6: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/6.jpg)
y 240 181 193 155 172 110 113 75 94x 1.6 9.4 15.5 20.0 22.0 35.5 43.0 40.5 33.0
Example: The data y has been observed for various values of x, as follows:
Fit the simple linear regression model using least squares.
οΏ½ΜοΏ½=234.1β3.509 π₯
![Page 7: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/7.jpg)
Example: Using the previous data, what is the mean value of at and the 95% confidence interval?
π¦=234.1β3.509 π₯Recall fit was
Need
95% confidence for Q=0.975
Confidence interval is ,
β .
Hence confidence interval for mean is
β129Β±17
![Page 8: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/8.jpg)
Extrapolation: predictions outside the range of the original data
π₯πππ€
π¦ πππ€
What is the prediction for mean at ?
π¦ πππ€=οΏ½ΜοΏ½+οΏ½ΜοΏ½ π₯πππ€
![Page 9: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/9.jpg)
Extrapolation: predictions outside the range of the original data
π₯πππ€
π¦ πππ€
What is the prediction for mean at ?
Looks OK!
π¦ πππ€=οΏ½ΜοΏ½+οΏ½ΜοΏ½ π₯πππ€
![Page 10: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/10.jpg)
Extrapolation: predictions outside the range of the original data
π₯πππ€
π¦ πππ€
What is the prediction for mean at ?
Quite wrong!
Extrapolation is often unreliable unless you are sure straight line is a good model
![Page 11: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/11.jpg)
We previously calculated the confidence interval for the mean: if we average over many data samples of at , this tells us the interval we expect the average to lie in.
What about the distribution of future data points themselves?
Confidence interval for a prediction
Two effects:
- Variance on our estimate of mean at
- Variance of individual points about the mean
Confidence interval for a single response (measurement of at ) is
οΏ½ΜοΏ½ 2( 1π+(π₯βπ₯ )2
ππ₯π₯ )οΏ½ΜοΏ½ 2
Example: Using the previous data, what is the 95% confidence interval for a new measurement of at Answer
β129Β±50
![Page 12: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/12.jpg)
A linear regression line is fit to measured engine efficiency as a function of external temperature (in Celsius) at values . Which of the following statements is most likely to be incorrect?
1 2 3 4
14%
32%
27%27%
1. The confidence interval for a new measurement of at is narrower than at
2. Adding a new data at would decrease the confidence interval width at
3. If and accurately have a linear regression model, adding more data points at and would be better than adding more at and
4. The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time
![Page 13: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/13.jpg)
Confidence interval for mean y at given x
Confidence interval for a single response (measurement of at )
- Confidence interval narrower in the middle (
- Adding new data decreases uncertainty in fit, so confidence intervals narrower ( larger)
- If linear regression model accurate, get better handle on the slope by adding data at both ends(bigger smaller confidence interval)
- Extrapolation often unreliable β e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20.
AnswerThe confidence interval for a new measurement of at is narrower than at
The mean engine efficiency at T= -20 will lie within the 95% confidence interval at T=-20 roughly 95% of the time
Adding a new data at would decrease the confidence interval width at
If and accurately have a linear regression model, adding more data points at and would be better than adding more at and
0 3015
![Page 14: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/14.jpg)
Correlation
Regression tries to model the linear relation between mean y and x.
Correlation measures the strength of the linear association between y and x.
20100
60
50
40
30
20
10
x
y
A
20100
60
50
40
30
20
10
x
y
B
Weak correlation Strong correlation
- same linear regression fit (with different confidence intervals)
![Page 15: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/15.jpg)
If x and y are positively correlated:
- if x is high ( y is mostly high ()
- if x is low () y is mostly low ()
on average is positive
If x and y are negatively correlated:
on average is negative
- if x is high ( y is mostly low ()
- if x is low () y is mostly high ()
can use to quantify the correlation
20100
60
50
40
30
20
10
x
y
B
![Page 16: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/16.jpg)
More convenient if the result is independent of units (dimensionless number).
π=ππ₯π¦
βππ₯π₯ππ¦π¦
r = 1: there is a line with positive slope going through all the points; r = -1: there is a line with negative slope going through all the points; r = 0: there is no linear association between y and x.
Range :
Pearson product-moment.
Define
If , then is unchanged ( Similarly for - stretching plot does not affect .
![Page 17: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/17.jpg)
Example: from the previous data:
Hence
Notes:
- magnitude of r measures how noisy the data is, but not the slope
- finding only means that there is no linear relationship, and does not imply the variables are independent
![Page 18: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/18.jpg)
Correlation
A researcher found that r = +0.92 between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us?
1 2 3 4
17%
72%
6%6%
1. Higher temperatures cause people to buy more ice cream.
2. Buying ice cream causes the temperature to go up.
3. Some extraneous variable causes both high temperatures and high ice cream sales
4. Temperature and ice cream sales have a strong positive linear relationship.
Question from Murphy et al.
![Page 19: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/19.jpg)
![Page 20: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/20.jpg)
Correlation r
error
- not easy; possibilities include subdividing the points and assessing the spread in r values.
Error on the estimated correlation coefficient?
Causation? does not imply that changes in x cause changes in y - additional types of evidence are needed to see if that is true.
J Polit Econ. 2008; 116(3): 499β532.http://www.journals.uchicago.edu/doi/abs/10.1086/589524
![Page 21: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/21.jpg)
Strong evidence for a 2-3% correlation.
- this doesnβt mean being tall causes you earn more (though it could)
![Page 22: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/22.jpg)
1.
Correlation
Which of the follow scatter plots shows data with the most negative correlation ?
1 2 3 4
57%
0%
33%
10%
1. No correlation2. Correct3. Not large4. positive
2.
3. 4.
![Page 23: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/23.jpg)
Acceptance Sampling
Situation: large batches of items are produced. We want to sample a small proportion of each batch to check that the proportion of defective items is sufficiently low.
One-stage sampling plans
Sample items number of defective items in the sample
Reject batch if , accept if
How do we choose and ?
![Page 24: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/24.jpg)
Operating characteristic (OC): probability of accepting the batch
Define proportion of defective items in the batch (typically small).
Then if the population the samples are drawn from is large.
N=100, c=3
![Page 25: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/25.jpg)
Testing 100 samples and rejecting if more than 3 are faulty gives the OC curve on the right. Which of the following is the curve for testing 100 samples and rejecting if more than 2 are faulty?
1 2 3
50%
23%27%
1. C=2 correct2. C=53. Wrong height
1. 2. 3.
Rejecting more than 2, rather than more than 3 makes it more likely to reject the batch (for any ). is higher.
is lower, lower
![Page 26: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/26.jpg)
For standard acceptance sampling, Producer and Consumer must decide on the following:
Acceptable quality level:
(consumer happy, want to accept with high probability)
Unacceptable quality level:
(consumer unhappy, want to reject with high probability)
Ideally:
- always accept batch if - always reject batch if
i.e. and
- but canβt do this without inspecting the entire batch
![Page 27: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/27.jpg)
Use a sampling scheme
Producerβs Risk: reject a batch that has acceptable quality
ΒΏ1βπΏ(π1)
Consumerβs Risk: accept a batch that has unacceptable quality
ΒΏπΏ(π2)
Want to minimize:
![Page 28: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/28.jpg)
π2π1
Operating characteristic curve
Consumerβs risk (probability of accepting when unacceptable quality )
Producerβs risk (probability of rejecting when acceptable quality )
If consumer and producer agree on - can then calculate and .
π½
![Page 29: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/29.jpg)
Acceptance Sampling Tables: give for and
![Page 30: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/30.jpg)
![Page 31: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/31.jpg)
Example
In planning an acceptance sampling scheme, the Producer and Consumer have agreed that the acceptable quality level is 2% defectives and the unacceptable level is 6%. Each is prepared to take a 10% risk. What sample size is required and under what circumstances should the batch be rejected?
Answer
,
βπ=153 ,π=5
Should sample 153 items and reject if the number of defective items is greater than 5.
![Page 32: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/32.jpg)
In planning an acceptance sampling scheme, the Producer and Consumer have agreed that the acceptable quality level is 1% defectives and the unacceptable level is 3%. Each is prepared to take a 5% risk. What is the best plan?
1 2 3 4
6% 6%
56%
33%
β’ sample 308 items and reject if the number of defective items is greater than 5
β’ sample 308 items and reject if the number of defective items is 5 or more
β’ sample 521 items and reject if the number of defective items is 9 or more
β’ sample 521 items and reject if the number of defective items is 10 or more
![Page 33: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/33.jpg)
In planning an acceptance sampling scheme, the Producer and Consumer have agreed that the acceptable quality level is 1% defectives and the unacceptable level is 3%. Each is prepared to take a 5% risk. What is the best plan?
Sample 521 and reject if more than 9 (i.e. 10 or more)
![Page 34: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/34.jpg)
Example β calculating the risks
It has been decided to sample 100 items at random from each large batch and to reject the batch if more than 2 defectives are found. The acceptable quality level is 1% and the unacceptable quality level is 5%. Find the Producer's and Consumer's risks.
, 5Answer
1. For the Producer's Risk: want probability of reject batch when
π (Reject batchwhenπ=0.01 )=1βπ (accept batch)=1βπΏ (0.01 )
ΒΏ1βπ ( π=0 )β π ( π=1 )βπ (π=2)
ΒΏ1βπΆ0100 0.010Γ0.99100βπΆ1
1000.01Γ0.9999βπΆ21000.012Γ0.9998
- 0.3660 - 0.3697 - 0.1849 = 0.079.
πβΌ π΅(100,0.01)
![Page 35: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/35.jpg)
Example β calculating the risks
It has been decided to sample 100 items at random from each large batch and to reject the batch if more than 2 defectives are found. The acceptable quality level is 1% and the unacceptable quality level is 5%. Find the Producer's and Consumer's risks.
, 5Answer
2. For the Consumerβs Risk: want probability of accepting batch when
π (Accept whenπ=0.05 )=πΏ (0.05 )
ΒΏπ ( π=0 )+π (π=1 )+π (π=2)
ΒΏπΆ0100 0.050Γ0.95100+πΆ1
1000.05Γ0.9599+πΆ2100 0.052Γ0.9598
ΒΏ0.118
πβΌ π΅(100,0.05)
![Page 36: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/36.jpg)
It has been decided to sample 100 items at random from each large batch and to reject the batch if more than 2 defectives are found. The acceptable quality level is 1% and the unacceptable quality level is 5%.
Which of the following would increase the Consumerβs Risk?
1 2 3
55%
18%
27%
1. Increasing the acceptable quality level to 2%
2. Decreasing the unacceptable quality level to 4%
3. Rejecting if more than 1 defectives are found
![Page 37: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/37.jpg)
It has been decided to sample 100 items at random from each large batch and to reject the batch if more than 2 defectives are found. The acceptable quality level is 1% and the unacceptable quality level is 5%.
Which of the following would increase the Consumerβs Risk?
Increasing the acceptable quality level
Decreasing the unacceptable quality level
Rejecting if more than 1 defectives are found
NO β Consumerβs Risk depends on the unacceptable quality level
YES βe.g. then more likely to accept when the defect probability is compared to
NO β more likely to get 1 or more, so less likely to accept batch lower Consumerβs Risk
![Page 38: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/38.jpg)
Two-stage sampling planIdea: test some, reject if clearly bad, accept if clearly good, if not clear investigate further
1. Sample items, number of defectives in the sample
2. Accept batch if , reject if (where )
3. If , sample a further items; let number of defectives in 2nd sample
4. Accept batch if , otherwise reject batch.
Advantage: can require fewer samples than single-stage plan (for similar )
Distadvantage: more complicated, need to choose
![Page 39: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/39.jpg)
Example
A two-stage sampling plan for a quality control procedure is as follows: Sample 75 items, accept if less than 2 defectives, reject if more than 3 defectives;otherwise sample 120 more and reject if more than 4 defectives in the new batch
Find the probability that a batch is rejected under this plan if the probability of any particular item being faulty is 2.
Answer
Let be number faulty in the first batch, be number faulty in second batch (if taken)
π 1β€1
2,3
π 1>3
Accept
Reject
π 2β€ 4
π 2>4
Accept
Reject
![Page 40: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/40.jpg)
π (reject )=π (rejectβfirst stage )+π (π 1=2,3 )π (rejectβsecond stage)
ΒΏπ (π 1>3 )+[π (π 1=2 )+π ( π1=3 ) ]π (π 2>4)
ΒΏ1ββπ=0
3
π (π 1=π)+[π (π 1=2 )+π (π 1=3 ) ][1ββπ=0
4
π (π 2=π)]
-+β¦
099
defectives out of 75 defectives out of 120 more
π 1β€1
2,3
π 1>3
Accept
Reject
π 2β€ 4
π 2>4
Accept
Reject
![Page 41: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/41.jpg)
Example (as before)
In planning an acceptance sampling scheme, the Producer and Consumer have agreed that the acceptable quality level is 2% defectives and the unacceptable level is 6%. Each is prepared to take a 10% risk. What sample size is required and under what circumstances should the batch be rejected?
Answer: single stage plan
,
π=153 ,π=5Sample 153 items and reject if the number of defective items is greater than 5.
Alternative answer: two-stage plan, as last example
π1=75 ,π2=120 ,π1=1 , c2=3 , c3=4
- Always take 153 samples
- Sometimes takes only 75 samples, sometimes 120+75=195
- Mean number: (depending on ); more efficient!
![Page 42: Stats for Engineers Lecture 10](https://reader036.fdocuments.net/reader036/viewer/2022062501/568168f7550346895de00832/html5/thumbnails/42.jpg)
Two-stage plan can have very similar OC curve, but require fewer samples
BUT: - not obvious how to choose ; example not optimal
Variation: better to include first sample with second sample for final decision
- less parallelizable (e.g. might care if testing is cheap but takes a long time)