asg1soln

5
SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors: Histogram of Pull Strengths (lb) 0 5 10 15 20 25 59 61 63 65 67 69 71 73 75 Frequency (a) The histogram is one-peaked, bell-shaped, and approximately symmetric. Given the relatively small spread, there is one observation (between 74 and 75) lying far above the main body of the data. This observation may be considered an outlier. We will verify in Question 2 that indeed, the single observation is an outlier in a formal sense. The tails of the distribution are relatively short. (b) The center of the distribution is at approximately 65 pounds. As the distribution is approximately symmetric, we expect that the values of mean and the median are very similar, and close to 65. (c) If all 100 PST values were overestimated by approximately the same small positive value due to a poorly calibrated measuring device, the shape of the histogram would be approximately the same as the histogram for the overestimated values. However, the center (peak) of the histogram would be shifted to the left by the difference between the overestimated values and the accurate values. The mean and the median would also be shifted by the difference to the left but standard deviation and the interquartile range would not be affected (would be the same as the values obtained for the overestimated PST values. Question 2 (a) The summary statistics for the pull strengths obtained with the Descriptive Statistics tool are displayed below: Summary Statistics Mean 64.859 Standard Error 0.29214323 Median 64.45 Mode 64.3 Standard Deviation 2.921432297 Sample Variance 8.534766667 1

description

stat 235 lab solution

Transcript of asg1soln

SOLUTIONS TO THE LAB 1 ASSIGNMENT Question 1 Excel produces the following histogram of pull strengths for the 100 resistors:

Histogram of Pull Strengths (lb)

0

5

10

15

20

25

59 61 63 65 67 69 71 73 75

Freq

uenc

y

(a) The histogram is one-peaked, bell-shaped, and approximately symmetric. Given the relatively

small spread, there is one observation (between 74 and 75) lying far above the main body of the data. This observation may be considered an outlier. We will verify in Question 2 that indeed, the single observation is an outlier in a formal sense. The tails of the distribution are relatively short.

(b) The center of the distribution is at approximately 65 pounds. As the distribution is approximately

symmetric, we expect that the values of mean and the median are very similar, and close to 65. (c) If all 100 PST values were overestimated by approximately the same small positive value due to a

poorly calibrated measuring device, the shape of the histogram would be approximately the same as the histogram for the overestimated values. However, the center (peak) of the histogram would be shifted to the left by the difference between the overestimated values and the accurate values. The mean and the median would also be shifted by the difference to the left but standard deviation and the interquartile range would not be affected (would be the same as the values obtained for the overestimated PST values.

Question 2 (a) The summary statistics for the pull strengths obtained with the Descriptive Statistics tool are

displayed below:

Summary Statistics

Mean 64.859Standard Error 0.29214323Median 64.45Mode 64.3Standard Deviation 2.921432297Sample Variance 8.534766667

1

Kurtosis 0.566577167Skewness 0.282186648Range 16.3Minimum 58.2Maximum 74.5Sum 6485.9Count 100

(b) The Paste Function feature applied to our data returns the following values of the first quartile, the

third quartile, and the interquartile range: First Quartile Q1 = 63.175 Third Quartile Q3 = 66.800 Interquartile range = 3.625 (c) As the distribution of pull strengths is approximately symmetric, the mean and standard deviation

are appropriate measures of center and variation. The median and the interquartile range are used for skewed distributions.

Question 3 According to the 1.5*IQR criterion, an outlier is any data point that lies below Q1-1.5*IQR or above the value Q3+1.5*IQR. Taking into account the values of the lower and upper quartiles, and the interquartile range obtained in Question 2, an outlier lies below 57.7375 and above 72.2375. There is only one observation that satisfies the condition, the value of 74.5 - the largest observation in the data set. The outlier 74.5 lies far above the main body of the data. Thus we expect that the mean and the standard deviation of the remaining 99 observations would decrease. We do not expect a significant change in the value of the median. The summary statistics for the data without the outlier are displayed below:

Summary Statistics (Outlier Removed)

Mean 64.76161616Standard Error 0.278230661Median 64.4Mode 64.3Standard Deviation 2.768360123Sample Variance 7.66381777Kurtosis -0.109386988Skewness 0.002956345Range 13.4Minimum 58.2Maximum 71.6Sum 6411.4Count 99 The table confirms the conclusions we have reached before.

2

Question 4 In order to convert all 100 PST measurements to kilograms, it is necessary to multiple each value in the column PST by 0.454. As a consequence, the new mean and the new median can be also obtained by multiplying the value of the mean and the median for the measurements expressed in pounds by 0.454. Moreover, given the formula for the standard deviation and the above, the new standard deviation can be obtained from the standard deviation for the original data by multiplying it by 0.454. Also the interquartile range for the data in kilograms is equal to the interquartile range for data on the original scale of measurement multiplied by 0.454. The histogram for the data expressed in kilograms will have the same shape as the histogram obtained in Question 1. The peak of the new histogram will be approximately at 65*0.454 = 29.51. Question 5 In order to answer the question whether the new ozone-friendly cleaning process produces similarly strong or stronger solder-joints, on the average, we look at the summary statistics for the distribution. The mean of the pull strengths obtained is 64.761616, and it is almost identical to the mean of pull strengths for the old technology (64.8). The small difference is due to sampling variability. Thus the new technology produces solder-joints of similar strength, on the average. Now we compare the variability of the two processes. The standard deviation for the old technology is 2.25 lb. This value is smaller than the value of 2.7683 lb obtained in Question 3 (after excluding the outlier). Given the large sample size that the new standard deviation is based on (99), it is safe to conclude that the new process results in slightly higher variability than the old process. More advanced statistical methods are required to determine whether the difference is statistically significant. The new process can be examined thoroughly to determine whether some sources of extra variation can be eliminated. Question 6 The histogram of electrical resistance for the 100 boards is displayed below:

Histogram of Electrical Resistances

0

5

10

15

20

25

0.2 0.6 1 1.4 1.8 2.2 2.6 3 More

Freq

uenc

y

The histogram is one-peaked, and skewed to the right. Most of the observations lie between 0 and 1, but there are several observations o

(a) utside the range. The right tail is longer than the left tail of

the distribution. There is one outlier.

3

(b) As the distribution is skewed, median and interquartile range are appropriate measures of center and spread, respectively.

resistance (RES) versus pull strength (PST) displays the relationship between e two variables. It allows you to assess the type of relationship (linear, nonlinear), direction (positive,

ve

(a) The scatterplot for the data is displayed below:

Question 7 The scatterplot of electricalthnegati ), and its strength.

Scatterplot of RES vs. PST

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

55 60 65 70 75

Pull Strength (in pounds)

Elec

trica

l Res

ista

nce

(in te

raoh

ms)

There is no clear pattern in the plot. It seems that the points in the plot are randomly scattered. However, it is worthy to notice a s

(b)

ubstantial difference in the variation of pull strength values for low electrical resistance values relative to that one for the high electrical resistance values. There are no obvious outliers in the plot.

4

LAB 1 ASSIGNMENT MARKING SCHEMA

Proper Header and appearance: 10 points

1.

) (c) Histogram of accurate measurements: 2 points

andard deviation and IQR of accurate values: 2 points

2.

ian, standard deviation, IQR): 4 points (b) irst Quartile, Third Quartile, IQR: 3 points

3. r range for outliers: 2 points

entifying the outlier: 2 points

4. ffect of expressing the PST values in kilograms on summaries: 2 points stogram: 2 points

5. omparing the average strength of resistors: 2 points sses: 2 points

6.

) Analysis of the shape of the histogram: 3 points ce and the spread: 2 points

7.

: 3 points Outliers: 1 point

(b) catterplot: 6 points

TOTAL = 70

Correctly formatted histogram: 6 points. (a) Analysis of the shape of the histogram: 3 points (b Center (estimates of the mean and the median): 2 points

Mean, Median, st

Summary Statistics:

(a) Descriptive Statistics output (mean, medF

(c) Discussion of appropriateness: 2 points

Determining the lower and uppe IdEffect of removing the outlier on some summary statistics: 3 points EEffect of expressing the PST values in kilograms on hi CComparing the variability of the two proce Correctly formatted histogram: 6 points. (a(b) Numerical measures to describe typical resistan

Relationship between pull strengths and resistance

(a) Discussion of the pattern in the scatterplot

Correctly formatted s

5