Ch26 Answers

download Ch26 Answers

of 25

Transcript of Ch26 Answers

  • 7/28/2019 Ch26 Answers

    1/25

    9/27/2006

    26 Answers

    Mix and Match

    1. i2. c3. a4. g5. b6. e7. j8. f9. d10. hTrue/False

    11.TrueIts only possible confounding. The lurking variable must also be related to theresponse.

    12.FalseIts another name for regression to patch up for the absence of randomization, but its

    not the same thing.13.True14.True15.True16.True17.False

    The purpose of an interaction is to allow the slopes to differ. Without an interaction,the slopes match.

    18.True

    19.FalseWed have to do this for every conceivable lurking factor, and weve not measuredthem all. Confounding is always possible without randomization.

    20.True21.False

    It is helpful if the sizes of the two groups are similar, but not assumed by the model.

  • 7/28/2019 Ch26 Answers

    2/25

    9/27/2006 26 Answers

    A26-2

    22.FalseUse comparison boxplots of the residuals grouped by the categorical variable.

    Think About It

    23.Is this data from a randomized experiment? If not, do we know that the sales agentssell comparable products that produce similar revenue streams? Do we know thecosts for the agents in the two groups are comparable, with similar supportingbudgets, such as comparable levels of advertising and internal staff support?Without such balance, there are many sources of confounding that could explain thedifferences that we see in the figure. The lurking factor might also explain the slightdifference in variation that we see in the summary.

    24.The relevant lurking factor that ought to come to mind is inflation in the cost ofproducts bought by this firm. If the prices of these purchase have risen 10% over theyear, then this should be taken into account. Similarly, has the nature of thebusiness changed over this time period. Perhaps the invoices in the 2006 year are formore expensive types of purchases or in larger quantity than those bought in 2005.

    25.We combine them in order to compare the intercepts and compare the slopes. Themultiple regression that combines them include one coefficient that is the differencein the intercepts and another that is the difference between the slopes. These bothcome with standard errors, and hence allow us to test whether the observeddifferences (which are the same with either approach) are statistically significant.

    26.The assumption of equation error variance in the two SRMs. You can have the SRMwork in each subset, but with different error variances. When combined, the

    difference in error variances violates the similar variances condition of the MRM. Tocheck this condition, we should look at the residuals grouped by the dummyvariable.

    27.In general, one should always try an interaction unless you have strong reason toknow that the slopes are parallel in the two groups. In this context, it seems clearthat the model needs an interaction. Union labor in auto plants make more thannonunion labor, and the slope is the estimate for the cost per hour of labor. Wedexpect it to be higher in the union shop.

    28.The intercept is the start-up time and the slope is the time per unit (fixed andvariable costs, respectively). If the robots function as before, then the main change

    will be the reduction in the intercept (smaller fixed costs). One might also expectless variation after the change; it might have been the case that start-up times variedwidely seeing that they ran on for 20 hours. The slopes should be about the same,though wed still check for an interaction.

    29.a) The intercept is the mean salary for Group=0, namely the women ($140,467). Theslope is the difference in salaries, with men marginally making $3,644 more thanwomen overall (i.e., ignoring the effect of managerial grade level).b) These match (almost). The slope in the simple regression is the difference inmean salaries, so regression assigns this estimate almost same level of significance

  • 7/28/2019 Ch26 Answers

    3/25

    9/27/2006 26 Answers

    A26-3

    found in the two-sample t-test.c) That the variances in the two groups are the same. The regression approach iscomparable to a two-sample t-test that requires equal variances. The t-testintroduced in Chapter 18 does not require this assumption.

    30.a) The intercept is the average number of mailings (about 30) for companies thatwere not aware. The slope is the difference on average between those that were notaware and those that are (12 more for those that are aware).b) On average, those that were aware sent statistically significant more packages inthe ntext month.c) The difference in variation could be due to a lurking variable. The larger variationcould be due to the role of the hours variable. If a wider range of hours were givento those that are aware, then this could explain the visible differences in variation.That is, a lurking variable could spread out those that were aware more than thosethat were not.

    31.a) About 2. Focus on the green points. At x = 0, the average seems to be about 0. Atx = 4, the average of these is near 8. 8/4 = 2. A similar calculation applies to the redpoints and gives a similar slope near 2.b) The slope will be much flatter, closer to zero. It seems like it might be positive,but will be considerably less than 2.

    32.a) The intercept is set-up time, time to configure the robots used in the assembly.b) The slope is the minutes per item produced, basically the rate at which theprocess churns out items once started.c) Need the interaction. The two fits evidently cross-over near about 40 to 50 units.

    33.a) Yes, the fits appear parallel because the coefficient of the interaction (D * x) whichmeasures the difference in the slopes of the two groups is not statistically significant

    (its t-statistic is within 1 of zero).b) Remove the interaction term to reduce the collinearity and force the slopes to beprecisely parallel.

    34.a) The slope for D tells you the difference in the fits of the two equations when Units= 0. In the context of this problem, the slope 52.82 means that the green employees(those with training) take about 50 minutes longer to get the production line set upfor producing units. Once they get it set up, however, they are more efficient withsmaller time costs for additional units.b) We need to find the point at which the two regression lines cross (they are notparallel in this example). As you can see in the following figure that shows the fits,

    the two lines cross near 45 units:

  • 7/28/2019 Ch26 Answers

    4/25

    9/27/2006 26 Answers

    A26-4

    50

    100

    150

    200

    250

    Minutes

    20 30 40 50 60 70 80 90 100 110

    Units

    Thats probably good enough in practice, but if we want to be thorough, we need tofind the units such that

    26.783 + 2.062 units = (26.783 + 52.816) + (2.062 -1.277) unitsSince the baseline terms are common to both sides, we can drop these and solve forunits in this equation:

    0 = 52.816 -1.277 units units = 52.816/1.277 41.36

    You Do It

    35.Emerald diamonds(a) In order to be a confounding variable, the weight has to be related to the price(we know this is true from previous study of these data, and common sense) and theweight has to be related to the group indicator. That is, diamonds of one claritygrade have to have different weights than those of the other. If the two groups have

    comparable weights, then the effect of weight is balanced between the two. A two-sample comparison of weight by clarity shows that the average weight is almost thesame in the two groups. Weight is unlikely to be a confounding effect in thisanalysis.

    Level Number Mean Std DevVS1 90 0.413556 0.054408VVS1 54 0.408148 0.053661

    (b) The two-sample t-test finds a statistically significant difference, with VVS1costing on average about $110 more than VS1 diamonds.

    VS1-VVS1, allowing unequal variances

    Difference -112.30t Ratio -2.88504Std Err Dif 38.93DF 103.4548Upper CL Dif -35.11Prob > |t| 0.0048Lower CL Dif -189.50Prob > t 0.9976Confidence 0.95Prob < t 0.0024

  • 7/28/2019 Ch26 Answers

    5/25

    9/27/2006 26 Answers

    A26-5

    (c) Because the interaction is not statistically significant, well remove it and refit themodel without this term. Evidently, the cost of either type of diamond rise at the samerate with weight.

    Term Estimate Std Error t Ratio Prob>|t|Intercept -52.53705 131.9049 -0.40 0.6910

    Weight (carats) 2863.4963 316.2582 9.05 |t|Intercept -20.44887 105.1595 -0.19 0.8461Weight (carats) 2785.9054 250.9129 11.10

  • 7/28/2019 Ch26 Answers

    6/25

    9/27/2006 26 Answers

    A26-6

    36.Convenience shopping(a) Previous analysis of this data has shown that volume of gasoline is related tosales in the convenience store. So, volume meets one of the conditions for aconfounding variable: its related to the response. To be a confounder, it also has todiffer between the two groups. In this data, the following summary shows that Site

    1 is busier. Gasoline sales (traffic) confounds the comparison of the sales. Site 1 sellsabout 20% more gasoline, a statistically significant amount.Site 1-Site 2, allowing unequal variances

    Difference 659.251t Ratio 13.81833Std Err Dif 47.708DF 506.2273Upper CL Dif 752.982Prob > |t| 0.0000Lower CL Dif 565.520Prob > t 0.0000Confidence 0.95Prob < t 1.0000

    (b) A two-sample t-test finds a statistically significant difference in sales, with Site 1selling on average about $700 more than Site 2.

    Site 1-Site 2, allowing unequal variancesDifference 727.208 t Ratio 28.71723Std Err Dif 25.323 DF 551.4839Upper CL Dif 776.949 Prob > |t| 0.0000Lower CL Dif 677.466 Prob > t 0.0000Confidence 0.95 Prob < t 1.0000

    (c) The initial analysis finds no statistically significant interaction, so well removethis term and refit the model. Evidently, gasoline sales produce comparable sales inboth convenience stores.

    Term Estimate Std Error t Ratio Prob>|t|Intercept 688.06922 84.96066 8.10

  • 7/28/2019 Ch26 Answers

    7/25

    9/27/2006 26 Answers

    A26-7

    The estimated range from the multiple regression is shorter because the regressionremoves the variation from the response due to variation in gasoline sales. Thebigger difference, however, is the shift of about $200. When adjusted fordifferences in traffic volume, Site 1 is still doing better, but not so much as suggestedby the initial comparison.

    (e) The model meets the similar variances condition. In this example, we canidentify both groups in the plot of the residuals on the fitted values. Color-codingmakes the boxplots unnecessary in this case, but it would probably be best to doboth.

    0

    1000

    Sales(Dollars)Residual

    1000 2000 3000

    Sales (Dollars) Predicted

    (f) Yes. By pooling, the slope is inflated, making it look as though gasoline saleshave a bigger impact on sales in the convenience store. This simple regressionsuggests that each gallon of gas sold generates $0.51 in convenience store sales. Infact, the slope at either location is only $0.31/gallon.

    1000

    2000

    3000

    Sales(Dollars)

    1000 2000 3000 4000 5000

    Volume (Gallons)

    Term Estimate Std Error t Ratio Prob>|t|Intercept 310.44032 65.31151 4.75

  • 7/28/2019 Ch26 Answers

    8/25

    9/27/2006 26 Answers

    A26-8

    Level Number Mean Std DevNP 40 56.9500 25.7014

    b) The two-sample t-test finds a very statistically significant difference in theperformance of the software from the two vendors. On average, the softwarelabeled MS transfers files in about 5.5 fewer seconds. (The variance is

    substantially larger for the files sent using the NP software.)MS-NP, allowing unequal variances

    Difference -5.5350 t Ratio -2.52682Std Err Dif 2.1905 DF 58.79005Upper CL Dif -1.1515 Prob > |t| 0.0142Lower CL Dif -9.9185 Prob > t 0.9929Confidence 0.95 Prob < t 0.0071

    c) The interaction in the model is statistically significant, meaning that the two typesof software have different rates of transfer (different MB per second).

    R2 0.752229

    se 5.138168n 80

    Term Estimate Std Error t Ratio Prob>|t|Intercept 4.8929786 1.995934 2.45 0.0165File Size (MB) 0.4037229 0.032012 12.61

  • 7/28/2019 Ch26 Answers

    9/25

    9/27/2006 26 Answers

    A26-9

    d) The two-sample comparison finds an average difference of 5.5 seconds (range 1 to10 seconds), with MS transferring files faster. The analysis of covariance alsoidentifies MS as faster, but shows that the gap becomes progressively wider as thefile size increases. NP transfers files (once started) at a rate of about 0.4 sec/MBcompared to 0.4 sec/MB for MS. The mean of the two-sample comparison is an

    average gap ignoring the size of the files.e) No. You can see hints of a problem in the color-coded plot of residuals on fittedvalues (with MS shown in red). Similarly, the boxplots of residuals show differentvariances.

    -15

    -10

    -5

    0

    5

    10

    Transfe

    rTime(sec)Residual

    10 20 30 40 50

    Transfer Time (sec) Predicted

    Res

    idualTransferTime(sec)

    -15

    -10

    -5

    0

    5

    10

    MS NP

    Vendor

    38.Production costsa) Material costs could be a confounding variable because it is related to the averagecost per unit. The material costs per unit are, however, very similar in the twoplants. Hence, materials costs per unit is not going to confound the comparisonusing the two-sample test.

    NEW-OLD, allowing unequal variancesDifference -0.22241t Ratio -1.28719Std Err Dif 0.17279DF 156.2608Upper CL Dif 0.11889Prob > |t| 0.1999Lower CL Dif -0.56372Prob > t 0.9000Confidence 0.95Prob < t 0.1000

    b) The average cost per unit is slightly lower per unit in the new plant by $1.10, butthe different found by the two-sample test is not statistically significant.

    NEW-OLD, allowing unequal variancesDifference -1.1133t Ratio -0.877

    Std Err Dif 1.2694DF 166.1142Upper CL Dif 1.3930Prob > |t| 0.3818Lower CL Dif -3.6195Prob > t 0.8091Confidence 0.95Prob < t 0.1909

    c) Neither the interaction nor dummy variable are statistically significant in themodel with both.

    Term Estimate Std Error t Ratio Prob>|t|Intercept 32.83758 1.650956 19.89

  • 7/28/2019 Ch26 Answers

    10/25

    9/27/2006 26 Answers

    A26-10

    Term Estimate Std Error t Ratio Prob>|t|Plant Dummy 1.1116877 2.776285 0.40 0.6893Dummy * Mat Cost/Unit -0.716403 1.094844 -0.65 0.5137

    After removing the interaction term, the effect of the plant dummy alone remainsnot statistically significant. The final model is just the original simple regression.

    Term Estimate Std Error t Ratio Prob>|t|Intercept 33.372891 1.431873 23.31

  • 7/28/2019 Ch26 Answers

    11/25

    9/27/2006 26 Answers

    A26-11

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    P

    rice/Sq

    Ft

    .0003 .0005 .0007 .0009 .0011

    1/Sq Ft

    b) The model requires both a dummy variable and interaction.

    R2 0.762904se 0.037308n 36

    Term Estimate Std Error t Ratio Prob>|t|Intercept 0.155721 0.019713 7.90

  • 7/28/2019 Ch26 Answers

    12/25

    9/27/2006 26 Answers

    A26-12

    inappropriate in this analysis. We need to have separate estimates of the variance offor the two realtors. We can interpret the fit, but not use the tools for inference.

    40.Leasesa) The locations are very distinct, with those in the city (shown as red dots) costingmore than those in the suburbs (green crosses).

    12

    14

    16

    18

    20

    22

    24

    26

    CostperSq

    Foot

    0 .0001 .0003 .0005 .0007 .0009

    1/Sq Feet

    b) The fitted model uses both the dummy variable and interaction. Both appearstatistically significant, though we need to check the conditions before going furtherwith inference.

    R2 0.615452se 1.092205n 223

    Term Estimate Std Error t Ratio Prob>|t|Intercept 15.817545 0.117467 134.66

  • 7/28/2019 Ch26 Answers

    13/25

    9/27/2006 26 Answers

    A26-13

    -3

    -2

    -10

    1

    2

    3

    4

    5

    CostperS

    q

    FootResidual

    15 16 17 18 19 20 21 22 23

    Cost per Sq Foot Predicted

    ResidualCostperSq

    Foot

    -3

    -2

    -1

    0

    1

    2

    3

    4

    5

    City Suburbs

    Location

    d) The baseline model (for the suburbs coded by 0 in the dummy variable), thevariable costs are about $15.82 per square foot with about $1900 in fixed costs. Forthe city, the variable costs are higher by about $1.54 with higher fixed costs (about$5150 more).

    e) Yes, because this model meets the conditions for the MRM, we can buildconfidence intervals and tests. For example, we can estimate that the premium forlocating in the city costs roughly

    1.5369 - 2 * 0.1874, 1.5369 + 2 * 0.1874 $1.16 to $1.91 per square footmore than a comparable location in the suburbs.

    41.R&D expenses(a) The two look very similar with the colors evenly mixed. A simple regression toboth years seems reasonable.

    -6

    -4

    -2

    0

    2

    4

    6

    8

    Log

    R&DE

    xpense

    0 10

    Log Assets

    R2 0.807597

    se 0.896963n 985

    Term Estimate Std Error t Ratio Prob>|t|Intercept -1.192587 0.062477 -19.09

  • 7/28/2019 Ch26 Answers

    14/25

    9/27/2006 26 Answers

    A26-14

    problem. As you can tell from the normal quantile plot, the combined data are notnearly normal, but since we are working with the slopes (which are averages) wecan continue on thanks to the CLT. Theres a more serious problem, however, notseen in these plots: do you really think that the two data values from AMD or Intel,for example, are independent of each other? Or, does it seem more likely that the

    data are dependent. Were voting for dependent, calling into question any notion ofusing the usual formulas for standard errors.

    -4

    -3

    -2

    -1

    0

    1

    2

    3

    Lo

    g

    R&DE

    xpense

    Residual

    -6 -4 -2-1 0 1 2 3 4 5 6 7 8 9

    Log R&D Expense Predicted

    ResidualLog

    R&DE

    xpense

    -4

    -3

    -2

    -1

    0

    1

    2

    2003 2004

    Year

    -4

    -3

    -2

    -1

    0

    1

    2

    100 200 300

    Count

    .001.01.05.10.25.50.75.90.95.99.999

    -4 -3 -2 -1 0 1 2 3 4

    Normal Quantile Plot

    (c) Heres the summary of the multiple regression. Neither added variable isstatistically significant and the R2 has hardly moved from the simple regression.

    R2 0.8077se 0.897636

    n 985

    Term Estimate Std Error t Ratio Prob>|t|Intercept -1.184021 0.091473 -12.94

  • 7/28/2019 Ch26 Answers

    15/25

    9/27/2006 26 Answers

    A26-15

    The incremental F test that measures the change in R2 that comes with adding twoexplanatory variables is

    F =(0.8077 - 0.807597)/(1-0.8077) * (985-1-3)/2 0.26which is not statistically significant. This agrees with the visual impressionconveyed by the original scatterplot: the relationship appears to be the same in both

    years.(d) Overall, a common regression model captures the relationship. The elasticity ofR&D expenses with respect to assets is about 0.8: on average each 1% increase isassets comes with a 0.8% increase in R&D expenses. I have serious questions,however, about the independence of the residuals in the two years, since I have apair of measurements on each company. Its hard to think of these as independent.

    42.Carsa) The color-coded scatterplot shows that cars from European companies (red dots)test to be more expensive, given their HP, than cars from domestic manufacturers(green crosses).

    4

    4.1

    4.2

    4.3

    4.4

    4.5

    4.6

    4.7

    4.8

    4.95

    Log

    10

    Price

    2.1 2.2 2.3 2.4 2.5 2.6

    Log 10 HP

    As a result, the shown simple regression splits the difference between the two,compromising the slope and intercept to blend the two into one.

    R2 0.767004se 0.104323n 132

    Term Estimate Std Error t Ratio Prob>|t|Intercept 1.2125378 0.157681 7.69

  • 7/28/2019 Ch26 Answers

    16/25

    9/27/2006 26 Answers

    A26-16

    Term Estimate Std Error t Ratio Prob>|t|Import Dummy -0.271731 0.245108 -1.11 0.2697Import * Log HP 0.1777922 0.105291 1.69 0.0937

    The initial scatterplot of the data seems straight enough in both groups, and the plotof the residuals on fitted values suggests no problems, though perhaps a slight

    increase in variation as the car prices increase. The boxplots indicate similarvariances, and the normal quantile plot confirms that the data are nearly normal(albeit with outliers such as the exotic Panoz on the high end of the scale and thecheap for its power Ford Cobra on the low side.)

    -0.2

    -0.1

    0.0

    0.1

    0.2

    0.3

    Log

    10

    Price

    Residual

    4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0

    Log 10 Price Predicted

    Res

    idualLog1

    0P

    rice

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    Europe US

    Location

    -0.2

    -0.1

    0

    0.1

    0.2

    0.3

    10 20 30

    Count

    .01 .05.10 .25 .50 .75 .90.95 .99

    -3 -2 -1 0 1 2 3

    Normal Quantile Plot

    (c) The incremental F-test gives the valueF = (0.869241 - 0.767004)/(1-0.869241) * (132 - 1 - 3)/2 50 >> 4

    The incremental F shows that an increase in R2 from 77% to 87% by the addition of

    two explanatory variables is highly statistically significant.(d) The estimates of the MRM show that neither coefficient is statistically significanttaken separately. Thats collinearity at work! The VIFs for these estimates are largerthan 300! Because of the collinearity, neither one appears statistically significanttaken individually. As a pair, however, the combination brings a statisticallysignificant improvement to the fit of the model.

    43.Moviesa) Adult movies (red dots) appear to have consistently higher subsequent sales at

  • 7/28/2019 Ch26 Answers

    17/25

    9/27/2006 26 Answers

    A26-17

    a given box-office gross than family movies. The fits to the two groups looklinear (on this log scale) with a fringe of outliers. A common simple regressionsplits the difference between the two groups. Heres the simple regression.

    -1

    0

    Log

    10

    SubsequentPurchase

    1 2

    Log 10 Gross

    R2 0.648668se 0.253298

    n 224

    Term Estimate Std Error t Ratio Prob>|t|Intercept -1.305742 0.063479 -20.57

  • 7/28/2019 Ch26 Answers

    18/25

    9/27/2006 26 Answers

    A26-18

    -0.8

    -0.6

    -0.4

    -0.2

    0.0

    0.2

    0.4

    0.6

    Log

    10

    Subsequ

    entPurchase

    Residual

    -0.5 .0 .5 1.0

    Log 10 Subsequent Purchase

    Predicted ResidualLog1

    0S

    ub

    sequentPurchase

    -0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    Adult Family

    Audience

    -0.8

    -0.7

    -0.6

    -0.5

    -0.4

    -0.3

    -0.2

    -0.1

    00.1

    0.2

    0.3

    0.4

    0.5

    0.6

    10 20 30 40 50

    Count

    .01 .05.10 .25 .50 .75 .90.95 .99

    -3 -2 -1 0 1 2 3

    Normal Quantile Plot

    (c) The incremental F-test uses the change in R2 to measure the statisticallysignificant of adding the two predictors (dummy and interaction). The test statistic

    isF = (0.75236 - 0.648668)/(1 -0.75236) * (224-1-3)/2 46

    which is very statistically significant. Wed reject H0 that the added variables bothhave slope zero.

    (d) The interaction is highly statistically significant, but the slope for the dummy isnot. Only one predictor seems useful. The F-test reaches a more impressive view ofthe value of adding these two predictors because it is not concerned about thesubstantial collinearity between them. The VIFs for these explanatory variables arealmost 15, reducing the size of the shown t-statistic for each by about 4.

    (e) The estimates show that as the box-office gross increases, movies intended foradult audiences sell statistically significantly better. Each 1% increase in the box-office gross for an adult movie fetches about 0.74% increase in after-market sales.For adult movies, the elasticity jumps to about 0.74+0.24 = 0.98%. As the success atthe box office grows, the gap opens up with adult movies doing better.

    44.Hiring(a) The black and white view shows a cluster of points separated from the mainbody of the data. These all joined an existing office (red). The simple regression

  • 7/28/2019 Ch26 Answers

    19/25

    9/27/2006 26 Answers

    A26-19

    fit to all of the data has a smaller slope than seems appropriate to employess in anew office (green).

    6

    7

    8

    9

    10

    11

    12

    LogP

    rofit

    0 1 2 3 4 5 6 7

    Log Accounts

    R2 0.176184se 0.717014

    n 464

    Term Estimate Std Error t Ratio Prob>|t|Intercept 8.9444533 0.100374 89.11

  • 7/28/2019 Ch26 Answers

    20/25

    9/27/2006 26 Answers

    A26-20

    -4

    -3

    -2

    -1

    0

    1

    2

    Log

    ProfitResidual

    8 9 10 11

    Log Profit Predicted

    Re

    sidualLog

    Profit

    -4

    -3

    -2

    -1

    0

    1

    2

    Existing New

    Office

    -4

    -3

    -2

    -1

    0

    1

    2

    50 100 150

    Count

    .01 .05.10 .25 .50 .75 .90.95 .99

    -3 -2 -1 0 1 2 3

    Normal Quantile Plot

    (c) The incremental F-test judges the change in R2 to be statistically significant, asyou would guess since both estimates are statistically significant by wide marginsand the sample size is rather large (n = 464)

    F = (0.29671 - 0.176184)/(1-0.29671) * (464-1-3)/2 39.4

    (d) These agree strongly in this example. Part of the reason for the agreement is that

    both the slope and intercept differ in the two groups. Also, theres less collinearitythan in many cases (such as the other exercises). The VIFs are about 10 large, butnot devastating.

    (e) The following plot shows the fits implied by the multiple regression. Thestatistically significant interaction suggests that a one approach for all placementprocedure is not going to be the best solution. Hires that are able to generate lots ofnew accounts appear to do much better in new offices. Hires that do not open somany accounts appear more suited to starting work in an existing office. The cross-over point in the two fits occurs where log of accounts is approximately (ratio ofcoefficient of dummy to the interaction)

    0.897/0.433 2.07or about exp(2.07) 7.9, or say 8 accounts.

  • 7/28/2019 Ch26 Answers

    21/25

    9/27/2006 26 Answers

    A26-21

    6

    7

    8

    9

    10

    11

    12

    LogP

    rofit

    0 1 2 3 4 5 6 7

    Log Accounts

    45.Promotion(a) A simple regression that combines the data from both locations makes a seriousmistake, one that vastly overstates the effect/benefit of detailing. By fitting one lineto both groups, rather than within each, the higher sales in Boston (red dots) inflatethe slope.

    0.10

    0.12

    0.14

    0.16

    0.18

    0.20

    0.22

    0.24

    MarketShare

    .02 .04 .06 .08 .10 .12 .14

    Detail Voice

    R2 0.310936se 0.038406n 78

    Term Estimate Std Error t Ratio Prob>|t|Intercept 0.0917039 0.015423 5.95

  • 7/28/2019 Ch26 Answers

    22/25

    9/27/2006 26 Answers

    A26-22

    se 0.007923n 78

    Term Estimate Std Error t Ratio Prob>|t|Intercept 0.1212103 0.004774 25.39

  • 7/28/2019 Ch26 Answers

    23/25

    9/27/2006 26 Answers

    A26-23

    which rounds to 0.06 to 0.24. Rather than get a 1% gain in market share with each1% increase in detailing voice, the model estimates a far smaller return on thispromotion. By ignoring the effects of the two groups, the analyst inflated the effectof promotion.

    46.iTunes(a) The scatterplot makes it clear that you need to distinguish the formats. Theres aclear interaction, with the AAC files (red dots) occupying much less space than theAIFF files (green +) for a given time duration. Makes you wonder why anyonewould prefer AIFF format unless it sounds a lot better.

    0

    1020

    30

    50

    60

    70

    80

    90

    110

    120

    130

    Megabytes(MB)

    0 100 200 300 400 500 600 700 800 900

    Time (seconds)

    (b) The estimated model with dummy variable (1 for AIFF and 0 for AAC) is darnnear perfect, with an R2 thats about off the charts The only error appears to bewhen a song does not quite fill the allocated space. These t-statistics are about aslarge as they come unless you have a data set with millions of cases.

    R2 0.999996se 0.043304

    n 596

    Term Estimate Std Error t Ratio Prob>|t|Intercept 0.0110338 0.007804 1.41 0.1579Time (seconds) 0.0154106 0.000026 603.76 0.0000Format Dummy 0.0715807 0.009385 7.63

  • 7/28/2019 Ch26 Answers

    24/25

    9/27/2006 26 Answers

    A26-24

    -0.10

    -0.05

    0.00

    0.05

    0.10

    Megabytes

    (MB)Residual

    0 102030405060708090 110 130

    Megabytes (MB) Predicted

    Residual

    Megabytes(MB)

    -0.1

    -0.05

    0

    0.05

    0.1

    AAC AIFF

    Format

    Because of these differences in variation, the standard errors of the slopes areprobably not precise. Just the same, whatever effect this violation of the similarvariance condition has on the SEs, its not enough to change those t-statistics tomake the estimates of the slopes for the compression rates not statisticallysignificant.

    (c) The estimates show that songs recorded using the AAC format take about0.01541 megabytes per second of recording time. Those recorded using the AIFFformat require about 0.1528 MB additional space per second (more than 10 times thespace used by AAC). Moreover, the fixed space needed by AAC (regardless of thelength of the song) is about 0.011 MB, whereas AIFF requires an additional 0.072 MBto get started.

    (d) Because the errors do not meet the similar variances condition and the fits are sogood that we dont need to borrow strength, lets just fit two separate regression

    lines, one for each format. We already know the fit for AAC, but now we also getthe appropriate se for the errors. We also discover, now that we can get more details,that the data have a slight kink and are skewed, definitely not normal. Noprediction interval for these!

    AAC: Megabytes (MB) = 0.0110338 + 0.0154106 Time (seconds)

    se = 0.02115

    -0.08

    -0.06

    -0.04

    -0.02

    0.00

    0.02

    0.04

    Residual

    0 100 200 300 400 500 600 700 800 900

    Time (seconds)

    -0.08

    -0.07

    -0.06

    -0.05

    -0.04

    -0.03-0.02

    -0.01

    0

    0.01

    0.02

    0.03

    0.04

    10 20 30 40

    Count

    .01 .05.10 .25 .50 .75 .90.95 .99

    -3 -2 -1 0 1 2 3

    Normal Quantile Plot

    For the songs stored in AIFF format, the equation is

    AIFF: Megabytes (MB) = 0.0826145 + 0.1682346 Time (seconds)

  • 7/28/2019 Ch26 Answers

    25/25

    9/27/2006 26 Answers

    se = 0.0484The SD of these residuals is about twice that for the songs coded using the AACprocedure. These residuals seem more typical and symmetric about zero, but thedistribution does not tail off as one would expect for a normal distribution. Theyappear uniformly distributed.

    -0.10

    -0.05

    0.00

    0.05

    0.10

    Residual

    0 100 200 300 400 500 600 700

    Time (seconds)

    0

    10 20 30

    Count

    .01 .05.10 .25 .50 .75 .90.95 .99

    -3 -2 -1 0 1 2 3

    Normal Quantile Plot

    Where does this leave us? We can come within 0.10 MB guaranteed for the AIFFformat song. None of the residuals is larger than that. So, wed say the song wouldtake

    0.0826145 + 0.1682346 * 240 = 40.4589185 0.1 MB

    for the AAC format with about 100% coverage. For the ACC format, we get a muchsmaller estimate, but its not so easy to set the range. Perhaps we might be able touse a range like this:

    0.0110338 + 0.0154106 * 240 = 3.7095778 , which might overestimate by 0.06 orunderestimate by 0.04 about half the size of the interval for the other format.