MATERI GEOSTAT

download MATERI GEOSTAT

of 40

description

geostatistika

Transcript of MATERI GEOSTAT

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    C H A P T E R

    23Investigating the

    relationship betweentwo numerical variables

    ObjectivesTo use scatterplots to display bivariate (numerical) data

    To identify patterns and features of sets of data from scatterplots

    To identify positive, negative or no association between variables from a scatterplot

    To introduce the q-correlation coefficient to measure the strength of the

    relationship between two variables

    To introduce Pearsons product-moment correlation coefficient r to measure the

    strength of the linear relationship between two variables

    To fit a straight line to data by eye, and using the method of least squares

    To interpret the slope of a regression line and its intercept, if appropriate

    To predict the value of the dependent (response) variable from an independent

    (explanatory) variable, using a linear equation

    In Chapter 22 statistics of one variable were discussed. Sometimes values of a variable for

    more than one group have been examined, such as age of mothers and age of fathers, but only

    one variable was considered for each individual at a time.

    When two variables are observed for each subject, bivariate data are obtained. For example,

    it might be interesting to record the number of hours spent studying for an exam by each

    student in a class and the mark they achieved in the exam. If each of these variables were

    considered separately the methods discussed earlier would be used. It may be of more interest

    to examine the relationship between the two variables, in which case new bivariate techniques

    are required. When exploring bivariate data, questions arise such as, Is there a relationship

    between two variables? or Does knowing the value of one of the variables tell us anything

    about the value of the other variable?

    554Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 555

    Consider the relationship between the number of cigarettes smoked per day and blood

    pressure. Since one opinion might be that varying the number of cigarettes smoked may affect

    blood pressure, it is necessary to distinguish between blood pressure, which is called the

    dependent or response variable, and the number of cigarettes, which is called the

    independent or explanatory variable. In this chapter some techniques are introduced which

    enable questions concerning the nature of the relationship between such variables to be

    answered.

    23.1 Displaying bivariate dataAs with data concerning one variable, the most important first step in analysing bivariate data

    is the construction of a visual display. When both of the variables of interest are numerical then

    a scatterplot (or bivariate plot) may be constructed. This is the single most important tool in

    the analysis of such bivariate data, and should always be examined before further analysis is

    undertaken. The pairs of data points are plotted on the cartesian plane, with each pair

    contributing one point to the plot. Using the normal convention, the variable plotted

    horizontally is denoted as x, and the variable plotted vertically as y. The following example

    examines the features of the scatterplot in more detail.

    Example 1

    The number of hours spent studying for an examination by each member of a class, and the

    marks they were awarded, are given in the table.

    Student 1 2 3 4 5 6 7 8 9 10

    Hours 4 36 23 28 25 11 18 13 4 8

    Mark 27 87 67 84 66 52 61 43 38 52

    Student 11 12 13 14 15 16 17 18 19 20

    Hours 4 19 6 19 1 29 33 36 28 15

    Mark 41 54 57 62 23 65 75 83 65 55

    Construct a scatterplot of these data.

    Solution

    The first decision to be made when preparing this scatterplot is whether to show Mark

    or Hours on the horizontal (x) axis. Since a students mark is likely to depend on the

    hours that they spend studying, in this case Hours is the independent variable and

    Mark is the dependent variable. By convention, the independent variable is plotted on

    the horizontal (x) axis, and the dependent variable on the vertical (y) axis, giving the

    scatterplot shown.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    556 Essential Advanced General Mathematics

    Hours x

    0

    20

    20

    40

    40

    60

    80

    10 30

    Mark y

    From this scatterplot, a general trend can be seen of increasing marks with increasing hours of

    study. There is said to be a positive association between the variables.

    Two variables are positively associated when larger values of y are associated with larger

    values of x, as shown in the previous scatterplot.

    Examples of variables which exhibit positive association are height and weight, foot size and

    hand size, and number of people in the family and household expenditure on food.

    Example 2

    The age, in years, of several cars and their advertised price in a newspaper are given in the

    following table.

    Age (years) 4 6 5 7 4 2 3 3

    Price ($) 13 000 9 800 11 000 8 300 10 500 15 800 14 300 13 800

    Age (years) 7 6 4 6 4 8 6

    Price ($) 9 700 9 500 13 200 10 000 11 800 8 000 12 200Construct a scatterplot to display these data.

    Solution

    In this case the independent variable is the age of the car, which is plotted on the

    horizontal axis. The dependent variable, price, is plotted on the vertical axis.

    2 4 6 81 3 5 7Age (years) x

    8000

    10000

    12000

    14000

    16000

    Price ($) y

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 557

    From the scatterplot a general trend of decreasing price with increasing age of car can be seen.

    There is said to be a negative association between the variables.

    Two variables are negatively associated when larger values of y are associated with smaller

    values of x, as shown in the scatterplot above.

    Examples of other variables which exhibit negative association are weight and number of

    weeks spent on a diet program, hearing ability and age, and number of cold rainy days per

    week and sales of ice creams.

    The third alternative is that a scatterplot shows no particular pattern, indicating no

    association between the variables.

    2

    2

    4

    4

    6

    6

    8

    8

    1 3 5 7 x

    y

    0

    There is no association between two variables when the values of y are not related to the

    values of x, as shown in the preceding scatterplot.

    Examples of variables which show no association are height and IQ for adults, price of cars

    and fuel consumption, and size of family and number of pets.

    When one point, or a few points, do not seem to fit with the rest of the data they are called

    outliers. Sometimes a point is an outlier, not because its x value or its y value is in itself

    unusual, but rather because this particular combination of values is atypical. Consequently such

    an outlier cannot always be detected from single variable displays, such as stem-and-leaf plots.

    For example, consider this scatterplot. While the

    variable plotted on the horizontal axis takes values

    from 1 to 8 and the variable plotted on the vertical

    axis takes values from 2 to 8, the combination (2, 8)

    is clearly an outlier.

    2

    2

    4

    4

    6

    6

    8

    8

    1 3 5 7 x

    y

    0

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    558 Essential Advanced General Mathematics

    Using the TI-NspireThe calculator can be used to construct a scatterplot of statistical data. The procedure is

    illustrated using the age and price of car data from Example 2.

    The data is easiest entered in a Lists &Spreadsheet application ( 3).Firstly, use the up/down arrows ( ) toname the first column age and the second

    column price.

    Then enter the data as shown.

    Open a Data & Statistics application (5) to graph the data. At first the data

    displays as shown.

    Specify the x variable by selecting Add XVariable from the Plot Properties (b24) and selecting age.

    Specify the y variable by selecting Add YVariable from the Plot Properties (b26) and selecting price.

    The data now displays as shown.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 559

    Using the Casio ClassPadThe table represents the results of 12 students in two tests.

    Test 1 score 10 18 13 6 8 5 12 15 15

    Test 1 score 12 20 11 9 6 6 12 13 17

    Enter the data into list1 (x) and list2 (y) in the module. Tap SetGraph,Setting . . . and select the tab for Graph3.(Note: Following on from the types of graphs in

    univariate statistics, this allows the Scatterplot

    settings to be remembered and called upon when

    required.)Ensure that all other graphs are de-selected and

    tap to produce the graph shown in the full

    screen.

    Select the graph window (bold border) and tap Analysis, Trace to scroll from point topoint and display the coordinates at the bottom of the graph.

    Exercise 23A

    Note: Save your data for 14 in named lists as they will be needed for later exercises.

    1 The amount of a particular pain relief drug given to each patient and the time taken for the

    patient to experience pain relief are shown.

    Patient 1 2 3 4 5 6 7 8 9 10

    Drug dose (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

    Response time (min) 65 35 15 10 22 16 10 18 70 20

    a Plot the response time against drug dose.

    b From the scatterplot, describe any association between the two variables.

    c Identify outliers, if any, and interpret.

    2 The proprietor of a hairdressing salon recorded the amount spent advertising in the local

    paper and the business income for each month of a year, with the following results.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    560 Essential Advanced General Mathematics

    Month Advertising ($) Business ($)1 350 9 450

    2 450 10 070

    3 400 9 380

    4 500 9 110

    5 250 5 220

    6 150 3 100

    Month Advertising ($) Business ($)7 350 8 060

    8 300 7 030

    9 550 11 500

    10 600 12 870

    11 550 10 560

    12 450 9 850

    a Plot the business income against the advertising expenditure.

    b From the scatterplot, describe any association between the two variables.

    c Identify outliers, if any, and interpret.

    3 The number of passenger seats on the most commonly used commercial aircraft, and the

    airspeeds of these aircraft, in km/h, are shown in the following table.

    Number of seats 405 296 288 258 240 230 193 188

    Airspeed (km/h) 830 797 774 736 757 765 760 718

    Number of seats 148 142 131 122 115 112 103 102

    Airspeed (km/h) 683 666 661 378 605 620 576 603

    a Plot the airspeed against the number of seats.

    b From the scatterplot, describe any association between the two variables.

    c Identify outliers, if any, and interpret.

    4 The price and age of several secondhand caravans are listed in the table.

    Age (years) Price ($)7 4 800

    7 3 900

    8 4 275

    9 3 900

    4 6 900

    8 6 500

    1 11 400

    Age (years) Price ($)10 8 700

    9 1 950

    9 3 300

    11 1 650

    3 9 600

    4 8 400

    7 6 600

    a Plot the price of the caravans against their age.

    b From the scatterplot, describe any association between the two variables.

    c Identify outliers, if any, and interpret.

    23.2 The q-correlation coefficientIf the plot of a bivariate data set shows a basic trend, apart from some randomness, then it is

    useful to provide a numerical measure of the strength of the relationship between the two

    variables. Correlation is a measure of strength of a relationship which applies only to

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 561

    numerical variables. Thus it is sensible, for example, to calculate the correlation between the

    heights and weights for a group of students, but not between height and gender, as gender is

    not a numerical variable. There are many different numerical measures of correlation, and each

    has different properties. In this section the q-correlation coefficient will be introduced.

    Consider the scatterplot of the number of hours spent by each member of a class when

    studying for an examination, and the mark they were awarded, from Example 1. This shows a

    positive association. To calculate the q-correlation coefficient, first find the median value for

    each of the variables separately. This can be done from the data, but it is usually simpler to

    calculate directly from the plot. There are 20 data points, and the median values are halfway

    between the 10th and 11th points, both vertically and horizontally. A vertical line is then drawn

    through the median x value, and a horizontal line through the median y value. The effect of this

    is to divide the plot into four regions, as shown.

    Hours x

    0

    20

    20

    40

    40

    60

    80

    10 30

    Marks y

    Each of the four regions which have been created in this way is called a quadrant, and it can

    be noticed immediately that most of the points in this plot are in the upper right and lower left

    quadrants. In fact, wherever there is a positive association between variables this will be the

    case.

    Consider the scatterplot of the age of cars and the advertised price from Example 2, which

    shows negative association. Again the median value for each of the variables is found

    separately. There are 15 data points, giving the median values at the 8th points, both vertically

    and horizontally. A vertical line is then drawn through the median x value, and a horizontal line

    through the median y value. In this particular case they are coordinates of the same point, but

    this need not be so.

    4 6 81 2 3 5 7Age (years) x

    8000

    10000

    12000

    14000

    16000

    Price ($) y

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    562 Essential Advanced General Mathematics

    It can be seen in this example that most points are in the upper left and the lower right

    quadrants, and this is true whenever there is a negative association between variables.

    These observations lead to a definition of the q-correlation coefficient.

    The q-correlation coefficient can be determined from the scatterplot as follows.

    x

    y

    0

    AB

    C D

    Find the median of all the

    x values in the data set, and

    draw a vertical line through

    this value.

    Find the median of all the

    y values in the data set, and

    draw a horizontal line through

    this value.

    The plane is now divided into four quadrants. Label the quadrants A, B, C and D as shown

    in the diagram.

    Count the number of points in each of the quadrants A, B, C and D. Any points which lie

    on the median lines are omitted.

    Let a, b, c, d represent the number of points in each of the quadrants A, B, C and D

    respectively. Then the q-correlation coefficient is given by

    q = (a + c) (b + d)a + b + c + d

    Example 3

    Use the scatterplot from Example 1 to determine the q-correlation coefficient for the number

    of hours each member of a class spent studying for an examination and the mark they were

    awarded.

    Solution

    There are nine points in quadrant A, one in quadrant B, nine in quadrant C and one in

    quadrant D.

    Thus q = (a + c) (b + d)a + b + c + d

    = (9 + 9) (1 + 1)9 + 1 + 9 + 1

    = 18 220

    = 1620

    = 0.8

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 563

    Example 4

    Use the scatterplot from Example 2 to determine the q-correlation coefficient for the age of

    cars and their advertised price.

    Solution

    There is one point in quadrant A, six in quadrant B, one in quadrant C and six in

    quadrant D.

    q = (a + c) (b + d)a + b + c + d

    = (1 + 1) (6 + 6)1 + 6 + 1 + 6

    = 2 1214

    = 1014

    = 0.71

    From Examples 3 and 4 it can be seen that q-correlation coefficients may take both positive

    and negative values. Consider the situation when all the points are in the quadrants A and C.

    q = (a + c) (b + d)a + b + c + d

    = a + ca + c (since b and d are both equal to zero)

    = 1Thus the maximum value the q-correlation coefficient may take is 1, and this indicates a

    measure of strong positive association.

    Suppose all the points are in the quadrants B and D.

    q = (a + c) (b + d)a + b + c + d

    = (b + d)b + d (since a and c are both equal to zero)

    = 1Thus the minimum value the q-correlation coefficient may take is 1, and this indicates ameasure of strong negative association.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    564 Essential Advanced General Mathematics

    When the same number of points are in each of the quadrants A, B, C and D then:

    q = (a + c) (b + d)a + b + c + d

    = 0a + b + c + d (since a = b = c = d)

    = 0This value of the q-correlation coefficient clearly indicates that no association exists.

    q-correlation coefficients can be classified as follows:

    1 q 0.75 strong negative relationship0.75 < q 0.50 moderate negative relationship0.50 < q 0.25 weak negative relationship0.25 < q < 0.25 no relationship

    0.25 q < 0.50 weak positive relationship0.50 q < 0.75 moderate positive relationship0.75 q 1 strong positive relationship

    Exercise 23B

    1 Use the table of q-correlation coefficients to classify the following.

    a q = 0.20 b q = 0.30 c q = 0.85 d q = 0.33e q = 0.95 f q = 0.75 g q = 0.75 h q = 0.24i q = 1 j q = 0.25 k q = 1 l q = 0.50

    2 Calculate the q-correlation coefficient for each pair of variables shown in the following

    scatterplots.

    a y

    x

    36

    24

    12

    15.0 20.0 25.0 30.0 35.0

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 565

    b

    x

    y

    140

    120 160 200 240 280

    210

    280

    c

    2

    4

    6

    8

    x1 2

    y

    3 4 5 6 7 80

    d y

    x0

    20

    20

    40

    40

    60

    80

    10 30

    3 The amount of a particular pain relief drug given to each patient and the time taken for theExample 4

    patient to experience pain relief are shown.

    Patient 1 2 3 4 5 6 7 8 9 10

    Drug dose (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

    Response time (min) 65 35 15 10 22 16 10 18 70 20

    a Use your scatterplot from 1, Exercise 23A to find the q-correlation coefficient for

    response time and drug dosage.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    566 Essential Advanced General Mathematics

    b Classify the strength and direction of the relationship between response time and drug

    dosage according to the table given.

    4 The proprietor of a hairdressing salon recorded the amount spent advertising in the localExample 3

    paper and the business income for each month of a year, with the following results.

    Month Advertising ($) Business ($)1 350 9 450

    2 450 10 070

    3 400 9 380

    4 500 9 110

    5 250 5 220

    6 150 3 100

    Month Advertising ($) Business ($)7 350 8 060

    8 300 7 030

    9 550 11 500

    10 600 12 870

    11 550 10 560

    12 450 9 850

    a Use your scatterplot from 2, Exercise 23A to find the q-correlation coefficient for

    advertising expenditure and total business conducted.

    b Classify the strength and direction of the relationship between advertising expenditure

    and business income according to the table given.

    5 The number of passenger seats on the most commonly used commercial aircraft, and the

    airspeeds of these aircraft, in km/h, are shown in the following table.

    Number of seats 405 296 288 258 240 230 193 188

    Airspeed (km/h) 830 797 774 736 757 765 760 718

    Number of seats 148 142 131 122 115 112 103 102

    Airspeed (km/h) 683 666 661 378 605 620 576 603

    a Use your scatterplot from 3, Exercise 23A to find the q-correlation coefficient for the

    number of seats on an airline and the air speed.

    b Classify the strength and direction of the relationship between the number of seats on

    an airline and the air speed according to the table given.

    6 The price and age of several secondhand caravans are listed in the table.

    Age (years) Price ($)7 4 800

    7 3 900

    8 4 275

    9 3 900

    4 6 900

    8 6 500

    1 11 400

    Age (years) Price ($)10 8 700

    9 1 950

    9 3 300

    11 1 650

    3 9 600

    4 8 400

    7 6 600

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 567

    a Use your scatterplot from 4, Exercise 23A to find the q-correlation coefficient for price

    and age of secondhand caravans.

    b Classify the strength and direction of the relationship between price and age of

    secondhand caravans according to the table given.

    23.3 The correlation coefficientWhen a relationship is linear the most commonly used measure of strength of the relationship

    is Pearsons product-moment correlation coefficient, r. It gives a numerical measure of the

    degree to which the points in the scatterplot tend to cluster around a straight line.

    Pearsons product-moment correlation is defined to be

    r = degree which the variables vary togetherdegree which the two variables vary separately

    Formally, if we call the two variables x and y and we have n observations then Pearsons

    product-moment correlation for this set of observations is

    r = 1n 1

    ni=1

    (xi x

    sx

    ) (yi y

    sy

    )

    where x and sx are the mean and standard deviation of the x scores and y and sy are the mean

    and standard deviation of the y scores.

    There are two key assumptions made in calculating Pearsons correlation coefficient r. They

    are

    the data is numerical

    the relationship being described is linear.

    Pearsons correlation coefficient r has the following properties:

    If there is no linear relationship, r = 0.

    x

    y

    0 r = 0

    For a perfect positive linear relationship,

    r = +1.

    x

    y

    0 r = +1

    For a perfect negative linear relationship,

    r = 1.

    x

    y

    0 r = 1Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    568 Essential Advanced General Mathematics

    Otherwise, 1 r +1Pearsons correlation coefficient r can be classified as follows:

    1 r 0.75 strong negative linear relationship0.75 r 0.50 moderate negative linear relationship0.50 r 0.25 weak negative linear relationship0.25 < r < 0.25 no linear relationship

    0.25 r < 0.50 weak positive linear relationship0.50 r < 0.75 moderate positive linear relationship0.75 r 1 strong positive linear relationship

    The following scatterplots show linear relationships of various strengths together with the

    corresponding value of Pearsons product-moment correlation coefficient.

    100 150 200 250 300 350 400

    10

    12

    14

    Traffic volume

    CO

    leve

    l

    600 800 1000 1200 1400

    20

    25

    15

    30

    Testosterone level

    Age

    Carbon monoxide level in the atmosphere

    and traffic volume: r = 0.985Age first convicted and testosterone (a male

    hormone) level of a group of prisoners:

    r = 0.814

    80 100 120

    60

    80

    100

    120

    140

    160

    Smoking ratio

    Mor

    tali

    ty

    90

    100

    110

    8 10 12 14 16 18 20Age 1st word

    Sco

    re

    Mortality rate due to lung cancer and

    smoking ratio (100 average): r = 0.716Score on aptitude test (taken later in life) and

    age (in months) when first word spoken:

    r = 0.445700

    600

    600 800700

    500

    500

    400

    Ver

    bal

    Mathematics

    20

    15

    10

    5

    30 40 50Age

    Cal

    f

    Scores on standardised tests of verbal

    and mathematical ability: r = 0.275Calf measurement and age of adult males:

    r = 0.005Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 569

    Using the TI-NspireThe value Pearsons product-moment correlation coefficient, r, can be calculated using

    the calculator. This will be illustrated using the age and price of car data from

    Example 2.

    With the data entered as the two lists age and

    price respectively, we now open a Calculatorapplication ( 1) to calculate Pearsons

    product-moment correlation coefficient.

    Use Linear Regression (mx + b) from theStat Calculations submenu of the Statisticsmenu (b613) and complete the

    dialog box as shown.

    Press enter to obtain the regressioninformation including the value for r as

    shown.

    Using the Casio ClassPadConsider the following set of data

    x 1 3 5 4 7

    y 2 5 7 2 9

    Enter the data into list1 (x) and list2 (y).

    Tap Calc, Linear Reg and select the settings shown. TapOK to produce the results shown in the second screen.

    The value of r is shown in the answer box.

    Tap OK to produce a scatterplot.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    570 Essential Advanced General Mathematics

    After the mean and standard deviation, Pearsons product-moment correlation is one of the

    most frequently computed descriptive statistics. It is a powerful tool but it is also easily

    misused. The presence of a linear relationship should always be confirmed with a scatterplot

    before Pearsons product-moment correlation is calculated. And, like the mean and the

    standard deviation, Pearsons correlation coefficient r is very sensitive to the presence of

    outliers in the sample.

    Exercise 23C

    1 Use the table of Pearsons correlation coefficients r to classify the following.

    a r = 0.20 b r = 0.30 c r = 0.85 d r = 0.33e r = 0.95 f r = 0.75 g r = 0.75 h r = 0.24i r = 0.50 j r = 0.25 k r = 1 l r = 1

    2 By comparing the plots given to those on page 538 estimate the value of Pearsons

    correlation coefficient r.

    a y

    x15.0 20.0 25.0 30.0 35.0

    24

    12

    36

    b y

    x160 200

    280

    210

    140

    280120 240

    c

    x

    y

    10

    2

    4

    6

    8

    2 3 7 864 5

    d

    x

    y

    80

    60

    40

    20

    10 20 30 400

    e

    x

    y

    1 3 54 6 72 8

    8000

    10000

    12000

    14000

    f

    x

    y

    6

    4

    2

    1 2 3 4 5 6 7 8

    8

    0

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 571

    3 The amount of a particular pain relief drug given to each patient and the time taken for the

    patient to experience relief are shown.

    Patient 1 2 3 4 5 6 7 8 9 10

    Drug dose (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

    Response time (min) 65 35 15 10 22 16 10 18 70 20

    a Determine the value of Pearsons correlation coefficient.

    b Classify the relationship between drug dose and response time according to the table

    given.

    4 The proprietor of a hairdressing salon recorded the amount spent on advertising in the

    local paper and the business income for each month for a year, with the following results.

    Month Advertising ($) Business ($)1 350 9 450

    2 450 10 070

    3 400 9 380

    4 500 9 110

    5 250 5 220

    6 150 3 100

    Month Advertising ($) Business ($)7 350 8 060

    8 300 7 030

    9 550 11 500

    10 600 12 870

    11 550 10 560

    12 450 9 850

    a Determine the value of Pearsons correlation coefficient.

    b Classify the relationship between the amount spent on advertising and business income

    according to the table given.

    5 The number of passenger seats on the most commonly used commercial aircraft, and the

    airspeeds of these aircraft, in km/h, are shown in the following table.

    Number of seats 405 296 288 258 240 230 193 188

    Airspeed (km/h) 830 797 774 736 757 765 760 718

    Number of seats 148 142 131 122 115 112 103 102

    Airspeed (km/h) 683 666 661 378 605 620 576 603

    a Determine the value of Pearsons correlation coefficient.

    b Classify the relationship between the number of passenger seats and airspeed

    according to the table given.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    572 Essential Advanced General Mathematics

    6 The price and age of several secondhand caravans are listed in the table.

    Age (years) Price ($)7 4 800

    7 3 900

    8 4 275

    9 3 900

    4 6 900

    8 6 500

    1 11 400

    Age (years) Price ($)10 8 700

    9 1 950

    9 3 300

    11 1 650

    3 9 600

    4 8 400

    7 6 600

    a Determine the value of Pearsons correlation coefficient.

    b Classify the relationship between price and age according to the table given.

    7 The following are the scores for a group of ten students who each had two attempts at a

    test (out of 70).

    Attempt 1 53 56 57 49 44 69 66 40 53 43 68 64

    Attempt 2 63 66 67 58 54 70 70 55 63 53 70 70

    a Construct a scatterplot of these data, and describe the relationship between scores on

    attempt 1 and attempt 2.

    b Is it appropriate to calculate the value of Pearsons correlation coefficient for these

    data? Give reasons for your answer.

    8 This table represents the results of two

    different tests for a group of students.Student Test 1 Test 2

    1 214 216

    2 281 270

    3 212 281

    4 324 326

    5 240 243

    6 208 213

    7 303 311

    8 278 290

    9 311 320

    a Construct a scatterplot of these data, and

    describe the relationship between scores on

    Test 1 and Test 2.

    b Is it appropriate to calculate the value of

    Pearsons correlation coefficient for these

    data? Give reasons for your answer.

    c Determine the values of the q-correlation

    coefficient and Pearsons correlation

    coefficient r.

    d Classify the relationship between Test 1 and Test 2 using both the q-correlation

    coefficient and Pearsons correlation coefficient r, and compare.

    e It turns out that when the data was entered into the student records, the result for

    Test 2, Student 9 was entered as 32 instead of 320.

    i Recalculate the values of the q-correlation coefficient and Pearsons correlation

    coefficient r with this new data value.

    ii Compare these values with the ones calculated in c.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 573

    23.4 Lines on scatterplotsIf a linear relationship exists between two variables it is possible to predict the value of the

    dependent variable from the value of the independent variable. The stronger the relationship

    between the two variables the better the prediction that is made. To make the prediction it is

    necessary to determine an equation which relates the variables and this is achieved by fitting a

    line to the data. Fitting a line to data is often referred to as regression, which comes from a

    Latin word regressum which means moved back.

    The simplest equation relating two variables x and y is a linear equation of the form

    y = a + bxwhere a and b are constants. This is similar to the general equation of a straight line, where a

    represents the coordinate of the point where the line crosses the y axis (the y axis intercept),

    and b represents the slope of the line. In order to summarise any particular (x, y) data set,

    numerical values for a and b are needed that will make the line pass close to the data. There are

    several ways in which the values of a and b can be found, of which the simplest is to find the

    straight line by placing a ruler on the scatter diagram, and drawing a line by eye, which

    appears to follow the general trend of the data.

    Example 6

    The following table gives the gold medal winning distance, in metres, for the mens long jump

    for the Olympic games for the years 1896 to 1996. (Some years were missing owing to the two

    world wars.)

    Find a straight line which fits the general trend of the data, and use it to predict the winning

    distance in the year 2008.

    Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956

    Distance (m) 6.35 7.19 7.34 7.49 7.59 7.16 7.44 7.75 7.65 8.05 7.82 7.57 7.82

    Year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004

    Distance (m) 8.13 8.08 8.92 8.26 8.36 8.53 8.53 8.72 8.67 8.50 8.55 8.59

    Solution

    Dis

    tanc

    e

    Year1900 1920 1940 1960 1980 2000

    9.0

    8.5

    8.0

    7.5

    7.0

    6.5

    6.0

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    574 Essential Advanced General Mathematics

    Note that this scatterplot does not start at the origin. Since the values of the

    coordinates that are of interest on both axes are a long way from zero, it is sensible to

    plot the graph for that range of values only. In fact, any values less than 1896 on the

    horizontal axis are meaningless in this context.

    The line shown on the scatterplot is only one of many which could be drawn. To

    enable the line to be used for prediction it is necessary to find its equation. To do this,

    first determine the coordinates of any two points through which it passes on the

    scatterplot. Appropriate points are (1932, 7.65) and (1976, 8.36). The equation of the

    straight line is then found by substituting in the formula which gives the equation for a

    straight line between two points.

    y y1x x1 =

    y2 y1x2 x1

    y 7.65x 1932 =

    8.36 7.651976 1932

    = 0.7144

    = 0.016y 7.65 = 0.016(x 1932)

    y = 0.016x 23.26or distance = 23.26 + 0.016 year

    The intercept for this equation is 23.26 m. In theory, this is the winning distancefor the year 0! In practice, there is no meaningful interpretation for the y axis intercept

    in this situation. But the same cannot be said about the slope. A slope of 0.016 means

    that on average the gold medal winning distance increases by about 1.6 centimetres at

    each successive games.

    Using this equation the gold medal winning distance for the long jump in 2008

    would be predicted as

    y = 23.26 + 0.016 2008 = 8.87 m

    Obviously, attempting to project too far into the future may give us answers which are not

    sensible. When using an equation for prediction, derived from data, it is sensible to use values

    of the explanatory variable which are within a reasonable range of the data. The relationship

    between the variables may not be linear if we move too far from the known values.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 575

    Example 7

    The following table gives the alcohol consumption per head (in litres) and the hospital

    admission rate to each of the regions of Victoria in 199495.

    Per capita Hospital

    consumption admissions per

    Region (litres of alcohol) 1000 residents

    LoddonMallee 9.0 42.0

    Grampians 8.4 44.7

    Barwon 8.7 38.6

    Gippsland 9.1 44.7

    Hume 10.0 41.0

    Western Metropolitan 9.0 40.4

    Northern Metropolitan 6.7 36.2

    Eastern Metropolitan 6.2 32.3

    Southern Metropolitan 8.1 43.0

    Find a straight line which fits the general trend of the data, and interpret the intercept and slope.

    Solution

    Alcohol consumption

    Adm

    issi

    ons

    7.06.0 8.0 9.0

    30.0

    0

    35.0

    40.0

    45.0

    10.0

    One possible line passes through the points (7, 36) and (9, 42).

    Thusy y1x x1 =

    y2 y1x2 x1

    y 36x 7 =

    42 369 7

    = 62

    = 3y 36 = 3(x 7)

    y = 3x + 15or admission rate = 15 + 3 alcohol consumptionThe intercept for this equation is 15, implying that we predict a hospital admission

    rate would be 15 per 1000 residents for a region with 0 alcohol consumption. While

    this is interpretable, it would be a brave prediction as it is well out of the range of theCambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    576 Essential Advanced General Mathematics

    data. A slope of 3 means that on average the admission rate rises by 3 per 1000

    residents for each additional litre of alcohol consumed per capita.

    Exercise 23D

    1 Plot the following set of data points on graph paper.Example 6

    x 0 1 2 3 4 5 6 7 8

    y 1 3 6 7 7 11 13 18 17

    Draw a straight line which fits the data by eye, and find an equation for this line.

    2 Plot the following set of points on graph paper.

    x 3 2 1 0 1 2 3 4y 5 2 0 6 7 11 13 20

    Draw a straight line which fits the data by eye, and find an equation for this line.

    3 The numbers of burglaries during two successive years for various districts in one state areExample 7

    given in the following table.

    District Year 1 (x) Year 2 (y)

    A 3233 2709

    B 2363 2208

    C 4591 3685

    D 4317 4038

    E 2474 2792

    F 3679 3292

    G 5016 4402

    H 6234 5147

    I 6350 5555

    J 4072 4004

    K 2137 1980

    a Make a scatterplot of the data.

    b Find the equation of a straight line which

    relates the two variables.

    c Describe the trend in burglaries in this state.

    4 The following data give a girls height (in cm) between the ages of 36 months and

    60 months.

    Age (x) 36 40 44 52 56 60

    Height (y) 84 87 90 92 94 96

    a Make a scatterplot of the data.

    b Find the equation of a straight line which relates the two variables.

    c Interpret the intercept and slope, if appropriate.

    d Use your equation to estimate the girls height at age

    i 42 months ii 18 years

    e How reliable are your answers to d?

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 577

    5 The following table gives the adult heights (in cm) of ten pairs of mothers and daughters.

    Mother (x) 170 163 157 165 175 160 164 168 152 173

    Daughter (y) 178 175 165 173 168 152 163 168 160 178

    a Make a scatterplot of the data.

    b Find the equation of a straight line which relates the two variables.

    c Estimate the adult height of a girl whose mother is 170 cm tall.

    6 The manager of a company which manufactures MP3 players keeps a weekly record of the

    cost of running the business and the number of units produced. The figures for a period of

    eight weeks are shown in the table.

    Number of MP3 players produced (x) 100 160 80 100 220 150 170 200

    Cost in 000s $ (y) 2.5 3.3 2.4 2.6 4.1 3.1 3.5 3.8

    a Make a scatterplot of the data.

    b Find the equation of a straight line which relates the two variables.

    c What is the manufacturers fixed cost for operating the business each week?

    d What is the cost of production of each unit, over and above this fixed operating cost?

    7 The amount of a particular pain relief drug given to each patient and the time taken for the

    patient to experience pain relief are shown.

    Patient 1 2 3 4 5 6 7 8 9 10

    Drug dose (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

    Response time (min) 65 35 15 10 22 16 10 18 70 20

    a Find the equation of a straight line which relates the two variables.

    b Interpret the intercept and slope if appropriate.

    c Use your equation to predict the time taken for the patient to experience pain relief if 6

    mg of the drug is given. Is this answer realistic?

    8 The proprietor of a hairdressing salon recorded

    the amount spent on advertising in the local

    paper and the business income for each

    month for a year, with the results shown.

    Month Advertising ($) Business ($)1 350 9 450

    2 450 10 070

    3 400 9 380

    4 500 9 110

    5 250 5 220

    6 150 3 100

    7 350 8 060

    8 300 7 030

    9 550 11 500

    10 600 12 870

    11 550 10 560

    12 450 9 850

    a Find the equation of a straight line

    which relates the two variables.

    b Interpret the intercept and slope if

    appropriate.

    c Use your equation to predict the business

    income which would be attracted if the

    proprietor of the salon spent the following

    amounts on advertising:

    i $1000 ii $0

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    578 Essential Advanced General Mathematics

    23.5 The least squares regression lineFitting a line to a scatterplot by eye is not generally the best way of modelling a relationship.

    What is required is a method which uses a more objective criterion. A simple method is two

    divide the data set into two halves on the basis of the median x value, and to fit a line which

    passes through the mean x and y values of the lower half, and the mean x and y values of the

    upper half. This is called the two-mean line, and while easy to determine, it is not very widely

    used. The most common procedure is the method of least squares. The least squares

    regression line is the line for which the sum of squares of the vertical deviations from the data

    to the line (as indicated in the diagram) is a minimum. These deviations are called the

    residuals.

    y

    x105 7 96 84321

    35

    30

    20

    25

    15

    5

    10

    0

    (xi, yi)

    y = a + bx

    Consider the line y = a + bxWe would like to find a and b such that the sum of the residuals is zero. That is,

    ni=1

    (yi a bxi ) = 0 1

    and the sum of residuals square is as small as possible. That is,

    ni=1

    (yi a bxi )2 is a minimum 2

    We will use the symbol S to denoten

    i=1(yi a bxi )2

    From 1 ,n

    i=1(yi a bxi ) = 0

    n

    i=1yi na b

    ni=1

    xi = 0

    y a bx = 0 a = y bx 3

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 579

    Substituting this relationship in 2

    S =n

    i=1[yi (y bx) bxi ]2

    =n

    i=1[(yi y) b(xi x)]2

    =n

    i=1[(yi y)2 2b(xi x)(yi y) + b2(xi x)2]

    This can be thought of as a quadratic expression in b.

    In order to find the value of b which minimises S, we will differentiate with respect to b and

    set the derivative equal to zero.

    dS

    db= 2

    ni=1

    (xi x)(yi y) + 2bn

    i=1(xi x)2

    = 0

    Simplifying gives b =

    ni=1

    (xi x)(yi y)n

    i=1(xi x)2

    4

    Equations 3 and 4 can then be used to calculate the least squares estimates of the y axis

    intercept and the slope.

    Using the TI-NspireThe calculator can be used to construct the least squares regression line. The procedure

    is illustrated using the age and price of car data from Example 2.

    It has previously been illustrated how to

    enter the data as two lists, age and price

    respectively, and from these construct the

    scatterplot in a Data & Statisticsapplication resulting in the data displayed

    as shown.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    580 Essential Advanced General Mathematics

    Now use Show Linear (mx + b) from theRegression submenu of the Analyze menu(b461) to place the regression

    line on the scatterplot as shown.

    Using the Casio ClassPadThe following data gives the heights (in cm) and weights (in kg) of 11 people.

    Height (x) 177 182 167 178 173 184 162 169 164 170 180

    Weight (y) 74 75 62 63 64 74 57 55 56 68 72

    Enter the data into list1 (x) and list2 (y).

    Tap Calc, Linear Reg and select the settingsshown. Tap OK to produce the results

    shown in the second screen. The format of

    the formula, y = ax + b is shown at thetop and the values of a, b are shown in the

    answer box.

    Tap OK to produce a scatterplot

    showing the regression line.

    Note: The formula can be automatically copied into a selected entry line if

    desired by selecting a graph number, e.g. y1 in the Copy Formula line.

    After the equation of the least squares line has been determined, we can interpret the

    intercept and slope in terms of the problem at hand, and use the equation to make predictions.

    The method of least squares is also sensitive to any outliers in the data.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 581

    Example 8

    Consider again the gold medal winning distance, in metres, for the mens long jump for

    the Olympic games for the years 1896 to 2004.

    Find the equation of the least squares regression line for these data, and use it to predict

    the winning distance for the year 2008.

    Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956Distance (m) 6.35 7.19 7.34 7.49 7.59 7.16 7.44 7.75 7.65 8.05 7.82 7.57 7.82Year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004Distance (m) 8.13 8.08 8.92 8.26 8.36 8.53 8.53 8.72 8.67 8.50 8.55 8.59

    Solution

    Using a calculator or computer the equation is found to be

    distance = 23.87 + 0.0163 yearwhich is quite similar to the equation to the line fitted by eye.

    The predicted distance for the year 2008 is

    distance = 23.87 + 0.0163 2008 = 8.86 m

    Example 9

    Consider again the data from Example 7 which related alcohol consumption per head (in litres)

    and the hospital admission rate to each of the regions of Victoria in 199495.

    Per capita consumption Hospital admissions

    Region (litres of alcohol) per 1000 residents

    LoddonMallee 9.0 42.0

    Grampians 8.4 44.7

    Barwon 8.7 38.6

    Gippsland 9.1 44.7

    Hume 10.0 41.0

    Western Metropolitan 9.0 40.4

    Northern Metropolitan 6.7 36.2

    Eastern Metropolitan 6.2 32.3

    Southern Metropolitan 8.1 43.0

    Find the equation of the least squares regression line which fits these data.

    Solution

    Using a calculator or computer the equation is found to be

    admissions = 19.9 + 2.45 alcoholwhich is slightly different from the line fitted by eye.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    582 Essential Advanced General Mathematics

    Correlation and causationThe existence of even a strong linear relationship between two variables is not, in itself,

    sufficient to imply that altering one variable causes a change in the other. It only implies that

    this might be the explanation. It may be that both the measured variables are affected by a third

    and different variable. For example, if data about crime rates and unemployment in a range of

    cities were gathered, a high correlation would be found. But could it be inferred that high

    unemployment causes a high crime rate? The explanation could be that both of these variables

    are dependent on other factors, such as home circumstances, peer group pressure, level of

    education or economic conditions, all of which may be related to both unemployment and

    crime rates. These two variables may vary together, without one being the direct cause of the

    other.

    Exercise 23E

    1 The following data give a girls height (in cm) between the ages of 36 months andExample 8

    60 months.

    Age (x) 36 40 44 52 56 60

    Height (y) 84 87 90 92 94 96

    a Using the method of least squares find the equation of a straight line which relates the

    two variables.

    b Interpret the intercept and slope, if appropriate.

    c Use your equation to estimate the girls height at age

    i 42 months ii 18 years

    d How reliable are your answers to part c?

    2 The number of burglaries during two

    successive years for various districts in

    one state are given in the following table.

    District Year 1 (x) Year 2 (y)

    A 3 233 2 709

    B 2 363 2 208

    C 4 591 3 685

    D 4 317 4 038

    E 2 474 2 792

    F 3 679 3 292

    G 5 016 4 402

    H 6 234 5 147

    I 6 350 5 555

    J 4 072 4 004

    K 2 137 1 980

    Using the method of least squares find

    the equation of a straight line which

    relates the two variables.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 583

    3 The following table gives the adult heights (in cm) of ten pairs of mothers and daughters.Example 9

    Mother (x) 170 163 157 165 175 160 164 168 152 173

    Daughter (y) 178 175 165 173 168 152 163 168 160 178

    a Using the method of least squares find the equation of a straight line which relates the

    two variables.

    b Interpret the slope in this context.

    c Estimate the adult height of a girl whose mother is 170 cm tall.

    4 The manager of a company which manufactures MP3 players keeps a weekly record of the

    cost of running the business and the number of units produced. The figures for a period of

    eight weeks are:

    Number of MP3

    players produced (x) 100 160 80 100 220 150 170 200

    Cost in 000s $ (y) 2.5 3.3 2.4 2.6 4.1 3.1 3.5 3.8

    a Using the method of least squares find the equation of a straight line which relates the

    two variables.

    b What is the manufactures fixed cost for operating the business each week?

    c What is the cost of production of each unit, over and above this fixed operating cost?

    5 The amount of a particular pain relief drug given to each patient and the time taken for the

    patient to experience pain relief are shown.

    Patient 1 2 3 4 5 6 7 8 9 10

    Drug dose (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 4.0

    Response time (min) 65 35 15 10 22 16 10 18 70 20

    a Using the method of least squares find the equation of a straight line which relates the

    two variables.

    b Interpret the intercept and slope if appropriate.

    c Use your equation to predict the time taken for the patient to experience pain relief

    if 6 mg of the drug is given. Is this answer realistic?

    6 The proprietor of a hairdressing salon

    recorded the amount spent on advertising

    in the local paper and the business income

    for each month for a year, with the

    results shown.

    Month Advertising ($) Business ($)1 350 9 450

    2 450 10 070

    3 400 9 380

    4 500 9 110

    5 250 5 220

    6 150 3 100

    7 350 8 060

    8 300 7 030

    9 550 11 500

    10 600 12 870

    11 550 10 560

    12 450 9 850

    a Using the method of least squares find

    the equation of a straight line which

    relates the two variables.

    b Interpret the intercept and slope if

    appropriate.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    584 Essential Advanced General Mathematics

    c Use your equation to predict the volume which would be attracted if the proprietor of

    the salon spent the following amounts on advertising.

    i $1000 ii $0

    Using a CAS calculator with statistics IIHow to construct a scatterplotConstruct a scatterplot for the following set of test scores.

    Treat Test 1 as the independent (x-) variable.

    Test 1 score 10 18 13 6 8 5 12 15 15

    Test 2 score 12 20 11 9 6 6 12 13 17

    Enter the data into your calculator using the Stats/List

    Editor.

    Type the data into list1 and list2.

    Setup the calculator to plot a statistical graph.

    a Press F2 to access the Plots menu.

    b Press ENTER to select 1:Plot Setup.

    c Press F1 to Define Plot1.

    d Complete the dialogue box as follows: For Plot Type: select 1:Scatter

    Leave Mark: as Box.

    For x, type in list1. For y, type in list2.

    Pressing ENTER confirms your selection and returns

    you to the Plot Setup menu.

    Pressing F5 (Zoom Data) in Plot Setup automatically

    plots the scatter plot in a properly scaled window.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Chapter 23 Investigating the relationship between two numerical variables 585

    How to calculate the correlation coefficient rUse a calculator to calculate the correlation coefficient r for the following data.

    x 1 3 5 4 7

    y 2 5 7 2 9

    Give the answer correct to two decimal places.

    Enter the data into your calculator using the

    Stats/List Editor.

    Type the data into list1 and list2.

    Calculate the correlation coefficient.

    a Press F4 to access the Calculate menu.

    b Move the cursor down ( ) to Regressions and

    across ( ) to the Regressions menu.

    c Press ENTER to select 1: LinReg(a+bx). This will

    take you to the LinReg(a+bx) dialogue box.

    d Complete the dialogue box. For Xlist: type in list1. For Ylist: type in list2.

    e Press ENTER to obtain the results.

    How to determine the equation of a least squares regressionThe following data gives the heights (in cm) and weights (in kg) of 11 people.

    Height (x) 177 182 167 178 173 184 162 169 164 170 180

    Weight (y) 74 75 62 63 64 74 57 55 56 68 72

    Use a graphics calculator to determine the equation of the least squares regression line

    that will enable weight to be predicted from height.

    Enter the data into your calculator using the Stats/List Editor.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    586 Essential Advanced General Mathematics

    Type the data into list1 and list2.

    a Press F4 to access the Calculate menu.

    b Move the cursor down (D) to Regressions and

    across (B) to the Regressions menu.

    c Press ENTER to select 1: LinReg(a+bx). This will take

    you to the LinReg(a+bx) dialogue box.

    d Complete the dialogue box. For Xlist: type in list1. For Ylist: type in list2.

    e Press ENTER to obtain the results.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Review

    Chapter 23 Investigating the relationship between two numerical variables 587

    Chapter summary

    Bivariate data arises when measurements on two variables are collected for each subject.

    A scatterplot is an appropriate visual display of bivariate data if both of the variables are

    numerical.

    A scatterplot of the data should always be constructed to assist in the identification of

    outliers and illustrate the association (positive, negative or none).

    Two variables are positively associated when larger values of y are associated with larger

    values of x. Two variables are negatively associated when larger values of y are associated

    with smaller values of x. There is no association between two variables when the values of

    y are not related to the values of x.

    When constructing the scatterplot, the independent or explanatory variable is plotted on the

    horizontal (x) axis, and the dependent or response variable is plotted on the vertical (y) axis.

    If a linear relationship is indicated by the scatterplot a measure of its strength can be found

    by calculating the q-correlation coefficient, or Pearsons product-moment correlation

    coefficient, r.

    If the values on a scatterplot are divided by lines representing the median of x and the

    median of y into four quadrants A, B, C and D, with a, b, c, d representing the number of

    points in each quadrant respectively, then the q-correlation coefficient is given by

    q = (a + c) (b + d)a + b + c + d

    Pearsons product-moment correlation, r, is a measure of strength of linear relationship

    between two variables, x and y. If we have n observations then for this set of observations

    r = 1n 1

    ni=1

    (xi x

    sx

    ) (yi y

    sy

    )

    where x and sx are the mean and standard deviation of the x scores and y and sy are the

    mean and standard deviation of the y scores.

    For these correlation coefficients

    1 q 11 r 1

    with values close to 1 indicating strong correlation, and those close to 0 indicating littlecorrelation.

    If a linear relationship is indicated from the scatterplot a straight line may be fitted to the

    data, either by eye or using the least squares regression method.

    The least squares regression line is the line for which the sum of squares of the vertical

    deviations from the data to the line is a minimum.

    The value of the slope (b) gives the extent of the change in the dependent variable

    associated with a unit change in the independent variable.

    Once found, the equation to the straight line may be used to predict values of the response

    variable (y) from the explanatory variable (x). The accuracy of the prediction depends on

    how closely the straight line fits the data, and an indication of this can be obtained from the

    correlation coefficient.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Rev

    iew

    588 Essential Advanced General Mathematics

    Multiple-choice questions

    1 For which one of the following pairs of variables would it be appropriate to construct a

    scatterplot?

    A eye colour (blue, green, brown, other) and hair colour (black, brown, blonde, red, other)

    B score out of 100 on a test for a group of Year 9 students and a group of Year 11 students

    C political party preference (Labor, Liberal, Other) and age in years

    D age in years and blood pressure in mm Hg

    E height in cm and gender (male, female)

    2 For which one of the following plots would it be appropriate to calculate the value of the

    q-correlation coefficient?

    A B C

    D E

    3 A q-correlation coefficient of 0.32 would describe a relationship classified as

    A weak positive B moderate positive C strong positive

    D close to zero E moderately strong

    4 The scatterplot shows the relationship between age and the number of alcoholic drinks

    consumed on the weekend by a group of people.

    The value of the q-correlation coefficient is closest to

    A 1 B 79

    C 56

    D7

    9E 1

    0

    5

    10

    15

    20

    10 20 30 40 50 60 70Age

    No.

    of

    drin

    ks

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Review

    Chapter 23 Investigating the relationship between two numerical variables 589

    5 The following scatterplot shows the relationship between height and weight for a group of

    people.

    The value of the Pearsons product-moment correlation coefficient r is closest to

    A 1 B 0.8 C 0.5 D 0.3 E 0

    180

    160

    120

    180

    160

    140

    100

    140 190

    220

    80

    200

    200150 170

    Wei

    ght

    Height (cm)

    Questions 6 and 7 relate to the following information.

    The weekly income and weekly expenditure on food for a group of 10 university students is

    given in the following table.

    Weekly

    income ($) 150 250 300 600 300 380 950 450 850 1000Weekly food

    expenditure ($) 40 60 70 120 130 150 200 260 460 600

    6 The value of the Pearson product-moment correlation coefficient r for these data is closest

    to

    A 0.2 B 0.4 C 0.6 D 0.7 E 0.8

    7 The least squares regression line which would enable expenditure on food to be predicted

    from weekly income is closest to

    A 0.482 + 42.864 weekly income B 0.482 42.864 weekly incomeC 42.864 + 0.482 weekly income D 239.868 + 1.355 weekly incomeE 1.355 + 239.868 weekly income

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Rev

    iew

    590 Essential Advanced General Mathematics

    Questions 8 and 9 relate to the following information.

    Suppose that the least squares regression line which would enable expenditure on

    entertainment (in dollars) to be predicted from weekly income is given by

    Weekly expenditure on entertainment = 40 + 0.10 Weekly income

    8 Using this rule the expenditure on entertainment by an individual with an income of $600

    per week is predicted to be

    A $40 B $24 060 C $100 D $46 E $240

    9 From this rule which of the following statements is correct?

    A On average for each extra dollar of income an extra 10 cents is spent on entertainment

    B On average for each extra 10 cents in income an extra $1 is spent on entertainment

    C On average for each extra dollar of income an extra 40 cents is spent on entertainment

    D On average people spend $40 per week on entertainment

    E On average people spend $50 per week on entertainment

    10 For the scatterplot shown the line of best fit would

    have a slope closest to:

    1610 12 14

    150

    18

    50

    22

    100

    200

    250

    20

    A 0.1 B 0.1 C 10D 10 E 200

    Short-answer questions

    Technology is required to answer some of the following questions.

    1 The following table gives the number of times

    the ball was inside the 50 metre line in an AFL

    football game, and the teams score in that game

    Inside 50 Score (points)

    64 90

    57 134

    34 76

    61 92

    51 93

    52 45

    53 120

    51 66

    64 105

    55 108

    58 88

    71 133

    a Plot the score against the number of Inside 50s.

    b From the scatterplot, describe any association

    between the two variables.

    2 Use the scatterplot constructed in 1 to determine

    q-correlation between the score and the number of

    Inside 50s.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Review

    Chapter 23 Investigating the relationship between two numerical variables 591

    3 The distance traveled to work and the time taken for a group of company employees are

    given in the following table. Determine the value of the Pearson product-moment

    correlation r for these data.

    Distance (kms) 12 50 40 25 45 20 10 3 10 30

    Time (mins) 15 75 50 50 80 50 10 5 10 35

    4 The following scatterplot shows the relationship between height and weight for a group of

    people. Draw a straight line which fits the data by eye, and find an equation for this line.

    160

    180

    200

    220

    170160 180 190 200150

    140

    140

    120

    80

    100

    height (cm)

    wei

    ght

    5 The time taken to complete a task, and the number of

    errors on the task, were recorded for a sample of 10

    primary school children. Determine the equation of

    the least squares regression line which fits these data.

    Time (seconds) Errors

    22.6 2

    21.7 3

    21.7 3

    21.3 4

    19.3 5

    17.6 5

    17.0 7

    14.6 7

    14.0 9

    8.8 9

    6 For the data in 5:

    a Interpret the intercept and slope of the least

    squares regression line.

    b Use the least squares regression line to predict

    the number of errors which would be observed

    for a child who took 10 seconds to complete the

    task.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Rev

    iew

    592 Essential Advanced General Mathematics

    Extended-response questions

    1 A marketing company wishes to predict the likely number of new clients each of its

    graduates will attract to the business in their first year of employment, by using their scores

    on a marketing exam in the final year of their course.

    Number of new

    Exam score clients

    65 7

    72 9

    68 8

    85 10

    74 10

    61 8

    60 6

    78 10

    70 5

    82 11

    a Which is the independent variable

    and which is the dependent variable?

    b Construct a scatterplot of these data.

    c Describe the association between the

    Number of new clients and Exam

    score.

    d Determine the value of the q-correlation

    coefficient for these data, and classify

    the strength of the relationship.

    e Determine the value of the Pearson

    product-moment correlation coefficient

    for these data and classify the strength

    of the relationship.

    f Determine the equation for the least squares regression line and write it down in terms of

    the variables Number of new clients and Exam score.

    g Interpret the intercept and slope of the least squares regression line in terms of the

    variables in the study.

    h Use your regression equation to predict to the nearest whole number the Number of new

    clients for a person who scored 100 on the exam.

    i How reliable is the prediction made in h?

    2 To investigate the relationship between marks on an assignment and the final examination

    mark a sample of 10 students was taken. The table indicates the marks for the assignment

    and the final exam mark for each individual student.

    Assignment mark Final exam mark

    (max = 80) (max = 90)80 83

    77 83

    71 79

    78 75

    65 68

    80 84

    68 71

    64 69

    50 66

    66 58

    a Which is the independent variable

    and which is the dependent variable?

    b Construct a scatterplot of these data.

    c Describe the association between the

    assignment mark and exam mark.

    d Determine the value of the q-correlation

    coefficient for these data, and classify the

    strength of the relationship.

    e Determine the value of the Pearson

    product-moment correlation coefficient

    for these data and classify the strength of

    the relationship.

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

  • P1: FXS/ABE P2: FXS

    9780521740494c23.xml CUAU033-EVANS September 12, 2008 11:42

    Review

    Chapter 23 Investigating the relationship between two numerical variables 593

    f Use your answer to d to comment on the statement: Good final exam marks are the

    result of good assignment marks.

    g Determine the equation for the least squares regression line and write it down in terms

    of the variables Final exam mark and Assignment mark.

    h Interpret the intercept and slope of the least squares regression line in terms of the

    variables in the study.

    i Use your regression equation to predict the Final exam mark for a student who scored 50

    on the assignment.

    j How reliable is the prediction made in i?

    3 A marketing firm wanted to investigate the relationship between airplay and CD sales (in the

    following week) of newly released songs. Data was collected on a random sample of 10 songs.

    No. of times the Weekly sales

    song was played of the CD

    47 3950

    34 2500

    40 3700

    34 2800

    33 2900

    50 3750

    28 2300

    53 4400

    25 2200

    46 3400

    a Which is the independent variable and which

    is the dependent variable?

    b Construct a scatterplot of these data.

    c Describe the association between the number

    of times the song was played and weekly sales.

    d Determine the value of the q-correlation

    coefficient for these data, and classify the strength

    of the relationship.

    e Determine the value of the Pearson

    product-moment correlation coefficient for these

    data and classify the strength of the relationship.

    f Determine the equation for the least squares

    regression line and write it down in terms of the

    variables Number of times the song was played and Weekly sales.

    g Interpret the intercept and slope of the least squares regression line in terms of the

    variables in the study.

    h Use your regression equation to predict the weekly sales for a song which was played 60

    times.

    i How reliable is the prediction made in h?

    Cambridge University Press Uncorrected Sample Pages 978-0-521-61252-4 2008 Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard

    SAM

    PLE

    Button23: