CHAPTER 6_correlation and Regression

download CHAPTER 6_correlation and Regression

of 25

Transcript of CHAPTER 6_correlation and Regression

  • 8/3/2019 CHAPTER 6_correlation and Regression

    1/25

    Page 1 of25

    CHAPTER 6 : CORRELATION - REGRESSION

    6.1 Introduction

    So far we have considered only univariate distributions. By the averages, dispersion and

    skewness of distribution, we get a complete idea about the structure of the distribution. Many atime, we come across problems which involve two or more variables. If we carefully study thefigures of rain fall and production of paddy, figures of accidents and motor cars in a city, of

    demand and supply of a commodity, of sales and profit, we may find that there is some

    relationship between the two variables. On the other hand, if we compare the figures of rainfall

    in America and the production of cars in Japan, we may find that there is no relationship betweenthe two variables. If there is any relation between two variables i.e. when one variable changes

    the other also changes in the same or in the opposite direction, we say that the two variables are

    correlated.

    W. J. King : If it is proved that in a large number of instances two variables, tend always to

    fluctuate in the same or in the opposite direction then it is established that a relationship existsbetween the variables. This is called a "Correlation."

    6.2 Correlation

    It means the study of existence, magnitude and direction of the relation between two

    or more variables. in technology and in statistics. Correlation is very important. The

    famous astronomist Bravais, Prof. Sir Fancis Galton, Karl Pearson (who used this

    concept in Biology and in Genetics). Prof. Neiswanger and so many others have

    contributed to this great subject

    6.3 Types of Correlation

    1. Positive and negative correlation2. Linear and non-linear correlation

    A) If two variables change in the same direction (i.e. if one increases the other also

    increases, or if one decreases, the other also decreases), then this is called a positive

    correlation. For example : Advertising and sales.

    B) If two variables change in the opposite direction ( i.e. if one increases, the other

    decreases and vice versa), then the correlation is called a negative correlation. Forexample : T.V. registrations and cinema attendance.

    1. The nature of the graph gives us the idea of the linear type of correlationbetween two variables. If the graph is in a straight line, the correlation is called

    a "linear correlation" and if the graph is not in a straight line, the correlation

    is non-linear orcurvi-linear.

  • 8/3/2019 CHAPTER 6_correlation and Regression

    2/25

    Page 2 of25

    For example, if variable x changes by a constant quantity, say 20 then y also changes

    by a constant quantity, say 4. The ratio between the two always remains the same (1/5

    in this case). In case of a curvi-linear correlation this ratio does not remain constant.

    6.4 Degrees of Correlation

    Through the coefficient of correlation, we can measure the degree or extent of the

    correlation between two variables. On the basis of the coefficient of correlation we

    can also determine whether the correlation is positive or negative and also its degree

    or extent.

    1. Perfect correlation: If two variables changes in the same direction and in thesame proportion, the correlation between the two is perfect positive.

    According to Karl Pearson the coefficient of correlation in this case is +1. On

    the other hand if the variables change in the opposite direction and in the same

    proportion, the correlation is perfect negative. its coefficient of correlation is -1. In practice we rarely come across these types of correlations.

    2. Absence of correlation: If two series of two variables exhibit no relationsbetween them or change in variable does not lead to a change in the other

    variable, then we can firmly say that there isno correlation or absurd

    correlation between the two variables. In such a case the coefficient of

    correlation is 0.

    3. Limited degrees of correlation: If two variables are not perfectly correlated or is there aperfect absence of correlation, then we term the correlation as Limited correlation. It maybe positive, negative or zero but lies with the limits 1.

    4. High degree, moderate degree or low degree are the three categories of thiskind of correlation. The following table reveals the effect ( or degree ) ofcoefficient or correlation.

    Degrees Positive Negative

    Absence of correlation Zero 0

    Perfect correlation + 1 -1

    High degree + 0.75 to +1

    - 0.75 to -1

    Moderate degree + 0.25 to +

    0.75

    - 0.25 to -

    0.75

  • 8/3/2019 CHAPTER 6_correlation and Regression

    3/25

    Page 3 of25

    Low degree 0 to 0.25 0 to - 0.25

    6.5 Methods Of Determining Correlation

    We shall consider the following most commonly used methods.(1) Scatter Plot (2)Kar Pearsons coefficient of correlation (3) Spearmans Rank-correlation coefficient.

    1) Scatter Plot ( Scatterdiagramor dot diagram ): In this method the values of the

    two variables are plotted on a graph paper. One is taken along the horizontal ( (x-axis)

    and the other along the vertical (y-axis). By plotting the data, we get points (dots) on

    the graph which are generally scattered and hence the name Scatter Plot.

    The manner in which these points are scattered, suggest the degree and the direction

    of correlation. The degree of correlation is denoted by r and its direction is given

    by the signs positive and negative.

    i) If all points lie on a rising straight line the correlation is

    perfectly positive and r = +1 (see fig.1 )

    ii) If all points lie on a falling straight line the correlation is

    perfectly negative and r = -1 (see fig.2)

    iii) If the points lie in narrow strip, rising upwards, the

    correlation is high degree of positive (see fig.3)

    iv) If the points lie in a narrow strip, falling downwards, the

    correlation is high degree of negative (see fig.4)

    v) If the points are spread widely over a broad strip, rising

    upwards, the correlation is low degree positive (see fig.5)

    vi) If the points are spread widely over a broad strip, falling

    downward, the correlation is low degree negative (see

    fig.6)

    vii) If the points are spread (scattered) without any specific

    pattern, the correlation is absent. i.e. r = 0. (see fig.7)

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606501.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606501.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606501.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606501.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    4/25

    Page 4 of25

    Though this method is simple and is a rough idea about the existence and the degree of

    correlation, it is not reliable. As it is not a mathematical method, it cannot measure the degree ofcorrelation.

    2)Karl Pearsons coefficient of correlation: It gives the numerical expression for the measure

    of correlation. it is noted by r . The value of r gives the magnitude of correlation and signdenotes its direction. It is defined as

    r =

    where

    N = Number of pairs of observation

    Note : r is also known as product-moment coefficient of correlation.

    OR r =

    OR r =

    Now covariance of x and y is defined as

    Example Calculate the coefficient of correlation between the heights of father and his

    son for the following data.

  • 8/3/2019 CHAPTER 6_correlation and Regression

    5/25

    Page 5 of25

    Heightof

    father

    (cm):165 166 167 168 167 169 170 172

    Heightof son

    (cm):

    167 168 165 172 168 172 169 171

    Solution: n = 8 ( pairs of observations )

    Height of

    father

    xi

    Height of

    son

    yi

    x

    =

    xi-

    x

    y =

    yi-yxy x2 y2

    165 167 -3 -2 6 9 4

    166 168 -2 -1 2 4 1

    167 165 -1 -4 4 1 16

    167 168 -1 -1 1 1 1

    168 172 0 3 0 0 9

    169 172 1 3 3 1 9

    170 169 2 0 0 4 0

    172 171 4 2 8 16 4

    xi=1344 yi=1352 0 0 xy=24 x2=36 y2=44

    Calculation:

    Now,

  • 8/3/2019 CHAPTER 6_correlation and Regression

    6/25

    Page 6 of25

    Since r is positive and 0.6. This shows that the correlation is positive and moderate

    (i.e. direct and reasonably good).

    Example From the following data compute the coefficient of correlation between x

    and y.

    Example If covariance between x and y is 12.3 and the variance of x and y are 16.4

    and 13.8 respectively. Find the coefficient of correlation between them.

    Solution: Given - Covariance = cov ( x, y ) = 12.3

    Variance of x ( x2 )= 16.4

  • 8/3/2019 CHAPTER 6_correlation and Regression

    7/25

    Page 7 of25

    Variance of y (y2 ) = 13.8

    Now,

    Example Find the number of pair of observations from the following data.

    r = 0.25, (xi - x ) ( yi - y ) = 60, y = 4, ( xi - x )2 = 90.

    Solution: Given - r = 0.25

  • 8/3/2019 CHAPTER 6_correlation and Regression

    8/25

    Page 8 of25

    If the values of x and y are very big, the calculation becomes very tedious and if we

    change the variable x to u = and y to where x0 and y0 are the

    assumed means for variable x and y respectively, then rxy= ruv

    The formula for r can be simplified as

    Example Marks obtained by two brothers FRED and TED in 10 tests are as follows:

    Find the coefficient of correlation between the two.

    Solution: Here x0 = 60, c = 4, y0 = 60 and d = 3

  • 8/3/2019 CHAPTER 6_correlation and Regression

    9/25

    Page 9 of25

    Calculation:

    6.6 Coefficient Of Correlation For Bivariate Grouped Data

    When the number of observations is very large, we need to arrange the data into

    different classes, which are either discrete or continuous. Items having values falling

    in a particular class are placed together and those having values falling in another

    class are placed together. Due to this the whole data is divided into horizontal rows

    and vertical columns, with one variable placed horizontally and the other placedvertically. The table so obtained is a two-way frequency distribution table and is

    called the correlation table or Bi-variate frequency distribution table. The formula for

    calculating and for bi-variate distribution is given by

    STEPS:

    1. First write down the mid-points of x along a horizontal raw and those of yalong a vertical column.

    2. Find

  • 8/3/2019 CHAPTER 6_correlation and Regression

    10/25

    Page 10 of25

    3. Multiply each frequency by the corresponding value of u then by correspondingvalue of v to get fuv. Write these numbers in the same box at the top.

    4. Add the frequencies horizontally, and write down the total. Similarly add thefrequencies vertically and write down its total.

    5. Multiply this additions of x by u to get f u.6. Multiply this addition of y by v to get f v.7. Multiply these frequencies by the square of u to get f u2.8. Multiply these frequencies by the square of v to get f v2.9. Add horizontally ( or vertically ) the top numbers denoting f u v written in each

    box ( or cell )

    10.Write down f u, f u2, f v, f v2 and f u v and then use the aboveformula.

    Example Calculate the coefficient of correlation for the following data.

    Age

    (years)

    of

    Husband

    Age (years) of wife

    Total10 -20 20 -30 30 -40 40 -50 50 -60

    10 - 25

    25 - 35

    35 - 45

    45 - 55

    55 - 65

    5

    3

    3

    15

    11

    11

    14

    7

    7

    12

    3

    3

    6

    8

    29

    32

    22

    9

    Total 8 29 32 22 9 100

  • 8/3/2019 CHAPTER 6_correlation and Regression

    11/25

    Page 11 of25

    Inserting, fuv = 94, n = 100, fu = -5, fv = -5, fu2 = 119 and fv2 = 119 in

    6.7 ProbableError

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    12/25

    Page 12 of25

    It is used to help in the determination of the Karl Pearsons coefficient of correlation r . Due to this r is corrected to a great extent but note that r depends on the

    random sampling and its conditions. it is given by

    P. E. = 0.6745

    i. If the value of r is less than P. E., then there is no evidence of correlation i.e. ris not significant.

    ii. If r is more than 6 times the P. E. r is practically certain .i.e. significant.iii. By adding or subtracting P. E. to r , we get the upper and Lower limits

    within which r of the population can be expected to lie.

    Symbolically e = r P. E.

    P = Correlation ( coefficient ) of the population.

    Example If r = 0.6 and n = 64 find out the probable error of the coefficient of correlation.

    Solution: P. E. = 0.6745

    = 0.6745

    =

    = 0.57

    6.8 Spearmans Rank Correlation Coefficient

    This method is based on the ranks of the items rather than on their actual values. Theadvantage of this method over the others in that it can be used even when the actual

    values of items are unknown. For example if you want to know the correlation

    between honesty and wisdom of the boys of your class, you can use this method by

    giving ranks to the boys. It can also be used to find the degree of agreements between

    the judgements of two examiners or two judges. The formula is :

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    13/25

    Page 13 of25

    R =

    where R = Rank correlation coefficient

    D = Difference between the ranks of two items

    N = The number of observations.

    Note: -1 R 1.

    i) When R = +1 Perfect positive correlation or complete

    agreement in the same direction

    ii) When R = -1

    Perfect negative correlation or completeagreement in the opposite direction.

    iii) When R = 0 No Correlation.

    Computation:

    i. Give ranks to the values of items. Generally the item with the highest value isranked 1 and then the others are given ranks 2, 3, 4, .... according to their

    values in the decreasing order.

    ii.

    Find the difference D = R1 - R2where R1 = Rank of x and R2 = Rank of y

    Note that D = 0 (always)

    iii. Calculate D2 and then find D2iv. Apply the formula.Note :

    In some cases, there is a tie between two or more items. in such a case each items

    have ranks 4th and 5th respectively then they are given = 4.5th rank. If three

    items are of equal rank say 4th then they are given = 5th rank each. If m be

    the number of items of equal ranks, the factor is added to S D2. If there

    are more than one of such cases then this factor added as many times as the number of

    such cases, then

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    14/25

    Page 14 of25

    Example Calculate R from the following data.

    Student No.: 1 2 3 4 5 6 7 8 9 10

    Rank in

    Maths :

    1 3 7 5 4 6 2 10 9 8

    Rank in

    Stats:

    3 1 4 5 6 9 7 8 10 2

    Solution :

    Student

    No.

    Rank

    inMaths

    (R1)

    Rank

    inStats

    (R2)

    R1 - R2D

    (R1 - R2 )2

    D2

    1 1 3 -2 4

    2 3 1 2 4

    3 7 4 3 9

    4 5 5 0 0

    5 4 6 -2 4

    6 6 9 -3 9

    7 2 7 -5 25

    8 10 8 2 4

    9 9 10 -1 1

    10 8 2 6 36

    N = 10 S D = 0 S D2 = 96

    Calculation of R :

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    15/25

    Page 15 of25

    Example Calculate R of 6 students from the following data.

    Marks

    in Stats :40 42 45 35 36 39

    Marks

    inEnglish

    :

    46 43 44 39 40 43

    Solution:

    Marks

    in

    Stats

    R1

    Marks

    in

    English

    R2 R1 - R2 (R1 -R2)2=D

    2

    40 3 46 1 2 4

    42 2 43 3.5 -1.5 2.25

    45 1 44 2 -1 1

    35 6 39 6 0 0

    36 5 40 5 0 0

    39 4 43 3.5 0.5 0.25

    N = 6 S D = 0 S D2

    = 7.50

    Here m = 2 since in series of marks in English of items of values 43 repeated twice.

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    16/25

    Page 16 of25

    Example The value of Spearmans rank correlation coefficient for a certain number ofpairs of observations was found to be 2/3. The sum of the squares of difference

    between the corresponding rnks was 55. Find the number of pairs.

    Solution: We have

    6.9Linear Regression

    Correlation gives us the idea of the measure of magnitude and direction betweencorrelated variables. Now it is natural to think of a method that helps us in estimating

    the value of one variable when the other is known. Also correlation does not imply

    causation. The fact that the variables x and y are correlated does not necessarily mean

    that x causes y or vice versa. For example, you would find that the number

    ofschoolsin a town is correlated to the number of accidents in the town. The reason

    for these accidents is not the school attendance; but these two increases what is known

    as population. A statistical procedure called regression is concerned with causation

    in a relationship among variables. It assesses the contribution of one or more variable

    calledcausing variable or independent variable or one which is

    beingcaused(dependent variable). When there is only one independent variable thenthe relationship is expressed by a straight line. This procedure is called simple linear

    regression.

    Regression can be defined as a method that estimates the value of one variable when

    that of other variable is known, provided the variables are correlated. The dictionary

    meaning of regression is "to go backward." It was used for the first time by Sir

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606701.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    17/25

    Page 17 of25

    Francis Galton in hisresearchpaper "Regression towards mediocrity in hereditary

    stature."

    Lines of Regression: Inscatter plot, we have seen that if the variables are highly

    correlated then the points (dots) lie in a narrow strip. if the strip is nearly straight, we

    can draw a straight line, such that all points are close to it from both sides. such a linecan be taken as an ideal representation of variation. This line is called the line of best

    fit if it minimizes the distances of all data points from it.

    This line is called the line of regression. Now prediction is easy because now all we

    need to do is to extend the line and read the value. Thus to obtain a line of regression,

    we need to have a line of best fit. But statisticians dont measure the distances bydropping perpendiculars from points on to the line. They measure deviations (

    orerrorsor residuals as they are called) (i) vertically and (ii) horizontally. Thus we

    get two lines of regressions as shown in the figure (1) and (2).

    (1) Line of regression of y on x

    Its form is y = a + b x

    It is used to estimate y when x is given

    (2) Line of regression of x on y

    Its form is x = a + b y

    It is used to estimate x when y is given.

    They are obtained by (1) graphically - by Scatter plot (ii)

    Mathematically - by the method of least squares.

    ii. Let y = a + b y ..... (1) where a and b are given by the normal equations y = n a + b x ..... (2)

    xy = a x + b x2

    .... (3)

    where n be the number of pairs of values of x and y.

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606901.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    18/25

    Page 18 of25

    Equation (6) is the equation of the line of regression of y on x.

  • 8/3/2019 CHAPTER 6_correlation and Regression

    19/25

    Page 19 of25

    is called the coefficient of regression of y on x which is obviously the

    slope of this line. Interchanging x and y in equation (6), the equation of the line of

    regression of x and y is given by

    Naturally bxy is the slope of this line which is equal to

  • 8/3/2019 CHAPTER 6_correlation and Regression

    20/25

    Page 20 of25

    Example A panel of two judges A and B graded dramatic performance by

    independently awarding marks as follows:

    Solution:

    The equation of the line of regression of y on x

    Inserting x = 38, we get

  • 8/3/2019 CHAPTER 6_correlation and Regression

    21/25

    Page 21 of25

    y - 33 = 0.74 ( 38 - 33 )

    y - 33 = 0.74 5

    y - 33 = 3.7

    y = 3.7 + 33

    y = 36.7 = 37 ( approximately )

    Therefore, the Judge B would have given 37 marks to 8th performance.

    Example The tworegression equationsof the variables x an y are

    x = 19.13 - 0.87 y and y = 11.64 - 0.50 x

    Find (1) Mean of xs

    (2) Mean of ys

    (3) Correlation coefficient between x and y

    Solution:

    1. Calculation of Mean

    \Mean of xs = 15.94 and Mean of ys = 3.67

    2.Calculation of r

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606905.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606905.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606905.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606905.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    22/25

    Page 22 of25

    x = 19.93 - 0.87 y

    Therefore,

    and y = 11.64 - 0.50 x

    Therefore,

    From (3) and (4)

    r = 0.66

    But regression coefficient are negative

    r = - 0.66

    Example In a partially destroyed laboratory record of an analysis of correlation data,

    the following results are legible:

    Variance of x = 9Regression equations : 8 x - 10 y + 66 = 6

    40 x - 18 y = 214

    What are (1) Means of xs and ys (2) the coefficient of correlation between x and y(3) the standard deviation of y ?

    Solution:

    1. Means:8 x - 10 y = -66 ----- (1)

    40 x - 18 y = 214 ----- (2)

    Solving (1) and (2) as

  • 8/3/2019 CHAPTER 6_correlation and Regression

    23/25

    Page 23 of25

    40 x - 50 y = -330 ----- (1)

    40 x - 18 y = 214 ----- (2)

    -32 y = -544

    y = 17

    Mean of ys 17

    Substituting y = 17 in (1) we get 8x - 10 17 = -66

    or 8x = 104 x = 13

    Mean of xs = 13

    2.Coefficient of correlation between x and y

    40 x = 18 y + 214

    Also -10 y = - 8 x - 66

    Therefore,

    3. Standard deviation of y Variance of x i.e. x2 = 9 x = 3

    Now byx =

  • 8/3/2019 CHAPTER 6_correlation and Regression

    24/25

    Page 24 of25

    y = 0.4

    Example From 10 observations of price x and supply y of a commodity the results

    obtained x = 130, y = 220, x2 = 2288, xy = 3467

    Compute the regression of y on x and interpret the result. Estimate the supply whenthe price of 16units.

    Solution: The equation of the line of regression of y on x

    y = a + b x

    Also from normal equations

    y = n a + b x and xy = a x + b x2

    we get

    220 = 10 a + 130 b (1)

    3467 = 130 a + 2288 (2)

    Solving (1) and (2) as

    2860 = 130 a + 1690 b

    3467 = 130 a + 2288 b

    On subtraction

    607 = 598 b b = 1.002

    Putting b = 1.002 in 220 = 10 a + 130 b, we get a = 8.974.

    Hence the 3 equation of the line of regression of y on x is

    y = 8.974 + 1.002 x

    When x = 16, we get

    y = 8.974 + 1.002 ( 16 )

    y = 25.006

    Example If is the acute angle between the two regression lines in the case of two

    variables x and y show that

    http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606907.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606907.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606907.asphttp://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606907.asp
  • 8/3/2019 CHAPTER 6_correlation and Regression

    25/25

    P 25 f 25

    with usual meanings. Explain the significance when r = 0 and r = 1.

    Solution: The slopes of the two regression lines are

    If r = then tan = or = /2 i.e. there is no relationship between two variables i.e.

    independent or uncorrelated.

    If r = 1 then = 0. The two regression lines are coincident or parallel and the

    correlation is perfect.

    **********