Research Methodology-Chapter 14

26
CHAPTER-14 INTRODUCTION TO CORRELATION & REGRESSION ANALYSIS By DR. PRASANT SARANGI

description

 

Transcript of Research Methodology-Chapter 14

Page 1: Research Methodology-Chapter 14

CHAPTER-14

INTRODUCTION TO CORRELATION & REGRESSION ANALYSIS

ByDR. PRASANT SARANGI

Page 2: Research Methodology-Chapter 14

Key concepts:

Introduction to Correlation Analysis Rank Correlation Linear Regression Analysis Multiple Regression Analysis

Page 3: Research Methodology-Chapter 14

CORRELATION ANALYSIS

• Positive Correlation• Negative Correlation• Linear Correlation and • Non-linear Correlation

Page 4: Research Methodology-Chapter 14

Positive Correlation

• Two variables are said to be positively correlated when the movement of the one variable leads the movement of the other variable in the same direction.

• There exists direct relationship between the two variables.

Page 5: Research Methodology-Chapter 14

Negative Correlation

• Correlation between two variables is said to be negative when the movement of one variable leads to the movement in the other variable in the opposite direction.

• Here there exists inverse relationship between the two variables.

Page 6: Research Methodology-Chapter 14

Linear Correlation

• The correlation between two variables is said to be linear where the points when drawn is a graph represents a straight line.

• Non-linear Correlation A relationship between two variables is said to be non-linear if a unit change

in one variable causes the other variable to change in fluctuations. If X is changed then corresponding values of Y will not change in the same

proportion.

Page 7: Research Methodology-Chapter 14

Methods of Measuring Correlation

• The Graphical MethodThe correlation can be graphically shown by using scatter diagrams. Scatter diagram reveals two important useful information. Firstly, through this diagram, one can observe the patterns between two

variables which indicate whether there exists some association between the variables or not.

Secondly, if an association between the variables is found, then it can be easily identified regarding the nature of relationship between the two (whether two variables are linearly related or non-linearly related).

Page 8: Research Methodology-Chapter 14

• Karl Pearson’s Coefficient of Correlation Karl Pearson’s coefficient of correlation (developed in 1986) measures

linear relationship between two variables under study. Since, the relationship is expressed is linear, hence, two variables change in a fixed proportion. This measure provides the answer of the degree of relationship in real number, independent of the units in which the variables have been expressed, and also indicates the direction of the correlation.

Page 9: Research Methodology-Chapter 14

• Direct method

22ii

iiXY

yx

yxr

Assumed Mean Method

2222 )()(

))((

YYXX

YXYXXY

ddnddn

ddddnr

Page 10: Research Methodology-Chapter 14

• Grouped Data

2222 )()(

))((

YYXX

YXYXXY

fdfdnfdfdn

fdfddfdnr

Page 11: Research Methodology-Chapter 14

Assumptions of Coefficient of Correlation

1. The Value of the Coefficient of Correlation Lies between -1 (minus one) to +1 (plus one).

2. The Value of the Coefficient of Correlation is Independent of the Change of Origin and Change of Scale of Measurement

2222 )()(

)()(

iiii

iiiiXY

kknhhn

khkhnr

Page 12: Research Methodology-Chapter 14

Rank Correlation Coefficient

There are three different situations of applying the Spearman’s rank correlation coefficient.

• When ranks of both the variables are given• When ranks of both the variables are not given and • When ranks between two or more observations in a series are

equal

Page 13: Research Methodology-Chapter 14

• When Ranks of Both the Variables are Given

)(

61

61

2

2

3

2

nnn

dor

nn

dRXY

When Ranks of both the Variables are not Given

•In such cases, each observation in the series is to be ranked first.

•The selection of highest value depends on the researcher.

• In other words, either the highest value or the lowest value will be ranked 1 (one) depends upon the decision of the researcher.

Page 14: Research Methodology-Chapter 14

• When Ranks between Two or More Observations in a Series are Equal• The ranks to be assigned to each observation are an average of the ranks

which these observations would have got, if they differed from each other.

)1(

......)(12

1)(

12

1)(

12

16

12

3332

321

31

2

nn

mmmmmmd

RXY

Page 15: Research Methodology-Chapter 14

Simple Linear Regression Model

Page 16: Research Methodology-Chapter 14

What do we use regression models for:

1. Estimate a relationship among economic variables, such as y = f(x).

2. Test hypotheses

3. Forecast or predict the value of one variable, y, based on the value of another variable, x.

Page 17: Research Methodology-Chapter 14

Dependent and Independent Variables

Dependent variable - the variable we are trying to explain

Independent (or explanatory) variables - variables that we think cause movements in the dependent variable

Page 18: Research Methodology-Chapter 14

Simple Regression Model

Y = dependent variableX = independent variable

Model is: Y = α + Xα is the intercept or constant is the slope coefficient

Page 19: Research Methodology-Chapter 14

Linearity

Models that are linear in the variables and in the coefficients:Y = α + X

Models that are nonlinear in the variables but linear in the coefficients: Y = α + X2

Page 20: Research Methodology-Chapter 14

Models that are nonlinear in the variables and in the coefficients:Y = α + X

Some models that are nonlinear can be made linear in the coefficients:

Y = e α X

take logs:ln Y = α + ln X

Page 21: Research Methodology-Chapter 14

r

E(Y|X)

E(Y|X)

AverageExpenditure

X (income)

E(Y|X)= α + X

=E(Y|X)

X

An Example showing income and average expenditure

Page 22: Research Methodology-Chapter 14

Error Term

Y is a random variable composed of two parts:

I. Systematic component: E(Y) = α + X This is the mean of Y.

II. Random component: u = Y - E(Y | X) = Y - α - X

u is called the stochastic or random error.

Together E(Y) and u form the model: Y = α + X + u

Page 23: Research Methodology-Chapter 14

Sources of error term

• Dependent variable measured with error• Model left out relevant variables• Wrong functional form• Inherent randomness of behaviour

Page 24: Research Methodology-Chapter 14

True Relationship

u4

Y

X

E(Y)= α + X

Y4

Y1

Y3

Y2

X1 X2 X3 X4

u1

u2

u3

Page 25: Research Methodology-Chapter 14

The Estimated Model

We use the data on Y and X to come up with guesses for α and . These estimated parameters or coefficients are

α and cap

^ ^

Page 26: Research Methodology-Chapter 14

Our estimated, or “fitted”, model gives the predicted value for Y for any given X:

Yi = α + Xi

The residual is the difference between the actual or observed value of Y and the predicted value:

ui = Yi - Yi = Yi - α - Xi

^ ^ ^

^ ^ ^ ^