EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

20
EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back

Transcript of EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Page 1: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

EC339: Lecture 3Chapter 2: Correlation and univariate analysis: A step back

Page 2: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Histograms Way more important than you might think! Look at data’s range

Ex: sample has ages 18-64 Divide into bins

There are formulas… but for all intents and purposes… make it look nice. Bin widths should be round numbers if possible

Easier in SPSS than Excel (First, create new dataset)

Go through ‘SATHist’ and ‘3DHist’

Page 3: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Correlation Data in SPSS Excel Histograms in ‘SATHist’ sheet Create an SPSS *.sav file with this data Use Chartbuilder to create a histogram Double-click on result, and go to the

“Elements Add distribution curve”

Page 4: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

This histogram is done using “Histogram” in SPSS, the next two are done using “Histogram Percent” which relates much better to the normal curve… which we will be using repeatedly.

Page 5: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

2

2

1

22

1)(

x

exfOnly need to know the mean and standard deviation to plot ANY normal curve

Page 6: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.
Page 7: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Points created in Excel using Normal equation. See my version of ‘SATScatter’

Page 8: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Open Correlation.xls SATHist tab Comparing Verbal and Math SAT entrance

scores for Wabash College Do you think there should be a relationship?

Positive or Negative?

Which has higher average? What does the standard deviation tell you?

Page 9: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Correlation

n

i y

i

x

i

s

yy

s

xx

nr

1

)()(

1

1

Using this equation: Open the “Wine_Exercise.xls” Spreadsheet and complete this table. With this data, you should also create a scatter plot, show the trendline with regression line equation, and calculate the SD line slope and intercept. We will calculate the regression line slope and intercept when we go over chapter 4. You can do what the spreadsheet says now, and we will get back to it later. See the [Correlation.xls]corr sheet to walk through another example of calculating correlation coefficients.

Wine Exercise

Page 10: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

SATScatter Sheet What does the picture tell you about the

relationship? SD (Standard Deviations) Line

If you increase x by one SD, and increase y by one SD. Slope has sign of correlation coefficient Passes through point of averages Remember “point-slope” method to find the equation of a

line Average x and y lines meet at point of averages

)(slopeintercept xy

Page 11: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Extreme Sheet -1 < r < 1 Correlation is BOUND between -1 and 1 Sign of the relationship: positive or negative How strong is the relationship Look for “cigar-shaped” cloud

Use the Patterns sheet to play with different correlation coefficients. Note that you don’t get a pronounced “cigar-shaped” cloud until r is 0.9 or higher.

Page 12: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

3D Histogram: Multivariate Analysis

Page 13: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Correlation r is used to measure the degree of linear

association between two variables, but it is not perfect. A high r should never be used to infer causation and r may do a poor job of summarizing the relationship.

r is the sample estimate of ρ “rho” for the population value

Association is not causation

Page 14: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Correlation Dangers Twice the r, does not mean twice as much

clustering r doesn’t tell you about the slope of a relationship Misleading summary—exactly in the same way

that the average and SD are sometimes not enough to describe a list of numbers (e.g., unsymmetrical histogram or outliers): See Patterns sheet

View misleading correlations in corr sheet

Page 15: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Misleading r: Patterns Worksheet Show misleading r Cycle through r You can change parameters yourself to see

how these values might change

Page 16: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

SD Line: Has positive slope, if r is positive (negative if r is negative). The slope is std(math)/std(verbal) in this case. (std(y)/std(x) generally).

The SD line goes THROUGH the intersection of the means.

SPSS Version of SATScatter

Page 17: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Costa Rica Example (CRExample) In this example, what does the SDLine

column signify? Does this SD Line look like a good ‘fit’ of the

data?

Page 18: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Aggregation Problem a.k.a. Ecological Correlation

Ex: If you average data by some grouping, you obtain a different correlation than if you take the correlation of individuals. (almost always…)

Correlation at group level suppresses individual variation

See [EcolCorr.xls] and [EcolCorrCPS.xls] Walk through live sheet (F9 is the key here)

Page 19: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

EcolCorrCPS

Educ Age Earnings Educ Age EarningsAverage 13.50 43.45 32540 Educ 1SD 0.33 0.58 4775 Age -0.17 1Max 14.22 45.38 43079 Earnings 0.63 -0.30 1Min 12.88 41.91 23290

Summary Statistics Correlation Table

Data averaged at the state level

Individual Data

Educ Age EarningsAverage 13.49 43.38 33356SD 2.53 8.18 43753Max 46 60 361315Min 31 30 -9999

Educ Age EarningsEduc 1Age 0.003 1Earnings 0.336 0.040 1

Summary Statistics

Correlation Table

-$50,000

$0

$50,000

$100,000

$150,000

$200,000

$250,000

$300,000

$350,000

$400,000

30 35 40 45 50 55 60

Ea

rnin

gs in

200

1 ($

)

Note the vastly different correlations between earnings and education and especially age and earnings. Grouping your data can be VERY misleading.

Page 20: EC339: Lecture 3 Chapter 2: Correlation and univariate analysis: A step back.

Correlation Lab Open CorrelationLab.doc Use Associated Files

StockReturns.xls Hitters1999.xls