Multivariate Methods
description
Transcript of Multivariate Methods
![Page 1: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/1.jpg)
Multivariate Methods
LIR 832
![Page 2: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/2.jpg)
Multivariate Methods: Topics of the Day
A. Isolating Interventions in a multi-causal worldB. Multivariate probability DistributionsC. The Building Block: covarianceD. The Next Step: Correlation
![Page 3: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/3.jpg)
A Multivariate World
Isolating Interventions in a Multi-Causal World A. Example of problem:
Evaluate a program to reduce absences from a plant? Is there age discrimination?
B. Types of data Experimental Quasi-experimental Non-experimental
C. Need multivariate analysis to sort out causal relationships.
![Page 4: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/4.jpg)
Bi-Variate Relations: A First Run at Multivariate Methods
A. Many of the issues we are interested in are essentially about the relationship between two variables.B. Bi-variate can be generalized to multivariate relationshipsC. We learn bi-variate formally and make more intuitive reference to multivariate.D. What do we mean by bi-variate relationship?
![Page 5: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/5.jpg)
Bi-Variate ExampleOur firm, has formed teams of engineers, accountants and general managers at all plants to work on several issues that are considered important in the firm. The firm has long been committed to gender diversity and we are interested in the distribution of gender among our managerial classifications. We are particularly concerned about the distribution of gender on these teams and particularly among engineers. Consider the distribution of two statistics about these three person teams.
a. gender of the team members (X: x = number of men) b. is the engineer a woman (Y: 0 = man, 1 = woman)
![Page 6: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/6.jpg)
Bi-Variate Example (cont.)
![Page 7: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/7.jpg)
Bi-Variate Example (cont.)
![Page 8: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/8.jpg)
Bi-Variate Example (cont.)
![Page 9: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/9.jpg)
Bi-Variate Example (cont.)
![Page 10: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/10.jpg)
Bi-Variate Example (cont.)
We can also use this information to build conditional probabilities: What is the likelihood that the engineer is a woman, given that we have a man on the team?
![Page 11: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/11.jpg)
Bi-Variate Example (cont.)
What is the likelihood that the engineer is a woman, given that we have a man on the team? P(Y = 1 & X = 1|X= 1) = P(Y = 1 & X = 1)/P(X= 1) = (2/8) / (3/8) = 2/3
Note: P(Y= 1|X=2) is: “the probability that Y is equal to 1 given that X = 2"
or “the probability that Y = 1 conditional on X = 2"
![Page 12: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/12.jpg)
Bi-Variate Example (cont.)
What is the likelihood that there is only one man, given the engineer is a woman? P(Y = 1 & X = 1|Y= 1) = P(Y = 1 & X = 1)/P(Y= 1) = (2/8)/(4/8) = 2/4 =1/2
![Page 13: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/13.jpg)
Bi-Variate Example (cont.)What is the likelihood that the engineer is a woman? P(Y= 1) = 1/2
But if we know that there are two men, we can improve our estimate: P(Y=1 |X=2) = P(Y=1 & X=2|X=2) = P(Y=1 &X=2) / P(X=2) = 1/8 / 3/8 = 1/3
What about calculating the likelihood of two men given the engineer is a woman?
![Page 14: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/14.jpg)
Example: Gender Distribution
![Page 15: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/15.jpg)
Example: Gender Distribution
Working with Conditional Probability:
P(female) = 50.91%
P(female| LRHR) = p(Female & LRHR)/P(LRHR) = 0.36/0.55 = 65%
P(LRHR) = 0.55%
P(LRHR|Female) = p(lrhr & female)/p(female) = .36/50.91 = .70%
![Page 16: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/16.jpg)
Independence Defined
Now that we know a bit about bi-variate relationships, we can define what it means, in a statistical sense, for two events to be independent.If events are independent, then Their conditional probability is equal to their
unconditional probability The probability of the two independent events
occurring is P(X)*P(Y) = P(X,Y).
![Page 17: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/17.jpg)
Importance of Independence
Why is independence important? If events are independent, then we are getting
unique information from each data point. If events are not independent, then
A practical example on running a survey on employee satisfaction within an establishment.
![Page 18: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/18.jpg)
Example: Employee Satisfaction
![Page 19: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/19.jpg)
Covariance
Covariance: Building Block of Multi-variate Analysis All very nice, but what we are looking for is a
means of expressing and measuring the strength of association of two variables. How closely do they move together? Is variable A a good predictor of variable B?
Move to a slightly more complex world, no more 2 and three category variables
![Page 20: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/20.jpg)
Example: Age and Income Data
![Page 21: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/21.jpg)
Example:Age and Income Data
![Page 22: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/22.jpg)
Example: Age and Income Data
![Page 23: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/23.jpg)
Example: Age and Income Data
__________________________________________________________________Descriptive Statistics: age, annual income
Variable N Mean Median StDev SE Meanage 23 24.565 23.000 4.251 0.886annual I 23 17174 10000 15712 3276
Variable Minimum Maximum Q1 Q3age 22.000 42.000 22.000 26.000annual I 0 65000 7000 25000_________________________________________________________________
![Page 24: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/24.jpg)
Example:Age and Income Data
![Page 25: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/25.jpg)
Example:Age and Income Data
•Adding some info to the graph…
![Page 26: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/26.jpg)
Covariance and Correlation Defined
Define Covariance and Correlation for a random sample of data: Let our data be composed of pairs of data
(Xi,Yi) where X has mean x and Y has mean y. Then the covariance, the co-movement around their means, is defined as:
![Page 27: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/27.jpg)
Example: CovarianceWe observe the relationship between the number of employees at work at a plant and the output for five days in a row:Attendance Output8 403 282 206 394 28What is the covariance of attendance and output?
![Page 28: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/28.jpg)
Example: Covariance (cont.)
The covariance is positive. This suggests that when attendance is above its mean, output is also above its mean. Similarly, when attendance is below its mean, output is below its mean.
![Page 29: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/29.jpg)
![Page 30: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/30.jpg)
Example: Overtime Hours and Productivity
![Page 31: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/31.jpg)
Example: Overtime Hours and Productivity
![Page 32: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/32.jpg)
Example: Overtime Hours and Productivity
Covariances: prod-avg, week
prod-avg weekprod-avg 113.7292week -49.5667 22.6667
![Page 33: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/33.jpg)
Example: Overtime Hours and Productivity (cont.)
![Page 34: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/34.jpg)
Example: Overtime Hours and Productivity (cont.)
![Page 35: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/35.jpg)
Example: Overtime Hours and Productivity (cont.)
Covariances: prod-avg, week, week-hours
prod-avg week week-hoursprod-avg 233.3345week -51.8706 21.3986week-hours -89.0777 0.0000 99.3069
![Page 36: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/36.jpg)
Example: Overtime Hours and Productivity (cont.)
![Page 37: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/37.jpg)
Example: Overtime Hours and Productivity (cont.)
![Page 38: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/38.jpg)
Example: Overtime Hours and Productivity (cont.)
![Page 39: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/39.jpg)
Correlation vs. Covariance
A limitation of covariance is that it is difficult to interpret. Its units are not well defined.Thus, we need a measure which is more readily interpreted and tells about the strength of association.Correlation: Population Correlation is Defined as:
![Page 40: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/40.jpg)
Correlation = 1.00
![Page 41: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/41.jpg)
Correlation = 0.94
![Page 42: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/42.jpg)
Correlation = 0.604
![Page 43: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/43.jpg)
Correlation = 0.198
![Page 44: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/44.jpg)
Correlation: Previous Examples
![Page 45: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/45.jpg)
Correlation: Previous Examples
![Page 46: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/46.jpg)
![Page 47: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/47.jpg)
Correlation:Previous Examples
![Page 48: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/48.jpg)
Correlation: Previous Examples
Overtime-Productivity:Limit to 5 days, 10 hours:
Correlations: prod-avg, week, week-hours
prod-avg weekweek -0.734 0.000
week-hours -0.585 0.000 0.000 1.000
![Page 49: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/49.jpg)
Correlations:Previous Examples
![Page 50: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/50.jpg)
Example: Correlation
![Page 51: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/51.jpg)
Example: Correlation
![Page 52: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/52.jpg)
Example: Correlation
What about some real data: Relationship between age gender and weekly earnings among human resource managers (admin associated occupations)?
![Page 53: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/53.jpg)
Example: CorrelationDescriptive Statistics: Female, age, weekearn
Variable N N* Mean Median TrMean StDevFemale 55158 0 0.50471 1.00000 0.50524 0.49998age 55158 0 42.357 42.000 42.103 11.662weekearn 47576 7582 894.53 769.23 846.16 562.22
Variable SE Mean Minimum Maximum Q1 Q3Female 0.00213 0.00000 1.00000 0.00000 1.00000age 0.050 15.000 90.000 33.000 51.000weekearn 2.58 0.01 2884.61 519.00 1153.00
![Page 54: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/54.jpg)
Example: Correlation
Tabulated Statistics: Female Rows: Female
weekearn weekearn Mean StDevMale 1085.4 622.1Female 727.2 440.5 All 894.5 562.2
![Page 55: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/55.jpg)
Example: CorrelationTabulated Statistics: Female
Rows: Female weekearn age weekearn age Mean Mean StDev StDev male 1085.4 43.256 622.1 11.856female 727.2 41.475 440.5 11.399all 894.5 42.357 562.2 11.662
![Page 56: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/56.jpg)
Example: CorrelationCovariances: age, weekearn, Female
age weekearn Femaleage 135.99weekearn 1119.24 316094.42Female -0.45 -89.17 0.25
![Page 57: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/57.jpg)
Example: Correlation
Correlations: age, Female, weekearn
age FemaleFemale -0.076 0.000weekearn 0.174 -0.318
![Page 58: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/58.jpg)
Example: Correlation
![Page 59: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/59.jpg)
Example: Non-Linearity
![Page 60: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/60.jpg)
Correlation and Covariance
So covariance and correlation are measures of linear association, but not measures of association in general (or of non-linear association).
![Page 61: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/61.jpg)
Correlation and Covariance
What if we do not have data on individuals but data on distributions? Example, we have plant level data but plants vary widely in employment. We want to give greater weight to plants with more employees.
![Page 62: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/62.jpg)
Correlation and Covariance
![Page 63: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/63.jpg)
Correlation and Covariance
![Page 64: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/64.jpg)
Correlation and Covariance
![Page 65: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/65.jpg)
Correlation and Covariance
![Page 66: Multivariate Methods](https://reader036.fdocuments.net/reader036/viewer/2022062302/56814c4b550346895db953ad/html5/thumbnails/66.jpg)
Correlation and Covariance