March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2....
Transcript of March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2....
![Page 1: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/1.jpg)
March 19th
Tamara Stallard; Cynthia Wheeler; Chris Tuskan; Daniella
Odesa; Kristine Petitta; Taylor Hood; Michelle Janulis; Diana
Rusin;
March 26th
Nicole Bullock; Kelly Robertson; Daniel Wood; Steph Boak;
Pouya Moghaddam; Joanna Lee; Nicole Corbo; Misghana
Ghebredingle
April 2nd (last class)Blake Huggins; Jamie-Lee Bossenberry; Ryan O’Quinn;
Sarah Stewart; Brittany Jenkins; Ryan Higgins; Greg
Morrow; Baley Gofton; Cynthia Hill; Brandi Spitzig;
Class presentation (10 minutes each) • •
Brief introduction to your topic:
Dependent variable & independent variables
Sample (what’s your target population)
What are you anticipating? (optional: causal diagram)
What has previous literature suggested that you can expect?
Findings: optional
• 1 page handout and/or power point (for my records)..
![Page 2: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/2.jpg)
• Assignment 3–due next week March 19th
• Do note: In this assignment I also get you
to run the model with the “WEIGHTED”
data..
• Today I will begin by talking a bit about
this (sampling and sample weights)
• To be followed by:
• Tips on creating models, regardless of
whether we are working with OLS or
Logistic Regression..
![Page 3: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/3.jpg)
In Soc 2206, you learned a bit about sampling strategies..
1. SRS sampling: Everyone in the target population has an equal chance
of being selected.
2. Stratified Samples:
First divide your sampling frame into strata (ex: provinces, or anything else),
and then use SRS sampling within each province
Proportionate: number within each strata is proportionate to population
Disproportionate: number within each strata is disproportionate
Typical of Stats Can: Stratified by province, over represent some provinces
and under represent others in their samples.
3. Cluster sampling (involving clusters, which can be almost anything:
Geographies, institutions, social groups, etc.)
Multistage sample of Canadian households (more than 10 million in Canada)1. Randomly selected metropolitan areas across Canada (35 CMAs current)
2. Within those selected CMA, randomly select census tracts (typically 100s)
3. Within those selected census tract, randomly select city blocks (typically dozens within each)
4. Within those selected city blocks, randomly select households (typically 100s)
![Page 4: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/4.jpg)
![Page 5: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/5.jpg)
![Page 6: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/6.jpg)
![Page 7: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/7.jpg)
5000 10,000
Beyond a certain size (eg. 5000), limited
returns on reducing sampling error
Sample size is important,
BUT:
![Page 8: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/8.jpg)
ANOTHER IMPORTANT POINT:
In predicting “sampling error”, it is “SAMPLE SIZE” that counts and not
the size of the “targeted POPULATION”
In the above example, to get a margin of error +/- 2.5 %,
you require a sample size of 1,534 persons for a population of 1,000,000
you require a roughly equally sized sample size for a population of 100,000
In other words, in Stats Can’s surveys, a sample of roughly a couple of 1000 in
each province would give estimates of roughly equivalent sampling error
for all provinces, regardless of the size of their respective populations..
![Page 9: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/9.jpg)
Assume we take a simple random sample (SRS) of Canada:
Every individual is given an equal chance of selection in the final sample
Lets assume that we interview 2000 Canadians in our sample (n-2000)
Probability of selection: n/P , i.e. 2000/35,540,419 = 0.000056274
P n
Each person in this sample
has the same “weight”
(inverse of the probability of
selection)
W = P/n =
35,540,419/2,000 =
17770.2
2014
Canada 35,540,419 2000
Newfoundland and Labrador 526,977 30
Prince Edward Island 146,283 8
Nova Scotia 942,668 53
New Brunswick 753,914 42
Quebec 8,214,672 462
Ontario 13,678,740 770
Manitoba 1,282,043 72
Saskatchewan 1,125,410 63
Alberta 4,121,692 232
British Columbia 4,631,302 261
In a SRS (n=2000) of Canadians, each person in our “unweighted” sample
would represent 17,770 persons…
It is possible to “weight your sample”, by making each case in your sample
represent 17770.2 cases (your weighted sample would look like your Population)
“unweighted” sample
![Page 10: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/10.jpg)
2014
Canada 35,540,419 2000
Newfoundland and Labrador 526,977 30
Prince Edward Island 146,283 8
Nova Scotia 942,668 53
New Brunswick 753,914 42
Quebec 8,214,672 462
Ontario 13,678,740 770
Manitoba 1,282,043 72
Saskatchewan 1,125,410 63
Alberta 4,121,692 232
British Columbia 4,631,302 261
For this reason: All of Stats Can surveys make sure that
they have at a few 1000 for “all Provinces”
Why, reasonable quality statistical estimates for all provinces…
While this sample at the national level works well (n=2000), we can not
draw inferences for specific provinces (for example, PEI = 8 cases)
![Page 11: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/11.jpg)
Both GIS, NLSCY and Health Survey are:
Stratified Samples (Disproportionate to size)
1. Divide Canada up into provinces
2. Take sufficient sample from each province to get good estimates
3. Simple random sample within provinces
Unweighted sample
Note: Probability of selection
differs by province
PEI: 1500/146,283
Ontario: 4000/13,678,740
Similarly, weights differ by province
PEI 146,283/1500 = 97.5
Ontario 13,678,740/4000 = 3419.7
Unweighted sample here: roughly 6.00 per cent (1500/2500) of the sample is
In PEI. In the population it is 0.41 per cent (146,273/35,540,419).
NOTE: WEIGHTED RESULTS WILL HAVE AN EQUAL DISTRIBUTION TO THE POPULATION
![Page 12: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/12.jpg)
In assignment 3, I have you run your “model results” with the
appropriate “weights” (easy to do)
Corrects for potential biases due to your sampling strategy.
![Page 13: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/13.jpg)
![Page 14: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/14.jpg)
e.g. with the Census, I focus on “likelihood of
low income” as my dependent variable
e.g. assume my research interests relate to the higher than
average incidence of low income among immigrants in Canada
Immigrant status and the likelihood of low income will be my primary
emphasis
Other variables? What’s important in the literature?
![Page 15: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/15.jpg)
Recommend either Social Science Citation Index or
Sociological Abstracts..
![Page 16: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/16.jpg)
Relevant in this context:
What if any sub-sample should be selected for your research…
Therefore I ask you in Assignment 3 (necessary step):
No need to focus on a specific subsample here: I will be comparing
immigrants with other Canadians (the full population will be involved)
![Page 17: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/17.jpg)
Find 5 studies that
explicitly focus on
your topic:
why immigrants are
more likely to
experience low
income
Note: your literature review is brief, so
stick to research directly related
to your research..
http://www.kings.uwo.ca/academics/soc
iology/resources-and-
information/sociology-department-
academic-awards/
![Page 18: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/18.jpg)
![Page 19: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/19.jpg)
Example:
this study hypothesizes:
Canadian immigrants are more likely to experience low income than other
Canadians.
This relationship is expected to be partially explained by “length of
“residence in Canada”.
The incidence of low income is expected to be highest among recent
immigrants and lowest among well-established immigrants
Yet the disadvantage of being an immigrant is expected to persist,
even among more established immigrants (even after controlling
for other relevant controls (sex, language age and education)
![Page 20: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/20.jpg)
Relevant in developing hypotheses:
Types of Multivariate Relationships
With contingency tables, we covered:
1. Spuriousness (not likely in your paper)
2. Causal chains
3. Suppressor variables
4. Multiple causes (independent effects)
5. Interaction effects
With regression, we can test for all of these
![Page 21: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/21.jpg)
1. Spurious relationships
• Initially an association is documented, yet with a control, the initial relationship disappears
Evidence in regression:
• Initial bivariate regression has a statistically significant slope or odds ratio
• When the control variable(s) are introduced, the coefficient is no longer significant
Bivariate:
X1 Y
![Page 22: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/22.jpg)
1. Spurious relationships
• Initially an association is documented, yet with a control, the initial relationship disappears
Evidence in regression:
• Initial bivariate regression has a statistically significant slope or odds ratio
• When the control variable(s) are introduced, the coefficient is no longer significant
Bivariate:
X1 Y
Multivariate: Y
X2
X1
![Page 23: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/23.jpg)
1. Spurious relationships
• Initially an association is documented, yet with a control, the initial relationship disappears
Evidence in regression:
• Initial bivariate regression has a statistically significant slope or odds ratio
• When the control variable(s) are introduced, the coefficient is no longer significant
Bivariate:
X1 Y
Multivariate: Y
X2
X1
![Page 24: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/24.jpg)
Types of Multivariate Relationships
Example: We conduct research on a sample of FORD
assembly line workers and document a positive
relationship between “Salary” and “Absenteeism”:
![Page 25: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/25.jpg)
Types of Multivariate Relationships
One might speculate a spurious relationship:
Salary
Age
Absenteeism
![Page 26: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/26.jpg)
Types of Multivariate Relationships2) chain relationships
• A relationship exists between X1 and Y at the bivariate level, which is modified with the addition of control variable(s)
•
• Consider:
X1 Y
X1
X2 Y
In this case, X1 is said to be the antecedent variable in the causal chain,
whereas X2 is referred to as an “intervening” variable..
Consider the following example:
We consider gender and income; what sorts of intervening variables might explain the initial relationship
SEX -> ? -> income
X2
![Page 27: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/27.jpg)
• Assume we are examining the relevance of “sex” to “market
earnings”..
• Sex Market income
• (0-female; 1-male)
• Run a linear regression:
Men are earning, 18111 dollars more than women
beta suggests a moderate effect (.154)
![Page 28: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/28.jpg)
Why such a gap?
How about the simple fact that women are more likely to work part-time?
Sex # hrs worked (weekly) Market income
(0-female; 1-male)
![Page 29: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/29.jpg)
Why such a gap?
How about the simple fact that women are more likely to work part-time?
Sex # hrs worked (weekly) Market income
(0-female; 1-male)
Run a regression with both independent variables
![Page 30: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/30.jpg)
Why such a gap?
How about the simple fact that women are more likely to work part-time?
Sex # hrs worked (weekly) Market income
(0-female; 1-male)
Difference persists!! Women are now making 12000 less
Yet not as strong an effect..
Part of the initial relationship between sex and income is explained by the intervening
variable (hours worked). Sex (as an antecedent variable) continues to be important,
even after controlling for hours worked.
Run a regression with both independent variables
![Page 31: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/31.jpg)
Types of Multivariate Relationships2) chain relationships
• A relationship exists between X1 and Y at the bivariate level, which is modified with the addition of control variable(s)
•
• Consider:
X1 Y
X1
X2 Y
• If we control for X2, various possibilities with the initial relationship (X1 and Y):
• - the initial effect on X1 on Y might disappear completely
• - the initial effect on X1 on Y is weakened (this is the most common outcome)
• - the initial effect on X2 on Y can even get larger (rarer, but it can happen)
• Note: the results in multiple regression can sometimes look the same as with spuriousness (i.e. the initial relationship disappears)..major difference is in interpretation
• Also: although the effect of X1 on Y might disappear, X1 is still involved in our “causal explanation” as an “indirect” cause
X2
![Page 32: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/32.jpg)
• Another example:
• Return to my initial hypothesis:
• Canadian immigrants are more likely to experience low income than other Canadians.
• This relationship is expected to be partially explained by “length of “residence in Canada”.
• Length of residence in Canada Low income
We create a variable for this purpose, to be run in logistic regression:
YR OF IMMIGRATION Low income
0 – Canadian born (set as reference category)
1 – immigrated prior to 1980
2- immigrated 1980-1999
3- immigrated 2000 or later (recent immigrant)
![Page 33: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/33.jpg)
YR OF IMMIGRATION Low income
0 – Canadian born (set as reference category)
1 – immigrated prior to 1980
2- immigrated 1980-1999
3- immigrated 2000 or later (recent immigrant)
Immigrants who came to Canada since 2000 are the most disadvantaged
314.1 per cent higher odds of being poor relative to our reference category:
the Canadian born (4.141 – 1.0)*100
Immigrants who came 1980-1999 still have higher odds, 79 per cent higher
odds of poverty (1.790 – 1.0) *100
Interestingly, the immigrants who arrived prior to 1980 have lower odds
(0.744 – 1) * 100 25.6 % lower
![Page 34: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/34.jpg)
YR OF IMMIGRATION Low income
• Might “Knowledge of Official Language” be relevant in this
context, as an important intervening variable? Is this why
recent immigrants are struggling?
YR OF IMMIGRATION Language Low income
•Might an important reason why recent immigration be so important in
• explaining “poverty” be the simple fact that immigrants are less likely
• to converse in English and/or French (one of Canada’s official languages?)
![Page 35: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/35.jpg)
YR OF IMMIGRATION Low income
0 – Canadian born (set as reference category)
1 – immigrated prior to 1980
2- immigrated 1980-1999
3- immigrated 2000 or later (recent immigrant)
25.6 % lower
79.0 % higher
314.1 % higher
![Page 36: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/36.jpg)
YR OF IMMIGRATION Low income
0 – Canadian born (set as reference category)
1 – immigrated prior to 1980
2- immigrated 1980-1999
3- immigrated 2000 or later (recent immigrant)
YR OF IMMIGRATION -> language -> Low income
0 – English (our reference category)
1 – French
2- English and French
3- NO KNOWLEDGE
Immigrants are only slightly
less likely to be poor after
introducing control
25.6 % lower
79.0 % higher
314.1 % higher
24.9 % lower
75.4 % higher
303.3% higher
Persons with no knowledge
of either Eng/French have
67.3 higher odds of poverty
![Page 37: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/37.jpg)
Types of Multivariate Relationships
3. Multiple causes (independent effects)
• Occurs when independent variables have separate effects on the dependent variable
• The introduction of controls has little influence on initial bivariate associations
![Page 38: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/38.jpg)
Types of Multivariate Relationships
3. Multiple causes (independent effects)
• Occurs when independent variables have separate effects on the dependent variable
• The introduction of controls has little influence on initial bivariate associations
X1
X2
X3 Y
X4
• Slope in simple regression very similar to that obtained in multiple regression
• eg. Age, sex, urban/rural residence & province, all have independent effects on Y
![Page 39: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/39.jpg)
NOTE: in the context of my research on
immigration and low income
Immigrant -> low income
I would want to control for age/sex as relevant background variables
Immigrant low income
Sex
Age
Do all these variables have independent effects in explaining immigrant status
and income?
Of these variables, how do each impact low income, when controlling for the
others? (which seems to be most important?)
Or are immigrants more likely to be poor, merely because they are much
younger? With more women than men? Are the effects “not independent”?
![Page 40: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/40.jpg)
![Page 41: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/41.jpg)
Run initial logistic regression:
Dependent variable (0-not low income; 1- low income)
Independent variable (0-not immigrant; 2- immigrant)
![Page 42: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/42.jpg)
Canadian immigrants are more likely to experience low income than
other Canadians.
Odds of low income
are 95.6 per cent
higher for immigrants
relative to other
Canadians.
What of the relevance of age/sex??
![Page 43: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/43.jpg)
Run a second regression (with controls for age/sex)
![Page 44: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/44.jpg)
Initial regression:
We introduce controls for “age and sex”…
sex (0-female; 1-male); age (5 year age groups)
Odds of low income
are 95.6 per cent
higher for immigrants
relative to other
Canadians.
Odds of low income
are now 90.7 per cent
higher for immigrants
relative to other
Canadians, after
controlling for age/sex
Slight decline in the effect of Immigrant, in controlling for these variables ..
The effects seem to be largely independent here..
RESULTS of MULTIPLE REGRESSION:
![Page 45: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/45.jpg)
Types of Multivariate Relationships
4. Suppressor variables
• initially we find no relationship between two variables (i.e., non-significant slope)
• After introducing control variable(s), the slope becomes significant
• OR
• initially we find a relationship between two variables
relationship gets stronger after control variable is introduced
![Page 46: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/46.jpg)
Example: Consider women aged 18-39 in the 2006 Census Public Use File
Recall earlier:
for all persons 18+
Difference was 18,000$
Sex (0-female, 1-male)
men earning still $11,587 more than women
What’s going on here?
What do you predict if we control for education?
Are young women paid less because they are less educated?
![Page 47: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/47.jpg)
Example: Consider women aged 18-39 in the 2006 Census Public Use File
Effect of sex is even greater when we control for whether or not someone
attended university!
Education does not explain the lower income of women! In fact, education
serves as a suppressor variable (the effect of gender gets even stronger)!
![Page 48: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/48.jpg)
Types of Multivariate Relationships
5. Interaction effects
• When you enter a statistical control, the original bivariate
association differs by category of the control variable
• Previous examples with contingency tables:
![Page 49: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/49.jpg)
Types of Multivariate Relationships
5. Interaction effects
• When you enter a statistical control, the original bivariate
association differs by category of the control variable
• Previous examples with contingency tables:
Contingency Table Relating Education, Income and Place of Birth
Foreign born Canadian born
High income Low income Total High income Low income Total
High 125 35.7% 225 64.3% 350 125 83.3% 25 16.7% 150
Education
Low 65 34.2% 125 65.8% 190 80 39.0% 125 61.0% 205
190 350 540 205 150 355
![Page 50: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/50.jpg)
Interaction Effects
• Can also test for interaction with regression when working with quantitative data (interval/ratio)
• When testing for interaction using multiple regression, we are testing whether the effect of an independent variable (X1) on Y differs by category of another independent variable (X2)
• Regression can test for this by introducing “interaction terms” into the multiple regression
![Page 51: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/51.jpg)
Interaction Effects
Modeling Interaction effects:
• If variables interact, we can improve the fit of our
model by introducing “cross-product” terms (also
referred to as “interaction terms”)
• A “cross-product” term is an artificial variable created
by multiplying two variables together
• If we want to test for an interaction between X1 and
X2 in explaining Y
• So, include a new variable which is simply a product
of X1 and X2:
2132211ˆ XXbXbXbaY
![Page 52: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/52.jpg)
Interaction Effects
• If the slope of our interaction term (b3) is found to be significant, we find evidence of a significant interaction between X1 and X2 in explaining Y
• If b3 is not significant, it is best to drop this “interaction term” from the regression
![Page 53: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/53.jpg)
• Example:2 variables used to predict “income”
• -> immigrant (0-no; 1 yes)
• -> university (0-no; 1 yes)
• Like in the cross tab, we will test for whether immigrant status and education interact in explaining market income
• Using Linear regression (interval/ratio dep var)
![Page 54: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/54.jpg)
IN our initial regression,
We include both variables:
University (0-no; 1-yes)
Immigrant (0-no; 1-yes)
University grads make more (+$33,548) when controlling for Immigrant status
Immigrants make less (-$7879) when controlling for education
![Page 55: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/55.jpg)
• We want to determine whether there’s an
interaction effect ..
• Does the effect of having a university
education differ for immigrants relative to
the Canadian born?
![Page 56: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/56.jpg)
In this case, we clearly have an interaction effect…
Interaction term
is significant!!
CAN CONCLUDE:
The effect of education is not the same for immigrants/Canadian born…
![Page 57: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/57.jpg)
Interaction Effects
Example (page 406-407 of text):
• Examining the relationship between:
Y mental impairment score (index)
X1 life events (number of stressful events)
X2 SES (index on socioeconomic status)
• Our research hypothesis: X1 has a positive effect on Y
X2 has a negative effect on Y
• As # of stressful events ↑, mental impairment ↑
• As SES ↑, mental impairment ↓
![Page 58: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/58.jpg)
Interaction Effects
Coefficients(a)
Unstandardized Coefficients
Model B Std. Error t Sig.
(Constant) 28.2298 2.1742 12.984 .0001
LIFE 0.1033 0.0325 3.177 .0030
1
SES -0.0975 0.0291 -3.351 .0019
Results:
![Page 59: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/59.jpg)
Interaction Effects
• Assume that we hypothesize an interaction: Upper SES
persons are better able to handle stressful life events
than are lower SES person
• That is, the effect of LIFE events is expected to interact
with SES as SES increases, the effect of LIFE events
on mental impairment decreases
• Our results with the interaction term:
![Page 60: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/60.jpg)
Interaction Effects
Coefficients(a)
Unstandardized Coefficients
Model B Std. Error t Sig.
(Constant) 26.0366 3.9488 6.594 .0001
LIFE 0.1559 0.0853 1.826 .0761
SES -0.0604 0.0627 -0.965 .3409
1
LIFE*SES -0.0008 0.0013 -0.668 .5087
The interaction term is not significant (p = .5087) and the
direct effects of SES and Life are no longer significant
Thus, we don’t have support for the hypothesis that SES
and LIFE interact to affect mental impairment
![Page 61: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/61.jpg)
Interaction Effects
• What if the interaction of LIFE * SES were significant?
• The negative slope suggests that the effect of Life
Events on Mental Impairment gets weaker as SES gets
higher
• That is, Life Events is more likely to lead to Mental
Impairment for those of higher socio-economic status
• If the coefficient were positive? Life Events has a
stronger impact on Mental Impairment for people of
higher SES
![Page 62: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/62.jpg)
Interaction Effects
• Again, if the interaction effect is not significant, then drop it from the model
• You MUST ALWAYS include both variables you hypothesize to be interacting (called the “main effects”), along with the interaction term in your regression model
• You cannot introduce all potential interaction terms into your multivariate model to see what is significant prior to testing for interaction effects, you should be able to justify it theoretically
• If an interaction effect really exists, then it makes no sense to interpret the main effects
![Page 63: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/63.jpg)
• PART C. Model Building
Examining the likelihood of low income, among Canadians with a
specific emphasis on the experience of immigrants.
The incidence of low income is expected to be highest among recent
immigrants and lowest among well-established immigrants
Yet the disadvantage of being an immigrant is expected to persist,
even among more established immigrants (even after controlling
for other relevant controls (language, education, sex, age and)
![Page 64: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/64.jpg)
• Model 1.
• Yr of immigration -> incidence of low income
![Page 65: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/65.jpg)
The disadvantage of being an immigrant is expected to
persist, even among more established immigrants (even
after controlling for other relevant controls (sex, language
age and education)
![Page 66: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/66.jpg)
![Page 67: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/67.jpg)
![Page 68: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..](https://reader035.fdocuments.net/reader035/viewer/2022071603/613e5ee059df642846167ccc/html5/thumbnails/68.jpg)
Can merely plug in values, for interpretation if you wish:
The above results give us the following equation:
Y = 27864.481 + 38215.784 (UNIV) -4349.417 (IMMIG) – 15237.378 (UN_IMM)
Remembering that 2 variables used to predict “income” are coded as follows:
-> UNIV (0-no; 1 yes)
-> IMMIG (0-no; 1 yes)
Note: if either UNIV=0 or IMMIG=0, then UN_IMM=0 must be 0 as well
What is the effect of having a university education for the Canadian born?
Y = 27864.481 + 38215.784 (UNIV) - 4349.417 (0) – 15237.378 (0)
it means a gain of $38215.78 for those with a degree
What is the effect of having a university education for an immigrant?
Y = 27864.481 + 38215.784 (UNIV) - 4349.417 (0) – 15237.378 (UNIV)
it means a gain of $27978.40 (38215.78-15237.38) for those
with a degree
NOTE: