March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2....

68
March 19 th Tamara Stallard; Cynthia Wheeler; Chris Tuskan; Daniella Odesa; Kristine Petitta; Taylor Hood; Michelle Janulis; Diana Rusin; March 26 th Nicole Bullock; Kelly Robertson; Daniel Wood; Steph Boak; Pouya Moghaddam; Joanna Lee; Nicole Corbo; Misghana Ghebredingle April 2 nd (last class) Blake Huggins; Jamie-Lee Bossenberry; Ryan O’Quinn; Sarah Stewart; Brittany Jenkins; Ryan Higgins; Greg Morrow; Baley Gofton; Cynthia Hill; Brandi Spitzig; Class presentation (10 minutes each) • • Brief introduction to your topic: Dependent variable & independent variables Sample (what’s your target population) What are you anticipating? (optional: causal diagram) What has previous literature suggested that you can expect? Findings: optional • 1 page handout and/or power point (for my records)..

Transcript of March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2....

Page 1: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

March 19th

Tamara Stallard; Cynthia Wheeler; Chris Tuskan; Daniella

Odesa; Kristine Petitta; Taylor Hood; Michelle Janulis; Diana

Rusin;

March 26th

Nicole Bullock; Kelly Robertson; Daniel Wood; Steph Boak;

Pouya Moghaddam; Joanna Lee; Nicole Corbo; Misghana

Ghebredingle

April 2nd (last class)Blake Huggins; Jamie-Lee Bossenberry; Ryan O’Quinn;

Sarah Stewart; Brittany Jenkins; Ryan Higgins; Greg

Morrow; Baley Gofton; Cynthia Hill; Brandi Spitzig;

Class presentation (10 minutes each) • •

Brief introduction to your topic:

Dependent variable & independent variables

Sample (what’s your target population)

What are you anticipating? (optional: causal diagram)

What has previous literature suggested that you can expect?

Findings: optional

• 1 page handout and/or power point (for my records)..

Page 2: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• Assignment 3–due next week March 19th

• Do note: In this assignment I also get you

to run the model with the “WEIGHTED”

data..

• Today I will begin by talking a bit about

this (sampling and sample weights)

• To be followed by:

• Tips on creating models, regardless of

whether we are working with OLS or

Logistic Regression..

Page 3: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

In Soc 2206, you learned a bit about sampling strategies..

1. SRS sampling: Everyone in the target population has an equal chance

of being selected.

2. Stratified Samples:

First divide your sampling frame into strata (ex: provinces, or anything else),

and then use SRS sampling within each province

Proportionate: number within each strata is proportionate to population

Disproportionate: number within each strata is disproportionate

Typical of Stats Can: Stratified by province, over represent some provinces

and under represent others in their samples.

3. Cluster sampling (involving clusters, which can be almost anything:

Geographies, institutions, social groups, etc.)

Multistage sample of Canadian households (more than 10 million in Canada)1. Randomly selected metropolitan areas across Canada (35 CMAs current)

2. Within those selected CMA, randomly select census tracts (typically 100s)

3. Within those selected census tract, randomly select city blocks (typically dozens within each)

4. Within those selected city blocks, randomly select households (typically 100s)

Page 4: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 5: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 6: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 7: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

5000 10,000

Beyond a certain size (eg. 5000), limited

returns on reducing sampling error

Sample size is important,

BUT:

Page 8: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

ANOTHER IMPORTANT POINT:

In predicting “sampling error”, it is “SAMPLE SIZE” that counts and not

the size of the “targeted POPULATION”

In the above example, to get a margin of error +/- 2.5 %,

you require a sample size of 1,534 persons for a population of 1,000,000

you require a roughly equally sized sample size for a population of 100,000

In other words, in Stats Can’s surveys, a sample of roughly a couple of 1000 in

each province would give estimates of roughly equivalent sampling error

for all provinces, regardless of the size of their respective populations..

Page 9: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Assume we take a simple random sample (SRS) of Canada:

Every individual is given an equal chance of selection in the final sample

Lets assume that we interview 2000 Canadians in our sample (n-2000)

Probability of selection: n/P , i.e. 2000/35,540,419 = 0.000056274

P n

Each person in this sample

has the same “weight”

(inverse of the probability of

selection)

W = P/n =

35,540,419/2,000 =

17770.2

2014

Canada 35,540,419 2000

Newfoundland and Labrador 526,977 30

Prince Edward Island 146,283 8

Nova Scotia 942,668 53

New Brunswick 753,914 42

Quebec 8,214,672 462

Ontario 13,678,740 770

Manitoba 1,282,043 72

Saskatchewan 1,125,410 63

Alberta 4,121,692 232

British Columbia 4,631,302 261

In a SRS (n=2000) of Canadians, each person in our “unweighted” sample

would represent 17,770 persons…

It is possible to “weight your sample”, by making each case in your sample

represent 17770.2 cases (your weighted sample would look like your Population)

“unweighted” sample

Page 10: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

2014

Canada 35,540,419 2000

Newfoundland and Labrador 526,977 30

Prince Edward Island 146,283 8

Nova Scotia 942,668 53

New Brunswick 753,914 42

Quebec 8,214,672 462

Ontario 13,678,740 770

Manitoba 1,282,043 72

Saskatchewan 1,125,410 63

Alberta 4,121,692 232

British Columbia 4,631,302 261

For this reason: All of Stats Can surveys make sure that

they have at a few 1000 for “all Provinces”

Why, reasonable quality statistical estimates for all provinces…

While this sample at the national level works well (n=2000), we can not

draw inferences for specific provinces (for example, PEI = 8 cases)

Page 11: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Both GIS, NLSCY and Health Survey are:

Stratified Samples (Disproportionate to size)

1. Divide Canada up into provinces

2. Take sufficient sample from each province to get good estimates

3. Simple random sample within provinces

Unweighted sample

Note: Probability of selection

differs by province

PEI: 1500/146,283

Ontario: 4000/13,678,740

Similarly, weights differ by province

PEI 146,283/1500 = 97.5

Ontario 13,678,740/4000 = 3419.7

Unweighted sample here: roughly 6.00 per cent (1500/2500) of the sample is

In PEI. In the population it is 0.41 per cent (146,273/35,540,419).

NOTE: WEIGHTED RESULTS WILL HAVE AN EQUAL DISTRIBUTION TO THE POPULATION

Page 12: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

In assignment 3, I have you run your “model results” with the

appropriate “weights” (easy to do)

Corrects for potential biases due to your sampling strategy.

Page 13: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 14: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

e.g. with the Census, I focus on “likelihood of

low income” as my dependent variable

e.g. assume my research interests relate to the higher than

average incidence of low income among immigrants in Canada

Immigrant status and the likelihood of low income will be my primary

emphasis

Other variables? What’s important in the literature?

Page 15: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Recommend either Social Science Citation Index or

Sociological Abstracts..

Page 16: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Relevant in this context:

What if any sub-sample should be selected for your research…

Therefore I ask you in Assignment 3 (necessary step):

No need to focus on a specific subsample here: I will be comparing

immigrants with other Canadians (the full population will be involved)

Page 17: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Find 5 studies that

explicitly focus on

your topic:

why immigrants are

more likely to

experience low

income

Note: your literature review is brief, so

stick to research directly related

to your research..

http://www.kings.uwo.ca/academics/soc

iology/resources-and-

information/sociology-department-

academic-awards/

Page 18: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 19: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Example:

this study hypothesizes:

Canadian immigrants are more likely to experience low income than other

Canadians.

This relationship is expected to be partially explained by “length of

“residence in Canada”.

The incidence of low income is expected to be highest among recent

immigrants and lowest among well-established immigrants

Yet the disadvantage of being an immigrant is expected to persist,

even among more established immigrants (even after controlling

for other relevant controls (sex, language age and education)

Page 20: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Relevant in developing hypotheses:

Types of Multivariate Relationships

With contingency tables, we covered:

1. Spuriousness (not likely in your paper)

2. Causal chains

3. Suppressor variables

4. Multiple causes (independent effects)

5. Interaction effects

With regression, we can test for all of these

Page 21: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

1. Spurious relationships

• Initially an association is documented, yet with a control, the initial relationship disappears

Evidence in regression:

• Initial bivariate regression has a statistically significant slope or odds ratio

• When the control variable(s) are introduced, the coefficient is no longer significant

Bivariate:

X1 Y

Page 22: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

1. Spurious relationships

• Initially an association is documented, yet with a control, the initial relationship disappears

Evidence in regression:

• Initial bivariate regression has a statistically significant slope or odds ratio

• When the control variable(s) are introduced, the coefficient is no longer significant

Bivariate:

X1 Y

Multivariate: Y

X2

X1

Page 23: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

1. Spurious relationships

• Initially an association is documented, yet with a control, the initial relationship disappears

Evidence in regression:

• Initial bivariate regression has a statistically significant slope or odds ratio

• When the control variable(s) are introduced, the coefficient is no longer significant

Bivariate:

X1 Y

Multivariate: Y

X2

X1

Page 24: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

Example: We conduct research on a sample of FORD

assembly line workers and document a positive

relationship between “Salary” and “Absenteeism”:

Page 25: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

One might speculate a spurious relationship:

Salary

Age

Absenteeism

Page 26: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships2) chain relationships

• A relationship exists between X1 and Y at the bivariate level, which is modified with the addition of control variable(s)

• Consider:

X1 Y

X1

X2 Y

In this case, X1 is said to be the antecedent variable in the causal chain,

whereas X2 is referred to as an “intervening” variable..

Consider the following example:

We consider gender and income; what sorts of intervening variables might explain the initial relationship

SEX -> ? -> income

X2

Page 27: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• Assume we are examining the relevance of “sex” to “market

earnings”..

• Sex Market income

• (0-female; 1-male)

• Run a linear regression:

Men are earning, 18111 dollars more than women

beta suggests a moderate effect (.154)

Page 28: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Why such a gap?

How about the simple fact that women are more likely to work part-time?

Sex # hrs worked (weekly) Market income

(0-female; 1-male)

Page 29: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Why such a gap?

How about the simple fact that women are more likely to work part-time?

Sex # hrs worked (weekly) Market income

(0-female; 1-male)

Run a regression with both independent variables

Page 30: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Why such a gap?

How about the simple fact that women are more likely to work part-time?

Sex # hrs worked (weekly) Market income

(0-female; 1-male)

Difference persists!! Women are now making 12000 less

Yet not as strong an effect..

Part of the initial relationship between sex and income is explained by the intervening

variable (hours worked). Sex (as an antecedent variable) continues to be important,

even after controlling for hours worked.

Run a regression with both independent variables

Page 31: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships2) chain relationships

• A relationship exists between X1 and Y at the bivariate level, which is modified with the addition of control variable(s)

• Consider:

X1 Y

X1

X2 Y

• If we control for X2, various possibilities with the initial relationship (X1 and Y):

• - the initial effect on X1 on Y might disappear completely

• - the initial effect on X1 on Y is weakened (this is the most common outcome)

• - the initial effect on X2 on Y can even get larger (rarer, but it can happen)

• Note: the results in multiple regression can sometimes look the same as with spuriousness (i.e. the initial relationship disappears)..major difference is in interpretation

• Also: although the effect of X1 on Y might disappear, X1 is still involved in our “causal explanation” as an “indirect” cause

X2

Page 32: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• Another example:

• Return to my initial hypothesis:

• Canadian immigrants are more likely to experience low income than other Canadians.

• This relationship is expected to be partially explained by “length of “residence in Canada”.

• Length of residence in Canada Low income

We create a variable for this purpose, to be run in logistic regression:

YR OF IMMIGRATION Low income

0 – Canadian born (set as reference category)

1 – immigrated prior to 1980

2- immigrated 1980-1999

3- immigrated 2000 or later (recent immigrant)

Page 33: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

YR OF IMMIGRATION Low income

0 – Canadian born (set as reference category)

1 – immigrated prior to 1980

2- immigrated 1980-1999

3- immigrated 2000 or later (recent immigrant)

Immigrants who came to Canada since 2000 are the most disadvantaged

314.1 per cent higher odds of being poor relative to our reference category:

the Canadian born (4.141 – 1.0)*100

Immigrants who came 1980-1999 still have higher odds, 79 per cent higher

odds of poverty (1.790 – 1.0) *100

Interestingly, the immigrants who arrived prior to 1980 have lower odds

(0.744 – 1) * 100 25.6 % lower

Page 34: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

YR OF IMMIGRATION Low income

• Might “Knowledge of Official Language” be relevant in this

context, as an important intervening variable? Is this why

recent immigrants are struggling?

YR OF IMMIGRATION Language Low income

•Might an important reason why recent immigration be so important in

• explaining “poverty” be the simple fact that immigrants are less likely

• to converse in English and/or French (one of Canada’s official languages?)

Page 35: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

YR OF IMMIGRATION Low income

0 – Canadian born (set as reference category)

1 – immigrated prior to 1980

2- immigrated 1980-1999

3- immigrated 2000 or later (recent immigrant)

25.6 % lower

79.0 % higher

314.1 % higher

Page 36: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

YR OF IMMIGRATION Low income

0 – Canadian born (set as reference category)

1 – immigrated prior to 1980

2- immigrated 1980-1999

3- immigrated 2000 or later (recent immigrant)

YR OF IMMIGRATION -> language -> Low income

0 – English (our reference category)

1 – French

2- English and French

3- NO KNOWLEDGE

Immigrants are only slightly

less likely to be poor after

introducing control

25.6 % lower

79.0 % higher

314.1 % higher

24.9 % lower

75.4 % higher

303.3% higher

Persons with no knowledge

of either Eng/French have

67.3 higher odds of poverty

Page 37: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

3. Multiple causes (independent effects)

• Occurs when independent variables have separate effects on the dependent variable

• The introduction of controls has little influence on initial bivariate associations

Page 38: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

3. Multiple causes (independent effects)

• Occurs when independent variables have separate effects on the dependent variable

• The introduction of controls has little influence on initial bivariate associations

X1

X2

X3 Y

X4

• Slope in simple regression very similar to that obtained in multiple regression

• eg. Age, sex, urban/rural residence & province, all have independent effects on Y

Page 39: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

NOTE: in the context of my research on

immigration and low income

Immigrant -> low income

I would want to control for age/sex as relevant background variables

Immigrant low income

Sex

Age

Do all these variables have independent effects in explaining immigrant status

and income?

Of these variables, how do each impact low income, when controlling for the

others? (which seems to be most important?)

Or are immigrants more likely to be poor, merely because they are much

younger? With more women than men? Are the effects “not independent”?

Page 40: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 41: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Run initial logistic regression:

Dependent variable (0-not low income; 1- low income)

Independent variable (0-not immigrant; 2- immigrant)

Page 42: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Canadian immigrants are more likely to experience low income than

other Canadians.

Odds of low income

are 95.6 per cent

higher for immigrants

relative to other

Canadians.

What of the relevance of age/sex??

Page 43: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Run a second regression (with controls for age/sex)

Page 44: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Initial regression:

We introduce controls for “age and sex”…

sex (0-female; 1-male); age (5 year age groups)

Odds of low income

are 95.6 per cent

higher for immigrants

relative to other

Canadians.

Odds of low income

are now 90.7 per cent

higher for immigrants

relative to other

Canadians, after

controlling for age/sex

Slight decline in the effect of Immigrant, in controlling for these variables ..

The effects seem to be largely independent here..

RESULTS of MULTIPLE REGRESSION:

Page 45: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

4. Suppressor variables

• initially we find no relationship between two variables (i.e., non-significant slope)

• After introducing control variable(s), the slope becomes significant

• OR

• initially we find a relationship between two variables

relationship gets stronger after control variable is introduced

Page 46: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Example: Consider women aged 18-39 in the 2006 Census Public Use File

Recall earlier:

for all persons 18+

Difference was 18,000$

Sex (0-female, 1-male)

men earning still $11,587 more than women

What’s going on here?

What do you predict if we control for education?

Are young women paid less because they are less educated?

Page 47: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Example: Consider women aged 18-39 in the 2006 Census Public Use File

Effect of sex is even greater when we control for whether or not someone

attended university!

Education does not explain the lower income of women! In fact, education

serves as a suppressor variable (the effect of gender gets even stronger)!

Page 48: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

5. Interaction effects

• When you enter a statistical control, the original bivariate

association differs by category of the control variable

• Previous examples with contingency tables:

Page 49: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Types of Multivariate Relationships

5. Interaction effects

• When you enter a statistical control, the original bivariate

association differs by category of the control variable

• Previous examples with contingency tables:

Contingency Table Relating Education, Income and Place of Birth

Foreign born Canadian born

High income Low income Total High income Low income Total

High 125 35.7% 225 64.3% 350 125 83.3% 25 16.7% 150

Education

Low 65 34.2% 125 65.8% 190 80 39.0% 125 61.0% 205

190 350 540 205 150 355

Page 50: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

• Can also test for interaction with regression when working with quantitative data (interval/ratio)

• When testing for interaction using multiple regression, we are testing whether the effect of an independent variable (X1) on Y differs by category of another independent variable (X2)

• Regression can test for this by introducing “interaction terms” into the multiple regression

Page 51: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

Modeling Interaction effects:

• If variables interact, we can improve the fit of our

model by introducing “cross-product” terms (also

referred to as “interaction terms”)

• A “cross-product” term is an artificial variable created

by multiplying two variables together

• If we want to test for an interaction between X1 and

X2 in explaining Y

• So, include a new variable which is simply a product

of X1 and X2:

2132211ˆ XXbXbXbaY

Page 52: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

• If the slope of our interaction term (b3) is found to be significant, we find evidence of a significant interaction between X1 and X2 in explaining Y

• If b3 is not significant, it is best to drop this “interaction term” from the regression

Page 53: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• Example:2 variables used to predict “income”

• -> immigrant (0-no; 1 yes)

• -> university (0-no; 1 yes)

• Like in the cross tab, we will test for whether immigrant status and education interact in explaining market income

• Using Linear regression (interval/ratio dep var)

Page 54: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

IN our initial regression,

We include both variables:

University (0-no; 1-yes)

Immigrant (0-no; 1-yes)

University grads make more (+$33,548) when controlling for Immigrant status

Immigrants make less (-$7879) when controlling for education

Page 55: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• We want to determine whether there’s an

interaction effect ..

• Does the effect of having a university

education differ for immigrants relative to

the Canadian born?

Page 56: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

In this case, we clearly have an interaction effect…

Interaction term

is significant!!

CAN CONCLUDE:

The effect of education is not the same for immigrants/Canadian born…

Page 57: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

Example (page 406-407 of text):

• Examining the relationship between:

Y mental impairment score (index)

X1 life events (number of stressful events)

X2 SES (index on socioeconomic status)

• Our research hypothesis: X1 has a positive effect on Y

X2 has a negative effect on Y

• As # of stressful events ↑, mental impairment ↑

• As SES ↑, mental impairment ↓

Page 58: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

Coefficients(a)

Unstandardized Coefficients

Model B Std. Error t Sig.

(Constant) 28.2298 2.1742 12.984 .0001

LIFE 0.1033 0.0325 3.177 .0030

1

SES -0.0975 0.0291 -3.351 .0019

Results:

Page 59: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

• Assume that we hypothesize an interaction: Upper SES

persons are better able to handle stressful life events

than are lower SES person

• That is, the effect of LIFE events is expected to interact

with SES as SES increases, the effect of LIFE events

on mental impairment decreases

• Our results with the interaction term:

Page 60: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

Coefficients(a)

Unstandardized Coefficients

Model B Std. Error t Sig.

(Constant) 26.0366 3.9488 6.594 .0001

LIFE 0.1559 0.0853 1.826 .0761

SES -0.0604 0.0627 -0.965 .3409

1

LIFE*SES -0.0008 0.0013 -0.668 .5087

The interaction term is not significant (p = .5087) and the

direct effects of SES and Life are no longer significant

Thus, we don’t have support for the hypothesis that SES

and LIFE interact to affect mental impairment

Page 61: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

• What if the interaction of LIFE * SES were significant?

• The negative slope suggests that the effect of Life

Events on Mental Impairment gets weaker as SES gets

higher

• That is, Life Events is more likely to lead to Mental

Impairment for those of higher socio-economic status

• If the coefficient were positive? Life Events has a

stronger impact on Mental Impairment for people of

higher SES

Page 62: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Interaction Effects

• Again, if the interaction effect is not significant, then drop it from the model

• You MUST ALWAYS include both variables you hypothesize to be interacting (called the “main effects”), along with the interaction term in your regression model

• You cannot introduce all potential interaction terms into your multivariate model to see what is significant prior to testing for interaction effects, you should be able to justify it theoretically

• If an interaction effect really exists, then it makes no sense to interpret the main effects

Page 63: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• PART C. Model Building

Examining the likelihood of low income, among Canadians with a

specific emphasis on the experience of immigrants.

The incidence of low income is expected to be highest among recent

immigrants and lowest among well-established immigrants

Yet the disadvantage of being an immigrant is expected to persist,

even among more established immigrants (even after controlling

for other relevant controls (language, education, sex, age and)

Page 64: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

• Model 1.

• Yr of immigration -> incidence of low income

Page 65: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

The disadvantage of being an immigrant is expected to

persist, even among more established immigrants (even

after controlling for other relevant controls (sex, language

age and education)

Page 66: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 67: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..
Page 68: March 19 - King's Facultydkerr.kingsfaculty.ca/dkerr/assets/lecture11_3306_2015... · 2016. 2. 18. · Logistic Regression.. In Soc 2206, you learned a bit about sampling strategies..

Can merely plug in values, for interpretation if you wish:

The above results give us the following equation:

Y = 27864.481 + 38215.784 (UNIV) -4349.417 (IMMIG) – 15237.378 (UN_IMM)

Remembering that 2 variables used to predict “income” are coded as follows:

-> UNIV (0-no; 1 yes)

-> IMMIG (0-no; 1 yes)

Note: if either UNIV=0 or IMMIG=0, then UN_IMM=0 must be 0 as well

What is the effect of having a university education for the Canadian born?

Y = 27864.481 + 38215.784 (UNIV) - 4349.417 (0) – 15237.378 (0)

it means a gain of $38215.78 for those with a degree

What is the effect of having a university education for an immigrant?

Y = 27864.481 + 38215.784 (UNIV) - 4349.417 (0) – 15237.378 (UNIV)

it means a gain of $27978.40 (38215.78-15237.38) for those

with a degree

NOTE: