Xiaohan zeng research

Post on 26-May-2015

101 views 0 download

Tags:

Transcript of Xiaohan zeng research

Xiaohan Zeng

Nov. 12, 2013

Research Presentation

Gender Differences in Publication Rate and Impact

The Leaky Pipeline

The Leaky Pipeline

The Leaky Pipeline

The Leaky Pipeline

Bachelor

The Leaky Pipeline

Bachelor

Master

The Leaky Pipeline

Bachelor

Master

Ph.D.

The Leaky Pipeline

Bachelor

Master

Ph.D.

Faculty*

The Leaky Pipeline

The Leaky Pipeline

The Leaky Pipeline

• Women are under-represented in the STEM disciplines, especially in later career stages

The Leaky Pipeline

• Women are under-represented in the STEM disciplines, especially in later career stages

• We lose women along the career path

The productivity gap

The productivity gap

Female

The productivity gap

Male

Female

The productivity gap

Puzzles

• What factors cause gender difference in publication pattern?

Puzzles

• What factors cause gender difference in publication pattern?

• Are these factors specific to the discipline?

The data

• 437,787 publications by 4,292 scientists

The data

• 437,787 publications by 4,292 scientists

• Disambiguation

The data

• 437,787 publications by 4,292 scientists

• Disambiguation

ChemistryChemical Engineering

Ecology Industrial Engineering

Material Science

MolecularBiology

Psychology

The data

• Small fraction of women

The data

• Small fraction of women

• Resource requirements

The data

• Small fraction of women

• Resource requirements

• Career risks

Measuring the productivity gap

Male

Female

Measuring the productivity gap

Male

Female

Measuring the productivity gap

Male

Female

Average

Women get fewer resources

• Women have lower average salary1

1. Ginther, DK, 2005

Resource requirement

DisciplineAverage annual

expenditure per PI [K$]

Chemical Engineering 490

Chemistry 515

Ecology -

Industrial Engineering 94

Material Science 612

Molecular Biology 1,897

Psychology 256

Resource requirement

DisciplineAverage annual

expenditure per PI [K$]

Chemical Engineering 490

Chemistry 515

Ecology -

Industrial Engineering 94

Material Science 612

Molecular Biology 1,897

Psychology 256

Women get fewer resources

2. NIH, http://report.nih.gov/nihdatabook/

• Women have lower average salary1

• Women receive smaller grants2

1. Ginther, DK, 2005

Resource and productivity

• Could research resource requirement affect the gender gap in publication pattern?

Resource and productivity

• Could research resource requirement affect the gender gap in publication pattern?

• Cannot produce as much if given less resource

R2=0.72P<0.04

• Women publish at lower rates compared to men

• Women publish at lower rates compared to men

• In disciplines where resource need is higher, the gap is larger

Publication impact

Publication impact

• h-index

Publication impact

• h-index

Publication impact

• h-index

Publication impact

• h-index

h~n0.53

Male

Female

Publication impact

• z-score of h-index

Male

Female

Publication impact

• z-score of h-index

Male

Female

Average

Difference in publication impact

Difference in publication impact

Women take fewer risks

Women take fewer risks

• Psychological studies:

• Byrnes et al. 1999

• Harris et al. 2006

Women take fewer risks

• Does this affect their career choice and publication impact?

Women take fewer risks

• Does this affect their career choice and publication impact?

• Gender difference in publication impact

D = D(R)

Women take fewer risks

• Does this affect their career choice and publication impact?

• Gender difference in publication impact

D = D(T, P, A)

Career risk

DisciplineSalary premium

of non-academic jobs, P-1

Chemical Engineering 0.39

Chemistry 0.61

Ecology 0.38

Industrial Engineering 0.41

Material Science 0.38

Molecular Biology 0.62

Psychology 0.47

Career risk

DisciplineSalary premium

of non-academic jobs, P-1

Time to career independence,

T [y]

Chemical Engineering 0.39 5.4

Chemistry 0.61 6.2

Ecology 0.38 8.2

Industrial Engineering 0.41 6.1

Material Science 0.38 6.6

Molecular Biology 0.62 7.3

Psychology 0.47 8.2

Career risk

DisciplineSalary premium

of non-academic jobs, P-1

Time to career independence,

T [y]

Frac. graduatesgoing to

academia, A

Chemical Engineering 0.39 5.4 0.21

Chemistry 0.61 6.2 0.32

Ecology 0.38 8.2 0.71

Industrial Engineering 0.41 6.1 0.56

Material Science 0.38 6.6 0.20

Molecular Biology 0.62 7.3 0.57

Psychology 0.47 8.2 0.42

Career risk

DisciplineSalary premium

of non-academic jobs, P-1

Time to career independence,

T [y]

Frac. graduatesgoing to

academia, A

Chemical Engineering 0.39 5.4 0.21

Chemistry 0.61 6.2 0.32

Ecology 0.38 8.2 0.71

Industrial Engineering 0.41 6.1 0.56

Material Science 0.38 6.6 0.20

Molecular Biology 0.62 7.3 0.57

Psychology 0.47 8.2 0.42

• Career risk R = R(T, P, A)

D = D(T, P, A)

R2=0.96P=0.001

D = D(T, P, A)

• Women publish with higher impact

• Women publish with higher impact

• In riskier disciplines, the gender gap becomes larger

• Women publish with higher impact

• In riskier disciplines, the gender gap becomes larger

• Self-selection among females

Conclusions

Conclusions

Higher resource

requirement

Conclusions

Higher resource

requirement

Lower productivity

of women

Conclusions

Higher resource

requirement

Lower productivity

of women

Higher career risk

Conclusions

Higher resource

requirement

Lower productivity

of women

Higher career risk

Higher publication impact of women

Acknowledgements

• NSF SBE 0624318, IIS 0830388

• Spanish DGICYT FIS2010-18639

JordiDuch

Marta Sales-Pardo

FilippoRadicchi

TeresaWoodruff

LuisAmaral

Time to career independence

Author disambiguationusing community detection and explicit semantic analysis

Author disambiguation is important

Author disambiguation is important

Author disambiguation is important

Michael Jordan

Author disambiguation is important

Michael Jordan

Current approaches

• Machine learning• K-nearest neighbors

• Support vector machine

• Etc.

How to tell them apart

• Essentially a clustering problem

Papers should be similar

• Research area

Papers should be similar

• Research area

• Affiliation

Papers should be similar

• Research area

• Affiliation

• Co-authors

Papers should be similar

• Research area

• Affiliation

• Co-authors

• Journal

Papers should be similar

• Research area

• Affiliation

• Co-authors

• Journal

• Publication year

Complex network approach

• Community detection

Complex network approach

• Community detection

• Similarity network

Single network

Compute similarity

Detect communities

Detect communities

• Modularity• Minimize connection across communities

Detect communities

• Modularity• Minimize connection across communities

• Infomap• Minimize amount of information that describes the random

walk across communities

Detect communities

Detect communities

Multiplex network

Multiplex network

Find agreement

Find agreement

• Clustering

Find agreement

• Clustering

• Conjunction

Find agreement

• Clustering

• Conjunction

• Logistic regression

Find agreement

• Clustering

• Conjunction

• Logistic regression

• Neural network

Explicit semantic analysis (ESA)

• Find Wikipedia articles that is most related to text

Explicit semantic analysis (ESA)

• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)

Explicit semantic analysis (ESA)

• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)

• In-degree of article

Explicit semantic analysis (ESA)

• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)

• In-degree of article

• Z-score

Affiliation

• String of affiliation

Affiliation

• String of affiliation

• ESA vector of concepts

Affiliation

• String of affiliation

• ESA vector of concepts

• Cosine similarity, “dot product”• {t1:0.5, t2:0.5}, {t1:0.2, t2:0.8}• 0.326

Affiliation

• " Northwestern University, Evanston IL 60201"• "evanston illinois":64.14

• "northwestern university":53.60

• "201213 uic flames men basketball team":50.44

• "howard s becker":50.38

Affiliation

• " Rovira i Virgili University, Tarragona, Spain"• "rovira i virgili university":73.31

• "josep lluis carod rovira":63.33

• "gaspar cervantes de gaeta":62.63

• "tarragona":61.09

Co-authors

• Set of co-authors

Co-authors

• Set of co-authors

• Jaccard index• |Intersection| / |Union|

Co-authors

• Set of co-authors

• Jaccard index• |Intersection| / |Union|

• ['Michelangelo', 'Leonardo'], ['Michelango', 'Leonardo', 'Raphael']• 1/4

Topics

• String of title, abstract

Topics

• String of title, abstract

• Vector of topics

• Cosine similarity, “dot product”

Topics

• "Use of a global metabolic network to curate organismal metabolic networks"• "metabolic network modelling": 346.9

• "bioinformatics": 323.2

• "metagenomics": 306.1

• "biochemical cascade": 301.9

• "genomics": 257.4

Topics

• "Assortative mixing in networks"• "network theory": 238.4

• "complex network": 195.5

• "network science": 158.0

• "social network": 133.0

• "smallworld network": 115.5

Example

• 2,298 publications from 6 authors

Example

• 2,298 publications from 6 authors• Similar names

• Similar topics

• Similar affiliations

Example

• 2,298 publications from 6 authors

• 2.6 million pairs of papers

Example

• 2,298 publications from 6 authors

• 2.6 million pairs of papers

• Filter the links• Network backbone model (Serrano et al.)

Example

• Node: Paper

Example

• Node: Paper

• Link: Similarity, weighted

Example

• Node: Paper

• Link: Similarity, weighted

• Color: Author

Affiliation

Topic

Coauthor

Journal

Supervised learning

Next steps

• Time of publication

Next steps

• Time of publication

• Non-local information

Next steps

• Time of publication

• Non-local information

• Combine information across networks

Next steps

• Time of publication

• Non-local information

• Combine information across networks

• Pruning

Next steps

• Time of publication

• Non-local information

• Combine information across networks

• Pruning

• Validation & Application

Acknowledgements

AndreaLancichinetti

LuisAmaral

ArnauGavalda