Zeng Zeng, Bharadwaj, IEEE TRASACTION ON COMPUTERS, VOL. 55, NO. 11, NOVEMBER 2006
Xiaohan zeng research
-
Upload
xiaohan-zeng -
Category
Technology
-
view
101 -
download
0
Transcript of Xiaohan zeng research
Xiaohan Zeng
Nov. 12, 2013
Research Presentation
Gender Differences in Publication Rate and Impact
The Leaky Pipeline
The Leaky Pipeline
The Leaky Pipeline
The Leaky Pipeline
Bachelor
The Leaky Pipeline
Bachelor
Master
The Leaky Pipeline
Bachelor
Master
Ph.D.
The Leaky Pipeline
Bachelor
Master
Ph.D.
Faculty*
The Leaky Pipeline
The Leaky Pipeline
The Leaky Pipeline
• Women are under-represented in the STEM disciplines, especially in later career stages
The Leaky Pipeline
• Women are under-represented in the STEM disciplines, especially in later career stages
• We lose women along the career path
The productivity gap
The productivity gap
Female
The productivity gap
Male
Female
The productivity gap
Puzzles
• What factors cause gender difference in publication pattern?
Puzzles
• What factors cause gender difference in publication pattern?
• Are these factors specific to the discipline?
The data
• 437,787 publications by 4,292 scientists
The data
• 437,787 publications by 4,292 scientists
• Disambiguation
The data
• 437,787 publications by 4,292 scientists
• Disambiguation
ChemistryChemical Engineering
Ecology Industrial Engineering
Material Science
MolecularBiology
Psychology
The data
• Small fraction of women
The data
• Small fraction of women
• Resource requirements
The data
• Small fraction of women
• Resource requirements
• Career risks
Measuring the productivity gap
Male
Female
Measuring the productivity gap
Male
Female
Measuring the productivity gap
Male
Female
Average
Women get fewer resources
• Women have lower average salary1
1. Ginther, DK, 2005
Resource requirement
DisciplineAverage annual
expenditure per PI [K$]
Chemical Engineering 490
Chemistry 515
Ecology -
Industrial Engineering 94
Material Science 612
Molecular Biology 1,897
Psychology 256
Resource requirement
DisciplineAverage annual
expenditure per PI [K$]
Chemical Engineering 490
Chemistry 515
Ecology -
Industrial Engineering 94
Material Science 612
Molecular Biology 1,897
Psychology 256
Women get fewer resources
2. NIH, http://report.nih.gov/nihdatabook/
• Women have lower average salary1
• Women receive smaller grants2
1. Ginther, DK, 2005
Resource and productivity
• Could research resource requirement affect the gender gap in publication pattern?
Resource and productivity
• Could research resource requirement affect the gender gap in publication pattern?
• Cannot produce as much if given less resource
R2=0.72P<0.04
• Women publish at lower rates compared to men
• Women publish at lower rates compared to men
• In disciplines where resource need is higher, the gap is larger
Publication impact
Publication impact
• h-index
Publication impact
• h-index
Publication impact
• h-index
Publication impact
• h-index
h~n0.53
Male
Female
Publication impact
• z-score of h-index
Male
Female
Publication impact
• z-score of h-index
Male
Female
Average
Difference in publication impact
Difference in publication impact
Women take fewer risks
Women take fewer risks
• Psychological studies:
• Byrnes et al. 1999
• Harris et al. 2006
Women take fewer risks
• Does this affect their career choice and publication impact?
Women take fewer risks
• Does this affect their career choice and publication impact?
• Gender difference in publication impact
D = D(R)
Women take fewer risks
• Does this affect their career choice and publication impact?
• Gender difference in publication impact
D = D(T, P, A)
Career risk
DisciplineSalary premium
of non-academic jobs, P-1
Chemical Engineering 0.39
Chemistry 0.61
Ecology 0.38
Industrial Engineering 0.41
Material Science 0.38
Molecular Biology 0.62
Psychology 0.47
Career risk
DisciplineSalary premium
of non-academic jobs, P-1
Time to career independence,
T [y]
Chemical Engineering 0.39 5.4
Chemistry 0.61 6.2
Ecology 0.38 8.2
Industrial Engineering 0.41 6.1
Material Science 0.38 6.6
Molecular Biology 0.62 7.3
Psychology 0.47 8.2
Career risk
DisciplineSalary premium
of non-academic jobs, P-1
Time to career independence,
T [y]
Frac. graduatesgoing to
academia, A
Chemical Engineering 0.39 5.4 0.21
Chemistry 0.61 6.2 0.32
Ecology 0.38 8.2 0.71
Industrial Engineering 0.41 6.1 0.56
Material Science 0.38 6.6 0.20
Molecular Biology 0.62 7.3 0.57
Psychology 0.47 8.2 0.42
Career risk
DisciplineSalary premium
of non-academic jobs, P-1
Time to career independence,
T [y]
Frac. graduatesgoing to
academia, A
Chemical Engineering 0.39 5.4 0.21
Chemistry 0.61 6.2 0.32
Ecology 0.38 8.2 0.71
Industrial Engineering 0.41 6.1 0.56
Material Science 0.38 6.6 0.20
Molecular Biology 0.62 7.3 0.57
Psychology 0.47 8.2 0.42
• Career risk R = R(T, P, A)
D = D(T, P, A)
R2=0.96P=0.001
D = D(T, P, A)
• Women publish with higher impact
• Women publish with higher impact
• In riskier disciplines, the gender gap becomes larger
• Women publish with higher impact
• In riskier disciplines, the gender gap becomes larger
• Self-selection among females
Conclusions
Conclusions
Higher resource
requirement
Conclusions
Higher resource
requirement
Lower productivity
of women
Conclusions
Higher resource
requirement
Lower productivity
of women
Higher career risk
Conclusions
Higher resource
requirement
Lower productivity
of women
Higher career risk
Higher publication impact of women
Acknowledgements
• NSF SBE 0624318, IIS 0830388
• Spanish DGICYT FIS2010-18639
JordiDuch
Marta Sales-Pardo
FilippoRadicchi
TeresaWoodruff
LuisAmaral
Time to career independence
Author disambiguationusing community detection and explicit semantic analysis
Author disambiguation is important
Author disambiguation is important
Author disambiguation is important
Michael Jordan
Author disambiguation is important
Michael Jordan
Current approaches
• Machine learning• K-nearest neighbors
• Support vector machine
• Etc.
How to tell them apart
• Essentially a clustering problem
Papers should be similar
• Research area
Papers should be similar
• Research area
• Affiliation
Papers should be similar
• Research area
• Affiliation
• Co-authors
Papers should be similar
• Research area
• Affiliation
• Co-authors
• Journal
Papers should be similar
• Research area
• Affiliation
• Co-authors
• Journal
• Publication year
Complex network approach
• Community detection
Complex network approach
• Community detection
• Similarity network
Single network
Compute similarity
Detect communities
Detect communities
• Modularity• Minimize connection across communities
Detect communities
• Modularity• Minimize connection across communities
• Infomap• Minimize amount of information that describes the random
walk across communities
Detect communities
Detect communities
Multiplex network
Multiplex network
Find agreement
Find agreement
• Clustering
Find agreement
• Clustering
• Conjunction
Find agreement
• Clustering
• Conjunction
• Logistic regression
Find agreement
• Clustering
• Conjunction
• Logistic regression
• Neural network
Explicit semantic analysis (ESA)
• Find Wikipedia articles that is most related to text
Explicit semantic analysis (ESA)
• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)
Explicit semantic analysis (ESA)
• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)
• In-degree of article
Explicit semantic analysis (ESA)
• Find Wikipedia articles that is most related to text• Term frequency – inverse document frequency (tf-idf)
• In-degree of article
• Z-score
Affiliation
• String of affiliation
Affiliation
• String of affiliation
• ESA vector of concepts
Affiliation
• String of affiliation
• ESA vector of concepts
• Cosine similarity, “dot product”• {t1:0.5, t2:0.5}, {t1:0.2, t2:0.8}• 0.326
Affiliation
• " Northwestern University, Evanston IL 60201"• "evanston illinois":64.14
• "northwestern university":53.60
• "201213 uic flames men basketball team":50.44
• "howard s becker":50.38
Affiliation
• " Rovira i Virgili University, Tarragona, Spain"• "rovira i virgili university":73.31
• "josep lluis carod rovira":63.33
• "gaspar cervantes de gaeta":62.63
• "tarragona":61.09
Co-authors
• Set of co-authors
Co-authors
• Set of co-authors
• Jaccard index• |Intersection| / |Union|
Co-authors
• Set of co-authors
• Jaccard index• |Intersection| / |Union|
• ['Michelangelo', 'Leonardo'], ['Michelango', 'Leonardo', 'Raphael']• 1/4
Topics
• String of title, abstract
Topics
• String of title, abstract
• Vector of topics
• Cosine similarity, “dot product”
Topics
• "Use of a global metabolic network to curate organismal metabolic networks"• "metabolic network modelling": 346.9
• "bioinformatics": 323.2
• "metagenomics": 306.1
• "biochemical cascade": 301.9
• "genomics": 257.4
Topics
• "Assortative mixing in networks"• "network theory": 238.4
• "complex network": 195.5
• "network science": 158.0
• "social network": 133.0
• "smallworld network": 115.5
Example
• 2,298 publications from 6 authors
Example
• 2,298 publications from 6 authors• Similar names
• Similar topics
• Similar affiliations
Example
• 2,298 publications from 6 authors
• 2.6 million pairs of papers
Example
• 2,298 publications from 6 authors
• 2.6 million pairs of papers
• Filter the links• Network backbone model (Serrano et al.)
Example
• Node: Paper
Example
• Node: Paper
• Link: Similarity, weighted
Example
• Node: Paper
• Link: Similarity, weighted
• Color: Author
Affiliation
Topic
Coauthor
Journal
Supervised learning
Next steps
• Time of publication
Next steps
• Time of publication
• Non-local information
Next steps
• Time of publication
• Non-local information
• Combine information across networks
Next steps
• Time of publication
• Non-local information
• Combine information across networks
• Pruning
Next steps
• Time of publication
• Non-local information
• Combine information across networks
• Pruning
• Validation & Application
Acknowledgements
AndreaLancichinetti
LuisAmaral
ArnauGavalda