ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS...
-
Upload
phungnguyet -
Category
Documents
-
view
223 -
download
4
Transcript of ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS...
© Fraunhofer ISI
European Research Council
F i n a l W o r k s h o p D B F & E R A C E P C S A s
2 0 / 2 1 F e b r u a r y 2 0 1 3 , B r u s s e l s
ERACEP: IDENTIFYING EMERGING
RESEARCH AREAS IN THE ERC RESEARCH
PROPOSALS
©P
ark
er
/ S
PL
/ A
ge
ntu
r F
ocu
s
Dr. Thomas Reiss
Etienne Vignola-Gagné
Piret Kukk
Fraunhofer ISI Karlsruhe
Prof. Dr. Wolfgang Glänzel
Dr. Bart Thijs
KU Leuven
© Fraunhofer ISI
Seite 2
European Research Council
2
Objectives:
Identification of topically emerging research areas and analysis to
what extent the activities supported by the ERC cover and
contribute to these research areas
Concept:
Combination of two perspectives:
From the ‘landscape of science’
From ERC funding procedures
In t roduct ion
© Fraunhofer ISI
Seite 3
European Research Council
3
Concept
© Fraunhofer ISI
Seite 4
European Research Council
Landscape of Science
40 dynamic fields
Selection of 20 most dynamic fields
for fine mapping
Pilot with 6 fields
composed of 46 clusters
In-depth analysis
of 10 emerging clusters
Expert
validation
Ident i f icat ion of emerging f ie lds
Matching of
grant
applications
© Fraunhofer ISI
Seite 5
European Research Council
5
Aims:
• Explore how ERC grant applications can be mapped to dynamic
fields and clusters identified via bibliometric analyses and expert
validation
• Analyse to which extent emerging topics are addressed by ERC grant
applications
• Find which proposals from the 2009 Starting Grants Call (932
proposals, 165 successful) can be classified within the emerging
topics previously identified
These emerging topics should be present in the proposals made to ERC
calls.
Matching
© Fraunhofer ISI
Seite 6
European Research Council
6
2009 Starting Grants Call
• 932 proposals, 165 successful
• 888 could be used for analysis
Matching sample
© Fraunhofer ISI
Seite 7
European Research Council
7
Two approaches:
1. Identify journals mentioned in grant applications and map to dynamic fields
defined by WoS subject categories
2. Text matching between proposals and thematic clusters via text mining
Possible value for users
Ex ante:
Classify applications into structure inherent to landscape of science
Preselect proposals: emerging – established
Provide bibliometric mirror for qualitative expert assessment
Ex post:
Reflection on selection procedure
Matching approach
© Fraunhofer ISI
Seite 8
European Research Council
8
Matching 1: WoS Subject Categor ies
ass ignment to proposals procedure
Proposal
99999
Journal of
Optics
Angewandte
Chemie Int.
1. Optics
2. Physics, applied
Text Mining
3. Physics, Atomic,
Molecular & Chemical
Fictive
example
4. Chemistry, Physical
5. Chemistry, Inorganic &
Nuclear
© Fraunhofer ISI
Seite 9
European Research Council
9
Opportunities:
Allows automated construction of a sample of proposals belonging to
selected WoS Subject Categories (and eventually, emerging topics)
Limitations:
False positives means additional qualitative work is required to
finalise the sample (mitigated by low rate of false negatives);
Text mining also captures keywords contained in journal titles but
that may not refer to articles in the proposal;
Previous publications in the proposals do not always indicate the
content of the project presented therein;
Publications cited in principle might tend towards mainstream
Focus on second matching approach
Matching 1 – oppor tun i t ies and
l imi ta t ions
© Fraunhofer ISI
Seite 10
European Research Council
10
The text matching is built in a vector space model. This ‘Vector Space
Model’ (VSM) is most appropriate for representing the content in
documents (Salton, 1975).
Each document is a vector with its own position in a space. Each
vector has as many dimensions as there are terms. If a term occurs in
a document, its value is not equal to zero.
These values are defined as
TF-iDF = term frequency * inverse document frequency.
This allows down-weighting of common words as they will appear in
many documents.
Similarities between proposals and papers are calculated as the
cosine of the angles between the vectors representing each document
in the term-based vector space and expressed by TF-iDF.
Matching 2 – Text Matching of
Appl icat ions and Scient i f ic Papers
© Fraunhofer ISI
Seite 11
European Research Council
11
The mapping of the proposals to topics or detected clusters is based on the
average similarity that the applications have to all the linked papers in one
cluster. A paper is linked to a proposal if they share at least one term. So the
average is taken over all non-zero similarities.
The threshold to define a mapping between an application and a cluster is set
at least 0.025.
298 of all processed applications (888) are matched with at least one cluster.
This is one third. 173 applications have multiple assignments.
Example: Application 241161 entitled : ‘Intergenerational correlations of
schooling, income and health: an investigation of the underlying mechanisms
is mapped with 4 clusters:
Quality of Life
Health Policy
Tobacco
Gender and Family
Matching 2 – Text Matching of
Appl icat ions and Scient i f ic Papers
© Fraunhofer ISI
Seite 12
European Research Council
12
no assignment 66%
unique assignments
14%
multiple assignments
20%
1/3 of all proposals
matched at least to one
cluster within 6 selected
dynamic fields (out of 20)
Matching 2 – resu l ts
© Fraunhofer ISI
Seite 13
European Research Council
13
Ex ante:
Emerging topics (*) addressed well by proposals
More conventional topics (e.g. “oil”, “battery”, “coal”) less frequent in
proposals
Ex post:
Highest success rate for “renewables”
However, also proposals in emerging topics* highly successful
Matching 2 resul ts – Energy and fue ls
Evaluation
Cluster Label Absolute Relative (%) Share of Success
Oil, Bio Oil 4 3,2 0%
Batteries, Storage 5 4,0 0%
Renewable Energy 14 11,3 43%
BioFuel * 21 16,9 14%
Kinetics 22 17,7 9%
Fuel Cells * 30 24,2 20%
Coal 10 8,1 0%
Combustion 18 14,5 11%
Matched Applications
© Fraunhofer ISI
Seite 14
European Research Council
14
Ex ante:
Strong focus on “scaffolds” and “nanostructuring”
Emerging topics (*) addressed rarely
Ex post:
By far highest success rate for imaging approaches
Proposal in emerging topics* not successful, small number effect?
Matching 2 resul ts – B iomedica l
Engineer ing
Cluster Label Absolute Relative (%) Share of Success
Catheter Ablation 3 1,5 0%
Brain Computer * 4 2,0 0%
Nanostructured 45 22,3 18%
Cartilage 18 8,9 17%
Tomography 32 15,8 28%
Scaffolds 60 29,7 15%
Bone Cement 22 10,9 9%
Cardiac 13 6,4 15%
Kinematics * 5 2,5 0%
Matched Applications
© Fraunhofer ISI
Seite 15
European Research Council
15
Ex ante:
Strong focus on waste water and bentic zone issues
Emerging topics (*) addressed rarely
Ex post:
Proposals for emerging topic “radiation” highly successful
Matching 2 resul ts – Env i ronmenta l
Sc iences
Cluster Label Absolute Relative (%) Share of Success
Diversity 17 15,6 35%
Radiation * 5 4,6 80%
Nanomaterials * 4 3,7 0%
Waste Water 20 18,3 10%
Bentic Zone 23 21,1 17%
Sediments 16 14,7 13%
Air Pollution 10 9,2 10%
Soil 14 12,8 21%
Matched Applications
© Fraunhofer ISI
Seite 16
European Research Council
16
Ex ante:
Strong focus on emerging topic “State-Region”
Ex post:
High success rate in all topics
Proposals in emerging topic “State-Region” highly successful
Matching 2 resul ts – Geography
Cluster Label Absolute Relative (%) Share of Success
Living 13 10,0 31%
Natural Resources 23 17,7 35%
Community 26 20,0 31%
State-Region * 43 33,1 28%
Diversity 25 19,2 20%
Matched Applications
© Fraunhofer ISI
Seite 17
European Research Council
17
Ex ante:
Applications focusing largely on cancer issues
Emerging topic (*) not covered by applications
Ex post:
Numbers too low
Matching 2 resul ts – Obstet r ics
Cluster Label Absolute Relative (%) Share of Success
Prenatal Diagnosis * 1 2,4 0%
Cesarean 4 9,5 25%
Pregnancy Complications 1 2,4 0%
Menopause 3 7,1 33%
Surgery: Incontinence 0 0,0
Infertility 5 11,9 0%
Cancer 28 66,7 18%
Matched Applications
© Fraunhofer ISI
Seite 18
European Research Council
18
Ex ante:
Emerging topics (*) not addressed very well by applications
Ex post:
Rather high success rate for emerging topic “AIDS, Africa”, however only
low absolute numbers
Matching 2 resul ts – Publ ic Heal th
Cluster Label Absolute Relative (%) Share of Success
Living Environment * 1 2,0 0%
Aids, Africa * 6 11,8 17%
Qualtiy of Life 8 15,7 13%
Health Policy 17 33,3 12%
Tobacco 7 13,7 14%
Gender and Family 12 23,5 17%
Matched Applications
© Fraunhofer ISI
Seite 19
European Research Council
19
Advantage of the applied methodology:
Text-based links do not require parsing of documents to journal references or
individual papers.
Links can be quantified.
Links can be calculated as soon as the applications are submitted.
Possible problems:
At this moment the proposal contains all the text that has been submitted: Summary,
CV of PI, publication list, detailed description.
Some common terms (e.g. cell, migration) can distort classification.
In public health we found several proposal that were dealing with a disease from a
purely medical perspective while the topic deals with the societal impact of the
disease. Of course, they share a lot of common terms and are even in the same
context.
Possible improvements:
Separated sections or even files for the proposals.
Increase the weight of phrases in the calculation of similarities.
Manual validation cannot be eliminated.
Matching 2 – d iscuss ion of method
© Fraunhofer ISI
Seite 20
European Research Council
20
Ex ante
1. ERC grant applications can be classified according to a dynamically evolving inherent
structure of the landscape of science.
2. Evaluation panels can be organised along such structures and adjusted dynamically.
3. Thereby the structuring of grant applications and setting up of evaluation panels are
driven mainly by inherent dynamics of the science landscape and not by static external
classifications. This may allow better adjustments of procedures to the evolution of
science.
4. ERC grant applications can be mapped to dynamic fields within the landscape of
sciences.
5. ERC grant applications can be matched with thematic clusters (emerging or
established) within dynamic fields. Thereby proposals can be identified that map to
emerging topics.
6. This could be used for supporting a pre-selection of applications and the assignment
of proposals to evaluation panels.
Conclus ions f rom the user perspect ive
© Fraunhofer ISI
Seite 21
European Research Council
21
Ex post
1. Self-evaluation of selections procedures, not in a sense of simple “good” or
“bad” judgments. Rather,
facilitates the identification of differences in procedures between scientific
fields based on coverage of emerging topics and success rates for all
topics;
points to topics, where reasons for selection/non-selection may need
additional consideration.
2. Provides hints for improving procedures, e.g. organization of structure and
content of proposals in a way that facilitates routine bibliometric analysis
(separate files for scientific/technical descriptions and information related
to PI).
Conclus ions f rom the user perspect ive
© Fraunhofer ISI
Seite 22
European Research Council
Coverage of emerging
topic
Success rate in
emerging topics
Example
high high energy and fuels,
geography
high low ?
low high environmental sciences
low low biomedical engineering
Sel f evaluat ion - example
Observations:
Differences between scientific fields concerning coverage and success
rates, reasons?
If emerging topics are addressed well, success rates of those applications
are high (reverse not correct)