ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS...

22
© Fraunhofer ISI European Research Council Final Workshop DBF & ERACEP CSAs 20/21 February 2013, Brussels ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH PROPOSALS ©Parker / SPL / Agentur Focus Dr. Thomas Reiss Etienne Vignola-Gagné Piret Kukk Fraunhofer ISI Karlsruhe Prof. Dr. Wolfgang Glänzel Dr. Bart Thijs KU Leuven

Transcript of ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS...

Page 1: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

European Research Council

F i n a l W o r k s h o p D B F & E R A C E P C S A s

2 0 / 2 1 F e b r u a r y 2 0 1 3 , B r u s s e l s

ERACEP: IDENTIFYING EMERGING

RESEARCH AREAS IN THE ERC RESEARCH

PROPOSALS

©P

ark

er

/ S

PL

/ A

ge

ntu

r F

ocu

s

Dr. Thomas Reiss

Etienne Vignola-Gagné

Piret Kukk

Fraunhofer ISI Karlsruhe

Prof. Dr. Wolfgang Glänzel

Dr. Bart Thijs

KU Leuven

Page 2: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 2

European Research Council

2

Objectives:

Identification of topically emerging research areas and analysis to

what extent the activities supported by the ERC cover and

contribute to these research areas

Concept:

Combination of two perspectives:

From the ‘landscape of science’

From ERC funding procedures

In t roduct ion

Page 3: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 3

European Research Council

3

Concept

Page 4: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 4

European Research Council

Landscape of Science

40 dynamic fields

Selection of 20 most dynamic fields

for fine mapping

Pilot with 6 fields

composed of 46 clusters

In-depth analysis

of 10 emerging clusters

Expert

validation

Ident i f icat ion of emerging f ie lds

Matching of

grant

applications

Page 5: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 5

European Research Council

5

Aims:

• Explore how ERC grant applications can be mapped to dynamic

fields and clusters identified via bibliometric analyses and expert

validation

• Analyse to which extent emerging topics are addressed by ERC grant

applications

• Find which proposals from the 2009 Starting Grants Call (932

proposals, 165 successful) can be classified within the emerging

topics previously identified

These emerging topics should be present in the proposals made to ERC

calls.

Matching

Page 6: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 6

European Research Council

6

2009 Starting Grants Call

• 932 proposals, 165 successful

• 888 could be used for analysis

Matching sample

Page 7: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 7

European Research Council

7

Two approaches:

1. Identify journals mentioned in grant applications and map to dynamic fields

defined by WoS subject categories

2. Text matching between proposals and thematic clusters via text mining

Possible value for users

Ex ante:

Classify applications into structure inherent to landscape of science

Preselect proposals: emerging – established

Provide bibliometric mirror for qualitative expert assessment

Ex post:

Reflection on selection procedure

Matching approach

Page 8: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 8

European Research Council

8

Matching 1: WoS Subject Categor ies

ass ignment to proposals procedure

Proposal

99999

Journal of

Optics

Angewandte

Chemie Int.

1. Optics

2. Physics, applied

Text Mining

3. Physics, Atomic,

Molecular & Chemical

Fictive

example

4. Chemistry, Physical

5. Chemistry, Inorganic &

Nuclear

Page 9: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 9

European Research Council

9

Opportunities:

Allows automated construction of a sample of proposals belonging to

selected WoS Subject Categories (and eventually, emerging topics)

Limitations:

False positives means additional qualitative work is required to

finalise the sample (mitigated by low rate of false negatives);

Text mining also captures keywords contained in journal titles but

that may not refer to articles in the proposal;

Previous publications in the proposals do not always indicate the

content of the project presented therein;

Publications cited in principle might tend towards mainstream

Focus on second matching approach

Matching 1 – oppor tun i t ies and

l imi ta t ions

Page 10: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 10

European Research Council

10

The text matching is built in a vector space model. This ‘Vector Space

Model’ (VSM) is most appropriate for representing the content in

documents (Salton, 1975).

Each document is a vector with its own position in a space. Each

vector has as many dimensions as there are terms. If a term occurs in

a document, its value is not equal to zero.

These values are defined as

TF-iDF = term frequency * inverse document frequency.

This allows down-weighting of common words as they will appear in

many documents.

Similarities between proposals and papers are calculated as the

cosine of the angles between the vectors representing each document

in the term-based vector space and expressed by TF-iDF.

Matching 2 – Text Matching of

Appl icat ions and Scient i f ic Papers

Page 11: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 11

European Research Council

11

The mapping of the proposals to topics or detected clusters is based on the

average similarity that the applications have to all the linked papers in one

cluster. A paper is linked to a proposal if they share at least one term. So the

average is taken over all non-zero similarities.

The threshold to define a mapping between an application and a cluster is set

at least 0.025.

298 of all processed applications (888) are matched with at least one cluster.

This is one third. 173 applications have multiple assignments.

Example: Application 241161 entitled : ‘Intergenerational correlations of

schooling, income and health: an investigation of the underlying mechanisms

is mapped with 4 clusters:

Quality of Life

Health Policy

Tobacco

Gender and Family

Matching 2 – Text Matching of

Appl icat ions and Scient i f ic Papers

Page 12: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 12

European Research Council

12

no assignment 66%

unique assignments

14%

multiple assignments

20%

1/3 of all proposals

matched at least to one

cluster within 6 selected

dynamic fields (out of 20)

Matching 2 – resu l ts

Page 13: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 13

European Research Council

13

Ex ante:

Emerging topics (*) addressed well by proposals

More conventional topics (e.g. “oil”, “battery”, “coal”) less frequent in

proposals

Ex post:

Highest success rate for “renewables”

However, also proposals in emerging topics* highly successful

Matching 2 resul ts – Energy and fue ls

Evaluation

Cluster Label Absolute Relative (%) Share of Success

Oil, Bio Oil 4 3,2 0%

Batteries, Storage 5 4,0 0%

Renewable Energy 14 11,3 43%

BioFuel * 21 16,9 14%

Kinetics 22 17,7 9%

Fuel Cells * 30 24,2 20%

Coal 10 8,1 0%

Combustion 18 14,5 11%

Matched Applications

Page 14: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 14

European Research Council

14

Ex ante:

Strong focus on “scaffolds” and “nanostructuring”

Emerging topics (*) addressed rarely

Ex post:

By far highest success rate for imaging approaches

Proposal in emerging topics* not successful, small number effect?

Matching 2 resul ts – B iomedica l

Engineer ing

Cluster Label Absolute Relative (%) Share of Success

Catheter Ablation 3 1,5 0%

Brain Computer * 4 2,0 0%

Nanostructured 45 22,3 18%

Cartilage 18 8,9 17%

Tomography 32 15,8 28%

Scaffolds 60 29,7 15%

Bone Cement 22 10,9 9%

Cardiac 13 6,4 15%

Kinematics * 5 2,5 0%

Matched Applications

Page 15: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 15

European Research Council

15

Ex ante:

Strong focus on waste water and bentic zone issues

Emerging topics (*) addressed rarely

Ex post:

Proposals for emerging topic “radiation” highly successful

Matching 2 resul ts – Env i ronmenta l

Sc iences

Cluster Label Absolute Relative (%) Share of Success

Diversity 17 15,6 35%

Radiation * 5 4,6 80%

Nanomaterials * 4 3,7 0%

Waste Water 20 18,3 10%

Bentic Zone 23 21,1 17%

Sediments 16 14,7 13%

Air Pollution 10 9,2 10%

Soil 14 12,8 21%

Matched Applications

Page 16: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 16

European Research Council

16

Ex ante:

Strong focus on emerging topic “State-Region”

Ex post:

High success rate in all topics

Proposals in emerging topic “State-Region” highly successful

Matching 2 resul ts – Geography

Cluster Label Absolute Relative (%) Share of Success

Living 13 10,0 31%

Natural Resources 23 17,7 35%

Community 26 20,0 31%

State-Region * 43 33,1 28%

Diversity 25 19,2 20%

Matched Applications

Page 17: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 17

European Research Council

17

Ex ante:

Applications focusing largely on cancer issues

Emerging topic (*) not covered by applications

Ex post:

Numbers too low

Matching 2 resul ts – Obstet r ics

Cluster Label Absolute Relative (%) Share of Success

Prenatal Diagnosis * 1 2,4 0%

Cesarean 4 9,5 25%

Pregnancy Complications 1 2,4 0%

Menopause 3 7,1 33%

Surgery: Incontinence 0 0,0

Infertility 5 11,9 0%

Cancer 28 66,7 18%

Matched Applications

Page 18: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 18

European Research Council

18

Ex ante:

Emerging topics (*) not addressed very well by applications

Ex post:

Rather high success rate for emerging topic “AIDS, Africa”, however only

low absolute numbers

Matching 2 resul ts – Publ ic Heal th

Cluster Label Absolute Relative (%) Share of Success

Living Environment * 1 2,0 0%

Aids, Africa * 6 11,8 17%

Qualtiy of Life 8 15,7 13%

Health Policy 17 33,3 12%

Tobacco 7 13,7 14%

Gender and Family 12 23,5 17%

Matched Applications

Page 19: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 19

European Research Council

19

Advantage of the applied methodology:

Text-based links do not require parsing of documents to journal references or

individual papers.

Links can be quantified.

Links can be calculated as soon as the applications are submitted.

Possible problems:

At this moment the proposal contains all the text that has been submitted: Summary,

CV of PI, publication list, detailed description.

Some common terms (e.g. cell, migration) can distort classification.

In public health we found several proposal that were dealing with a disease from a

purely medical perspective while the topic deals with the societal impact of the

disease. Of course, they share a lot of common terms and are even in the same

context.

Possible improvements:

Separated sections or even files for the proposals.

Increase the weight of phrases in the calculation of similarities.

Manual validation cannot be eliminated.

Matching 2 – d iscuss ion of method

Page 20: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 20

European Research Council

20

Ex ante

1. ERC grant applications can be classified according to a dynamically evolving inherent

structure of the landscape of science.

2. Evaluation panels can be organised along such structures and adjusted dynamically.

3. Thereby the structuring of grant applications and setting up of evaluation panels are

driven mainly by inherent dynamics of the science landscape and not by static external

classifications. This may allow better adjustments of procedures to the evolution of

science.

4. ERC grant applications can be mapped to dynamic fields within the landscape of

sciences.

5. ERC grant applications can be matched with thematic clusters (emerging or

established) within dynamic fields. Thereby proposals can be identified that map to

emerging topics.

6. This could be used for supporting a pre-selection of applications and the assignment

of proposals to evaluation panels.

Conclus ions f rom the user perspect ive

Page 21: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 21

European Research Council

21

Ex post

1. Self-evaluation of selections procedures, not in a sense of simple “good” or

“bad” judgments. Rather,

facilitates the identification of differences in procedures between scientific

fields based on coverage of emerging topics and success rates for all

topics;

points to topics, where reasons for selection/non-selection may need

additional consideration.

2. Provides hints for improving procedures, e.g. organization of structure and

content of proposals in a way that facilitates routine bibliometric analysis

(separate files for scientific/technical descriptions and information related

to PI).

Conclus ions f rom the user perspect ive

Page 22: ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN … · ERACEP: IDENTIFYING EMERGING RESEARCH AREAS IN THE ERC RESEARCH ... Identify journals mentioned in grant applications and map

© Fraunhofer ISI

Seite 22

European Research Council

Coverage of emerging

topic

Success rate in

emerging topics

Example

high high energy and fuels,

geography

high low ?

low high environmental sciences

low low biomedical engineering

Sel f evaluat ion - example

Observations:

Differences between scientific fields concerning coverage and success

rates, reasons?

If emerging topics are addressed well, success rates of those applications

are high (reverse not correct)