Ilya Ponomarev 1, Pawel Sulima 1, Jodi Basner 1, Unni Jensen 1, Joshua Schnell 1, Karen Jo 2, and...

22
Ilya Ponomarev 1 , Pawel Sulima 1 , Jodi Basner 1 , Unni Jensen 1 , Joshua Schnell 1 , Karen Jo 2 , and Nicole Moore 2 A New Approach for Automated Author Discipline Categorization and Evaluation of Cross-Disciplinary Collaborations for Grant programs ilya.ponomarev@thomsonreuters .com 1 Custom Analytics, Rockville, MD 2 National Cancer Institute, Bethesda, MD 10/16/2013 5:30 PM

Transcript of Ilya Ponomarev 1, Pawel Sulima 1, Jodi Basner 1, Unni Jensen 1, Joshua Schnell 1, Karen Jo 2, and...

Ilya Ponomarev1, Pawel Sulima1, Jodi Basner1, Unni Jensen1, Joshua Schnell1, Karen Jo2, and Nicole Moore2

A New Approach for Automated Author Discipline Categorization and Evaluation of Cross-Disciplinary

Collaborations for Grant programs

[email protected]

1Custom Analytics, Rockville, MD2National Cancer Institute, Bethesda, MD

10/16/2013 5:30 PM

Why Cross-disciplinary Research?

2

“Interdisciplinary research can be one of the most productive and inspiring of human pursuits”

Facilitating Interdisciplinary ResearchNational Academy of Sciences, 2005

• Innovation increasingly occurs at the boundaries of disciplines• Complex “Puzzles” require diverse background• Data avalanche from multiple sources requires fusion of information• Convergent technologies require integration across disciplines

US Government Funding of Cross-disciplinary R&D

3

DODDOENSFNIHNASA

How to Measure Success of Cross-disciplinary Program?

THIS TALK:

1.In order to measure cross-disciplinarity define disciplines as accurate as possible

2.General approach of automatic assigning grant specific categories to papers and people

3.Application to NCI PS-OC grant program classification?

4

See also J. Basner, “Evaluating Collaboration and Outcomes of Health Research” Friday, 10/18/2013, 11:00am at Gunston East Rm

NCI Physical Sciences-Oncology Centers

5

12 centers, 250 Researchers

09/2009-Current

Institute Facilitate Generate

Evaluation: Birds View1. Use publications as a proxy of outcome

6

2006-2008:3,367 pubs

2009-2012:601 reported pubs

2. Compare baseline data set (2006-2008) with ongoing research data set (2009-2012) Web of Science+ Medline

166 active PS-OC investigators202,000 references4,199 journal titles

productivity impactcollaborationFields convergence

J. Basner, Friday, 10/18/2013

Evaluation: Birds ViewApproach:

7

PS-OC 2/3 broad categories

Onc

olog

y

Phy

sica

l Sci

ence

s

Life

Sci

ence

s

PS-OC 3 broad categories

Onc

olog

y

Phy

sica

l Sci

ence

s

Life

Sci

ence

s266 Web of Science Journal Subject Categories

8

Has Oncology SCMultiple SCs per journals (up to 7) Multidisciplinary (meaningless, but “Science”, “Nature”)Some SCs are already inter-disciplinaryLSs dominates after aggregation

22 ESI Subject Categories

9

One SC per journalDoes not have Oncology Multidisciplinary SC exists alsoClinical medicine?LSs dominates after aggregation

Mapping. Challenges

Approach:

1.Intermediate map on extended 6 Broad Categories

2.Paper level SC assignment based on references 10

PS-OC 3 broad categories

Onc

olog

y

Phy

sica

l Sci

ence

s

Life

Sci

ence

s

Web of Science 266 Journal SCs

Web of Science 22 Broad ESI categories

One SC per journalDoes not have Oncology Multidisciplinary SC exists alsoClinical medicine?LSs dominates after aggregation

Has Oncology SCMultiple SCs per journals MultidisciplinarySome SCs are inter-disciplinaryLSs dominates after aggregation

Step 1. Introduce 6 Intermediate PS-OC Categories for Better Selection:

11

PS – Physical Sciences

LS – Life Sciences

OC – Oncology

MED – Medicine

OTH – Others

MULT – Multidisciplinary

11

(very often MED journals are closer to ON than LS)

Will be dropped on final stage

Step 2. Map 265 WoS JSC to 6 PS-OC Categories:

12

Examples:

a) Obvious: Acoustics PS, Chemistry, Analytical PS

Oncology OC, Management OTH

b) Dominant: Biophysics PS

c) Dominant: Physics, Multidisciplinary PS

d) Meaningless: Multidisciplinary MULT(usually published in “Nature”, “Science” or “PNAS”)

Meaningless in terms of assignment PS-OC category: article published in MULT journal can be about PS, or about LS, or OC. Usually, it is not interdisciplinary article. Additional re-classification of article’s research field is needed based on references.

Step 3. Assign PS-OC Categories Weights to Each Journal

13

(Journals in WoS can have 1 or 2, or 3, … even 7 SCs)

Examples:

Journal “Radiation Research” – 3 SCs:

Biology LSBiophysics PSRadiology, NM PS

LSPS

Map Select distinctPS-OC categories

2

Count total (denominator) Weights)

LS=1/2PS=1/2OC=0MED =0MUL=0OTH=0

Each journal should be counted equally

Step 4. Calculate combined J-R weights for publications:

14

Example:

Coffey D., Getzenberg R. JAMA, 2006 1 journal cat (MED=1) 26 Refs:

14

Journal weights Aver. Refs Weights

LS=0PS=0MED=1OC=0MUL=0OTH=0

LS=0.23PS=0.04MED=0.17OC=0.36MUL=0.19OTH=0

½ (Journal + Refs)

LS=0.12PS=0.019MED=0.58OC=0.18MUL=0.1OTH=0

Better assignment of paper’s field based oninformation what paper cites

Step 5. Collect all publications for each investigator, calculate average weights, and rank PS-OC categories:

15

Example.

David A 8 pubs: Average JR weights

Averaged J-R Weights

LS =0.32PS =0.04MED=0.23OC =0.41OTH =0.01

Person Inter-disciplinarity

LS =2PS =4MED=3OC =1OTH =5

Ranks

3

Step 6. Redistribute MED and OTH weights between OC,LS, and PS

16

LS =0.32PS =0.04MED=0.23OC =0.41OTH =0.01

LS =0.4PS =0.05OC =0.55

Validation

17

At the beginning of the program: Investigators self-nominated themselves as oncologists or physicists

Applications: how publication patterns change

18

Future Development

19

19

Physical Scientist

Oncologist

Life Scientist

PS-OC Network Investigators Outside Network Co-authors

Conclusions

20

• Automated approach for decomposition of scientific publications into grant specific discipline categories

• Multi-step method with intermediate mapping• Weighted SC assignment based on article’s and its references’ SCs• Precision-recall validation based on investigators’ self-

categorizations• Oncologists within the NCI’s PS-OC program are publishing more

physical sciences research and physical scientists are publishing more oncology or life sciences research during years of program participation.

[email protected]

Thomson ReutersCustom Analytics

Rockville, MD

SUPPORTING SLIDES