Download - Su yan venueranking

Transcript

Toward Alternative Measures for Ranking Venues: A Caseof Database Research Community

Su Yan - IBM Almaden Research LabDongwon Lee - The Pennsylvania State University

Su Yan, Dongwon Lee, Toward Alternative Measures for Ranking Venues: A Case of Database Research Community, ACM/IEEE-CS Joint Conference on Digital libraries (JCDL) , 2007, 235 - 244

Introduction

Publication venue ranking “How good is a journal X?” “Is a conference X better than Y?”

Publication venue ranking is often closely related with important issues : Evaluating the contribution of individual

scholars/research groups Subscription decision making in libraries

Motivation 1/2 – Citation free?

Do we have to use citation analysis in venue ranking? Various meta information exist Citation meta data are harder to extract and parse, and

contain more errors Meta data like author names also convey important

information

Goal: Defining metrics without using citation analysis Easy to implement Meta data quality is trustworthy – enable large scale

venue ranking by using automatically extracted clean meta data

Motivation 2/2 – Enhance citation

Existing citation-based methods tend to consider only the explicit citation relationship Decision of making references depends on many factors

Most citation-based methods focus on the ranking of journals Conferences are becoming more important, e.g. in CS

Citation patterns are different between journals and conferences Directly apply existing journal analysis methodology to

conferences may not be appropriate

Conferences are being Important in CS

NRC ranks all the Ph.D. programs in US recentlyFor computer science

CRA convinced NRC to adopt new sources Originally relying on citations from the Thomson-

Reuters Web of Science database now count publications in journals and conferences

based on CVs submitted by faculty members

http://www.cra.org/resources/crn-online-view/reinvigorating_the_field/

Motivation 2/2 – Enhance citation

Existing citation-based methods tend to consider only the explicit citation relationship Decision of making references depends on many factors

Most citation-based methods focus on the ranking of journals Conferences are becoming more important, e.g. in CS

Citation patterns are different between journals and conferences Directly apply existing journal analysis methodology to

conferences may not be appropriate

Yearly distribution of # of ACM TODS vs. SIGMOD papers being cited in 2002

Motivation 2/2 – Enhance citation Existing citation-based methods tend to consider only the explicit

citation relationship Most citation-based methods focus on the ranking of journals Citation patterns are different between journals and conferences

Goal: 1. Enhance citation with other types of meta data

Explore latent citation relationships

2. Define new metrics that doesn’t differentiate journals from conferences

A unified framework to evaluate diverse publication venues

Ranking as a Top-K Problem

It is difficult to rank venues in total order

In practice, people are more interested in the question: “What are the top-k venues in the field f ?”

The question can be answered, if two sub-questions can be answered S1: What is the set of good articles, Seedp ? S2: What are the top-k venues that are most similar in their

qualities to Seedp?

Evaluate a Venue

Goodness of a venue = the sum of the goodness of articles in it

( sum avg, max, etc. ) E.g., a venue a is “better” than a venue b if a has

more “good” articles than b has

Can adopt various definition of the goodness of an article

S1: What is the set of good articles, Seedp ?S2: What are the top-k venues that are most similar in their qualities to Seedp?

Sub-question S1

Find the initial collection of good articles Seedp

Possible Solutions: User provides the seed Use accumulated citation count information

Hypothesis 1:There are a number of good articles in each subject field that most people agree on (denoted as Seedp).

S1: What is the set of good articles, Seedp ?

Sub-question S2

Two types of Solutions to S2:

1. Seed-based measures by using author name meta data Easier to extract Cleaner and more trustable

2. Browsing-based measure Model readers’ paper browsing behavior Rank venues from readers’ perspective

S2: What are the top-k venues that are most similar in their qualities to Seedp?

1. Seed-based measure

Goodness of an article: Naïve: good if authored by SeedA

Fair: better if authored by more SeedA

Unfair: better if by more “productive” SeedA (unfair to amateur authors)

Hypothesis 2:Authors of seed articles, Seedp, are authoritative authors

(denoted as SeedA) and are likely to produce good quality articles.

2. Browsing-based measure Model readers’ article browsing behavior

Reading an interesting article pj, how to find the next article to read? 1. Pick one from reference2. Pick another one by the same author of pj

3. Higher probability to pick one by SeedA in both cases

The article browsing model

Experiment Setup

In order to measure the performance of metrics, we need A baseline method to compare with

ISI impact factor, IFx (2003) = A / B A: # of times that articles published in 2001-2002 were cited

in indexed journal during 2003 B: total # of articles published in 2001-2002

Clean data set to work with – DBLP-ACM clean dataset Link DBLP and ACM using titles (ISBN if available) Remove conflicting authors and venues Hand-picked database-related publication venues

Venue ranking results 1

Ranking results (Seed = VLDB conference)

Venue ranking results 2

Ranking results (Seed = SIGMOD conference)

Significant test against IF’

Significant test against the modified IF measure

( α= 0.01, ρP = 0.354, t = 2.396)

( 56 venues, t test and using Pearson’s critical value)

H0: There is no strong positive rank order relationship between the naïve/fair/unfair seed-based/browsing-based measure result and the impact factored result.

Conclusion

It is possible to evaluate venues with easier-to-get meta data, such as author names

Citation analysis can be enhanced by using other types of meta data

Conferences are becoming more important and should be treated as such

Fundamental differences in citation patterns may exist between journals and conferences

Define new metrics that provide unified framework to evaluate diverse publication venues

THANK YOU!

Su Yan, Dongwon Lee, Toward Alternative Measures for Ranking Venues: A Case of Database Research Community, ACM/IEEE-CS Joint Conference on Digital libraries (JCDL) , 2007, 235 - 244

Venue ranking results 3

Venue ranking by IF measures

Venue ranking by IF measure Venue ranking by modified IF measure

Significant test against EIC ranking Estimated Impact of Conference (EIC)

http://www.cs-conference-ranking.org/

Top 20 venues by the EIC measure

Significant test against the EIC ranking

( critical value of Spearman’s rank correlation coefficient, ρs = 0.534 for 20 pairs )

Significant test against IF’

Significant test against the IF measure

( α= 0.01, ρP = 0.354, t = 2.396)

( 56 venues, t test and using Pearson’s critical value)

H0: There is no strong positive rank order relationship between the naïve/fair/unfair seed-based/browsing-based measure result and the impact factored result.