Su yan venueranking

download Su yan venueranking

of 24

  • date post

    04-Dec-2014
  • Category

    Documents

  • view

    575
  • download

    4

Embed Size (px)

description

 

Transcript of Su yan venueranking

  • 1. Toward Alternative Measures for Ranking Venues: A Caseof Database Research Community
    Su Yan - IBM Almaden Research Lab
    Dongwon Lee- The Pennsylvania State University
    Su Yan, Dongwon Lee, Toward Alternative Measures for Ranking Venues: A Case of Database ResearchCommunity, ACM/IEEE-CS Joint Conference on Digital libraries (JCDL) , 2007, 235 - 244

2. Introduction
Publication venue ranking
How good is a journal X?
Is a conference X better than Y?
Publication venue ranking is often closely related with important issues :
Evaluating the contribution of individual scholars/research groups
Subscription decision making in libraries
3. Motivation 1/2 Citation free?
Do we have to use citation analysis in venue ranking?
Various meta information exist
Citation meta data are harder to extract and parse, and contain more errors
Meta data like author names also convey important information
Goal: Defining metrics without using citation analysis
Easy to implement
Meta data quality is trustworthy enable large scale venue ranking by using automatically extracted clean meta data
4. Motivation 2/2 Enhance citation
Existing citation-based methods tend to consider only the explicit citation relationship
Decision of making references depends on many factors
Most citation-based methods focus on the ranking of journals
Conferences are becoming more important, e.g. in CS
Citation patterns are different between journals and conferences
Directly apply existing journal analysis methodology to conferences may not be appropriate
5. Conferences are being Important in CS
http://www.cra.org/resources/crn-online-view/reinvigorating_the_field/
NRC ranks all the Ph.D. programs in US recently
For computer science
CRA convinced NRC to adopt new sources
Originally relying on citations from the Thomson-Reuters Web of Science database
now count publications in journals and conferences based on CVs submitted by faculty members
6. Motivation 2/2 Enhance citation
Existing citation-based methods tend to consider only the explicit citation relationship
Decision of making references depends on many factors
Most citation-based methods focus on the ranking of journals
Conferences are becoming more important, e.g. in CS
Citation patterns are different between journals and conferences
Directly apply existing journal analysis methodology to conferences may not be appropriate
7. Yearly distribution of # of ACM TODS vs. SIGMOD papers being cited in 2002
8. Motivation 2/2 Enhance citation
Existing citation-based methods tend to consider only the explicit citation relationship
Most citation-based methods focus on the ranking of journals
Citation patterns are different between journals and conferences
Goal:
Enhance citation with other types of meta data
Explore latent citation relationships
Define new metricsthat doesnt differentiate journals from conferences
A unified framework to evaluate diverse publication venues
9. Ranking as a Top-K Problem
It is difficult to rank venues in total order
In practice, people are more interested in the question:
What are the top-k venues in the field f ?
The question can be answered, if two sub-questions can be answered
S1: What is the set of good articles, Seedp ?
S2: What are the top-k venues that are most similar in their qualities to Seedp?
10. Evaluate a Venue
S1: What is the set of good articles, Seedp ?
S2: What are the top-k venues that are most similar in their qualities to Seedp?
Goodness of a venue
= the sum of the goodness of articles in it
( sum avg, max, etc. )
E.g., a venue a is better than a venue b if a has more good articles than b has
Can adopt various definition of the goodness of an article
11. Sub-question S1
S1: What is the set of good articles, Seedp ?
Find the initial collection of good articles Seedp
Possible Solutions:
User provides the seed
Use accumulated citation count information
Hypothesis 1:
There are a number of good articles in each subject field that most people agree on (denoted as Seedp).
12. Sub-question S2
S2: What are the top-k venues that are most similar in their qualities to Seedp?
Two types of Solutions to S2:
Seed-based measures by using author name meta data
Easier to extract
Cleaner and more trustable
Browsing-based measure
Model readers paper browsing behavior
Rank venues from readers perspective
13. 1. Seed-based measure
Hypothesis 2:
Authors of seed articles, Seedp, are authoritative authors (denoted as SeedA) and are likely to produce good quality articles.
Goodness of an article:
Nave: good if authored by SeedA
Fair: better if authored by more SeedA
Unfair: better if by more productive SeedA(unfair to amateur authors)
14. 2. Browsing-based measure
Model readers article browsing behavior
Reading an interesting article pj, how to find the next article to read?
Pick one from reference
Pick another one by the same author of pj
Higher probability to pick one by SeedA in both cases
The article browsing model
15. Experiment Setup
In order to measure the performance of metrics, we need
A baseline method to compare with
ISI impact factor, IFx(2003) = A / B
A:# of times that articles published in 2001-2002 were cited in indexed journal during 2003
B: total # of articles published in 2001-2002
Clean data set to work with DBLP-ACM clean dataset
Link DBLP and ACM using titles (ISBN if available)
Remove conflicting authors and venues
Hand-picked database-related publication venues
16. Venue ranking results 1
Ranking results (Seed = VLDB conference)
17. Venue ranking results 2
Ranking results (Seed = SIGMOD conference)
18. Significant test against IF
H0: There is no strong positive rank order relationship between the nave/fair/unfair seed-based/browsing-based measure result and the impact factored result.
Significant test against the modified IF measure
( = 0.01, P = 0.354,t = 2.396)
( 56 venues, t test and using Pearsons critical value)
19. Conclusion
It is possible to evaluate venues with easier-to-get meta data, such as author names
Citation analysis can be enhanced by using other types of meta data
Conferences are becoming more important and should be treated as such
Fundamental differences in citation patterns may exist between journals and conferences
Define new metrics that provide unified framework to evaluate diverse publication venues
20. Thank You!
Su Yan, Dongwon Lee, Toward Alternative Measures for Ranking Venues: A Case of Database ResearchCommunity, ACM/IEEE-CS Joint Conference on Digital libraries (JCDL) , 2007, 235 - 244
21. Venue ranking results 3
22. Venue ranking by IF measures
Venue ranking by IF measure
Venue ranking by modified IF measure
23. Significant test against EIC ranking
Estimated Impact of Conference (EIC)
http://www.cs-conference-ranking.org/
Significant test against the EIC ranking
( critical value of Spearmans rank correlation coefficient, s = 0.534 for 20 pairs )
Top 20 venues by the EIC measure
24. Significant test against IF
H0: There is no strong positive rank order relationship between the nave/fair/unfair seed-based/browsing-based measure result and the impact factored result.
Significant test against the IF measure
( = 0.01, P = 0.354,t = 2.396)
( 56 venues, t test and using Pearsons critical value)