PAGERANK-RELATED METHODS FOR ANALYZING...

Post on 09-Jul-2020

3 views 0 download

Transcript of PAGERANK-RELATED METHODS FOR ANALYZING...

PAGERANK-RELATED METHODSFOR ANALYZING CITATION NETWORKS

Author: Ludo Waltman and Erjia YanPresenter: Erjia Yan

Boğaziçi University, IstanbulISSI, June 29

• Objectives– understandings of PageRank– applications of PageRank in informetric research– tutorial: extracting journal citation networks

through bibliographic data– tutorial: computing PageRank for journals in

journal citation networks using Sci2 and MATLAB

Objectives | 2

NON-RECURSIVE

• journal impact factor• h-index• accumulative number of

citations• accumulative number of

publications• …

RECURSIVE

• PageRank and its variants– AuthorRank (Liu et al., 2005)– Y-factor (Bollen et al., 2006)– CiteRank (Walker et al., 2007)– FutureRank (Sayyadi &

Getoor, 2009)– Eigenfactor (Bergstrom &

West, 2008)– SCImago (SCImago, 2007)– weighted PageRank (Ding,

2011; Yan & Ding, 2011)– …

A comparison | 3

NON-RECURSIVE RECURSIVE

A comparison | 4

• Observations– non-recursive methods take into account only the local

structure of a citation network; thus, a citation originating from Nature or Science has the same weight as a citation originating from some obscure journals

• Motivations– using recursive methods to take into account the global

structure of a citation network such that citations originating from highly cited nodes are given more weight than those originating from lowly cited nodes

Observations and motivations | 5

• Basics of PageRank– the concept was first proposed by Pinski and Narin in 1976

(influence weight); PageRank was introduced as a method for ranking web pages by Brin and Page in 1998

• Formulation

– where α denotes the damping factor parameter, Bi denotes the set of all web pages that link to web page i, mj denotes the number of web pages to which web page j links, and ndenotes the total number of web pages to be ranked.

Basics of PageRank | 6

nmp

piBj j

ji

1)1(

• In other words…– the larger the number of web pages that link to web page i,

the higher the PageRank value of web page i– the higher the PageRank values of the web pages that link

to web page i, the higher the PageRank value of web page i– for those web pages that link to web page i, the smaller the

number of other web pages to which these web pages link, the higher the PageRank value of web page i

– the closer the damping factor parameter α is set to 1, the stronger the above effects

PageRank meanings | 7

• On the damping factor– 1: PageRank won’t converge– just below 1 (e.g., 0.9999): extremely sensitive to small

changes in the network of links– 0.5: according to Chen et al. (2007), 0.5 is preferred for

citation networks based on the assumption that authors on average will browse as far as two degrees of references (references and references’ cited references, thus 1-1/2=0.5)

– 0.85: the default (coincide the “six degrees of separation”: 1-1/60.85)

Damping factor | 8

• Applications– Analyzing journal citation networks

• Y-factor; Eigenfactor; SCImago Journal Rank (SJR)

– Analyzing author citation networks• SARA (science author rank algorithm)

– Analyzing document citation networks• CiteRank

Applications | 9

TUTORIALS

Tutorials | 10

• Tools we need– Sci2: https://sci2.cns.iu.edu/user/index.php – Sci2 plugins:

http://wiki.cns.iu.edu/display/SCI2TUTORIAL/3.2+Additional+Plugins

– MATLAB or Octave: http://www.gnu.org/software/octave/

• Data materials– http://www.pages.drexel.edu/~ey86/p/tutorial/

Tools and materials | 11

Steps 1-5 | 12

• Step 6: merge individually downloaded files– on Windows systems, a command such as copy *.txt

merged_data.txt can be entered in the Command Prompt tool

– in the resulting file, make sure to remove all lines ‘FN Thomson Reuters Web of Knowledge VR 1.0’ except for the first one and all lines ‘EF’ except for the last one

• Step 7: change file extension– change the extension of the text file that contains your

bibliographic data from .txt into .isi.

Steps 6-7 | 13

Steps 8-9 | 14

Steps 10-12 | 15

Step 13 | 16

Steps 14-19 | 17

Step 19 | 18

function p = calc_PageRank(C, alpha, n_iterations)

% Take care of dangling nodes.

m = sum(C, 2);

C(m == 0, :) = 1;

% Create a row-normalized matrix.

n = length(C);

m = sum(C, 2);

C = spdiags(1 ./ m, 0, n, n) * C;

% Apply the power method.

p = repmat(1 / n, [1 n]);

for i = 1:n_iterations

p = alpha * p * C + (1 - alpha) / n;

end

Steps 20-21 | 19

The resulted PageRank scores for the journals

• Author and document citation networks and PageRank calculations can be obtained through extracting proper networks in Sci2

Other citation network types | 20

• Questions?

• Any further questions can be directed to:– Erjia Yan ey86@drexel.edu or– Ludo Waltman waltmanlr@cwts.leidenuniv.nl

Thank you | 21