Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group...

40
Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK http://cybermetrics.wlv.ac.uk/
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group...

Page 1: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Link analysis as a social science technique

Mike Thelwall

Statistical Cybermetrics Research Group

University of Wolverhampton, UK

http://cybermetrics.wlv.ac.uk/

Page 2: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Link Analysis Manifesto Links are:

A wonderful new source of information about relationships between people, organisations and information

An easy to collect data source But:

Results should be interpreted with care

Page 3: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Talk Structure Part 1: Academic link analysis –mainly

from an information science perspective Part 2: Software demonstration Part 3: A social science link analysis

methodology

Page 4: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Link Analysis: Motivation Individual hyperlinks reflect concrete creation

reasons such as connections between web page contents or creators

Counts of large numbers of hyperlinks may reflect wider underlying social processes

Links may reflect phenomena that have previously been difficult to study, opening up new research areas E.g. informal scholarly communication

Page 5: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Part 1: Academic Hyperlink Analysis To map patterns of communication between

researchers in a country based upon university web sites

Patterns of communication are also mapped based upon journal citations or journal title words Provides useful information about the structure and

evolution of research fields Can identify previously unknown field connections

Web analysis could illustrate wider and more current patterns

Page 6: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Data Collection Web crawler AltaVista advanced queries, e.g. Links from

Wolves Uni. to Oxford Uni.domain:wlv.ac.uk AND linkdomain:ox.ac.uk Google link queries

Find links to specific URLs, e.g. links to the Institute home page

link:www.oii.ox.ac.uk

Page 7: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
Page 8: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
Page 9: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Types of link count

Direct link counts Inter-site links only

Co-inlink counts B and C are co-inlinked

Co-outlink counts D and E are co-outlinked

B C

A D E

F

Page 10: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Alternative Document Models A method to ignore multiple similar links

E.g., domain ADM: count links between domains instead of pages

P1P2P3

P4P5P6

www.scit.wlv.ac.uk www.oii.ox.ac.uk

Page 11: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Some Inter-University Hyperlink Patterns

Mainly for the UK and Europe

Page 12: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Citation-Style Hyperlink Analysis Citation counts are known to be reasonable

indicators of research quality but is the same true for inlink counts? Counts of links to universities within a country can

correlate significantly with measures of research productivity

The significance of this result is in giving ‘permission’ to investigate the use of inter-university links for researching scholarly communication

Page 13: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Most links are only loosely related to research 90% of links between UK university sites have some

connection with scholarly activity, including teaching and research But less than 1% are equivalent to citations

So link counts do not measure research dissemination but are more a natural by-product of scholarly activity Cannot use link counts to assess research Can use link counts to track an aspect of communication

Page 14: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Links to UK universities against their research productivity

The reason for the strong correlation is the quantity of Web publication, not its quality

This is different to citation analysis

Page 15: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Universities tend to link to neighbours

Page 16: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Universitiesclustergeographically

Page 17: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Language is a factor in international interlinking

English the dominant language for Web sites in the Western EU

In a typical country, 50% of pages are in the national language(s) and 50% in English

Non-English speaking extensively interlink in English

{Research with Rong Tang & Liz Price}

Page 18: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Can map patterns of international communicationCounts of links between EU universities in Swedish are represented by arrow thickness.

Page 19: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Counts of links between EU universities in French are represented by arrow thickness.

Page 20: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Which language???

Page 21: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Which language???

Page 22: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Linking patterns vary enormously by discipline No evidence of a significant geographic trend Disciplinary differences in the extent of

interlinking: e.g., history Web use is very low, Chemistry is very high

Individual research projects can have an enormous impact upon individual departments E.g. Arts web sites are often for specific exhibitions

or for digital media projects Links not frequent enough to reliably reveal

patterns of interdiscipliniarity

Page 23: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

The next slide is a (Kamada-Kawai) network of the interlinking of the “top” 5 universities in AEAN countries (Asia and Europe) with arrows representing at least 100 links and universities not connected removed.

(Research with Han Woo Park)

Page 24: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK
Page 25: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Clustering using links

Page 26: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Background: Power laws in Academic Webs

Academic Webs have a topology dominated by power laws, including Counts of links to pages (inlink counts) Counts of links to pages (outlink counts) Groups of interconnected pages

Power laws mean that Link creation obeys the ‘rich get richer’ law “Communities” of pages or sites are rarely pure but

tend to multiply overlap

Page 27: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Page Outlinks

Page 28: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Topological component sizes: “pure link communities”

Page 29: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Community Identification Algorithm: “Impure communities”

Can apply to pages, directories and domains Gives complimentary results: a “layered

approach”

1

10

100

1000

10000

1 10 100 1000 10000

Community size: Directory model, k = 32

Freq

uenc

y

1

10

100

1000

10000

100000

1 10 100 1000 10000 100000

Community size: page model, k = 32

Freq

uenc

y

Page 30: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Stretching links further: co-inlinks, co-outlinks More interlinked does not imply more similar

For the UK academic Web, about 42% of domains connected by links alone host similar disciplines, and about 43% connected by links, co-inlinks and co-outlinks

Can use any type of link to look for similar sites Over 100 times more domains are co-inlinked or co-

outlinked than are directly linked Links in any form are less than 50% reliable as

indicators of subject similarity

Page 31: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Summary Studies of the relatively restricted

subdomain of university web sites Produce direct research results

For Web Information Retrieval (e.g. search engines), they also Help refine methodologies Help build intuition about web structure

Page 32: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Part 2: Software Demonstration SocSciBot

Web crawler for social sciences research

SocSciBot Tools Link analyser for SocSciBot data

Cyclist Search engine with some corpus linguistics capability

(e.g. word frequency lists for each site)

http://socscibot.wlv.ac.uk/

Page 33: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Part 3: A General Social Science Link Analysis Methodology A general framework for using link counts in

social sciences research For research into link creation or Together with other sources, for research into other

online or offline phenomena Applicable when there are enough links relevant

to the research question to count For collections of large web sites or For large collections of small web sites

Page 34: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Nine stages for a research project1. Formulate an appropriate research

question, taking into account existing knowledge of web structure

2. Conduct a pilot study

3. Identify web pages or sites that are appropriate to address the research question

Page 35: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Nine stages for a research project4. Collect link data from a commercial

search engine or a personal crawler, taking appropriate accuracy safeguards

5. Apply data cleansing techniques to the links, if possible, and select an appropriate counting method

6. Partially validate the link count results through correlation tests, if possible

Page 36: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Nine stages for a research project7. Partially validate the interpretation of the results

through a link classification exercise

8. Report results with an interpretation consistent with link classification exercise, including either a detailed description of the classification or exemplars to illustrate the categories

9. Report the limitations of the study and parameters used in data collection and processing

Page 37: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Interpreting link counts For most research, need to be able to place an

interpretation on link counts E.g. A links to B more than C, therefore… A is inlinked more than B therefore…

Do links ‘measure’ visibility, luminosity, authority, information exports/imports, communication, impact, online impact, quality, importance, interpersonal communication, nothing, random actions,…?

Page 38: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Interpreting link counts Classifying random samples of links can

help decide how to interpret them E.g. Links predominantly reflect…

Correlation test are also useful as a form of triangulation E.g. Links counts associate with…

Page 39: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

The theoretical perspective for link counting In order to be able to reliably interpret link

counts, all links should be created individually and independently, by humans, through equivalent gravity judgments (e.g., about the

quality of the information in the target page). Additionally, links to a site should target pages

created by the site owner or somebody else closely associated with the site.

Page 40: Link analysis as a social science technique Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK

Summary Link counts are an information source that

may reveal new insights into online and offline phenomena

Can be used in conjunction with other data sources to address many research questions

With existing tools, are relatively easy to use in research