Post on 03-Jul-2015
Citation studiesin the humanitiesChris Alen SulaSchool of Information & Library SciencePratt Institute
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Matt Mil lerNYPL LabsNew York Public Library
Background‣ scholarly communication — the processes by which scholars
share their findings, both formally (e.g., articles) and informally (e.g., tweets, letters, blogs)
‣ bibliometrics — methods for analyzing citation behaviors
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ Bibliometrics is largely based on studies of scientific and technical corpora (Hérubel and Buchanan, 1994; Lamont, 2000), with relatively few studies in the humanities (cf. Ardanuy, 2013).
(Yan & Ding, 2012)
Citation networks
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Rosvall & Bergstrom (2007)
Bibliometrics & humanities: Why so l itt le?‣ lack of data (Linmans, 2010), especially for
‣ monographs (Hammarfelt, 2011), which still form the backbone of humanities work (Larivière, et. al., 2006)
‣ older sources, which humanists cite with greater frequency than scientists (Heinzkill, 1980)
‣ lack of citations, comparatively speaking
‣ Humanists cite each other less frequently than scientists (Heinzkill, 1980; Swales, 1990; Hellqvist, 2010).
‣ Multi-authored articles are rare (Price, 1966; Pao, 1981, 1982; Sievert and Sievert, 1989; Wiberly, 1989), around 1.06 authors per article from 1980–2007 (Linmans, 2010).
‣ Humanists do cite and co-author (Leydesdorff, Hammarfelt & Salah, 2011) and Dhers have done citation studies (Smith, 2009).
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ Humanities discourse differs from scientific discourse.
‣ more integral references, in which authors associate their own views with those they references (Swales, 1990; Hyland, 1999; Harwood, 2008)
‣ more negative references, which object to other authors’ claims (Meadows, 1974; Brooks, 1985; Cano, 1989).
‣ The mere fact that one humanist cites another says nothing about type or significance of their relationship.
‣ Understanding and tracking these these relationships would give us a richer, more nuanced view of the humanities. Part of that data can come from reference contexts, part from extra-citational information (mentions, likes, real-world relationships, etc.).
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Bibliometrics & humanities: Why so l itt le?
Reference context
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
(Chubin & Moitra, 1975)
(Frost, 1979)
‣ two example schema
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Code at http://github.com/thisismattmiller/dh2013-humanities-citation
Our tool: extraction
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ Layout recognition used to extract citations and surrounding context (usually 1–2 sentences)
Our tool: classif ication
‣ Naïve Bayes classifier using NLT
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
sample from positive training set‣extensively discussed by‣useful discussion‣indebted to‣groundbreaking work‣result confirms the hypothesis
sample from negative training set‣contra‣appears to overlook‣fail to account for‣problematic‣is unable to
Data & results
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
‣ articles sampled for this study
‣ results of citation tool applied to sample set
Polarity results by discipl ine
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Broader patterns?‣ citation frequency x polarity
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
Future directions
‣ further manual inspection of articles to determine the reliability of extraction and classification
‣ further training of the sentiment classifier on larger corpora
‣ measures of inter-rater reliability for classification
‣ support for more document layouts
‣ crowdsourced PDF analysis & classif ier training
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller
References‣ All references are available in the conference proceedings at
http://dh2013.unl.edu/abstracts/ab-353.html
‣ Additional references:
‣ Jordi Ardanuy (2013). "Sixty Years of Citation Analysis Studies in the Humanities (1951–2010)" Journal of the American Society for Information Science and Technology 64(8): 1751–1755.
‣ Erjia Yan and Ying Ding (2012). “Scholarly Network Similarities: How Bibliographic Coupling Networks, Citation Networks, Cocitation Networks, Topical Networks, Coauthorship Networks, and Coword Networks Relate to Each Other” Journal of the American Society for Information Science and Technology 63(7): 1313–1326.
‣ Code at http://github.com/thisismattmiller/dh2013-humanities-citation
#DH2013 / Citation studies in the humanities / @chrisalensula @thisismmiller