Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Post on 23-Jan-2015

842 views 3 download

Tags:

description

An analysis of crowd-sourced "article" creation and user-generated metadata for a digital repository of biodiversity literature

Transcript of Crowd-sourcing the creation of "articles" within the Biodiversity Heritage Library

Crowd-sourcing the creation of “articles” within the Biodiversity

Heritage Library

Bianca Crowleycrowleyb@si.edu

Trish Rose-Sandlertrish.rose-sandler@mobot.org

The BHL is…

• A consortium of 13 natural history, botanical libraries and research institutions

• An open access digital library for legacy biodiversity literature.

• An open data repository of taxonomic names and bibliographic information

• An increasingly global effort

BHLLITA 2011

Problem: Books vs. ArticlesLibrarians manage books Users need articles

BHLLITA 2011

Solution: “Article-ization”

Creating articles manually, through the help of our users: BHL PDF Generator

Creating articles through automated means: BioStor http://biostor.org/issn/0006-324X

BHLLITA 2011

Page, R. (2011). Extracting scientific articles from a large digital archive: BioStor and the Biodiversity Heritage Library. BMC Bioinformatics, 12(187). Retrieved from

http://www.biomedcentral.com/1471-2105/12/187

LITA 2011 BHL

Create-your-own PDF

BHLLITA 2011

Citebank today: http://citebank.org

BHLLITA 2011

What is an “article” anyway?

BHLLITA 2011

the Good, the Bad, the Ugly

BHLLITA 2011

the Good, the Bad, the Ugly

BHLLITA 2011

the Good, the Bad, the Ugly

BHLLITA 2011

Questions for Data Analysis

• What is the quality, or accuracy, of user provided metadata?

• What kinds of content are users creating?

• How can we improve the PDF generator interface?

BHLLITA 2011

Stats

• Jan 2010-Apr 2011 – Approx 60,000 pdfs created from PDF

Generator– 40% of those (approx 24,000) were ingested

into CiteBank (PDFs without user-contributed metadata excluded)

• 5 reviewers analyzed 945 pdfs (approx 3.9% of the 24,000+ articles going into Citebank)

**Thanks to reviewers Gilbert Borrego, Grace Costantino, and Sue Graves from the Smithsonian Institution

BHLLITA 2011

Methodological approach

• Quantitative – numerical rating system

• Rated titles, authors, beg/end pages• Its “findability” within CiteBank

search often determined how it was rated

BHLLITA 2011

Ratings System

Title

• 1=has all characters in title letter for letter• 2=does not have all characters in title letter for

letter but still findable in CiteBank search • 3= does not have all characters in title letter for

letter and is NOT findable via the CiteBank search

LITA 2011 BHL

Ratings System

Author

• 1=has all characters in author(s) last name letter for letter

• 2=has at least one author’s last name spelled correctly

• 3=has no authors or none of the author’s last names are spelled correctly

LITA 2011 BHL

Ratings System

Article beginning & ending pages

• 1=has all text pages for an article, from start to end

• 2=subset of pages from a larger article • 3=a set of pages where the intellectual content

has been compromised.

LITA 2011 BHL

Analysis steps

LITA 2011

ResultsTitle average

1.68

Title average 1.68

Author(s) average 1.33

Beg/End pg average 1.41

Title & Author average 1.50

Overall average (combines first 3 above)

1.47

LITA 2011 BHL

What did we learn?

• Ratings were better than we expected

• Many users took the time to create decent metadata

• “good enough” is not great but is still “findable”

LITA 2011 BHL

BHL-Australia’s new portalhttp://bhl.ala.org.au/

there’s always room for improvement

Other factors

But of course…..

BHLLITA 2011

Changes we madefor UI so far

• Asking users if they want to contribute their article to CiteBank

• Making article title a required field and validating it so its at least 2 or more characters

•  Review button for users to review page selections and metadata (inspired by BHL-AUS)

• Reduced text and increased more intuitive graphics (inspired by BHL-AUS)

BHLLITA 2011

Brief survey of proposed changes

• Overwhelmingly positive response to proposed change

there’s always room for improvement

But of course…..

BHLLITA 2011

Success Factors

• Monitor the creation of the metadata to look at user behavior and patterns

• Engage with your users

• Incentivize your users

LITA 2011

@BioDivLibrary

/pages/Biodiversity-Heritage-Library/63547246565

/photos/biodivlibrary/sets/

/group/biodiversity-heritage-library

Bianca Crowleycrowleyb@si.edu

Trish Rose-Sandlertrish.rose-sandler@mobot.org

http://biodiversitylibrary.org

BHLLITA 2011