Writing The Encyclopedia Of Life (not EoL.org)

58
Vincent S. Smith Writing the Encyclopedia of Life

Transcript of Writing The Encyclopedia Of Life (not EoL.org)

Page 1: Writing The Encyclopedia Of Life (not EoL.org)

Vincent S. Smith

Writing theEncyclopediaof Life

Page 2: Writing The Encyclopedia Of Life (not EoL.org)

Background 1The big picture of biodiversity research

Goal…• Inventory the Earth’s species• Document their relationships• Publish & apply these data

Data set…• 1.8M described species (10M names)

• 300M pages (over last 250 years)

• 1.5-3B specimens

People…• 4-6,000 scientists• 30-40,000 amateurs• Many more citizen scientists?

Page 3: Writing The Encyclopedia Of Life (not EoL.org)

Background 2The process of biodiversity research

Parochial…• Specialised• Experts• Fragmented & distributed

Methodological…• Communities of practice• Hard to record & update• High output but low impact

Different…• Data• Interpretations• Methods How do we integrate the BIG with the small?

Page 4: Writing The Encyclopedia Of Life (not EoL.org)

250 yr progress report• Up to 87% of life on Earth is still undescribed

• 6% of biodiversity scientists cover 80% of the worlds biodiversity

• At present rates most species will be extinct long before we describe them

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

250 yrs 1000 yrs!!!

?1758 2008 3008

Bacteria9021 Spp

Archaebacteria259 Spp.

Plants260k spp.

Animals1.18 M spp.

Other193k spp.

Fungi101k

250 year and counting!

The story so far…

Page 5: Writing The Encyclopedia Of Life (not EoL.org)

Bacteria9021 Spp

Archaebacteria259 Spp.

Plants260k spp.

Animals1.18 M spp.

Other193k spp.

Fungi101k

1.8 million species

Taxonomic effort

Page 6: Writing The Encyclopedia Of Life (not EoL.org)

Crusta-ceans

39k

Birds 10kReptiles 7.1kMammals 5kAmphib.5k

Sponges 10kCnidarians 9k

Rotifers 1.8kFlatworms 13.7k

Insects0.82 M spp.

Molluscs117 k

Fish 25k

Bacteria9021 Spp

Archaebacteria259 Spp.

Plants260k spp.

Animals1.18 M spp.

Other193k spp.

Fungi101k

1.8 million species

Taxonomic effort

Page 7: Writing The Encyclopedia Of Life (not EoL.org)

Crusta-ceans

39k

Birds 10kReptiles 7.1kMammals 5kAmphib.5k

Sponges 10kCnidarians 9k

Rotifers 1.8kFlatworms 13.7k

Insects0.82 M spp.

Molluscs117 k

Fish 25k

Bacteria9021 Spp

Archaebacteria259 Spp.

Plants260k spp.

Animals1.18 M spp.

Other193k spp.

Fungi101k

Beetles370k spp.

Flies85k spp.

Butterflies & moths165k spp.

Bees, wasps & ants198k spp.

0.01 papers per species per yeari.e 1 paper every 100 years

Birds: 1 paper per species per yr.Mammals: 2 papers per species per yr.

Elephants: 47 papers per species per yr.

1.8 million species

Taxonomic effort

Page 8: Writing The Encyclopedia Of Life (not EoL.org)

1,000’s of journals addressinga common set of questions

What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?

DATA

“Paper minds”Traditional publication

Page 9: Writing The Encyclopedia Of Life (not EoL.org)

1,000’s of journals addressinga common set of questions

Mol. Phyl. Evol.21,964 pp. since 2000

Menopon gallinaeNumidicola antennatusAmyrsidea ventralisSomaphantus lusiusMenacanthus stramineusColimenopon urocoliusTrinoton anserinumMeromenopon meropisGruimenopon longumHoazineus armiferusCopocephalum zebraComatomenopon elbeli/elongatumPsittacomenopon poicephalusOdoriphila clayae/phoeniculiArdeiphilus trochioxusCuculiphilus fasciatusCiconiphilus quadripustulatusEomenopon denticulatumPiagetiella bursaepelecaniOsborniella crotophagaeHohorstiella lataNeomenopon pteroclurusMachaerilaemus laticorpus/latifronsAustromenopon crocatumEidmanniella pellucidaHolomenopon brevithoracicumDennyus hirundinisMyrsidea victrixAncistrona vagelliPseudomenopon pilosumBonomiella columbaeChapinia robustaPlegadiphilus threskiornisActornithophilus uniseriatusMEGAMENOPONRediella mirabilis

Latumcephalum lesouefi/macropusParaboopia flavaParaheterodoxus insignisBoopia tarsataTherodoxus oweniLaemobothrion maximumRicinus fringillaeTrochiliphagus abdominalisTrochiloecetes rupununiLiposcelis bostrychophilus

What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?

“Paper minds”Traditional publication

Page 10: Writing The Encyclopedia Of Life (not EoL.org)

1,000’s of journals addressinga common set of questions

What is a species? How many species are there? Where are species distributed? How have species distributions changed? How are species related? How have species characters changed? To what extent is are species relationships predictive?

“Species Name”The universal linker

RAW DATA > Logically interconnectedbut presently fragmented by the publication process

Other problems…• Time & money• Audience mismatch• Findability & reusability

“Paper minds”Traditional publication

Page 11: Writing The Encyclopedia Of Life (not EoL.org)

Looking within a paperData mining publications

2. Extract text (OCR)

3. Find keywords

1. Scan

- Taxonomic names- Author names- Citations- Collection data- Morphological data- Descriptions- Identification keys- Illustrations- Photographs

Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60.

Page 12: Writing The Encyclopedia Of Life (not EoL.org)

2. Extract text (OCR)

3. Find keywords

1. Scan

- Taxonomic names- Author names- Citations- Collection data- Morphological data- Descriptions- Identification keys- Illustrations- Photographs

4. Index5. Annotate online

Palma, R.L., and R.L.C. Pilgrim. 2002. A revision of the genus Naubates (Insecta: Phthiraptera: Philopteridae). J. R. Soc. N.Z. 32:7-60.

Looking within a paperData mining publications

Page 13: Writing The Encyclopedia Of Life (not EoL.org)

How do we bring this all together?

“Publications” Specimens

• Technical issues• Social issues• Needs to scale (web)• Needs to be sustainable

People

?

Page 14: Writing The Encyclopedia Of Life (not EoL.org)

Technical issues 1Data standards

• TDWG (since 1986)• GBIF• Bridging computer science & biology• Its not science!

“Standards” can mean many things:• Data exchange standards (e.g. Darwin Core)• Common restricted vocabularies (Sp.2000 classification)• Programming standards• Data quality

Page 15: Writing The Encyclopedia Of Life (not EoL.org)

Technical issues 2Platforms

• Generic databases with custom interfaces (MySQL, Oracle)(e.g. Species 2000, IPNI)

• Bespoke (usually commercial) databases(e.g. KeEMU, Biota)

• Content Management Systems & blogging platforms such as Drupal, Plone, Wordpress etc

(e.g. EOL’s LifeDesks, GBIF websites)• Wikis such as Mediawiki, Semantic Mediawiki

(e.g. Wikipedia, iTaxon)

QuickTime™ and a decompressor

are needed to see this picture.

Page 16: Writing The Encyclopedia Of Life (not EoL.org)

QuickTime™ and a decompressor

are needed to see this picture.

Technology moves fast!

Technical issues 2Platforms

Page 17: Writing The Encyclopedia Of Life (not EoL.org)

Technical issues 2Platforms - common design considerations

Need scalable and flexible platforms that support:

1) large numbers of users as passive readers and active contributors2) editorial hierarchies serving individual and community needs3) the epistemological richness and diversity of all contributors4) flexible data models that can be modified or added by contributors5) automated integration of third party content6) automated semantic enrichment of contributed and 3rd party content7) content workflows and curation tools8) content archival and citation9) content licensing and a conditions of use framework10) web services11) ease of use

Page 18: Writing The Encyclopedia Of Life (not EoL.org)

Technical issues 3Web services (integration hacks)

Module Name Description and API Searches the Biodiversity Heritage Library for printed pages held within their archives that have a reference to a specific taxon name. bhl API: http://www.biodiversitylibrary.org/Tools.aspx Searching the Flickr image database for pictures that have taxon name metadata associated with them. flickr API: http://www.flickr.com/services/api/ Displays maps of the world that geolocate biological occurrence records from the GBIF database. gbifmap API: http://ispecies.blogspot.com/2007/08/maps-and-google-tweak.html Searching the morphbank image database for pictures that have taxon name metadata associated with them. morphbank API: http://services.morphbank.net/mb Searches the NCBI database for nucleotide sequences, protein sequences and related links. ncbi API: http://www.programmableweb.com/api/ncbi-entrez Displays the initial section of a Wikipedia article for the taxon name, if the page exists. wikipedia API: http://en.wikipedia.org/wiki/Special:Export/

Similar to Flickr, but for Yahoo! Images. yahooimages API: http://developer.yahoo.com/search/image/V1/imageSearch.html

Page 19: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 1The community

• Taxonomy as a team sport (Community size and the community of one)• Networking effects (quality, multi-disciplinarity and utility of data)• The rise and rise of the “amateur”• Cost of professionals• Top down and bottom up organization (how to partition the community)• Bottom up benefits, low transaction costs (social information flows, motivation and relations self organize the group)• Support epistemological richness• Collaborative output, peer review, credit (incentives)

Page 20: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 2Nationalism / Politics

• Convention on Biological Diversity, 1992• Biodiversity does not respect national boundaries• Biodiversity questions do not respect national boundaries• Funding is (usually) national / regional• Benefits are expected to be national• Often don’t match the questions we want to address• Politics amongst researchers and institutions (e.g. EDIT and Lifewatch)• Good politicians and not always good scientists

Page 21: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 3Incentives

• Article citation (most common method of peer recognition)• Influences authors employment, reputation and research opportunities• Traditional metrics of scholarly activity (no. papers, impact factor, H-Index)• Taxonomy is not usually high impact, but has a long half life• High cost of traditional publication (unaffordable to authors & libraries)• Lessons from Zootaxa (low cost, high volume) and Wikipedia (highly linked)

Page 22: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 4Licensing

• Mickey mouse, copyright and 1923• Copyright transfer agreements• As of 2009 half of all taxonomic treatments are in copyright

Publications on ants

Page 23: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 4Licensing

• Who owns your work (your employer?)• Branding and credit• Creative Commons• Open Access• Open Science (making science more accountable)

Page 24: Writing The Encyclopedia Of Life (not EoL.org)

Social issues 5Human Computer Interactions

Page 25: Writing The Encyclopedia Of Life (not EoL.org)

Technical solutions & social modelsCurrent options for writing the Encyclopedia

of Life

1) “New” scholarly publishing (semantic enrichment of publications)2) One database to rule them all - the Common Data Model (CDM)3) EOL.org, ToL.org & related initiatives4) Wikipedia / Wikispecies5) Scratchpads / LifeDesks

Page 26: Writing The Encyclopedia Of Life (not EoL.org)

Encyclopedia of Life (EOL)“A web page for every species”

http://www.eol.org/

• A web page for all 1.8M species

• Multi-institution collaboration

• $50m funding (5 years)- MacArthur and Sloan Foundations

• Megascience mashup- Aggregating data from the web

• Multiple audiences- Science & outreach

• 10 years to complete- First draft 2008, “finished” 2017!

Page 27: Writing The Encyclopedia Of Life (not EoL.org)
Page 28: Writing The Encyclopedia Of Life (not EoL.org)

Encyclopedia of Life (EOL)“A web page for every species”

• Huge interest- 11.5 million hits in first 5 hours

- 500+ press articles

- Pages unavailable for first two days!

• First draft 27 Feb. 2008 - 24 “exemplar” pages

- 30,000 detailed pages (fish & amphib.)

- 1 million “stubs” (names & links)

- Growth (needs 1,000 spp. per day)

• Much praise but growing criticism

- Quality vs. quantity of information- Authoritative “vetting” process- Credit for “authors”

• Eight more years to go- Get more content online- Better tools to engage more people

Page 29: Writing The Encyclopedia Of Life (not EoL.org)

What is a Scratchpad?A website for you & your community

Your data1

Published & reviewedon your site

3Uploaded &

tagged

2

Page 30: Writing The Encyclopedia Of Life (not EoL.org)

Your data1

Published & reviewedon your site

3Uploaded &

tagged

2

Fast Intuitive Fit for use

What is a Scratchpad?A website for you & your community

Page 31: Writing The Encyclopedia Of Life (not EoL.org)

What can Scratchpads do?Import, manage, search & browse:

DNA & Phylogenies

Specimens

Literature Images

Page 32: Writing The Encyclopedia Of Life (not EoL.org)

DNA & Phylogenies

Specimens

Literature ImagesTaxonomy

What can Scratchpads do?Integration & connectivity within & between sites

Page 33: Writing The Encyclopedia Of Life (not EoL.org)

+Administration -Change your site information -Change you front page -Change your logo -Activity and access logs+Backup -Backing up your data -Restoring your data+Bibliography -Creating a record -Importing from a ref. manager -Exporting to a reference manager+Blog -Creating and adding a blog+Custom Content -Defining a CCK -Importing from a spreadsheet -Creating a custom view+Fileshare -Creating and using a fileshare+Forum -Altering the forum settings -Creating a container for a forum -Creating a new forum -Creating a new topic inside a forum

+Groups -Creating a group -Subscribing to a group+Image -Uploading & basic annotation -Linking image & location records -Linking image & specimen records -Linking image & publication records -Overlay annotations on images+Layout -Change your theme -Menus -Blocks and sidebars+Locations -Creating a record -Importing from a spreadsheet+Pages -Creating, editing, cloning & deleting -Configuring the panels template+Panels -Adding & configuring content -Creating a new panel -Citing a Panels page+Phylogeny -Adding a phylogenetic tree

+Specimens -Creating a record -Importing from a spreadsheet -Linking specimen & location records -Linking specimen & pub. records+Tasks -Creating a tasklist+Taxonomy -Importing from a spreadsheet -Importing from ClassificationBank -Starting from scratch -Taxonomy manager -Displaying a classification -Adding names -Deleting names -Taxonomy & panels+Users -Your settings -Adding a new user -User roles and permissions -Adding and editing user profile fields -Logging in+Webform -Creating and using webforms

What can Scratchpads do?In summary:

Page 34: Writing The Encyclopedia Of Life (not EoL.org)

What can Scratchpads do?Visual taskguide

Page 35: Writing The Encyclopedia Of Life (not EoL.org)

Current ScratchpadsAntsBeesBeetlesBig-headed fliesBirdsBlackfliesCiliatesCockroachesDragon TreesDung BeetlesFalse ButtonweedFlat wormsFliesForaminiferaFossil InsectsFungus GnatsHolometabolaLeaf-miner FliesLiceLichens of BermudaMalvaceaeMegalastrum fernsMilichiid fliesMosquitoesMossesNannotax fossilsNepticuloid mothsPalmsPearl oystersPolychaete wormsScaleworms

TermitesTriticid grassesWeevilsWood Ferns

Sulawesi FernsStick insects

Sites: 130+Users: 1500+Pages: 170kSince March 2007

Page 36: Writing The Encyclopedia Of Life (not EoL.org)

Scratchpad applicationsA multipurpose, flexible technology

4th Edition Howard & Moore, Birds of the world(fact checking, data compilation, 2010, funding)

eBooks

Page 37: Writing The Encyclopedia Of Life (not EoL.org)

European Mosquito Bulletin (ISSN 1460-6127), Phasmid Studies (ISSN 0966-0011)(submission, review, & dissemination of articles)

eJournals

Scratchpad applicationsA multipurpose, flexible technology

Page 38: Writing The Encyclopedia Of Life (not EoL.org)

Image galleriesNanno fossils, Cockroaches, Stick insects, Flatworms, Grasses, Lichens & many more…

(rapid upload, annotation, & display of images)

Scratchpad applicationsA multipurpose, flexible technology

Page 39: Writing The Encyclopedia Of Life (not EoL.org)

ZOOTAXAA rapid international journal for animal taxonomistsISSN 1175-5326 (Print Edition) & ISSN 1175-5334 (Online Edition)

GBIF, Zootaxa, Threatened Plants of the World (Kew), BarCoVer (DNA Barcoding) & more (space for data collection, services, discussion, & organization)

Societies & Organizations

Scratchpad applicationsA multipurpose, flexible technology

Page 40: Writing The Encyclopedia Of Life (not EoL.org)

How do Scratchpads work?Getting a Scratchpad

• Biological focus• Agree to T&C’s (click-thru) • CC license “by-nc-sa”

Requirements

• Maintainer• Scope/Mission/API Keys• (Sub)domain name

Application

Content• Unrestricted (overlapping)• No branding (focus on authors)• Value added

http://scratchpads.eu/apply

Page 41: Writing The Encyclopedia Of Life (not EoL.org)

Using a Scratchpad

• User categories (maintainer, ed. contrib.)• Public / private content (flexible groups)• Admin. page (site settings & behavior)

Management

• Content types (biblio, maps, “page” etc)• Forms, managers, Excel, EndNote etc• Custom content (add or extend data types)

Data Input

Tagging (indexing)• Taxonomy terms (2M +)• Multiple classifications• Auto-tagging

How do Scratchpads work?

Page 42: Writing The Encyclopedia Of Life (not EoL.org)

AutotaggingIndexing data to make it findable

1. Create content

2. Find terms

3. Submit(Index)

(Autotag)

(e.g. reference)

Journal citation mentions taxon name

Page 43: Writing The Encyclopedia Of Life (not EoL.org)

1. Create content

2. Find terms

3. Submit(Index)

(Autotag)

(e.g. reference)

Matches taxonomy term (Drag & Drop)

AutotaggingIndexing data to make it findable

Page 44: Writing The Encyclopedia Of Life (not EoL.org)

1. Create content

2. Find terms

3. Submit(Index)

(Autotag)

(e.g. reference)

Page tagged (indexed) with taxon name

AutotaggingIndexing data to make it findable

Page 45: Writing The Encyclopedia Of Life (not EoL.org)

Indexing data to make it findable

How do Scratchpads work?

• Tagged data can bepresented differently

• For example as part ofa traditional bibliography

• Or as small windows or “panels” of data

Page 46: Writing The Encyclopedia Of Life (not EoL.org)

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Taxonomic hierarchies

Files and documents

Phylogenetic trees

Customized content

Specimen records

Photographs & illustrations

Personalized instructionsCommon

namesBibliographic

literature

Types of Scratchpad Panel…Built with “tagged data”

Page 47: Writing The Encyclopedia Of Life (not EoL.org)

Dynamically built species pages

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 48: Writing The Encyclopedia Of Life (not EoL.org)

Browsed through a taxonomy

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 49: Writing The Encyclopedia Of Life (not EoL.org)

Including 3rd party content

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 50: Writing The Encyclopedia Of Life (not EoL.org)

With data curation toolsWith data curation tools

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 51: Writing The Encyclopedia Of Life (not EoL.org)

Listing all “authors”

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 52: Writing The Encyclopedia Of Life (not EoL.org)

Dated, permanent & citable

Integrating data & “publishing” in a Scratchpad

How do Scratchpads work?

Page 53: Writing The Encyclopedia Of Life (not EoL.org)

Choose which panels to display

Adjusting the panels layout

How do Scratchpads work?

Page 54: Writing The Encyclopedia Of Life (not EoL.org)

An example based on the Catalogue of Life classification

How do Scratchpads work?

2 million taxon pagesOpen curation at http://catlife.myspecies.info

Page 55: Writing The Encyclopedia Of Life (not EoL.org)

Questions?

Page 56: Writing The Encyclopedia Of Life (not EoL.org)
Page 57: Writing The Encyclopedia Of Life (not EoL.org)

Scratchpad managementScalable & sustainable technology

Virtual machine, open-source software, self-archiving, backed-up, multi-site configuration(easy to move & upgrade, secure & reliable, citable, screencasts, low admin., low marginal costs)

Hardware, software & user support

Page 58: Writing The Encyclopedia Of Life (not EoL.org)