ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum,...

24
iCollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London

Transcript of ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum,...

Page 1: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

iCollections, Mass Digitisation of British & Irish Lepidoptera

Adrian Hine, Natural History Museum, London

Page 2: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

iCollections Background

• iCollections began March 2013 for 3 years, using 8 full time digitisers plus existing staff.

• Digitise the British Lepidoptera (Butterflies & Moths) ca. ½ million specimens (5000 drawers).

• Pilot project for mass digitsation of pinned insects.

• The main aim of digitisation is to capture the label data, not on the specimen image per se.

• Workflow for the Digital Collections Programme (DCP) – a Digital Museum.

Page 3: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Digitisation Benefits

• Three top-level themes:

• Research

• Collections

• Public engagement

• Have to choose carefully to maximise limited budget. British Lepidoptera ticks all these boxes!

Page 4: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Research

• Large powerful dataset (50% usable), temporal & spatial.

• Cimate change, distributional changes, migration, morphometrics.

• Occurance records to National Biodiversity Network.

Page 5: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Better Collections

• Better curation & preservation, access

Page 6: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Public Engagement

• Lepidoptera charismatic group, lot of public interest.

• Explain our science: Science Uncovered, Nature Live, TV, radio.

Page 7: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Data Workflow

• Data quality is at the heart of the digitisation process. We wish to control the quality of data going into EMu.

• Didn’t want to simply be pushing large quantities of unqualified data into EMu to have to deal with at a later stage.

• Consistent, systematic approach to data capture.

• Every stage of the digitisation process followed written protocols.

• Each specimen given a unique specimen number (Data Matrix barcode & human readable).

Page 8: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Data Workflow

• Opted for data capture outside EMu

– poor quality data in EMu makes databasing directly into EMu difficult (sites, taxonomy, parties).

– build a highly streamlined data entry interface for transcription phase.

– build harmonisation tools to control data going into EMu (reduce duplication).

• Developing a RDA for the future.

• Biggest challenge is harmonisation with existing data within EMu (taxonomy, sites, parties, specimens).

Page 9: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Digitisation Workflow

Transcription

Taxonomy Harmonisation

Import into EMu

Georeferencing

Imaging

Specimen Preparation Digitiser

Digitiser

Digitiser

Taxonomist

Georeferencer

Data Manager

Page 10: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Specimen Preparation

Page 11: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Imaging

Page 12: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Ingestion into Transcription Database• Script uses the application Barcodefiler to

search the image for a barcode. If one is found the script renames the image filename with the specimen number.

• It then creates a stub record in the rapid data capture system (SQL backend) with three core data fields;– specimen number (from barcode)

– drawer number (from folder name)

– taxon name (from folder name)

• Using ImageMagic libraries it creates a cropped label derivative image.

Page 13: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Transcription

Page 14: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Data Harmonisation

• Biggest challenge is how to harmonise data with existing EMu data.

• Wish to use appropriate records where they exist in EMu and not to create additional duplicates.

• Data concepts we wish to harmonise with EMu records;

• Taxonomy (determination)

• Parties (collectors)

• Locations (drawers)

• Data concepts to create as new

• Sites

Page 15: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Taxonomy Harmonisation

• EMu - Taxonomy still a mess! For UK butterflies, 1000’s of names. Duplicates, erroneous names, different combinations.

• Did not have the time to clean Taxonomy for UK Lepidoptera. We have to live with the mess!

• Need taxonomic expertise to validate the iCollections name with the correct concept in EMu.

• Typos, errors when entering names by digitisers.• Can’t rely on the EMu import algorithms as

matching taxon names is too complex. Need human intervention.

• Built mapping tool to map taxon name with existing EMu name.

Page 16: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Taxonomy Harmonisation Tool

Page 17: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Sites Harmonisation

• Messy data makes databasing directly difficult. Sites has poor quality data. Very few are usable, very poor consistency of how data have been captured (diverse data sources).

• Mapping site variants to a site master record.Box Hill

Box Hill; Surrey

Box Hill; Kent Box Hill; Surrey; UK;

Box Hill; near Dorking 51.254 N, -0.308 W

Box Hill, Dorking

• Out of 181,000 specimens, just 9,681 unique site variants.

Page 18: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Sites Harmonisation & Georeferencing

Page 19: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Sites Georeferencing

Page 20: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Import into EMu

• Import is a phased approach;

1) Images. KE have built a backend script to ingest multimedia server side. Reports out a csv with the EMu irn & file name identifier.

2) Specimen record (taxonomy, drawer location & multimedia).

3) Georeferenced collection event data.

Page 21: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Issues

• Barcode no reads or misreads.

• Printing quality of barcodes.

• Multiple specimens on one pin.

• Conflicting data.

• Data difficult to interpret.

• Specimens with old style specimen number labels (non barcode).

• Specimen records exist already in EMu.

Page 22: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Digitisation Progress

Page 23: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

iCollections Team

The success is due to the project having a strong team ethic, pulling together museum staff from a wide variety of different disciplines.Gordon Paterson chairVictoria Carter project managerDarrell Siebert quality assurancePeter Wing digitiserElisa Cane digitiserFlavia Toloni digitiserJo Durant digitiserLyndsey Douglas digitiserSara Albuquerque digitiserJasmin Perera digitiserSophie Ledger digitiserGerrardo Mazzetta digitiserGeoff Martin collections managementMartin Honey collections managementBlanca Huertas collections managementTheresa Howard collections managementSteve Brooks researchAngela Self researchIan Kitching researchMalcolm Penn georeferencingLiz Duffell georeferencingCaitlin McLaughlin georeferencingMike Sadka database & interface designerAdrian Hine data workflowChris Sleep databaseVladimir Blagoderov image workflowSteve Cafferty image workflow

Page 24: ICollections, Mass Digitisation of British & Irish Lepidoptera Adrian Hine, Natural History Museum, London.

Questions?