Sharing re-usable phylogenetic data: we're not there yet
-
Upload
ross-mounce -
Category
Technology
-
view
604 -
download
2
description
Transcript of Sharing re-usable phylogenetic data: we're not there yet
Sharing reusable phylogenetic data: we're not there yet
Ross Mounce
@rmouncehttp://orcid.org/0000-0002-3520-2046
A talk of two halves
1.) Outlining the extent of the problem
(lack of) sharing, standards, care (?)
2.) What I'm trying to do about it:
Digging data out of PDFs
Re-releasing as
Just ~4% of published phylogenetic studies in 2010publicly archived their supporting phylo data in
Stoltzfus A, O'Meara B, Whitacre J, Mounce R, Gillespie E, Kumar S, Rosauer D, & Vos R. 2012 Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis
BMC Research Notes 10.1186/1756-0500-5-574
Where's the data?
Check our data yourself on Dryad here: 10.5061/dryad.h6pf365t
Scientists cannot be relied upon to share published data upon request
This has been known for a while nowe.g. (in Psychology) Wicherts et al 2006
But has been confirmed to be true for phylogenetics too:
Drew et al 2013 'Lost Branches in the Tree of Life'
report that just ~16% of researchers contacted supplied
the requested ('published') phylo data.
My own experience tallies with this – I soon stopped bothering to try and ask people via email for a copy of their published data. It's a waste of time.
The (Single) Supplementary Data Filewas a Y2K solution – a dump
ResearchData
Many legacy journal supplementary data systems bury data and leave it there to decompose
Often not re-usable in form e.g. a lazy PDF
Sometimes 'typeset', corrupting the data
A jumble of words & data where the bit you want is on page 92 (no programmatic access)
BURIED and really not very discoverable
Do reviewers even look at it? I think not tbh
I wasted too much of my PhD trying to get usable data to re-analyze
This is what I felt like... So I tried to do something about it...
www.supportpalaeodatarchiving.co.uk
An open letter in support of palaeontology data archiving
Which was picked-up by Nature NewsWhich, in turn got me in touch with:
Part 2
Since few will help you to re-use their data
You've got to dig it out and
make it re-usable yourself
ANDre-release it openly
so no-one else wastes their time doing this
It's not just phylogenetics.
I learned from the Open Knowledge Conference (Berlin 2011)that a lot different academic fields seem also struggle to make re-usable published data available.
If it's a common, shared-problem... why not seek a shared, cross-disciplinary solution?
AMI (Amanuensis)
Building upon tools first developed in computational chemistry by the Murray-Rust lab
e.g.
ChemicalTagger → PhyloTagger (Entity tagging)(Chem)PubCrawler → (Phylo)PubCrawler
(to getting 10,000+ PDFs to work on)
https://bitbucket.org/nickday/pub-crawlerhttp://www-ucc.ch.cam.ac.uk/products/software/chemicaltagger Open Source
BBSRC grant approved
“PLUTo: Phyloinformatic Literature Unlocking Tools”
Software for making published phyloinformatic data discoverable, open, and reusable
...I just need to get my PhD viva done & rubber-stamped
Instructions for getting the current working setup here:(multiple repositories, dependencies & requirements!)
http://rossmounce.co.uk/2013/10/06/setting-up-ami2-on-windows/
Evolution of ultraviolet vision in the largest avian radiation - the passerines Anders Ödeen 1* , Olle Håstad 2,3 and Per Alström 4
HTML
Styles , superscriptsAnd diåcritics preserved!
AMI
Turdus iliacusTaeniopygia guttataSerinus canariaLanius excubitorMelopsittacus undulatusPavo cristatusSturnus vulgarisDolichonyx oryzivorusFicedula hypoleucaVaccinium myrtillusFalco tinnunculus
TurdusPomatostomus LeothrixAmytornis AcanthisittaOrthonyx x 2MalurusCnemophilus x 4Philesturnus x 2Motacilla x 2Toxorhampus x 2
Typical phylo tree: 60 nodes, complex and miniscule annotation, vertical text, hyphenation and valuable branch lengths. AMI extracts ALL
Acanthisittidae Acanthizidae Acrocephalidae Callaeidae Campephagidae Cnemophilidae Corvidae
0.84 0.91 0.93 0.95
Acanthisitta Acrocephalus Ailuroedus Ailuroedus Amytornis Camptostoma
AMI23.1234.5437.2138.55
Posteriorprobability
Branch lengths
NexML
Genus Family
HTML
Acknowledgements & Thanks
For travel & accommodation support, without which I couldn't possibly attend TDWG
For the Panton Fellowship,inspiration and support
To the organisersof both the session:Nico, Hilmar, Rutgerand the conferenceas a whole!
My main collaborators on PLUTo: Matthew Wills and Peter Murray-Rust