DSpace at ILRI : A semi-technical overview of “CGSpace”
-
Upload
ciard-movement -
Category
Education
-
view
56 -
download
1
Transcript of DSpace at ILRI : A semi-technical overview of “CGSpace”
A semi-technical overview of “CGSpace”
DSpace at ILRI
Alan OrthJune, 2015
History of DSpace at ILRI
● 2009: ILRI launches Mahider (“repository” in Amharic)
● 2010: Other CGIAR centers and programs join our platform and share hard / soft costs
● 2011: Rebranded as “CGSpace”● 2015: 9 CGIAR centers, ~50,000 items, ~250k
hits/month
“CGSpace” in June, 2015
How we use DSpace
● Content people embedded in each department help capture results (presentations, papers, brochures, etc)
● Primary location for institutional outputs!● No posting PDFs on corporate website!● Integrate with website and blogs via RSS feeds● Direct ALL traffic to DSpace!● For data sets, videos, etc we make a metadata-
only accession with a link to eg YouTube
● Communities, sub-communities, and collections● Tempting to model after organization hierarchy!● (we did)● … but organization hierarchies change!
DSpace hierarchies
Mostly organized by output type now...
Metadata
● Standard Dublin Core is available● No AGROVOC● You can create custom controlled vocabularies in
arbitrary namespaces, eg: cg.subject.ilri
Custom metadata in ILRI report
Not AGROVOC!
“Discovery” facets
● Context-aware metadata summaries
● Side effect: helps spot metadata inconsistencies!
● … Open Access, Open access, open Access, etc.
Search engine optimization (SEO)
Help Google Scholar consume your content!
● XML sitemaps● Consistent domain name, eg: cgspace.cgiar.org● Persistent links for resources● Website speed and HTTPS both a plus● Sign up for Google Webmaster Tools to submit
sitemap, control indexing, see stats, etc
Sitemap view in Google Webmaster Tools
Importance of persistent links
● Website addresses change…● mahider.ilri.org -> cgspace.cgiar.org● But resources stay the same!
http://hdl.handle.net/10568/67073
● “Handle” service from handle.net● Everything under prefix 10568 is CGSpace● Default DSpace handle prefix is 123456789!
dc.identifier.uri specifies an item’s persistent universal resource identifier (URI)
Getting data INTO DSpace
● Day-to-day submission is manual, by a small army of editors
● One-time batch uploads of items from other systems in CSV format (InMagic!)
● OAI-PMH for metadata only● OAI-ORE for metadata + bitstreams (eg, from
another DSpace or Sharepoint, etc)● SWORD (haven't tried)● REST API (DSpace 5+, haven't tried)
Getting data OUT OF DSpace
● REST API for structured JSON or XML● OAI-PMH for metadata● OAI-ORE for metadata + bitstreams (PDFs, etc)● RSS feeds for websites / blogs● XML sitemaps for search engines*
*Google discontinued the use of OAI for discovering site content in 2008! http://googlewebmastercentral.blogspot.com/2008/04/retiring-support-for-oai-pmh-in.html
CCAFS website, driven by Drupal + DSpace APIs
“Latest outputs” on project blog populated via RSS, links to CGSpace
Open source workflow on GitHub
https://github.com/ilri/DSpace
Skills needed in your organization
Besides content people(!)...
● Prioritize Linux systems administration experience (Tomcat, httpd, PostgreSQL, DNS, SSH, git)
● General: computer science background● Web developers a diverse bunch...● Java development experience doesn't hurt
Extra considerations
● Item mapping● Maintenance tasks (background batch jobs)● Backups of assetstore and PostgreSQL!● Altmetrics tracks social media mentions● Separate production / development
environments● CGSpace server is $80/month● ~20GB of PDFs, ~8GB of Solr data
Getting help
● “DSpace Tech” mailing list● “dspace” tag on StackOverflow website● [email protected]