Data Wrangling at Rice University Denis Galvin Rice University MetaArchive Annual Membership Meeting...
-
Upload
erika-ford -
Category
Documents
-
view
214 -
download
0
Transcript of Data Wrangling at Rice University Denis Galvin Rice University MetaArchive Annual Membership Meeting...
Data Wrangling at Rice University
Denis Galvin Rice UniversityMetaArchive Annual Membership Meeting
Houston Texas
ETDs at Rice
• Dspace• Collection in a database driven
by programming• 42,581 G• Brief and Full records
ETD Structure• Briefhttp://scholarship.rice.edu/handle/1911/13401• Fullhttp://scholarship.rice.edu/handle/1911/13401?show=full• PDFshttp://scholarship.rice.edu/bitstream/handle/1911/13401/1338793.PDF?sequence=1
Testing
• All testing done on Centos using VMware • Plugintool testing• Run one daemon• Copying other sites plugins
Manifest Page
Dublin Core
request?verb=ListRecords&metadataPrefix=oai_dc&set=hdl_1911_8299
Sub-Manifest Page
• Links to ETDs within DSpace
Plugin
• Configuration parameters:Base URL• For the sub-manifest pages:
Part (integer)
Crawl Rules
Crawl rules explained
• Include master manifest page:
• Include sub-manifest page:
• Include items under /bitstream
• Include OAI-PMH link
Crawl rules explained• Include full record
• OAI-PMH link on manifest master• Pulls in Dublin Corehttp://scholarship.rice.edu/dspace-
oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=hdl_1911_8299
Collection Sizes
• Recommended AU between 1G and 10G
• 5 AUs between 7 and 10G• Create new AUs as collection
grows
Tips
• Don’t trust testing with the plugin tool
• Read documentation • Test with Run One Daemon• Test on the caches• Use expert mode to write
plugin
Questions?