EPrints and the Cloud

29
EPrints Cloud Visions

description

EPrints capabilities in the Cloud; a presentation at the EduServ "Repositories and the Cloud" event. For more info see http://repcloud.eventbrite.com/

Transcript of EPrints and the Cloud

Page 1: EPrints and the Cloud

EPrintsCloud Visions

Page 2: EPrints and the Cloud

What is EPrints For?

EPrints offers a safe, open and useful place

to store, share and manage material in the

pursuit of research and educational

agendas.administrative reporting, collaboration, data sharing,

digital profile enhancement , e-learning, e-publishing, e-

research, marketing, open access,

preservation, publicity, research assessment, research management, scholarly collections

Page 3: EPrints and the Cloud

Research Curation, Researcher Support

Researchers’ environment supported by repository

Research data managed by repository

Research community assisted by repository

Page 4: EPrints and the Cloud

What is a Repository

Safe, secure, persistent, managed storage for files

Safe, secure, persistent management of shareable FRBR works

Safe, secure, persistent, management of scholarly & scientific working

Lead

ing

to…

Science 2.0 / The Fourth Paradigm / Data Intensive ScienceThe challenge is not cloud computing but cloud thinking

Page 5: EPrints and the Cloud

Bio-Diversity

Page 6: EPrints and the Cloud

Current EPrints Cloud Capabilities

Amazon Elastic Compute Machine Images (AMIs) Small (Single Core / 1.7Gb) Large (64 Bit / Quad Core / 7.5Gb) Extra Large (64 Bit / 8 Core / 15Gb)

EPrints 3.2 is 64 Bit Enabled

Persistent Database & Storage Really Excited - Super Fast / Cheap / Easy!

Page 7: EPrints and the Cloud

Cloud to Desktop Storage

Data can be stored on multiple storage services

Local disk, SAN, NAS, Honeycomb, Cloud

Researchers can mount repository objects as a networked filesystem

Service usage and preservation risks can be monitored and analysed.

Page 8: EPrints and the Cloud

Hybrid Storage In EPrints

A single storage solution has drawbacks.

Cost vs. Speed vs. Reliability Repositories need to be

agile: to utilize and be able to migrate to new platforms

Leverage the benefits of each solution without losing control of your digital objects.

Page 9: EPrints and the Cloud

Local Disk Storage

No local bandwidth costs Hard to expand Locally Managed High overheads cost Requires space and cooling Tied closely to the software S

TO

RA

GE

EC

OS

YS

TEM

Page 10: EPrints and the Cloud

Local Archival Storage

Specialist Expensive to purchase Locally Managed Space and running costs Expandable

STO

RA

GE

EC

OS

YS

TEM

Page 11: EPrints and the Cloud

Cloud Storage

Scalable Externally controlled Known Costings Unclear retention policy Re-Useable (using simple APIs) Global Scale

STO

RA

GE

EC

OS

YS

TEM

Page 12: EPrints and the Cloud

But Clouds Blow Away

Recently: Yahoo Briefcase XDrive AOL Pictures HP Upline Sony Image Station

Source: Tom Spring - PCWorld

Page 13: EPrints and the Cloud

Why use Hybrid Storage

Use the best features of each storage type

Performance Scaling-up bandwidth

Optimisation Large-file handling Multimedia streaming

Localised Delivery Local delivery from the cloud

Page 14: EPrints and the Cloud

EPrints Storage Controller

• The storage controller decides where to put a file.

• Rule-based policy defined by XML configuration file

• Large binary files of scientific data (raw machine result data) can be stored in a large disk (slower access) system and sent to a tape company for long term storage.

• Processed results can be stored locally and in the cloud ready for rapid delivery to end points.

Page 15: EPrints and the Cloud

Architecture Diagram

Page 16: EPrints and the Cloud

Controller Ruleset

<choose> <when test="datasetid = 'document'"> <choose> <when test="$parent{relation_type} =

'isVolatileVersionOf'"> <plugin name="Local"/> </when> <otherwise> <plugin name="AmazonS3"/> </otherwise> </choose> </when> <otherwise> <plugin name="Local"/> </otherwise> </choose>

Page 17: EPrints and the Cloud

EPrints Storage Manager

Page 18: EPrints and the Cloud

Amazon S3 Localisation (1)

Page 19: EPrints and the Cloud

Amazon S3 Localisation (2)

Page 20: EPrints and the Cloud

Preservation Services

Object Classification

Risk Analysis

Mitigation and Migration

Page 21: EPrints and the Cloud

EPrintsForthcoming Development

Page 22: EPrints and the Cloud

EPrints Cloud Services

Web based repository setup Much like getting started with a blog. Fill in a form and obtain a repository. Coming to EPrints core in next major release.

Enterprise Support for Cloud Solutions Full Setup & Configuration Global Distribution Auto Upgrade & Patching Trusted Backup

Page 23: EPrints and the Cloud

EPrints 3.2

Plug-ins / Modules Everything builds on the core layer Major part of v3.2 is strengthening

the core and adding more abstraction layers

Improved data model Enhanced data facilities Enhanced metadata facilities Improved programming & API

Page 24: EPrints and the Cloud

EPrints 3.2 Structure

Page 25: EPrints and the Cloud

Community Driven Development

There are many abstraction layers. Display Manipulation Upload Handlers Custom Datasets Import / Export Plug-ins Transcoding Plug-ins Database Plug-ins Storage Plug-ins

One API

Page 26: EPrints and the Cloud

Storage Plug-ins

Local NFS Amazon S3 Sun Cloud Storage Service Microsoft Azure Any others based on the S3 API…. (the last 3 all are)

5 Call API (about 30mins to write a plug-in)

Page 27: EPrints and the Cloud

Our Development Vision

Empower the Community with a simple API API in 3.2

Give the community a platform to test their code

Use the Cloud!

Give the community a distribution mechanism

The EPrints Bazaar (beta)

Page 28: EPrints and the Cloud

EPrints Bazaar

Similar in concept to Apple’s App Store

Every install of EPrints will have access to the Bazaar

Single click install/uninstall of plug-ins

EPrints Services Approved Plug-ins Enterprise support for limited 3rd party plug-ins

Page 29: EPrints and the Cloud

Summary

EPrints provides the professional, enterprise level application for resource management

Including cloud support at many levels Repository-in-the-cloud Storage-in-the-cloud Services-in-the-cloud