San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun...

23
San Diego Supercomputer Center San Diego Supercomputer Center www.irods.org iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open Grid Forum 19 Jan 31, 2007 – session II

Transcript of San Diego Supercomputer Center iRODS DGMS Towards Data Grid Standard Implementations Arun...

Page 1: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer CenterSan Diego Supercomputer Centerwww.irods.org iRODS DGMS

Towards Data Grid Standard Implementations

Arun Jagatheesan

San Diego Supercomputer Center

Open Grid Forum 19 Jan 31, 2007 – session II

Page 2: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 2

Outline

• Community Introduction : OGF-GFS• User perspective• Developer/Vendor Perspective• Need for standard community implementation• Community implementation process• GFS-WG community architecture sketch• Follow-up actions

Page 3: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 3

Motivation

• Global namespace for unstructured data storage • Collaboration amongst multiple partners / teams• Long-term management of unstructured data

• Files, collection-based digital entities

Page 4: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 4

NIH BIRN Data Grid

Page 5: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 5

World Wide Datagrid

Page 6: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 6

Used or Required by

• Large scale academic projects• Federal agencies (NARA, LoC, …)• Fortune 500, Forbes Global 2000, ….

Page 7: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 7

DGMS Concept-wise

• Large-scale logical file system + File System+ Database System+ Grid Computing

= Data Grid Management System (DGMS)

• Core Concepts• Logical shared collections • Logical shared resources• Collaborative communities

Page 8: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 8

Problem solved / Requirements –1

• Collaborative logical namespace• Global collaborations of multiple teams• Collaborations of multiple organizations • Avoid multiple mount points as they restrict scalability of

the collaboration• Coordinated data sharing at any granular level (data,

metadata, annotations,…)

Page 9: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 9

Problem solved / Requirements –2

• Data Distribution• Multi-site replicas reduce access times• Replicas have the same logical name everywhere in the

enterprise (big plus for users)• Concept of replica, copy, cache• Replicas controlled by user, admin, system-enabled

(automated or policy based)• Reduce WAN latency (chattiness)

Page 10: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 10

Problem solved / Requirements –3

• Data Classification and Discovery• Major advantage for Global 2000 companies• Tag data with any arbitrary metadata schema• Each team can organize its data based on user-defined

attributes• Multiple teams can have different metadata attributes on

the same data• Query, discover and access data without knowing path or

protocol to be used

Page 11: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 11

User Perspective

• Designed for Off the shelf • don’t want to assemble (or DIY) • But able to customize the solution

• One point of contact or responsibility• If it does not work I have one mailing list or number to call

Page 12: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 12

Vendor/developer perspective

• “OGF-GFS compatible” • OGF-GFS Data Grid Applications• OGF-GFS Data Grid Appliance

• Ease of standard evolution• Avoid unnecessary dependencies on multiple interfaces

for operations that are the same granular level

• Ability to collaborate, learn and compete• An end-to-end solution with common interface• Additional capabilities that add value to the solution

Page 13: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 13

Lessons Learnt

• Software v/s Specification• Software implementation to engage and collaborate as we

define standards (unless every wants to invest on software development from the start)

• Make both the user and vendor/developer happy• Have users happy to be confident to share requirements

and demand for the standards from vendors/developers• Vendors/developers know it’s a real thing that can be

implemented around their existing products or software

Page 14: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 14

The scope (from GFS Architecture)

• A single interface• Protocols

• A hybrid of XML and byte-level protocol• XML – command channel of operations• Byte-level – data movement

• Possible Functionalities • File namespace and file operations (read, write, …• Meta-data operations (user-defined metadata, search)• Data Grid Language for policy, rules etc.,

Page 15: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 15

What could be the right high level picture?

DGMS

XML-command protocol

XML-command protocol

Byte-level data protocol

Byte-level data protocolObject-transfer

Facilitate SOA

Page 16: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 16

What could be the right high level picture?

DGMSserver

XML-command protocol

XML-command protocol

Byte-level data protocol

Byte-level data protocol

DGMSserver

DGMSserver

Page 17: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 17

User perspective

Logical Resources

Multiple Replicas

Users from different

organizations

User defined meta data for

data discovery

Secret Recipe

Page 18: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 18

So what will we be doing (products?)

• Definition• Concept ( data grid namespace, resource-namespace…)• Initial functionalities (DGMS operations to be targeted)• Namespace (Files, Metadata, Resource, Policy rules)

• XML protocol • XML-handshake and message transfer between DGMS-

client and DGMS-server

• Most importantly…• Software as a common framework for the evolution,

adoption and growth of the standard and DGMS concepts

Page 19: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 19

So how will we do it? (process)

• Community-based open design (OPEN FORUM)• Design discussions as a community• Code through multiple parties to make sure we keep the

vendor/developer community and user community engaged

• Community-based open standard (OPEN STDS)• Specs written using wiki and other mechanisms• Community based spec for OGF• Interoperability workshops and Workshops along with

other relevant agencies like SNIA or DMTF

Page 20: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 20

How can you get started?• Initial requirements

• Can you delete email? (sign up for our mailing list)• Got Bandwidth and browser? (Visit our group page)• Can you scream or shout or smile ( join our WG sessions)

• Are you a user or consumer or researcher?• Tell us what is needed?• What should be there for you to put this open source

software/standard in production

• Are you a vendor/developer?• Have your engineer or developer talk to us (we will convert him to a

DGMS developer or DGMS Guru)• We are developing a open standard – take advantage of it and

develop a value added solution around it

Page 21: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 21

When do we get started?

• Right now (Hmmm.. We did long time back)• Conference calls every other week

• Mostly Wednesdays• Attend through phone call, Skype or Polycom Video

conference (any thing you like)• Discussions influencing, design requirements

• Face to face meeting• Once every quarter (planned), OGF sessions

Page 22: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 22

Suggestions, comments, critics

• TO DO• Standard operations based on policies/rules• Take advantage of OGF standards as possible• Other commercial or magic tools could be used below the

standard

• NOT TO DO

Page 23: San Diego Supercomputer Center  iRODS DGMS Towards Data Grid Standard Implementations Arun Jagatheesan San Diego Supercomputer Center Open.

San Diego Supercomputer Centerwww.irods.org IROS DGMS 23

Conclusions

• Data Grids• Data Grid Management systems (DGMS)• Very good user need in academic and non-academics• Need for standards framed by Grid File System WG

• Software-included Spec Strategy