Databases and applications in a distributed GRID environment
Grid Access to Databases
Transcript of Grid Access to Databases
![Page 1: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/1.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access
Vladimir Veytser, [email protected]
NPACI Summer Institute
![Page 2: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/2.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access
• Ability to access database through the use of Grid Security Infrastructure (GSI).
• GSI is based on public key encryption, X.509 certificates, SSL communication protocol, and it is implemented by Globus ToolKit.
• Our goal is to integrate existing databases into Globus/GSI based Grids. From the user’s perspective it should be just another Grid resource.
• It should not cause degradation in performance.
![Page 3: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/3.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access Methods
• We see two methods for accessing databases on the Grid:
– Through Grid Middleware (e.g. WebServices, SRB, SpitFire)
– Direct access- i.e You can open a connection directly into a database as long as you have a “Grid” certificate.
• The access method will depend on the type of application and whether or not a particular method is supported by a platform.
![Page 4: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/4.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Presentation Outline
• Currently available technology• Industry efforts• Our wish list• Q&A• A small lab using DB2 and Globus
![Page 5: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/5.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Current Database technology
• Currently databases do not support direct Globus authentication.
• An alternative: Grid enabled middleware services– SRB: a client can GSI authenticate itself to a
server– SpitFire: a webservice in front of the database to
which client can GSI authenticate– OGSA-DAIS: data access and integration service.
This spec is based on OGSA WebServices. – Customize GRAM (part of Globus) jobmanager
which can covert RSL string into “native” SQL.
![Page 6: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/6.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Technology in Detail: SRB and SpitFire
• SRB: Storage Resource Broker– Client GSI authenticates to a server.– Server runs a query using native DB2 client.– Result are returned back to the client
• SpitFire (http://edg-wp2.web.cern.ch/edg-wp2/spitfire/) – Designed to give quick and easy access to
(meta)data where the access patterns are simple– Front end: Grid Web Service– Backend: JDBC to the DBMS
![Page 7: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/7.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Technology in Detail: OGSA-DAIS
• Working group in Global Grid Forum. Based in Edinburgh, UK
• Defines OGSI compliant webservices.– Data Resource Manager: provides handle to an
actual resource manager and exposes it’s capability/features.
– Data Resource: provides handle to an actual data source (file system, db) and exposes it’s object types (DBSchema, storedProcedure, userDefinedTypes, triggers, etc).
![Page 8: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/8.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
OGSA-DAIS cont.
• OGSA-DAIS Services cont:– Data Activity Session: provides the context for
data request operations. Created dynamically by DR.
– DataSet: populated by DAS. Client receives handle or data value. Handle can be:
• Synchronous: not returned until DS is created and populated
• Asynchronous: returned as soon as DS is created but before it is populated
![Page 9: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/9.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Access via “middle man”
• Pros– Exists today and does not require changes to the
existing databases– Does not require you to have database clients – Makes it easier for a user by automating many of
the details (transfer, staging, etc.) – Allows for DB roles (SpitFire: base, admin, info) – Dual functionality: also a good place to store meta-
data (schemas, stored procedures, etc.)– Works well in the cluster environments where you
submit batch jobs
![Page 10: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/10.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Access via “middle man” cont.
• Cons– Can effect performance– Admins want more flexiability and familiar interface– Store database passwords– SpitFire: designed for simple access paterns – OGSA-DAI: too heavyweight for simple data
access (4 services)– Hard to do auditing– Another thing to maintain
![Page 11: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/11.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: Oracle
• Oracle 9i– Does not have direct GSI authentication– Oracle tools can be invoked using Globus
Resource Allocation Manager (GRAM). They claim to have a toolkit that can do this (OGDK), but I could not find it.
• Oracle 10i– Will have support for GSI authentication and other
Globus services. – Should be in beta very soon
![Page 12: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/12.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: DB2
• DB2 V 8.1– Users can either write a costume jobmanager or
specify necessary parameters through the RSL string (this method will be used in today’s lab).
– Emerging Technologies Toolkit supports Grid WebServices. For more information see: http://www.alphaworks.ibm.com/tech/ettk
• DB2 V 8.2– Will be out next year– Unlike Oracle will have an interface into which you
can plug in your security model.
![Page 13: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/13.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: MySQL
• MySQL version 4.1– Closest to having GSI authentication– They support SSL (via OpenSSL library)– It should be possible to modify OpenSSL to
support Globus certificates– OpenSSH has already done it: GSISSH
(NCSA)– Currently efforts are going on at SDSC and
Brookhaven National Lab
![Page 14: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/14.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Our Goals
• GSI authentication pushed into the database• Role base authentication
– Admin: power users who can insert/drop/delete.– Power: read privileges and write to temp
tables/views.– Info: user with read only privileges.
• Most common. • Many to one mapping (Many data base users
to a single data base.)
![Page 15: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/15.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Globus Jobs at TeraGridSDSC
• User submits a globus job (e.g. globus-job-submit) from his work station to tg-login.sdsc.teragrid.org
• Jobmanager at tg-login converts RSL string into PBS• PBS schedules a job on our DTF cluster (dual,
128-nodes Itanium2 cluster)• For more info: http://teragrid.org/docs/user-guide.htm• We want jobs that require DB2 access to work the
same way.– Requires DB2 client installation- not there yet.
![Page 16: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/16.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
LAB Outline:
Goal: As close to real world as possible• Login into BlueHorizon (NPACI grid)• Get a Grid Certificate• Create a proxy-certificate• Submit Globus DB2 job to ctf19
login/compute node (the only node with DB2 clients)
![Page 17: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/17.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
LAB cont.
• I will give out LAB instructions during class.• A copy of LAB instruction can be found at:
– http://www.sdsc.edu/~veytser/db2globuslab.html
![Page 18: Grid Access to Databases](https://reader036.fdocuments.net/reader036/viewer/2022082808/55504fd0b4c9058f768b53fc/html5/thumbnails/18.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Thank You
• Questions?