Component-Based Portals for Grid Computing
description
Transcript of Component-Based Portals for Grid Computing
OGCEOGCEConsortium
Component-Based Portals for Grid Computing
Marlon PierceCommunity Grids Lab
Indiana University
OGCEOGCEConsortium
NSF NMI Project for Reusable Portal Components: Who We Are
• University of Chicago– Gregor von Laszewski
• Indiana University– Marlon Pierce, Dennis Gannon, Geoffrey Fox, and Beth
Plale• University of Michigan
– Charles Severance, Joseph Hardin• NCSA/UIUC
– Jay Alameda, Joe Futrelle• Texas Advanced Computing Center
– Mary Thomas
OGCEOGCEConsortium
What Is Grid Computing?
• Grid Computing provides an overlay infrastructure that can be used to bind computing and data resources from multiple organizations into “virtual organizations”.– Security, information services, resource access protocols, file
transfer, etc.• Open Grid Services Architecture recasts Grid
capabilities as Web Services – WSDL descriptive conventions, advanced features for transient
services, etc.– Service hosting environments manage service lifecycles,
interactions with requestor agents.• But what about the clients?
– And what about user centric services?
OGCEOGCEConsortium
Towards A Common Grid Client Hosting Environment
Grid portal background and emerging common frameworks
OGCEOGCEConsortium
What Is a Computing Portal?
• Browser based user interface for accessing grid and other services– “Live” dynamic pages for accessing grid services – Use(d) Java/Perl/Python COGs– Manage credentials, launch jobs, manage files, etc.– Hide Grid complexities– Can run from anywhere– Unlike user desktop clients, connections go through portal server,
so overcome firewall/NAT issues • Combine “Science Grid” with traditional web
capabilities– Get web pages for news feeds– Post and share documents– And other more traditional web page features
• Customizable interfaces and user roles/views
OGCEOGCEConsortium
Let 10000 Flowers Bloom
• Many portal projects have been launched since late ’90s. – HotPage from SDSC, NCSA efforts, DOD, DOE Portals,
NASA IPG– 2002 Special Issue of Concurrency and Computation
• Continue to be important component of many large projects– NEESGrid, DOE SciDAC projects, NASA, NSF, many
international efforts• Global Grid Forum’s Grid Computing
Environments Research Group– Community forum
OGCEOGCEConsortium
Por
tal U
ser
Inte
rface
Grid ResourceBroker Service
Grid and Web
Protocols
Informationand
Data Services
DatabaseService Database
HPCor
Compute Cluster
Grid InformationServices, SRB
PortalClientStub
PortalClientStub
PortalClientStub
JDBC,Local, orRemote
Connection
Three-Tiered Architecture
Three-tiered architecture is accepted standard for accessing Grid and other services
OGCEOGCEConsortium
Problem with Portals
• GCE revealed two things– Everyone was doing the same thing
• Not quite, but significant • Everyone builds secure logins, remote file manipulation, command execution,
access to info servers.• Everyone would at least like support for multiple user roles (administrators,
users) and customization– No one could share components with other groups
• No well defined way of sharing UI components or making services interoperate.• No well defined interfaces to portal services.
• A research opportunity!– Two levels of integration: user interfaces and services
• Our challenges– Stop reinventing things and provide ways for groups to reuse components.– Provide a portal marketplace for competing (advanced) services.– Provide APIs for service integration
OGCEOGCEConsortium
A Solution based on components
• A software component is object defined by – A precise public interface – A semantics that includes a set of “standard” behaviors.
• A Software component architecture is:– A a set of rules for component behavior & – A framework in which components can be easily installed
and interoperate.• The component architecture of choice for the
Portal community is the one based on portlets– Java components that generate content, make local and
remote connections to services.– Portal containers manage portlet lifecycles
OGCEOGCEConsortium
A Portlet Approach to Grid Services
• A Portlet is a portal server component that provides basic services rendered in a user-configurable window in a portal pane.
Portal Server
MyProxyServer
MetadataDirectoryService(s)
Directory& indexServices
ApplicationFactoryServices
Messagingand group
collaboration
Event andlogging
Services
Portlet1
Portlet2
Portlet3
Portlet4
Portlet5
Portlet6
OGCEOGCEConsortium
The Grid Portal• Provides Portlets for
– Management of user proxy certificates
– Remote file Management via Grid FTP
– News/Message systems • for collaborations
– Grid Event/Logging service– Access to OGSA services – Access to directory services– Specialized Application Factory
access• Distributed applications• Workflow
– Access to Metadata Index tools• User searchable index
OGCEOGCEConsortium
OGCE Foundations: Portal Containers and Grid Access
OGCEOGCEConsortium
Portlet Component and Container Technologies
• Jakarta Jetspeed– Open source Java portlet project– Jetspeed is both a framework and reference
implementation– Defines portlets, portal service APIs (login, authorization,
customization, etc.)• CHEF from University of Michigan
– Uses Jetspeed as a framework• Reimplements many of the core classes
– Basis for UM CourseTools– NEESGrid portal– CMCS Portal
OGCEOGCEConsortium
Background
• CHEF is organized around groups of users• Portals in CHEF are group based (a group
can consist of only one person!)• A user sees the Portals for each group of
which that person is a member• The Portal is a collection of Portal pages• Each Portal page contains one of more
teamlets
OGCEOGCEConsortium
Portal Engine:
JetspeedVelocityCHEF
Teamlets:Written in JAVA
Responsible for GUI Operate in the
context of a session.Rely on services for
any persistent or “cross-user” information.
ServicesPersistent
System-wideMultiple
implementations of services
Configurable as to what
implementation provides what
service
Servlets:Access services outside of the
portal engine: AccessServlet and WebDavServlet
Services API
CHEF Architecture
WebServer:
TomcatTurbine
Non-HTTP Components (i.e. E-Mail)
OGCEOGCEConsortium
What is a Teamlet?
• A teamlet is a portal-like presentation of information and possible user actions
• It can be placed in multiple places with a portal in across multiple portals; each placement is independent
• Each placement is configurable• Each placement belongs to a portal; and is
therefore associated with a group
OGCEOGCEConsortium
Design Process - Elements
• The design of a teamlet consists of three elements– A service (the Java class or classes that implement
the interface to a source/store of information)– An action (the Tool in CHEF; one of more Java
classes that present information to the user and respond to user actions)
– The GUI (usually a set of Velocity templates)
OGCEOGCEConsortium
Java CoG Kit
• Provides interfaces to elementary Grid functionality– Copy a file from here to there– Execute a remote job on the Grid– Authenticate to the Grid
• Provides interfaces to more advanced Grid functionality such as simple job queues and task graphs
• Provides a convenient API level interface that protects you from many changes in the Grid such as GT1.x to GT2.x to GT3.x to GT4.x
OGCEOGCEConsortium
What does the user see?
Portlet
Java CoG Kit High-Level
GT2 GT3 GT4 Condor SSH
Java CoG Kit Low-Level interface interface
Portal Interface
OGCEOGCEConsortium
Portal Capabilities
A survey of current portal capabilities.
Portal Capabilities DescriptionGrid Proxy Certificate Manager Get MyProxy certs after logging in.Schedule Interactive individual and group calendarsDiscussion Persistent topic-based discussion for groupsChat Live chat services and interfacesDocument managers WEBDav based document system for group file
sharingMDS/LDAP Browsers Basic Globus MDS browsing and navigatingGridContext Portlets Access context services for managing metadataGRAM Job Submission Run simple executables on remote hostsGridFTP Upload, download, crossload remote files.GPIR Portlets View, interact with HPC status, job, etc
information.Anabas Access to Anabas shared display applets
Newsgroups and citation portlets Post topics to newsgroup, manage group references and citations with access controls
User Portlets
OGCEOGCEConsortium
Grid Portlet Examples
• We’ll next overview several portal capabilities.
• Jetspeed/CHEF acts as a clearing house for portal capabilities– User interface components can be added in well
defined ways.– First level of integration
• All Grid access goes through the Java COG.
OGCEOGCEConsortium
Example Capability: Portals for Users
• The MyProxy Manager– The user contacts the portal
server and asks it to do “grid” things on behalf of the user.
– To make this possible the server needs a “Proxy Certificate”
• The user has previously stored a proxy cert in a secure MyProxy Server stored with a temporary password.
• User give the portal server the password and the portal server contacts the proxy server and loads the proxy.
• The portal server will hold the proxy for the user for a “short amount of time” in the user’s session state.
Portal Server
1. Load myProxyCertificate!
User “Beth”
MyProxyServer
2. Give meBeth’s proxycertificate
I amBeth’sProxy
3.COG
MyProxyPortlet
OGCEOGCEConsortium
JavaCOG
Example Capability: File Management
• Grid FTP portlet– Allow User to manage remote file spaces– Uses stored proxy for
authentication– Upload and download files– Third party file transfer
• Request that GridFTP server A send a file to GridFTP server B
• Does not involve traffic through portal server
Portal Server
User “Beth”
GridFTPServer A GridFTP
Server B
GridFTPportlet
OGCEOGCEConsortium
Example Capability: Grid Context Service
• User’s want to be able to use the portal to keep track of lots of things– Application and experiment records
• File metadata, execution parameters, workflow scripts
– “Favorite” services• Useful directory services, indexes, links to important resources
– Notes and annotations• “Scientific Notebooks”
OGCEOGCEConsortium
XDirectory: A Grid Context Service
• XDirectory is itself a Grid Service that is access by the portal.– An index over a relational database– Each node is either a “directory node” or a leaf.– Leaf nodes are xml elements which contain metadata as well as html
annotations.
OGCEOGCEConsortium
Portlet Interfaces to Grid Context Services
• A Remote Service Directory Interface– Holds references and metadata
about application services.• User selects interface to
application service from the directory browser.
• Examples: (near completion)– Select a link to a Dagman
document and invoke the Condor service on the script.
– Same for GridAnt/Ogre or BPEL workflow script.
– Factory services for any grid apps that have interactive user interfaces.
Portal Server
RemoteGridApplicationService
RemoteService
DirectoryService
OGCEOGCEConsortium
Example Capability: Topic Based Messaging Systems
• Indiana University has implemented a XML metadata system based on messages.
• Newsgroups– Topic based posting and administration
• Citation/reference browsers– Topic based, export/import bibtex
• Portlets sit atop JMS-based message system.
OGCEOGCEConsortium
OGCEOGCEConsortium
User Privileges for Group Channels
• Users request access to specific topics/channels.– Granted by administrator for that topic
• Can request– Read/write by browser– Read/write by email (newsgroups)– Receive/don’t receive attachments.
• Topic admin can edit these requests.• Super admins can manage administrators
to topics
OGCEOGCEConsortium
GPIR Data• Load - aggregated CPU• Downtime data for a
machine– Jobs: aggregated queue
• MOTD• Nodes: job usage for
each machine node• NWS: based on VO and
Click model• Grid Monitoring
– Based on TACC GMS System
– Custom providers– Plans to include MDS3.0
and INCA data uderway
• Expanding to include: – queuing system– application profiles– performance data– Application profiles– Doc links
• Model allows generic inclusion of any XML data from any recognized source– Need schema– Need query
OGCEOGCEConsortium
Grid Portal Information Repository (GPIR 1.1)
OGCEOGCEConsortium
GPIR Components
• Web Services Ingestor – Web Services Ingestor and clients– XML Schemas - can be changed
• Data Repository– Local Cache– Archival --> PostgreSQL
• Web Service Query– retrieve data – XML Queries– Retrieving current snapshot and archived data
• Clients– GridPort services– Portal/Web Interface (Portlets, servlets, JSP)– Command line– Any that speak web services
OGCEOGCEConsortium
Future Capabilities
OGCEOGCEConsortium
Major Theme: Grid Application Support
• Current portal’s job submission capabilities are vanilla– Type desired machine, executable, output file– Generates RSL, runs command
• Actual job management requires more– Integration of information, scheduling services,
file transfers, job sequencers, events
OGCEOGCEConsortium
Capability: Job Sequencer Portlets
Portal
User uses Portal to generate XML description of sequence.
" xsi:schemaLocation="http://grids.tacc.utexas.edu/schemas/sequencer/jobSequenceC:\DOCUME~1\Maytal\Desktop\Maytal\Work\GP-IR\GP-IRX~1\motd.xsd">
< <Status>New</Status> <Step> <Status>Unscheduled</Status> <Type>CSFJob</Type> <Parameter name="jobFactoryServiceHandle">http://129.116.218.36:15080/ogsa/services/metascheduler/JobFactoryService</Parameter> <Parameter name="queue">normal</Parameter> <Parameter name="executable">pam</Parameter> <Parameter name="arguments">-g 1 mpichp4_wrapper /home/monitor/mpi_jobs/mpimd_5</Parameter> <Parameter name="directory">/home/monitor/mpi_jobs</Parameter> <Parameter name="count">4</Parameter> <Parameter name="stdIn">/dev/null</Parameter> <Parameter name="stdOut">/home/monitor/mpi_jobs/tomislavSequencerJobOut</Parameter> <Parameter name="stdErr">/home/monitor/mpi_jobs/tomislavSequencerJobErr</Parameter> </Step> <Step> <Status></Status> <Type>GridFTP</Type> <Parameter name="fromHost">[Previous]</Parameter> <Parameter name="toHost">blanco.tacc.utexas.edu:2811</Parameter> <Parameter name="fromFileFullName">/home/monitor/mpi_jobs/tomislavSequencerJobOut</Parameter> <Parameter name="toFileFullName">/home/monitor/mpi_jobs/tomislavSequencerJobOutCopied</">/home/monitor/mpi_jobs/tomislavSequencerJobErr</Parameter> <Parameter name="toFileFullName">/home/monitor/mpi_jobs/tomislavSequencerJobErrCopied</Parameter> </Step></JobSequence>
Currently, sequence steps can consist of
File Transfers and Job Submissions to the
CSF meta scheduler
GPIR
The XML is then decomposed and persisted to GPIR where the status
information of each step in the sequence and of the sequence as a whole can be
stored
Sequencer
GridPort returns a Sequence ID to the
Portal immediately and then begins executing
the Sequence to completion or to error. Status information can be obtained at any time with the Sequence ID
OGCEOGCEConsortium
Capability: Community Scheduling Framework Portlets
CSF Use Case • Researcher submits job through User Portal• User Portal uses GridPort to
– authenticate user– optionally make advanced reservation to visualization system– submit job to CSF
• CSF selects compute cluster with best fit and forwards job• Gridport sends results to visualization system
User Workstation
User PortalGridPort
CSF
VisualizationSystem
Bandera
Blanco
Buda
OGCEOGCEConsortium
O.G.R.E.—A Job Management Engine• See Thursday Demo• O.G.R.E. = Open Grid Computing Environments
Runtime Engine• What Ant lacked, but we needed:
• Broader conditional execution, • Ant: based on write-once String properties.
• A general “loop” structure for Task execution.• Data-communication between Tasks (and with their containers).• Specialized tasks
• File reading and writing• Local and remote file management (gridftp)• Web service related tasks• Event- and process-monitoring-tasks
OGCEOGCEConsortium
Data and Metadata Management
• When the job is through…• Simulations, experiments generate both data
and metadata– Metadata includes from code input parameters, host
machines, data formats, owners of data, generators of data,…
• NEESGrid metadata system will be integrated into the portal release.
• Another example of integrated Grid services– GridFTP, CAS and other services
OGCEOGCEConsortium
Metadata Repository Capabilities
• Data store– Files– Logical naming– Format translation
• Metadata store– Structured (RDF-like
schemas)– Random-access (tuple
store)– Version control
• Archiving– Mass store– “nar” archive format
• Security– Single signon– Secure reliable file transfer
with GridFTP– Authorization via CAS
• Grid service interfaces– NFMS: NEESgrid File
Management Service– NMDS: NEESgrid Metadata
Service– Repo. service (Façade)– Secure remote access by
applications
OGCEOGCEConsortium
Portal
Repository architecture
NFMS
NMDS
Filesystem
GridFTP
Repobrowser
File xferservlet
user repository
HTTPS JDBC,File I/O
Repo.service
(Façade)
GSDL,GridFTP
Java API,GridFTP
NMDSDB
CAS CASDB
Access Grid and Related Portlets
OGCEOGCEConsortium
Architectural Upgrades
Portlet standards, service managers, event standards
OGCEOGCEConsortium
“A Bag of Portlets…”
• Portlet/container systems provide a simple level of user interface integration.– A clearing house for pluggable components of
all sorts• User interfaces are actually to a diverse
set of backend services.– A mixture of UIs to Web services, grid services,
communication/collaboration services,….• We are a portlet marketplace…• But we need closer integration
OGCEOGCEConsortium
OGCE Initial Architecture
Port
al
LocalPortlets
Teamlets
Proxy Portlets
JetspeedInternalServices
JavaCOGAPI
Java CoGKit
Grid Services
GridProtocols
GRAM,MDS-LDADMyProxy
ServiceAPI
CHEFServices
Rem
ote
Inte
rfac
es CoG
StubsHTTP
Grid Services
Other ServicesSOAP
Initial architecture aggregates multiple services into a single portal using portlet containers
OGCEOGCEConsortium
Integration Points and Service Abstractions
• Internal portal service abstractions– Service layer abstractions to define how to
interact with in-memory proxy certificates.• Authorization
– Internal and external roles need to be integrated.
• Events– Share events between services– Job submissions should automatically update
the calendar operation, for example
OGCEOGCEConsortium
TeraGrid Integrated Architecture
Diagram demonstrates how existing software projects (such as GridPort) can be adapted to support NMI Portals software system
Port
al
Portletsand
Teamlets
JetspeedInternal Services
GridServiceStubs
RemoteContentServices
Remote Content ServersHTTP
GridServices
JavaCoG Kit
LocalPortal
Services
ServiceAPI
Grid
Port
Too
lkit
WebServices
OGCEOGCEConsortium
Portlet Standards
• Current portal uses Jetspeed portlet API• Other portlet systems available
– Websphere->GridSphere• Portlet standard: JSR 168
– A common API for all next generation portlet systems.– Compliant portlet components may be shared between
systems.• Open Source Implementation (Pluto) is
available– We will be adopting this, will be part of our SC2004
release– Will leverage education portal work from CHEF team.
OGCEOGCEConsortium
OGCE Portals in Action
Some early applications
OGCEOGCEConsortium
New Starts: TeraGrid Portal • Access to TeraGrid Services
– Version 0: Collecting Initial Services• Public Information about
Resources• Private Information for the
developers.– Version 1: A User centered portal
(Q2 2004)• Hotpage/Gridport style access to
user accounts, credentials, job submission & management.
– Version 2: Portals for Science Collaborations (Q3 2004)
• Shared spaces, whiteboards, AG access, group authorization, shared application services
OGCEOGCEConsortium
LEAD Portal
OGCEOGCEConsortium
OGCE Collaboratory
We Can’t Do It AllWe’ll Take Credit For It, Though
OGCEOGCEConsortium
OGCE Collaboratory
• Hopefully, we have convinced you to not rebuild portals from scratch.– Time to use pluggable components in consistent
frameworks.• Our award is not just to release our own
software.– We want to foster the portal community– Contributed third party components will be sought.
• Initial contributions will be from similar projects– CMCS and other SciDAC projects are closely allied
OGCEOGCEConsortium
Additional Information
• OGCE Web site: www.ogce.org– Download the portal software– Join news lists, get announcements
• OGCE Demo Portal: www.collab-ogce.org– See our demo Thursday night
• Contact us– [email protected]