Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC)...
-
Upload
miles-chandler -
Category
Documents
-
view
216 -
download
2
Transcript of Open Science Grid The OSG Accounting System: GRATIA by Philippe Canal (FNAL) & Matteo Melani (SLAC)...
Open Science Grid
The OSG Accounting System:GRATIA
byPhilippe Canal (FNAL) & Matteo Melani (SLAC)
Mumbai, India CHEP2006
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 2
What is Accounting? (in the Grid context)
Grid accounting is the process of maintaining a (consistent) Grid-wide
view of VO members' resource utilization.[1]
[1] Accounting in Grid Environments, by Peter Gardfjäll, Department of Computing Science, Umeå University ,Sweden
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 3
Why do we want an accounting system?
Resource providers (SLAC, Fermilab…) want to perform cost-benefits analysis
Resource providers wants to improve planning
Resource providers want better security
Resource providers want to improve QoS (priorities, debugging…)
Support a Grid “Economic Model”
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 4
What is the real problem (solution)?
Nobody talked about “Grid economy”
Do we really want an Accounting system?
Or maybe a monitoring system will do?
Lets look at accounting and monitoring
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 5
Accounting vs. Monitoring
A monitoring system:
Purpose: monitoring system health, debugging, system profiling
Gathers state information about the system resources
Collects system events. It works like a DAQ system: as
close as possible to the system, as less intrusive as possible
Quasi Real-time to real-time
An accounting system:
It keeps track of resources usage
It links a users’ service requests with the resources consumed to satisfied that requests
It has accounts, banks, “currency” and support an economic model (policies)
“After the facts”
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 6
For Example: Monitoring at SLAC
What do we monitor:
Network Switches, routers status Internet
Mbytes/sec in/out
Computer Clusters Batch systems, NFS and AFS
servers, databases servers
Storage Space Disks usage, HPPS
Some metrics we use: CPU utilization, Memory Disk usage, Disk I/O Various Networking metrics
(Mbytes in/out of switches, routers, servers…)
Some primitive job submission results (LSF)
We use a lot of monitoring tools and infrastructure: Ganglia, Nagios, OpenView, SNTP tools, Monalisa…
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 7
For Example: Accounting at SLAC?
The monitoring system cannot link resource usage to users/groups
Maybe by looking into the logs and correlating the events…but a lot
of work
Accounting infrastructures and tools ala Ganglia or Nagios do not
exist
Basically we cannot (yet) fully link a user name with a precise set of
computing resource usage metrics
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 8
What I think we should track
Job submission: Priority in the batch queue CPU-time Wall clock time Memory usage
Storage Disk usage, Tape storage usage Storage class (to be defined)
Network data transfer Network speed Quantity of data transferred
Special software usage, Operator/Administrator services…maybe later
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 9
Goals
Track services and resources usage per grid user after the fact Focus on quality, integrity and security of the information
Accounting Information easily available to people (web interface)
and to applications (Web Services)
Build a system that is simple to manage (install, configure and
upgrade) and to extends (well defined APIs)
Based on well proven and standard (industrial strength)
technologies
However we do not cover (but keep in mind) User charging system,
Resources or services pricing
Support for an economic model for resource allocation
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 10
System Properties
Interoperability The Accounting System should leverage existing standards to maximize interoperability with
other Grids and Accounting Services.
Fault Tolerance Reduce and flag data loss.
Resilient to communication failures over LAN and WAN.
resilient to the failure of one of its component. Security
Guarantees integrity and non–repudiation of the accounting records at the site level. Uses secure communication channels (mutual authentication, message integrity,
confidentiality) and access control lists. Scalability and Performance
Not really an issue Other
leverage existing tools and infrastructures to solve related problems.
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 11
Simple Domain Model
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 12
Design Direction
We are currently focused on getting the infrastructure right more
than the specific metrics to measure resources usage
Open: we give APIs
Distributed: Meters are distributed objects
Based on open source standard technologies: Web Services, Java
Platform, Tomcat, Axis, Hibernate
Same idea as GUMS and JClarens: the service is an independent
Tomcat Application (JClarens for authentication)
Insure interoperability with OSG partners (LCG, TeraGrid…)
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 13
Architecture Overview
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 14
Meter
A Meter is responsible for Gathering all the data about a Grid service usage Gathering all the data about the resources used by that Grid service Assembling a Service Usage record
Logically there is 1 Meter entity per 1 Grid Service
Each Meter is composed by one or more Probes and one Assembler (plus some other components for management functions)
Grid Service uses resources distributed across the Resource Provider’s LAN, therefore the Meter is also distributed
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 15
Meter Logical View
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 16
Meter’s Probe and Assembler
Probes use secure channel (mutual authentication, data integrity) to send usage information to the Assemblers.
Usage information is packaged in ProbeEvents that are send to the Assemblers through a Web Service interface.
Each ProbeEvent object has a standard header and a payload in XML format.
Probes use “at least one semantics” technique to send ProbeEvents to the Assemblers (communication is resilient to failure)
Assemblers can choose synchronous or asynchrous processing of ProbeEvents
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 17
Collector
Main functionalities:
Hosting the Meters' components (the Assemblers) that are responsible for
assembling Service Usage Records
Monitoring the Meters' components called Probes
Communication between Probes and Assemblers: routing of ProbesEvents
to the proper Assembler
Communication between Assemblers and Data Store
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 18
Collector Logical View
Data Store Component
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 19
Accountant
This is a component thought for future use.
Main functionalities:
further process the Service Usage Records to apply economic policy
(pricing & billing)
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 20
Deployment View
Deployed as a Tomcat application: can take advantage of Tomcat clustering features for scalability and availability
Collector and Publisher can run on two different Tomcat instance
Can use the most popular database implementations; the database server can be on the same host with Tomcat or on different host
Probes can run anywhere on the LAN
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 21
Deployment Diagram
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 22
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 23
Conclusion
More Information
Project Charter, Requirements and Design Documents
OSG Accounting Twiki page and
Mailing list: [email protected]
Any Questions, Comments, etc?
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 24
SPARE SLIDES
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 25
CHEP2006 Philippe Canal (Fermilab) & Matteo Melani (SLAC) 26
Probe
Collector
Repository of Accounting Records
Data Store Access Layer
Resource Provider Site
WSAPI
Web Presenter
Statistical Analyzer
Probe
Probe
Probe
Collector
Repository of Accounting Records
Grid Operation Center
Probe
Probe
Probe
Collector
Repository of Accounting Records
Data Store Access Layer
VO Center
Web Presenter
Statistical Analyzer
Probe
Probe Data Store
Access Layer
Web Presenter
Statistical Analyzer
Overview