Stephen Booth EPCC Stephen Booth [email protected] GridSafe Overview.

17
Stephen Booth EPCC Stephen Booth [email protected] GridSafe Overview

Transcript of Stephen Booth EPCC Stephen Booth [email protected] GridSafe Overview.

Page 1: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Stephen BoothEPCC

Stephen Booth [email protected]

GridSafe Overview

Page 2: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

2

Grid-SAFE

• JISC funded project to build general purpose accounting/monitoring solution.– http://gridsafe.forge.nesc.ac.uk/

• Builds on accounting subsystem from SAFE user administration system used by UK national facilities HPCx/HECToR

Page 3: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Challenges

• Need to work with different HPC technologies– Different batch systems– Different middleware

• Need to work with wide variety of different local policies.• Need to work with both grids and local HPC resources.• One solution won’t fit all potential users

– Build kit of parts – Pre-built solutions for common deployment scenarios.

• Key aims– Modular design, individual functions can be deployed independently – Behaviour can be customised using plug-ins to implement different

service policies.

Page 4: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Overview

Page 5: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Data Formats

• System can consume accounting data in a variety of formats.

• Each format has a plug-in parser module

• New formats can be supported by writing additional parser plug-ins.

• Data is stored in an SQL database.

• Additional policy plug-ins can augment the parser to customise behaviour.

Raw Data

DBParser

Policy Policy Policy

Page 6: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Parser

• System can support multiple input formats at the same time.

• Current supported parsers– OGF-UR XML

– SGE accounting logfile

– PBS accounting logfile

– EGEE JobManager logfile

– Etc.

• New parsers easy to generate

Page 7: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

7

OGF-UR support

• OGF-UR XML is supported as an interchange format– Parser plug-in to parse OGF-UR

– Export module to format internal data as OGF-UR

• Grids may only want to use only this Format for central accounting.– Local instances could use raw data and generate UR for central processing.

• Various grid communities seem to interpret OGF-UR differently and/or make additional requirements beyond that in the schema

– Required fields

– Different charging models

– Different global username models

– OGF-UR spec allows extensions.

– Specification will also evolve over time.

• Parser/exporter highly configurable to support variations/extensions.

Page 8: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Use in the grid

Grid accounting

Site accounting Independent UR Generator

XML XML

Page 9: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

9

Report generation module

• Reports can be generated on demand from web interface

• Grid-safe uses XML templates to define reports – Can generate unified reports over multiple data tables containing

different types of data

– Tables/charts

– Parameterised reports (e.g. to select user or project).

• Support reports in multiple output formats– PDF HTML CSV XML

Page 10: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.
Page 11: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Report generation speed

• Performance of report generation a particular issue

• Number of database records key to this.– Need to utilise database effectively. Not acceptable to read all

records into memory.

• ~1,000,000 record database table not a problem. – Current National HPC systems within this range.

– Throughput clusters often have significantly larger record counts due to large numbers of small short jobs.

• Old data can be moved to separate tables.

• Support for Daily aggregates via policy plug-in– Builds secondary accounting table combining similar records.

– For ECDF 51 million records -> 35 thousand aggregates

Page 12: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Policy plug-ins

• Allow behaviour to be customised to local requirements

• Generate new properties

– E.G. Charge values

• Trigger additional processing

– Decrement charging allocations

– Generate aggregate records

– Etc.

• New policies can be written for specific requirements

Page 13: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Aggregation Policy

• Generates Aggregated records

– Each time a new record is loaded

– Corresponding aggregate is located/created

– Aggregate values updated

• The raw data is also kept and can be used in reports if required.

• Aggregate data can be regenerated if required.

Page 14: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

ClassificationPolicy

• Converts selected fields from raw accounting data into references to separate database table.– Reduces data footprint.

– Augmenting information can be added to these tables.

• Example:

URRecordURRecord

DailyAggregateDailyAggregate

UserUser

UnixGroupUnixGroup

SiteSite

InstitutionInstitution

Page 15: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

DerivedPolicy

• Defines new properties as expressions over existing properties

• E.g. (EndTime-StartTime)*CPUs

• These expressions can then be used in reports.

Page 16: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

LinkPolicy

• Merge data from different sources

– E.g. Batch system logs and middleware logs.

• Each data source is parsed to its own table.

– Primary table parsed first.

– LinkPolicy added to secondary data source.

– Locates corresponding primary record,

– Adds cross reference or copies additional properties to primary

Page 17: Stephen Booth EPCC Stephen Booth s.booth@ed.ac.uk GridSafe Overview.

Web Services

• RUPI– Current proposal from OGF RUS-WG

– Web service for the upload of XML usage record.

– Grid-SAFE has an implementation of the current upload service (RUPI).

• RUQI– Currently working on a proposal for a Query specification

– Aims

– Easy to implement in different code bases.

– Provide sufficient functionality for efficient report generation.

– Long term aim to provide reporting portal that can query any system that implements this interface.