All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has...

20
All Hands Meeting D. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it… Developing an operational Grid

Transcript of All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has...

Page 1: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

1

Hans Hoffman has described the scale of the problems that we are facing, I will try to describe the how we are trying to solve it…

Developing an operational Grid

Page 2: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

2

GridPP is involved in two projects:

• EU DataGrid Project

• SAMGrid Project

These projects are build on top common products such as the globus toolkit and CondorG.I will describe some of the aspects of the middleware developed in these projects and how we are deploying it.

Developing an operational Grid

Page 3: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

3

In 15 minutes this will have to be a very brief overview for more information see the many posters we have here, talk to the people on the booth or look at the GridPP website

http://www.gridpp.ac.uk/

Developing an operational Grid

Page 4: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

4

The DataGrid Project

Applications

Physical Fabric

Middleware

Information and

Monitoring Services

HEP Apps

EO Apps

Bio Apps

Workload Management

Data Management

Globus Middleware

Computing Fabric

Storage Element

Mass Storage Management

Network Services

Fabric Management

Major UK Involvement

Networking Fabric

I will only talk about a few of these boxes

Page 5: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

5

The DataGrid ProjectResource Management:

•The user describes their jobs using a set of Condor ClassAds. The job is then submitted to a resource broker from any User Interface (UI) Machine.•Resource broker (RB) is at the centre of the resource management. The RB matches the requirement of the job to the resources. This uses the Condor ClassAd Libraries.•Information about available resources is cached by the Information Index (II) which the RB queries.•II in turn acquire their information by interrogating individual GRISes and National GIISes•Information about the location of data is stored in a replica catalogue.

Page 6: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

6

The DataGrid ProjectAn example:Executable = "WP1testF";

StdOutput = "sim.out";

StdError = "sim.err";

InputSandbox = {"/home/datamat/sim.exe", "/home/datamat/DATA/*"};

OutputSandbox = {"sim.err","sim.err","testD.out"};

Rank = other.TotalCPUs * other.AverageSI00;

Requirements = other.LRMSType == "PBS" \

&& (other.OpSys == "Linux RH 6.1" || other.OpSys == "Linux RH 6.2");

RetryCount = 2;

Arguments = "file1";

InputData = "LF:test10099-1001";

ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it";

DataAccessProtocol = "gridftp";

OutputSE = "grid001.cnaf.infn.it";

Page 7: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

7

The DataGrid Project

Resource Management:

If the RB is able to match the job to a resource it then passes the job over to the Job Submission Service (JSS), which then submits the job to the selected resource. The JSS is based on CondorG.

Logging information is kept at each stage.

All user interaction is via UI and he/she is able list resources that match their requirements, submit jobs, examine the status of submitted jobs, access all logging information about their jobs and cancel jobs.

Page 8: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

8

The DataGrid Project

Information Services (R-GMA):

Design and implementation based Grid Monitoring Architecture of the GGF…with the term directory replaced with registry to avoid any implied structure.

Page 9: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

9

The DataGrid Project

Information Services (R-GMA):

The current implementation uses servelet technology, with APIs in Java, C++, C, Perl and Python.

Page 10: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

10

Information Services (R-GMA):

The DataGrid Project

Gives the impression of one RDBMS per VO.

Currently there are 2 types of producer:• A circular buffer producer:

No RDBMS is used and SQL queries are handled by the code. A consumer may miss records if it is too slow

• A data base producer:Uses a databse to hold data so data is never lost, however it is slower and requires a clean up strategy to avoid indefinite growth.

More producers are being implement at the request of the users.

Page 11: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

11

Information Services (R-GMA):An Example: CPU load at various sites.Note all information is timestamped.

SELECT * FROM CPULoad WHERE Country = ’UK’ AND Site = ’RAL’

Would give the output of producer 1.

The DataGrid Project

Page 12: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

12

The DataGrid Project

Network Monitoring:

Essential if network information is to be used in brokering.SLAC’s IEPM uses Pinger to measure round trip time, iperf, bbftp and bbcp to measure TCP throughput and UDPmon to measure UDP throughput.

Sample results from IEPM monitoring between SLAC and Daresbury.

Page 13: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

13

The DataGrid Project

Network Monitoring:The network monitoring information can then be published via an LDAP service.

Page 14: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

14

The DataGrid Project

Installation and configuration (LCFG):Configuration of large numbers of different machines can be very troublesome. DataGrid uses LCFG.

Each Machine has its own profile which can include general site profiles and individual configuration opinions

Profile then published in XML

Page 15: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

15

/*

gw31

==============================================

BARRY'S WN

*/

/* Host specific definitions */

#define HOSTNAME gw31

/* Some useful macros */

#include "macros-cfg.h"

/* Site specific definitions */

#include "site-cfg-farm.h.ic"

/* Linux default resources */

#include "linuxdef-cfg.h“

/* LCFG client specific resources */

#include "client_testbed-cfg.h"

/* Well, obviously, if you read the title !!!!!!! */

#include "WorkerNode-cfg.h“

/* Specific NIC */

+update.modlist label

+update.mod_label alias eth0 eepro100

XML published profile

The DataGrid Project

Page 16: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

16

How do we know what is working?

We monitor each site.

Page 17: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

17

SAMGrid

SAM is used by the DØ experiment …so SAM is operational but but is it a Grid?

Currently SAM is mainly data management tool.Locations of replicas are stored in a central database. Files are moved to a running job as and when needed. Currently 1TB/day. SAM can only submit jobs to local resources.

Has been modified to use gridftp.Being updated for remote submission using CondorG ready very soon (in testing now).

Page 18: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

18

Real Data files from FNAL

MC files from NIKHEF

SAMGrid

Page 19: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

19

Developing an operational Grid

Conclusions

DataGrid being deployed across Europe and just starting to be used …

ATLAS Experiment last week.

SAM already being used by many users and is being modified for remote submission and to use transfer protocols such as gridftp.

Page 20: All Hands MeetingD. Colling, Imperial College London, for the GridPP Project 1 Hans Hoffman has described the scale of the problems that we are facing,

All Hands Meeting D. Colling, Imperial College London, for the GridPP Project

20

DatabaseServer(s)(Central Database)

NameServer

Global Resource

Manager(s)Log server

Station 1Servers

Station 2Servers

Station 3 Servers

Station nServers

Mass Storage System(s)

SharedGlobally

Local

SharedLocally

Arrows indicateControl and data flow