Current Status and Plan on Grid at KEK/CRC

24
HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION KEK Current Status and Plan on Grid at KEK/CRC Go Iwai, KEK/CRC On behalf of KEK Data Grid Team Links to SAGA is additionally listed in reference slide (end of presentation file) Institutes list is a bit revised (in slide: Institutes)

description

Links to SAGA is additionally listed in reference slide (end of presentation file) Institutes list is a bit revised (in slide: Institutes). Current Status and Plan on Grid at KEK/CRC. Go Iwai, KEK/CRC On behalf of KEK Data Grid Team. Outline. * SAGA: A Simple API for Grid Applications. - PowerPoint PPT Presentation

Transcript of Current Status and Plan on Grid at KEK/CRC

Page 1: Current Status and Plan on Grid at KEK/CRC

HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION

KEK

Current Status and Plan on Grid at KEK/CRC

Go Iwai, KEK/CRCOn behalf of KEK Data Grid

Team

Links to SAGA is additionally listed in reference slide (end of presentation file)Institutes list is a bit revised (in slide: Institutes)

Page 2: Current Status and Plan on Grid at KEK/CRC

Outline▸ Introduction

▹ Deployment▹ VO specific operational statistics

▸ Recent activities▹ Unified user-interfaces using SAGA*

▸ Summary

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 2

* SAGA: A Simple API for Grid Applications

Page 3: Current Status and Plan on Grid at KEK/CRC

HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION

KEK

Introduction▸ Our role▸ Deployment status▸ VO scale

▹ Resources▹ Institutes▹ Members

▸ Ops stats▹ # of jobs▹ CPU consumption 3

Page 4: Current Status and Plan on Grid at KEK/CRC

Tohoku Univ.

KEKUniv. of Tsukuba

Nagoya Univ.

Kobe Univ.

Hiroshima IT

Introduction▸ Major HEP projects:

▹ Belle, J-PARC, ATLAS ongoing projects▹ ILC, Super-Belle future projects

▸ Also covering ▹ Material science, bio-chemistry and so on using

synchrotron light and neutron source▹ RT tech. transfer

▸ We have a role to support university groups in these fields.▹ including Grid deployment/operation.

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 4

Page 5: Current Status and Plan on Grid at KEK/CRC

KEK’s Contribution in EGEE-III▸ TSA1.1: Grid Management

▹ Interoperability and collaboration▸ TSA1.2: Grid operations and support

▹ 1st line support for operations problems▹ Middleware deployment and support

▪a) Coordination of middleware deployment and support for problems.

▪b) Regional certification of middleware releases if needed (outside of PPS and SA3 involvement). This is anticipated to be very rare and will require specific justification.

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 5

Page 6: Current Status and Plan on Grid at KEK/CRC

2nd Open Meeting of the SuperKEKB Collaboration @ KEK 6

SINET

Logical Map Focused on Network Connection for Grid

March 17, 2009

CAVOMS

HPSS-DSISRB-DSI

KEK-1NAREGI KEK-2

DMZ

CC-GRID GRID-LAN

UIIntranet

Univ.Grid

Univ.

Univ.Grid

Univ.Grid

Univ.Grid

Univ.Grid

LFC

Page 7: Current Status and Plan on Grid at KEK/CRC

VO Component

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK

LFC VOMS RB SRM CE IS

Virtual Organization

Central Services for Belle

Look

ing up

logic

al file

name

Logo

n VO

& pr

ivileg

e mgt

Req.

/Boo

king

com

p. re

s.

Unified IF into the storage equip.

Unified IF into the LRMS

Service/Resource discovery

SRB-DSI

SRB

HSM

Belle VO specific arch.

SRM

7

Directory access by TURL, not using any logical namespace so far

GridFTP

Inside B-Comp. System

Page 8: Current Status and Plan on Grid at KEK/CRC

2nd Open Meeting of the SuperKEKB Collaboration @ KEK

Brief Summary of LCG DeploymentJP-KEK-CRC-01 (KEK-1)▸ Production in GOC since Nov 2005▸ Mainly operated by KEK staffs▸ Site Role:

▹ Practical operation for KEK-2▹ Getting’ started for university groups

▸ Resource and Component:▹ SL-3x or SL-4x▹ gLite-3.X▹ CPU: 14▹ Storage: ~7TB for disk and DSI for HSM, HPSS▹ Fully functional services

▸ Supported VOs:▹ belle apdg ail g4med dteam ops ppj ilc calice

naokek

JP-KEK-CRC-02 (KEK-2)▸ Production in GOC since early 2006▸ Site operation:

▹ Manabu and Kohki▸ Site Role:

▹ More stable services based on KEK-1 experiences. ▸ Resource and Component:

▹ SL-3x or SL-4x▹ gLite-3.X▹ CPU: 48▹ Storage: ~1TB for disk▹ Fully functional services

▸ Supported VOs:▹ belle apdg ail g4med dteam ops ppj ilc calice

naokek

March 17, 2009 8

In plan to upgrade

In Q1 of FY2009 10WNs x 8CPUs x ~4kSI2K Storage capability on demand basis

HPSS virtually works as the backend disk of SE ~200USD/1TB

VM (Xen) technologies are widely supported in whole site Higher availability & more robustness

Old blade servers (B-Comp) are now being integrated with KEK-2 250 x 2CPUs

Fully upgrade and up soon!

Page 9: Current Status and Plan on Grid at KEK/CRC

LCG InfrastructureDeployment Status▸ 55 countries▸ 265 sites▸ 88K CPUs▸ 130M SI2K▸ 480 PB

available▸ 580 PB in use

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK

~10MSI2K

As of Dec 20089

Page 10: Current Status and Plan on Grid at KEK/CRC

Resource Deployment over belle VO

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 10

10MSI2KFZK-LCG2

Note: Still missing KEK-2

▸ 18M SI2K/9k CPUs▹ ~10% of whole

production resources

▸ Storage through SRMs▹ 27 TB available▹ 83 GB in use

▸ HSM storage in KEK Belle Computing System through SRB▹ ~100 TB (ask Nakazawa-san in detail)

Page 11: Current Status and Plan on Grid at KEK/CRC

Institutes▸ IFJ PAN (CYFRONET) (Poland)▸ Univ. of Melbourne (Australia)▸ KEK (Japan)▸ National Central Univ. (Taiwan)▸ ASGC (Taiwan)▸ Nagoya University (Japan)▸ KISTI (Korea)▸ Univ. of Karlsruhe (Germany)▸ Jozef Stefan Institute (Slovenia)▸ Panjab Univ. (India)▸ Virginia Polytechnic Inst. State Univ. (US)▸ Univ. of Hawaii (US)▸ Wayne State University (US)▸ Korea Univ. (Korea)▸ Univ. of Sydney (Australia)

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 11

VO-scale is being expanded slowly

Federation done

Page 12: Current Status and Plan on Grid at KEK/CRC

Members▸ /C=JP/O=KEK/OU=CRC/OU=KEK/CN=Nishida Shohei ▸ /C=JP/O=KEK/OU=CRC/OU=KEK/CN=Yoshimi Iida ▸ /C=JP/O=KEK/OU=CRC/CN=Go Iwai ▸ /C=JP/O=KEK/OU=CRC/OU=Nagoya/CN=Yuko Nishio ▸ /C=AU/O=APACGrid/OU=The University of Melbourne/CN=Glenn R. Moloney ▸ /C=JP/O=KEK/OU=CRC/OU=Korea University/CN=Hyuncheong Ha ▸ /C=JP/O=KEK/OU=CRC/CN=Yoshiyuki WATASE ▸ /C=JP/O=KEK/OU=CRC/CN=Hideyuki Nakazawa ▸ /C=JP/O=KEK/OU=CRC/CN=YAMADA Kenji ▸ /C=JP/O=KEK/OU=CRC/OU=Nagoya university HEPL/CN=kenji inami ▸ /C=JP/O=KEK/OU=CRC/OU=Nagoya university HEPL/CN=Mitsuhiro Kaga ▸ /C=JP/O=KEK/OU=CRC/CN=Jun Ebihara ▸ /C=JP/O=KEK/OU=CRC/OU=Korea University/CN=Soohyung Lee ▸ /C=JP/O=KEK/OU=CRC/CN=Manabu Matsui ▸ /C=SI/O=SiGNET/O=IJS/OU=F9/CN=Marko Bracko ▸ /C=JP/O=KEK/OU=CRC/CN=Kenn Sakai ▸ /C=JP/O=KEK/OU=CRC/CN=Yugawa Takahiro ▸ /C=JP/O=KEK/OU=CRC/CN=Yamada Chisato ▸ /O=GermanGrid/OU=Uni Karlsruhe/CN=Thomas Kuhr ▸ /O=GermanGrid/OU=FZK/CN=Dimitri Nilsen ▸ /C=KR/O=KISTI/O=GRID/O=KISTI/CN=84035421 Beob Kyum Kim ▸ /C=JP/O=KEK/OU=CRC/CN=Shunsuke Takahashi

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 12

90% of 22 persons is ops staff

Page 13: Current Status and Plan on Grid at KEK/CRC

Ops Stats: JFY2006 & JFY2007

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK

0

2000

4000

6000

8000

10000

12000

14000

16000

0

20000

40000

60000

80000

100000

120000ppjopsilcg4meddteamcalicebelleatlasapdgTotal

Num

ber

of S

ubm

itted

Jobs

(x10

00)

Tota

l

Apr-0

6May

-06Jun

-06 Jul-06Au

g-06Sep

-06Oct-

06Nov

-06Dec-

06Jan

-07Feb

-07Mar-

07Ap

r-07May

-07Jun

-07 Jul-07Au

g-07Sep

-07Oct-

07Nov

-07Dec-

07Jan

-08Feb

-08Mar-

080

50001000015000200002500030000350004000045000

0

50000

100000

150000

200000

250000

300000

350000

CPU

Con

sum

ptio

n(x

1000

kSI

2K x

Hrs

)

Tota

l

100 kJobs

300 kHrs

Submitted Job @ KEK-2 & KEK-2

CPU Consumption @ KEK-1 & KEK-2

13

Page 14: Current Status and Plan on Grid at KEK/CRC

Ops Stats: JFY2008

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 14

0

2000

4000

6000

8000

10000

12000

14000

02000400060008000100001200014000160001800020000

TW-FTTSiGNETKR-KISTI-GCRT-01JP-KEK-CRC-01FZK-LCG2CYFRONET-LCG2Australia-ATLASTotal

Num

ber

of S

ubm

itted

Jobs

(x10

00)

Tota

l

Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-090

20000400006000080000

100000120000140000160000180000

020000400006000080000100000120000140000160000180000200000

CPU

Con

sum

ptio

n(x

1000

kSI

2K x

Hrs

)

Tota

l

18 kJobs

170 khrs

Now

Page 15: Current Status and Plan on Grid at KEK/CRC

Service AvailabilityJan-Dec 2008

▸ 127H/4SD▸ 12 tickets

were opened, but solved all

▸ 931H/12SD▸ 13 tickets

were opened, but solved all

March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK

More than 90% availability in 2008 !

15

Page 16: Current Status and Plan on Grid at KEK/CRC

HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION

KEK

Recent ActivitiesSAGA-A Simple API for Grid Applications▸ Motivation▸ Goal▸ Current Status

16

Page 17: Current Status and Plan on Grid at KEK/CRC

Grid Deployment at KEKMiddleware/Experiment Matrix

gLite NAREGI Gfarm SRB iRODSBelle Using Planning Using UsingAtlas UsingRadio therapy

Using Developing Planning

ILC Using Planning PlanningJ-PARC Planning Planning Planning TestingSuper-Belle To be decided by 2010

▸ Commonly most of experiment or federation are using gLite as the Grid middleware.

▸ NAREGI middleware is being deployed as the general purpose e-science infrastructure in Japan▹ Difficulties: e.g. human costs, time differences▹ Both interops among MWs are mandate for us (next a few slides)

▪ To provide higher availability, reliability and to keep prod. quality

March 17, 2009 172nd Open Meeting of the SuperKEKB Collaboration @ KEK

Page 18: Current Status and Plan on Grid at KEK/CRC

Issues on Multi Middleware Apps▸ For site admins:

▹ Dedicate HW is deployed in each middleware▪ LRMS▪ OS

▸ For end users:▹ By ordinal way, same apps for each middle are developed to be enabled on Grid▹ They have to know which middleware they are using.

gLiteNAREGISRBiRODS

CPUs StorageCPUs StorageStorage

Deployed dedicate HW

App App App

Users should be aware the underlying middleware-layer and hardware deployed

March 17, 2009 182nd Open Meeting of the SuperKEKB Collaboration @ KEK

Page 19: Current Status and Plan on Grid at KEK/CRC

SAGA-Engine

Motivation

SAGA adaptorsAdpt Adpt Adpt

Applications

▸ We need to operate multi Grid middleware at the same time.▹ Resource sharing among them is mandate

▪ We are also contributing to GIN▸ Virtualization of Grid middleware is our wish

▹ The best scenario for the application developers

Today’s topic for SAGA-NAREGI

March 17, 2009 192nd Open Meeting of the SuperKEKB Collaboration @ KEK

gLiteNAREGISRBiRODS

CPUs Storage

Cloud LRMSLSF/PBS/SGE/…

GIN/PGI: Multi-Middleware Layer

Fair share resources among middles

Page 20: Current Status and Plan on Grid at KEK/CRC

Project Goal(Funded collaboration with NII)

SAGA-Engine

gLiteNAREGISRBiRODS

SAGA adaptorsAdpt Adpt Adpt

C++ InterfacePython Binding

Svc Svc Apps Apps

RNSFC service

based on OGF standard

CPUs Storage1. Middleware-transparent layer2. Middleware-independent services

March 17, 2009 202nd Open Meeting of the SuperKEKB Collaboration @ KEK

Cloud LRMSLSF/PBS/SGE/…

1. GIN/PGI: Multi-Middleware Layer

2.

SAGA supports cloud & LRMS for local clusters

Page 21: Current Status and Plan on Grid at KEK/CRC

Current StatusSAGA-NAREGI Adaptor

▸ Only for the job adaptor▸ Succeed to submit a job in NAREGI and to retrieve

resultssaga::job::description jd;jd.set_attribute(sja::description_executable, "/bin/hostname");jd.set_attribute(sja::description_working_directory, "/some/where/work/dir");jd.set_attribute(sja::description_output, “std.out");jd.set_attribute(sja::description_error, "std.err");

std::vector <std::string> ft;ft.push_back("gsiftp://gfarm.cc.kek.jp/my/file.in > file.in");ft.push_back("gsiftp://gfarm.cc.kek.jp/my/file.out < file.out");jd.set_vector_attribute(sja::description_file_transfer, ft);

saga::job::service js("naregi://nrgvms.cc.kek.jp");saga::job::job j = js.create_job(jd);j.run();while (j.get_state() != saga::job::Done) { std::cout << j.get_attribute(“JobID”) << std::endl; sleep(1);}

Job description

File stage-in/out

Job Submission

March 17, 2009 212nd Open Meeting of the SuperKEKB Collaboration @ KEK

Page 22: Current Status and Plan on Grid at KEK/CRC

HIGH ENERGY ACCELERATOR RESEARCH ORGANIZATION

KEK

Summary

22

Page 23: Current Status and Plan on Grid at KEK/CRC

Summary▸ Belle VO has being expanded

▹ 9 institutes and 22 users▹ 18M SI2K/9k CPUs▹ 27 TB available, 83 GB in use through SRMs

▪ HSM in KEK Belle Computing System Is used through SRB.▹ KEK-2 will come up very soon

▪ Final state to pass certificate process

▸ SAGA-NAREGI ready for use▹ Only for job adaptor currently▹ SAGA-PBS is now being developed and will be released

soon (in March 2009)▹ This project has been funded for 3.5 years and end in

March 2012.March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 23

Page 24: Current Status and Plan on Grid at KEK/CRC

References▸ SAGA

▹ http://saga.cct.lsu.edu/▹ http://forge.ogf.org/sf/projects/saga-rg

▸ VOMS end point▹ https://voms.kek.jp:8443/voms/belle/webui/request/user/create

▸ VO setting parameters▹ https://voms.kek.jp:8443/voms/belle/webui/config

▸ VO ID card▹ https://cic.gridops.org/index.php?section=vo&page=homepage&su

bpage=&vo=belle▸ VOMS certificate

▹ http://voms.kek.jp/voms.kek.jp.10684▹ https://cic.gridops.org/downloadRP.php?section=database&rpname

=certificate&vo=belle&vomsserver=voms.kek.jp▸ VOMS.kek.jp will move to VOMS.cc.kek.jp in next power cut

August 2009.March 17, 2009 2nd Open Meeting of the SuperKEKB Collaboration @ KEK 24