ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION,...

36
“DAE GRID” (Grid Computing Activities in Department of Atomic Energy, India) ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER MUMBAI - INDIA

Transcript of ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION,...

Page 1: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

“DAE GRID” (Grid Computing Activities in

Department of Atomic Energy, India)

ALHAD .G. APTE

HEAD, COMPUTER DIVISION,

BHABHA ATOMIC RESEARCH CENTER

MUMBAI - INDIA

Page 2: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• INTEGRATED PROBLEM SOLVING ENVIRONMENT AT

BARC, MUMBAI

• DAE GRID AND REGIONAL DAE-WLCG

• CHALLENGING ISSUES IN DAE GRID

• INTRENATIONAL COLLABORATION

• PRODUCTS DEVELOPED AND DEPLOYED AT LCG

• PARTICIPATION IN EU-INDIA GRID

• CONCLUSIONS

PRESENTATION OUTLINE

Page 3: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

INTEGRATED INTEGRATED INTEGRATED INTEGRATED

PROBLEM SOLVINGPROBLEM SOLVINGPROBLEM SOLVINGPROBLEM SOLVING

ENVIRONMENT ENVIRONMENT ENVIRONMENT ENVIRONMENT

at BARC, Mumbaiat BARC, Mumbaiat BARC, Mumbaiat BARC, Mumbai

Page 4: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• HIGH PERFORMANCE COMPUTING SYSTEMS

• PARALLEL FILE SYSTEM

• HIGH RESOLUTION CLUSTER BASED VISUALIZATION

SYSTEMS

• SCATTERED RESOURCES BELONGING TO DIFFERENT

ADMIN DOMAINS

• GOAL TO PROVIDE SEAMLESS ACCESS TO ALL THESE

RESOURCES TO ENSURE OPTIMAL UTILIZATION

INTEGRATED PROBLEM SOLVING INTEGRATED PROBLEM SOLVING INTEGRATED PROBLEM SOLVING INTEGRATED PROBLEM SOLVING ENVIRONMENT at BARCENVIRONMENT at BARCENVIRONMENT at BARCENVIRONMENT at BARC

Page 5: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• ANUPAM – Ameya

512 (256*2)CPU, 3.6 GHz Gigabit Ethernet Network

HPL Benchmark:

1.73 TFLOPS

• ANUPAM – Ajeya

1152 (288*4) Core, 2.66 GHz Infiniband

(4x DDR = 20Gbps)

HPL Benchmark: 9 TFLOPS

HPC Clusters

Page 6: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Monitoring & Accounting Tools

Page 7: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Rendering Cluster

• 1 Master Client

• 18 Servers

• 3.2 GHz dual processor, 2GB DDR-II RAM

• Graphics Cards

High Resolution Display

• Tiled 6 x 6 LCD Panels

• 47 million resolution

Rendering Cluster & High Resolution Display

Master Client

Win32/Xlib

ChromiumOpenGL

GraphicsHardware

NETWORK

Graphics Hardware

Rendering Server 36

Projector/Monitor

Graphics Hardware

Rendering Server 1

Projector/Monitor

Page 8: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

– Cross platform

– Scalar / Vector / Tensor / Volume Visualization

– Streamlines

– Contours/Isosurface

– Auto Fill Geometry

– Additional Component Generation

– Geometry Extraction

– Animation

– Scripting

– Image / Movie support

– CGNS / Plot3D / VTK / and many more data format compatibility

AnuVi is Scientific Visualization Framework and tools for Simulation.

AnuVi

Page 9: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

PILOT DAEPILOT DAEPILOT DAEPILOT DAE----GRID GRID GRID GRID

OfOfOfOf

DEPARTMENT OF ATOMIC DEPARTMENT OF ATOMIC DEPARTMENT OF ATOMIC DEPARTMENT OF ATOMIC ENERGY, INDIAENERGY, INDIAENERGY, INDIAENERGY, INDIA

Page 10: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• DAE units involved in collaborative activities with needs to share expertise and resources

• DAE-Grid project initiated by BARC to provide the grid infrastructure to meet the demanding computing needs of scientific researchers

� Enables organizations to share their hardware and software resources.

• Four major DAE institutes

� BARC-Mumbai

� RRCAT-Indore

� IGCAR-Kalpakkam

� VECC-Kolkata

• LCG/gLite is the grid middleware used. LCG-2.4 was the initial Grid Middleware and now using glite-3.1. Around 350 Processors are connected through DAE-Grid.

• Low network bandwidth due to high costs. Batch job submission applications.

The DAE-Grid

Page 11: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

IGCAR: Data and S/W

Repositories

BARC:

Computing

4 Mbps Links

VECC: Scientific

Instruments

CAT: storage

Resource sharing and coordinated problem solving in dynamic, multiple R&D units

Uses WLCG tools

DAE-Grid Setup

Page 12: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

UI

User Interface

SE

Storage Element

CE

Computing Element

WN

Worker NodeWN

Worker NodeWN

Worker NodeWN

Worker NodeWN

Worker NodeWN

Worker Node

Resource Broker + MyProxy Server + Top BDII

(Workload Management) (Proxy renewal) (Information System)

LFC

File Catalog

Interface for using the GRID

Certifying Authority

Certificates

VOMS

Virtual Organization Membership Server

Middleware Services

• services (central) deployed only at BARC � Certification Authority (CA)� Virtual Organization Membership

Service (VOMS)� Resource Broker (RB) + MyProxy

Server + Top BDII� LCG File Catalogue (LFC)� Monitoring & Accounting Server

• All sites deploy the site services namely

� Computing Element (one CE for every cluster)

� Worker Nodes� User Interface

Monitoring & Accounting Server

Page 13: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

User Interface

User Interface

Worker Nodes

PBS client

Certificates

FMON Server

GridFTP, RFIO and FMON agent

Storage Element

Gatekeeper

Site BDII

GRIS

Information Providers

PBS

FMON agent

Computing Element

Site Services• Computing Element

� Gatekeeper Service (Accepts job requests from Grid)

� PBS (Cluster Resource Management System)

� GRIS, Site BDII (Information System)

� FMON (Monitoring Agents)• Worker Nodes

� PBS Client• Storage Element

� DPM Services� GridFTP, RFIO Services� FMON Server (Monitoring

Server)• User Interface

DAE Grid Site Services

Page 14: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Grid Portal

Page 15: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• One CA (LCG/gLite uses GSI) setup in BARC.

• The roles defined:

a) CA Administrator b) Registration Authority (RA)

c) Site Manager d) User

• Two servers namely offline CA and online CA

• Online CA provides a web interface for users to upload the CSR (Certificate Signing Request) and download the signed certificates.

• The offline CA is used for issuing certificates and revoking certificates and in generation of CRL (Certificate Revocation Lists).

• DAE-Grid have presently one VO “DAEGRID”.

DAE-Grid Certification Authority (CA)

Page 16: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• An in-house developed monitoring service that gives the complete state of the grid in a single page (Services, File System, PBS MOM alerts etc)

• Gives Status of the jobs in queue.• Job Records are collected and graphs generated by server from APEL (Accounting

Processor for Event Logs), which runs on every cluster

Monitoring & Accounting Server

Page 17: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• The RB service failure has the following impact� New jobs cannot be submitted� Status of existing jobs cannot be queried� Jobs, which have finished will not be shown as, completed until the RB

service has been recovered� Output data from jobs may be lost since they cannot copy the job

results to the output sandbox on the RB (the job retries for few times after waiting random periods for some time and then gives up)

High Availability RB

RB Master RB Standby

Switch

HA Status Packets

broker.barc.daegrid.gov.in

State

Page 18: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

RRCATIndore, M.P

VECCKolkota, W.B

IGCARKalpakkam

BARCMumbai

Cluster

R B

Links not being used

Multiple Resource Brokers

• Current gLite Version has inherent support for using multiple resource brokersfor a single VO.

• The user job will be directed to the resource brokerthat is up and running at the time of submission.

Page 19: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• Currently queued jobs need to be shifted

� Resource Broker maintains the state in MySQL database and Condor-G maintains the queue

� Putting the jobs in the backup RB Condor-G queue is not possible.

� Instead, take the state of jobs and Sandbox Dir. in main RB and give to backup RB

� Backup RB copies the Sandbox Dir.’s and maintains the state of the jobs separately that were initially submitted to main RB

• DNS mapping need to be changed for main RB.

• Client (Backup RB) – Server (CE’s) method used by backup RB to regularly update the status of these Jobs

• Job Management Commands like job status, job canceletc need to take this effect automatically.

Issues in preparing the backup broker..

Page 20: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• Main RB comes up

� Get the state of all the jobs (shifted from this RB) from the Backup RB

� Update the MySQL state tables and Sandbox directories accordingly

• Remove the DNS mapping from the main RB

• This has been tested and is very useful in ensuring the smooth running of jobs in the event of a scheduled switching off of a Resource broker owing to A/C maintenance or due to scheduled Electrical Maintenance.

Issues in automating the above solution (we are currently working on this)

• Maintaining same Global User DN-> Local Usermapping across RB’s is difficult.

• Re-thinking needed to Completely automating this process (which is not there like DNS mapping).

Issues in preparing the backup broker..

Page 21: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

RegionalRegionalRegionalRegional

DAEDAEDAEDAE----WLCG WLCG WLCG WLCG

TierTierTierTier----2 2 2 2

in Indiain Indiain Indiain India

Page 22: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

TIFR : CMS Tier II

4 Mbps Links

VECC/SINP: ALICE Tier II

0.3/1 Gbps link to CERN/Geant

100 MbpsLink

Tier III 2 Mbps

IPLC

NLC

BARC, IOPB and 14 Universities have been operational since 2007

34 mbpsLink to Geant

Regional DAE-WLCG Tier-2 in India

Page 23: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Tier2: LCG - IndiaCMS-TIFR, Mumbai, India

Network

Network bandwidth recently upgraded to 1 Gbps

Link is tested successfully between TIFR and

CERN and commissioned. Full utilization is underway

Storage

Current: 50 TB (raw) disk space. HP EVA 8000 system

DPM using SRM is used to provide storage services

Page 24: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Running Services

Bandwidth Status

Service Processor RAMVOBOX Dual Intel Xeon 2.4 Ghz 4GBALIENLFC

CE Dual Intel Xeon 3.0 GHz 4GBSE Dual Intel Xeon 3.0 GHz 4GB13 WN Dual Intel Xeon 3.0 GHz 4GB

100 Mbps is running fine since 14/01/2008 and Upgraded soon to 155 Mbps.

ALICE TIER-2 Centre at KOLKATA

Page 25: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Participated in development & deployment of Tools in LCG

LEMON Architecture

GRIDVIEW

QUATTOR Architecture

SHIVA

CC Tracker

Page 26: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

GridView Architecture

GRIDVIEW: VO-Wise Data Transfer

Page 27: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

GridView Screen

GRIDVIEW: Job Status

Page 28: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

DAE participationDAE participationDAE participationDAE participation

in in in in

EUEUEUEU----India GridIndia GridIndia GridIndia Grid

Page 29: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• Quickly set up Grid Infrastructure– Use production Grid-WLC and connect Indian Grids– Interoperate with EGEE Euro-Grids– Contribute to Grid standardisation efforts

• Support applications from diverse communities– High Energy Physics…………. DAE units– Condensed Matter Physics… Pune Univ,

TIFR, BARC– Bio-Sciences……………………… NCBS– Earth Sciences …………………. CDAC-Pune– Pilot clusters to users…….... INFN & VECC

• Business– E-governance interested business partners… NIC

+ Disseminate knowledge about the Grid through training

EU-India Grid Objectives

Page 30: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

• CLOSE INVOLVEMENT IN PROJECT PROCESSES

• COLLABORATION WITH CERN FOR LCG

• DEVELOPMENT IN GRID COMPUTING WITH CONFORMANCE TO gLite

• LEADING ROLE IN EUROPE-INDIA CONNECTIVITY

• COORDINATING WITH GOVERNMENT OF INDIAAGENCIES IN DECISION MAKING PROCESS

DAE AS A PARTNER IN EUDAE AS A PARTNER IN EUDAE AS A PARTNER IN EUDAE AS A PARTNER IN EU----INDIAGRIDINDIAGRIDINDIAGRIDINDIAGRID

• PARTNER IN INDIA’S NATIONAL GRID “GARUDA”

Page 31: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

To SUMMARISE

• DAE HAS BEEN ACTIVE IN GRID COMPUTING

SINCE 2004

• A PILOT DAE GRID IS OPERATIONAL AND IS

BEING ENHANCED

• PARTICIPATION IN WLCG HAS GENERATED MAN

POWER EXPERTISE: PRODUCTS DESIGNED,

DEVELOPED AND DEPLOYED AT LCG, CERN

• CLOSE PARTICIPATION IN EU-INDIAGRID with

Deputy Director from DAE

• IN FUTURE, WE EXPECT TO ACITIVELY

PARTICIPATE IN HIGH-END APPLICATIONS

DEVELOPMENT AND TAKING UP MIDDLEWARE

PROBLEMS

Page 32: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Acknowledgements

P. S. DHEKNE, DY. DIR, EU-INDIAGRID PROJECTEX-ASSOCIATE-DIRECTOR (E&IG), BARC

& THE DRIVE

R. S. MUNDADA, B. S. JAGADEESH, S. K. BOSE, K. RAJESH

TEAM LEADERS IN SUPERCOMPUTING, GRID COMPUTING & VISUALISATION

R. SHARMA, K BHATT, PHOOLCHAND, C.S.R.C. MURTHY, DINESH, SONAVANE,VAIBHAV, VINOD

YOUNG DEVELOPERS

Page 33: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

LINKED SLIDES

Page 34: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

LEMONarchitecture

Continue

Page 35: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

Configuration Management Infrastructure

Node(Cluster) Management

Continue

QUATTORARCHITECTURE

Page 36: ALHAD .G. APTE HEAD, COMPUTER DIVISION, BHABHA …EU-IndiaGrid... · HEAD, COMPUTER DIVISION, BHABHA ATOMIC RESEARCH CENTER ... DAE-Grid Certification Authority ... BARC, IOPB and

SAMDB

GRIDVIEWDB

Service Nodes

SAM testsSAM TestResults

SAMFramework

PublishingWeb Service

R-GMAArchiver Module

Web ServiceArchiver Module

SAM XSQLExport Module

RBs SEs (gridftp)

WS Client

RB JobLogs

GridftpLogs

GridftpLogs

Fabric Monitoring System at Site (LEMON / Nagios)

HTTP/XMLAvailability Metrics

GOCDBGOCDB

Sync ModuleData Analysis &

Summarization Module

VisualizationModule

Graphs & Reports

Continue

GridView Architecture