CSCS STORAGE INFRASTRUCTURE - HPC Advisory...

14
CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio Gorini

Transcript of CSCS STORAGE INFRASTRUCTURE - HPC Advisory...

Page 1: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

CSCS STORAGE INFRASTRUCTURE

CSCS HPS

Storage System Engineer

Stefano Claudio Gorini

Page 2: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

CSCS GPFS FS

2

/users & /apps /project /store

Small size Very Large Size Extreme Size

Quota by user Quota by group Quota by consortiunm

As a user exits

@ CSCS + 6 months

Duration of project

+ 6 months

As contractually agreed

Normal bandwidth High bandwidth High bandwidth

(if file on disk)

Backed up Backed up HSM

100 GB per user Capacity requested

and justified in a

project proposal

Capacity by Contract;

either matching founds

or fully paid by customer

Page 3: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

PROJECT FS – HW

3

~1.4 PB ~1 PB

480 SATA 2TB DISKS 480 SATA 2TB DISKS 420 SATA 2TB DISKS 420 SATA 2TB

DISKS

DATA DISKS ~2.4 PB

~ 1 TB on SSD

Card

METADATA DISKS

TSM

Storage

Agent

BERNINA15

BERNINA03

BERNINA04

BERNINA16 BERNINA14

BERNINA13 BERNINA01 BERNINA05

BERNINA02

BERNINA22

GLOBAL 118-119 GLOBAL 123-124 GLOBAL 112-113 GLOBAL 116-117

BERNINA23

BERNINA25

BERNINA05

Page 4: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

HOME & APPS FS – HW

4

GLOBAL 114-115

BERNINA11

BERNINA10

BERNINA24

DATA DISKs

60 of 120 SATA 2TB DISKS

METADATA DISKs

4 of 64 FC 500GB DISKS

TSM

Storage

Agent

Page 5: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

STORE FS – HW

5

DATA DISKs

300 SATA 3TB DISKS DATA DISKs

300 SATA 3TB DISKS

DATA DISKs

300 SATA 3TB DISKS

~ 500 GB on SSD

Card

METADATA DISKS

TSM

Storage

Agent

ADULA05

ADULA06

MEDEL01

MEDEL02

MEDEL03

MEDEL04

MEDEL05

MEDEL06

MEDEL07

MEDEL08

MEDEL09

MEDEL10

MEDEL11

MEDEL12

MEDEL13

MEDEL14

RAMSAN1

~2.1 PB DATA DISKS

Page 6: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

GPFS - CNFS

6

~# mmremotefs show all

Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority

global global global.cscs.ch /global rw yes - 0

apps apps globalhome.cscs.ch /apps rw yes - 0

users users globalhome.cscs.ch /users rw yes - 0

store archive store.cscs.ch /store rw yes - 0

/global *.cscs.ch(rw,async,no_root_squash)

/users *.cscs.ch(rw,async,no_root_squash)

/apps *.cscs.ch(rw,async,no_root_squash)

/store *.cscs.ch(rw,async,no_root_squash)

Alias used by CNFS:

nfs01.cscs.ch

nfs02.cscs.ch

nfs03.cscs.ch

nfs04.cscs.ch

BERNINA20

BERNINA07

BERNINA21

BERNINA08

Page 7: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

QFS/SAM-FS to GPFS TSM Migration

7

GPFS

+

TSM/HSM

QFS/SAM-FS

Page 8: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

QFS/SAM-FS to GPFS TSM Migration

8

1. Snapshot and migration of the metadata

2. Production stays on the old system

3. Bulk migration of data from the snapshot:

• Read tar le from SAM-FS/QFS

• Transfer data over network using a parallel copy tool

• Data integrity verication after the network transfer done by checksums, which

had been taken from the old system before the start of the migration.

• Untar to the new GPFS location 4. Transition to production after final synchronization of data

5. After that clean GPFS/TSM on production without access to the old tapes

SAM-FS

GPFS

HMK tool

Page 9: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

QFS/SAM-FS to GPFS TSM Migration

9

FROM 02/09/2011 TO 10/12/2011 TO MIGRATE :

• ~26M files

• ~650 TB

• Average speed ~7 TB/day

PERFORMANCES WERE DRIVEN BY DEVICE SPEED:

• Tape drive speed ( T10000 max. 100 MB/s )

• Network speed ( 3 Gb/s due to a PCIX card)

• GPFS performances (the one used to migrate data was 4 GB/s)

Page 10: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

DATA TOPOLOGY

10

Y<1MB 1MB<Y<100MB

100MB<Y<1GB 1GB<Y<10GB

Y> 10GB

0 20,000 40,000 60,000 80,000 100,000 120,000 140,000

X < 1 Month

1 Months < X < 3 Months

3 Months < X < 1 Year

X >1 Year

GB

File Size Range summerized in GB (Y) as a function of Last Access time (X)

Page 11: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

TSM/HSM

11

/project

HSM & Backup Clients

HSM & Backup Clients

TSM

DB

HSM & Backup Clients

/store

TSM

DB

3 TSM Servers + 1 Spare

6 TSM Storage Agents:

24 LTO Tape Drives

5,719 LTO5 Slots & 5,719

Cartridges

- 8.58 PB uncompressed

Backup / Restore Capacity:

20 x 100 MB/s = 2000 MB/s = 7.2 TB/h

+ 4 drives for Data Management:

Reclaim, Copy, Move, DB Backup

GP

FS

S

tora

ge

Ag

en

ts

TS

M S

erv

ers

Active Libr. Manager

Spare

/users & /apps

TSM 6.3

Page 12: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

Open Issue on GPFS/TSM

MMBACKUP - GPFS utility that drives Backup using the filesystem

policy does not yet completely join the TSM warning/error catalog:

12

“ Cannot reconcile shadow database.

Unable to compensate for all TSM errors in new shadow database.

Preserving previous shadow database.

Run next mmbackup with -q to synchronize shadow database. exit

12”

~10 hours to rebuild shadow database

Page 13: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

Future Plans

Add a NEW TIERED GPFS “STAGE”:

– SSD & FC DISK

– Data moved across disk group by gpfs policy

Deploy a complete TSM Replica (TSM 6.3 feature)

13

Page 14: CSCS STORAGE INFRASTRUCTURE - HPC Advisory …hpcadvisorycouncil.com/events/2012/Switzerland-Workshop/... · CSCS STORAGE INFRASTRUCTURE CSCS HPS Storage System Engineer Stefano Claudio

Thanks!

14