San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group...

17
San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008

description

3 Background Presented repository survey to MC (August 2007) –MC resolution to move all data online –Identified the need for a geographically separate repository as an operational backup Began evaluating San Diego Super Computing Center (SDSC) –Currently managing data storage for a number of science- related programs –Provides high speed data exchange and mass storage management –Very inexpensive at $450/TB/Year for near-line –Explore as a secondary storage option for PDS –SDSC agreed to let PDS evaluate using their beta iRODS software –Determine if 500 GB / day is a realistic goal for moving data to a secondary repository

Transcript of San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group...

Page 1: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

San Diego Super Computing (SDSC) Testing

Summary and Recommendations

Physical Media Working Group

March 2008

Page 2: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

2

PDS MC Policies on Media, Integrity and Backup

• Data Delivery Policy – Data producers shall deliver one copy of each archival volume to the appropriate

Discipline Node using means/media that are mutually acceptable to the two parties. The Discipline Node shall declare the volume delivery complete when the contents have been validated against PDS Standards and the transfer has been certified error free.

– The receiving Discipline Node is then responsible for ensuring that three copies of the volume are preserved within PDS. Several options for "local back-up" are allowed including use of RAID or other fault tolerant storage, a copy on separate backup media at the Discipline Node, or a separate copy elsewhere within PDS. The third copy is delivered to the deep archive at NSSDC by means/media that are mutually acceptable to the two parties.(Adopted by PDS MC October 2005)

• Archive Integrity Policy - – Each node is responsible for periodically verifying the integrity of its archival

holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. (Adopted by MC November 2006)

Page 3: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

3

Background

• Presented repository survey to MC (August 2007)– MC resolution to move all data online– Identified the need for a geographically separate repository as

an operational backup

• Began evaluating San Diego Super Computing Center (SDSC)– Currently managing data storage for a number of science-

related programs– Provides high speed data exchange and mass storage

management– Very inexpensive at $450/TB/Year for near-line– Explore as a secondary storage option for PDS– SDSC agreed to let PDS evaluate using their beta iRODS

software– Determine if 500 GB / day is a realistic goal for moving data to

a secondary repository

Page 4: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

4

Timeline

• Fall 2007– Evaluated iRODS beta software between JPL and SDSC– Good performance results in moving data– Captured metrics for different scenarios

• Size of files transferred• Number of files transferred• Time when files are transferred• Network speed• Network connection (e.g., 10/100 vs GigE)

– Minor bugs found• Decision to wait for a more stable release

• Winter 2008– New release of iRODS client (February 2008)– Testing between JPL and SDSC

• Excellent performance (e.g., ~10 - 30 Mbytes/sec)– Some network problems encountered between JPL and SDSC which

required resolution– Extended test to PPI/UCLA with good results (e.g. transferred 1/2

terabyte over a 15 hour period using a real PDS data)– Extended test to GEO

Page 5: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

5

Summary of Testing Results

• Reliability

– Transferring data between JPL and SDSC is only partially successful as checksum failures appear randomly

• This appears to be a network routing issue which is being resolved by our network administrators

– Transferring data between PPI and SDSC has not shown any problems

• Performance (using iRods)

– JPL to SDSC – 0.5 to 5 GByte file = between 6 and 16 MBytes/sec– PPI to SDSC - 0.4 to 3 GByte file = between 7 and 8 MBytes/sec– SDSC to JPL – 0.5 to 5 GByte file = ~ 8 MBytes/sec– SDSC to PPI - 0.4 to 3 GByte file = ~ 5 MBytes/sec

• Usability– Installation and configuration has been straight-forward

Page 6: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

6

Recommendations

• As more testing builds confidence in using iRods s/w:– Bring more Nodes into testing (May 2008):

• More diverse testing environments• More opportunity to identify PDS wide systemic problems / areas of concern

– Make final recommendation on using SDSC and options for making it operational (June 2008)

Page 7: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

7

Backup Material

Page 8: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

8

JPL/EN System Configuration

• JPL/EN Server– Memory: 2073072k– CPU: dual 2200.204 MHz processor.– O/S: Red Hat Enterprise Linux ES release 4 (Nahant)– Hard drive: 73 GB Ultra-160 Scsi drive– Ethernet card 0: negotiated 100baseTx-FD

• Network Bandwidth– 100 MBits/sec– 1000 MBits/sec

Page 9: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

9

JPL/PO.DAAC System Configuration

• PO.DAAC server– Sun X4100– 2 x Dual Core AMD Opteron(tm) Processor 285– 16GB of memory– Linux kernel 2.6.9-66.0.12ELsmp– Red Hat Enterprise Linux ES release 4 (Nahant Update 6)– Westwood+ turned on

• Network Bandwidth– 100 MBits/sec– 1000 MBits/sec

Page 10: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

10

PDS/PPI System Configuration

• PDS/PPI Server– Intel Pentium D 2.8 GHz– 2 GB RAM– Kernel ver. 2.6.9-42.ELsmp– Red Hat Enterprise WS 4 update 4

• Network Bandwidth– 1000 MBits/sec

Page 11: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

11

Repository Directories / Files

• Directory: /tsraid1/<96 directories listed below> Size: ~1.0036E12 Bytes

clem1-1-rss-1-bsr-v1.0_s clem1-1-rss-5-bsr-v1.0_s clem1-l_e_y-a_b_u_h_l_n-2-edr-v1.0_s clem1-l-h-5-dim-mosaic-v1.0_s clem1-l-u-5-dim-basemap-v1.0_s clem1-l-u-5-dim-uvvis-v1.0_s eso-j-irspec-3-rdr-sl9-v1.0 eso-j-s-n-u-spectrophotometer-4-v2.0_s eso-j-susi-3-rdr-sl9-v1.0 go-a_c-ssi-2-redr-v1.0_s go-a_e-ssi-2-redr-v1.0_s go-j_jsa-ssi-2-redr-v1.0_s go-j_jsa-ssi-4-redr-v1.0 go-j-nims-2-edr-v2.0 go-v_e-ssi-2-redr-v1.0_s go-v-rss-1-tdf-v1.0_s group_clem_xxxx_m group_dmgsm_100x_m group_dmgsm_200x_m group_go_00xx_m group_go_100x_m group_go_1101_m group_go_110x23_m group_go_110x_m group_gp_0001_m

group_hal_0024_m group_hal_0025_m group_hal_0026_m group_hal_00xx_m group_lp_00xx_m group_mg_0xxx_m group_mg_2401_m group_mg_5201_m group_mgs_0001_m group_mgs_100x_m group_mgsa_0002_m group_mgsl_000x_m group_mgsl_20xx_m group_sl9_0001_m group_sl9_0004_m hst-j-wfpc2-3-sl9-impact-v1.0_s hst-s-wfpc2-3-rpx-v1.0_s ihw-c-lspn-2-didr-crommelin-v1.0 ihw-c-lspn-2-didr-halley-v1.0 irtf-j_c-nsfcam-3-rdr-sl9-v1.0_s iue-j-lwp-3-edr-v1.0 lp-l-rss-5-gravity-v1.0_s lp-l-rss-5-los-v1.0_s mer1-m-pancam-2-edr-sci-v1.0_s

mer1-m-pancam-3-radcal-rdr-v1.0_s mgn-v-rdrs-5-cdr-alt_rad-v1.0_s mgn-v-rdrs-5-dim-v1.0_s mgn-v-rdrs-5-gvdr-v1.0_s mgn-v-rdrs-5-midr-n-polar-stereogr-v1.0 mgn-v-rdrs-5-midr-s-polar-stereogr-v1.0 mgn-v-rss-5-losapdr-l2-v1.0_s mgn-v-rss-5-losapdr-l2-v1.13_s mgs-m-accel-0-accel_data-v1.0_s mgs-m-accel-2-edr-v1.1_s mgs-m-accel-5-profile-v1.2 mgs-m-moc-na_wa-2-dsdp-l0-v1.0_s mgs-m-moc-na_wa-2-sdp-l0-v1.0_s mgs-m-mola-1-aedr-10-v1.0_s mgs-m-mola-3-pedr-ascii-v1.0 mgs-m-mola-3-pedr-l1a-v1.0_s mgs-m-rss-1-cru-v1.0_s mgs-m-rss-1-ext-v1.0_s mgs-m-rss-1-map-v1.0_s mgs-m-rss-1-moi-v1.0_s mgs-m-rss-5-sdp-v1.0_s mgs-m-tes-3-tsdr-v1.0_s mgs-m-tes-3-tsdr-v2.0_s mpfl-m-imp-2-edr-v1.0_s mr9-m-iris-3-rdr-v1.0_s

mr9_vo1_vo2-m-iss_vis-5-cloud-v1.0_s mssso-j-caspir-3-rdr-sl9-stds-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0 mssso-j-caspir-3-rdr-sl9-v1.0_s near-a-grs-3-edr-erosorbit-v1.0 near-a-mag-2-edr-cruise1-v1.0 near-a-mag-2-edr-cruise2-v1.0 near-a-mag-2-edr-cruise3-v1.0 near-a-mag-2-edr-cruise4-v1.0 near-a-mag-2-edr-earth-v1.0 near-a-mag-2-edr-erosflyby-v1.0 near-a-mag-2-edr-erosorbit-v1.0 near-a-mag-2-edr-erossurface-v1.0 near-a-mag-3-rdr-cruise2-v1.0 near-a-mag-3-rdr-cruise3-v1.0 near-a-mag-3-rdr-cruise4-v1.0 near-a-mag-3-rdr-earth-v1.0 near-a-mag-3-rdr-erosflyby-v1.0 near-a-mag-3-rdr-erosorbit-v1.0 vg1-s-rss-1-rocc-v1.0_s vg1-ssa-rss-1-rocc-v1.0_s vg2-s-rss-1-rocc-v1.0_s

Page 12: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

12

PDS / SDSC Testing

• Timeline– 09/07 – JPL/EN writes Test Plan (for testing iRods s/w)

• Identifies / documents PDS/SDSC architecture• Identifies the set of parameters that are to be varied :

– - size of files transferred -- Mbytes to Gbytes– - number of files transferred -- 1 to hundreds– - time when files transferred -- peak / low network access periods– - network speed -- Mbits / Gbits– - and basically any other parameters that might affect reliability / transfer speed

• Identifies the set of parameters to be measured:– Transfer speed (Mbytes/sec)– Reliability (% of transmission failtures)

– 09/07 - JPL/EN tested pre-production version of iRods s/w• EN testing shows checksum errors on file transfer• EN & SDSC agree to halt testing until SDSC can provide stable s/w

– 10/07 – JPL/EN captured test results in Test Report

Page 13: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

13

PDS/SDSC Testing Configuration

SDSC Data Central Server

PDS Server (32 Bit)

(Rebuild of Starburst)PDS Archive

Starbase

IRODS Client

Sam-QFS

SDSC

Repository

PDS

Repository

~100 MB/S

Mounted

~1 GB/S

Int2/OC12

Accounts

PDS Dev

First Tests 1. iRODS client installed at JPL

2. iRODS metadata catalog (iCAT) running on Postgresql at SDSC.

3. iRODS managed data transfer from JPL to Sam-QFS at SDSC.  We would use parallel I/O to do the transfer, with the goal of moving the terabyte of data within a day.  In effect, we would use iRODS to move a file from a disk at JPL to storage at SDSC.

4. iRODS checksums used to validate data integrity

Base ConfigurationPhase 1

iCAT

Page 14: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

14

PDS / SDSC Testing

• Timeline– 03/08 – JPL/PO.DAAC begins testing production version of iRods s/w:

• PO.DAAC varied parameters:– Server separate from JPL/EN servers– Same JPL network and network speeds– File sizes varied from 0.5MBytes to 17GBytes– Single file transfer; multi-file transfer– Single thread transfer; multi-thread (up to 16 threads)– Network speed (100Mbits, 1Gbit)

• PO.DAAC testing indicates random data corruption• Transfer rates:

– Using 64 bit system:– PO.DAAC to SDSC; 0.5 GByte file: 20-25 Mbytes/sec– PO.DAAC to SDSC; 17 GByte file: 30 Mbytes/sec

– 03/08 – JPL/PO.DAAC tests using iperf s/w:• Tested iperf between: JPL to SDSC (error detected)• Tested iperf between: Raytheon to SDSC (no errors)• Tested iperf between: UCLA to SDSC (no errors)• Tested iperf between: JPL to UCLA (random errors detected)• Tested iperf between: JPL to SDSC (error detected)• Tested iperf between: JPL/EN to JPL/PO.DAAC (error detected)

– 03/08 – Testing indicates problem within the JPL network

Page 15: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

15

PDS / SDSC Testing

• Timeline– 03/08 – PDS/PPI begins testing production version of iRods s/w

• PPI varied parameters:– Server separate from JPL servers– UCLA network

• PPI testing shows no data corruption• Transfer rates:

– from PPI to SDSC (0.5 TBytes; 300 transfers): ~7-8 Mbytes/sec– from SDSC to PPI: ~TBD Mbytes/sec

• Impressions of using iRods s/w:– Easy to install and configure– Descent transfer rates

Page 16: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

16

PDS / SDSC Testing

• Timeline– 03/27/08 – JPL/JPL.NET:

• identifies router, outside of JPL and between JPL and SDSC, is dropping bits.

• re-routes traffic to bypass errant router

– 03/27/08 – JPL/PO.DAAC re-tests data transfer from JPL to SDSC: no data corruption

– 03/27/08 – PDS/EN asks GEO and IMG/USGS to participate in testing production version of iRods s/w:

• Emailed GEO and IMG/USGS SDSC contact information and start-up procedures

– GEO will provide baseline for iRods operating on Windows

Page 17: San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.

17

PDS / SDSC Testing

• Timeline– 02/08 – SDSC releases production version of iRods s/w

• Version 1.0; more robust / stable version• Concurrently being tested by other SDSC clients:

– Maryland (not SBN), Wisconsin, and UCLA (not PPI)

– 02/08 – JPL/EN begins testing production version of iRods s/w• EN varied parameters:

– 2 different servers using different H/W and OS– File sizes varied from 500MBytes to 2GBytes– Single file transfer; multi-file transfer– Single thread transfer; multi-thread (up to 16 threads)– Network speed (100Mbits, 1Gbit)

• EN testing shows checksum errors on file transfer• Transfer rates:

– using 32 bit system on 2 GB file:» from JPL to SDSC: ~7.2 Mbytes/sec» from SDSC to JPL: ~8.3 Mbytes/sec

– using 64 bit system on 2 GB file:» from JPL to SDSC: ~11.5 Mbytes/sec» from SDSC to JPL: ~26.2 Mbytes/sec