Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1,...
-
Upload
stephen-sullivan -
Category
Documents
-
view
212 -
download
0
Transcript of Experience of a low-maintenance distributed data management system W.Takase 1, Y.Matsumoto 1,...
Experience of a low-maintenance distributed data management
systemW.Takase1, Y.Matsumoto1, A.Hasan2, F.Di Lodovico3, Y.Watase1, T.Sasaki1
1. High Energy Accelerator Research Organization (KEK), Japan2. University of Liverpool, UK
3. Queen Mary, University of London, UK
1
Contents
• KEK iRODS system– Running in production over 2 years– Rules enable to store file efficiently– Federation with QMUL
• iRODS applications– SCALA : Visualization tool for SCALA– iRODS XOR-based backup
• Summary
2
iRODS overview
3
• Distributed data management system• Client-server architecture• Allows data management policies to be
enforced on the server-side• Provides interface to many different
types of storage• Client can access to iRODS via
– i-commands : Commands-line utilities– iRODS Browser : Web interface
KEK iRODS Systems
• 4 iRODS servers– RHEL 5.6– iRODS 2.5 ⇒ 3.2– PostgreSQL 9.1.1– 2 years〜
4
• iRODS Zone– KEK-T2K– KEK-MLF– KEKZone– demoKEKZone
HPSS (High Performance Storage System)
Disk System
• Storage resource
Data Management for T2K
• Tokai to Kamioka (T2K) Neutrino experimental group
• The experimental data is stored to KEK storage
• The group needed to provide an easy way to quickly access data collected to evaluate the quality of the data from outside of KEK
• iRODS provided the solution
5
http://t2k-experiment.org/wp-content/uploads/t2kmap.gif
Data Management for T2K
• KEK-T2K Zone for the experimental group started operation from October 2010
• Detected data are processed then transferred to KEK iRODS
• People in the group became to able to access the stored data easily and quickly– i-commands– iRODS Browser
6
iRODS Rules for KEK-T2K Zone
• Bundle and replicate the data
7
Client
T2Kdata server
disk
DB
Disk system
HPSS
iRODSserver
rodswebfilefile
filetar file
tar file
Each experimental data file is small (〜 several MB)
HPSS prefers large file
iRODS Rules for KEK-T2K Zone
• Response to request
8
disk
DB
Disk system
HPSS
ClientiRODSserver
rodswebtar filefile
file
request
T2Kdata server
Federation with QMUL
9
• Data replication among 2 sites• Share each site data
KEK-T2KExperimental
dataQMULZoneAnalytical
data
Federation
11
SCALA : Visualization tool for iRODS
• Statistical Charts And Log Analyzer• iRODS lacked an interface for usage
statistics and also for debugging problems
• We developed a web interface for visualizing iRODS status overview– Statistical Charts page– Log Analyzer page
• SCALA has been installed to KEK iRODS
12
SCALA Overview
iRODS
Resource usage
Log files
ParseSummarize
Display
SCALA
• Input : iRODS outputs• Output : Visualized system daily status as charts
Parsed table
Summarized table
Database
14
Log Analyzer
1. User clicks an bar
3. User clicks an error message
4. Related log displayed
2. Error detail displayed
• Provides error debugging tool
16
iRODS XOR-based backup
• Full file replication– Current method for reliable storage of data is
replicate data– If disk fails or server fails still have a copy– Requires much storage space– Portion of the file becomes corrupt you have to
replace the full file• XOR-based backup
• Reduces the space with same robustness• Splits file into some blocks and creates parity blocks• If a block becomes corrupt you have to recreate only
corrupted block
17
XOR-based backup:100% recovery with any 2 servers fail
Full-File Replication uses 3 servers and needs 300GB
XOR-based backup uses4 servers but only needs 200GB
iRODS rule enables automatic processing
Server1
Server2
Server3
Server4
A B C D
E = B + C
F = C + D
G = A + D
H = A + B
18
XOR-based backup:Decoding flow
Server1 Server2 Server3 Server4
A B C D
E = B + C
F = C + D
G = A + D
H = A + B
19
Summary
• KEK iRODS system has been running in production over 2 years
• iRODS gives a way to quickly and easily access data outside of KEK
• Rule of bundle and replicate the data leads to store files efficiently
• Federation with QMUL enables to share each data and backup
• SCALA is a visualizing tool and has been installed KEK iRODS– It leads to better management of the iRODS overall service
• XOR-based backup provides data reliability and less storage cost compared with replication– iRODS rule enables automatic processing