UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th...
-
Upload
isaiah-atkinson -
Category
Documents
-
view
218 -
download
0
Transcript of UKI-SouthGrid Overview Pete Gronbech SouthGrid Technical Coordinator GridPP 25 - Ambleside 25 th...
UKI-SouthGrid Overview
Pete GronbechSouthGrid Technical Coordinator
GridPP 25 - Ambleside25th August 2010
Seven(-teen) Sisters
SouthGrid August 20103
UK Tier 2 reported CPU
– Historical View to present
0
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Jun-09 Jul-09 Aug-09
Sep-09
Oct-09 Nov-09
Dec-09
Jan-10 Feb-10
Mar-10
Apr-10 May-10
Jun-10 Jul-10
K SPEC int 2000 hours
UK-London-Tier2
UK-NorthGrid
UK-ScotGrid
UK-SouthGrid
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Jun-09 Jul-09 Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10 Feb-10
Mar-10
Apr-10 May-10
Jun-10 Jul-10
K SPEC int 2000 hours
JET
BHAM
BRIS
CAM
OX
RALPPD
SouthGrid August 20104
SouthGrid SitesAccounting as reported by
APEL
Sites Upgrading to SL5 and recalibration of published SI2K values
SouthGrid August 20105
Site Resources
HEPSPEC06
CPU (kSI2K) converted from
HEPSPEC06 benchmarks Storage (TB)
1772 442 1.5
3344 836 166
1836 459 110
2268 567 120
3564 891 199
12928 3232 633
0
25712 6248 1181.5
Site
EDFA-JET
Birmingham
Bristol
Cambridge
Oxford
RALPPD
Totals
Gridpp3 h/w generated MoU for 2010,11,12
2010 TB 2011 TB 2012 TB
bham 179 95 124
bris 22 27 35
cam 108 135 174
ox 203 255 328
RALPPD 364 440 583
2010 HS06 2011 HS06 2012 HS06
bham 1450 2,119 2724
bris 661 1,173 1429
cam 1148 1,445 1738
ox 2034 2,483 2974
RALPPD 6499 13109 16515
SouthGrid August 20107
JET
• Stable operation, (SL5 WNs)• Could handle more opportunistic LHC work
1772HS06
1.5TB
SouthGrid August 20108
Birmingham
• Just purchased 40TB Storage
– total storage to 10TB + 6*20 + 2*40 = 210 TB in a week or two
• Two new 64 bit servers– (SL5) Site BDII + monitoring
VMs– (SL5) DPM head node
• Everything (except mon) is SL5
• Both clusters have dual lcg-CE/CreamCE front ends
• Sluggish response/instabilities with GPFS on Shared Cluster– Installed 4TB NFS mounted file
server for experiment software/middleware/user areas
Taken on someone else's proprietary (non SL5) smart phone. He couldn't get signal in there either.
SouthGrid August 20109
Birmingham
Bristol LCG StoRM SE with gpfs, 102TB 90% full of CMS data
• StoRM developers are finishing testing 1.5.4 on SL5 64bit, plan to provide 1.5.4 both for slc4 ia32 and sl5 x86_64 to Early Adopters this month (August). Bristol is waiting for stable well-tested StoRM v1.5 SL5 64-bit release . In the meantime Bristol's StoRM v1.3 (32-bit on SL4) working very well!
On 1Gbps network, getting good bandwidth utilization Servers (StoRM & gridftp) very responsive despite
load:
Prior WN: Intel XEON 2.0GHz; Dec2009 new WN: AMD 2.4GHz each AMD WN = 2 x 1TB drive, part of 1 disk = WN space
Dr Metson experimenting with HDFS using rest of 1 disk + 2nd disk, working with INFN on possibility of StoRM on top of HDFS
Also experimenting with using Hadoop to process CMS data
In Other News... Swingeing IT staff cuts being planned at U Bristol (and
downgrades for those few remaining) Started planning that SouthGrid will take over Bristol
LCG Site Admin from April 2011 Consolidate & reduce PP servers so Astro admin can
inherit PP Staff will best-effort support Bristol AFS server (IS
won't)
HDFS with StoRM
SouthGrid August 201012
Bristol
• Plan to try to run the ce’s and other control nodes on Virtual machines using an identical setup to Oxford, to enable remote management.
• The StoRM SE on GPFS will be run by Bob Cregan on site.
SouthGrid August 201013
Cambridge
• 32 cores CPU installed April 2010: bought from GridPP3 tranche 2.
• Server to host several virtual machines (BDII, Mon, etc.) just delivered.
• Network upgraded last November to provide gigabit ethernet to all GRID systems.
• Storage is still 140TB; CPU will be increased due to the purchase in the first point.
• Atlas production is the main VO running on this site.• Investigating current under utilisation, possible
Accounting issues?
SouthGrid August 201014
RALPP
• We believe we are now through all the messing about with air conditioning, with our machine room now running on the refurbished/upgraded AC plant. Happy days, all except for the leaks shortly after they turned it on!
• We've been running well below nominal capacity for most of this year, but are pretty much back now.
• Joining with the Tier 1 for the tender process.• Testing argus and glexec• RGMA and site BDII now moved to SL5 VMs• Working on setting up a test instance of
dCache, working with the Tier 1, using Tier 2 hardware.
SouthGrid August 201015
Oxford
• Last 6 months cluster running with very high utilisation.
• Completed the tender for new kit and placed orders in July. Unfortunately the orders had to be cancelled due to manufacturing delays on the particular motherboard we ordered and a pricing problem. Now re-evaluating all suppliers with updated quotes.
• New Argus server installed. (Report by Kashif)– ‘Installing Argus was easy and configuring was also OK once I
understood the basic concept of policies but it took me a considerable time because of a bug in Argus which is partly due to old style of host certificate issued by UK CA. The same issue was responsible for gridpp voms server problem. I have reported this to UK CA.
– Argus uses glexec on the WN, it is being tested the glexec installed on t2wn41.
– Details on gridpp wiki http://www.gridpp.ac.uk/wiki/Oxford’
• Oxford has become an early adopter for CREAM and ARGUS.
SouthGrid August 201016
Grid Cluster setup CREAM ce & pilot setup
t2ce02
CREAM
Glite 3.2 SL5
T2wn41glexec
enabled
t2argus02
t2ce06
CREAM
Glite 3.2 SL5
T2wn40 -87
Oxford
Oxford Dashboard
SouthGrid August 201018Thanks to Glasgow for the idea /
code
Oxford’s Atlas dashboard
SouthGrid August 201019
SouthGrid August 201020
Conclusions
• SouthGrid sites utilisation generally improving• Many had recent upgrades for hardware using
Gridpp3 second tranche, others putting out tenders, some delays following issues with vendor at Oxford
• RALPPD back to full strength following AC upgrade• Monitoring for production running improving• Concerns over reduced manpower at sites as we
move into GridPP 4
Future Meetings
• Look forward to GridPP 26 in Sheffield next April• If you look in the right places the views are as good as
here in the lakes.