Factors affecting ANALY_MWT2 performance

20
Factors affecting ANALY_MWT2 performance MWT2 team August 28, 2012

description

Factors affecting ANALY_MWT2 performance. MWT2 team August 28, 2012. Factors to check. Storage servers Internal UC network Internal IU network WAN network Effect of dCache -locality caching versus WAN direct access IU analysis nodes specifically. Individual storage servers. - PowerPoint PPT Presentation

Transcript of Factors affecting ANALY_MWT2 performance

Page 1: Factors affecting ANALY_MWT2 performance

Factors affecting ANALY_MWT2 performance

MWT2 teamAugust 28, 2012

Page 2: Factors affecting ANALY_MWT2 performance

2

Factors to check

• Storage servers• Internal UC network• Internal IU network• WAN network• Effect of dCache-locality caching versus WAN

direct access• IU analysis nodes specifically

Page 3: Factors affecting ANALY_MWT2 performance

3

Individual storage servers

• We have previously measured performance of each storage node individually with various “blessing tests”– Nodes are uct2-s[14] and iut2-s[6]– Note xxt2-s[3] are first gen; xxt2-s[4-14] are SAS2 H800

• Each storage node is over-provisioned for CPU and memory (96G) – even while running dCache services and Xrootd-overlay

• Each node has a single 10G NIC, a potential bottleneck– Some of the s-nodes have an additional 10G port that could be cabled

and bonded• Currently only UC and IU have storage and since all accesses are

local no analysis jobs run at UIUC presently– UIUC will add 300TB this fall

Page 4: Factors affecting ANALY_MWT2 performance

4

Typical Storage node

Page 5: Factors affecting ANALY_MWT2 performance

5

Storage Network utilization at UC An hour sample.See IO nicely spread over servers, noobvious bottlenecks or hot spots

Page 6: Factors affecting ANALY_MWT2 performance

6

Storage Network utilization UC (week)Over past week.More less the same- good spread,100-300 MB/s continuously per system

Page 7: Factors affecting ANALY_MWT2 performance

7

UC NetworkThis link is now 2x10G

Page 8: Factors affecting ANALY_MWT2 performance

8

UC Network – Bottlenecks (1)

• PC8024F and PC6248 stack– Cacti (guest/cacti):

http://www.mwt2.org/cacti/graph.php?action=view&local_graph_id=3581&rra_id=all

– Last week. There are moments of saturation (3)

Page 9: Factors affecting ANALY_MWT2 performance

9

UC Network (2)

• PC6248 and Cisco 6509 is 2x10G bonded– http://www.mwt2.org/cacti/graph.php?action=vie

w&local_graph_id=3757&rra_id=all

• Last week (looks fine mostly < single 10G)

Page 10: Factors affecting ANALY_MWT2 performance

10

WAN network from UC to IU, UIUC, and BNL

Page 11: Factors affecting ANALY_MWT2 performance

11

WAN NetworkLast week’s IO to UC: green is usually FTS transfers from BNL, blue IO mostly to IUbut some to UIUC. This is for one of the 10G NICs from the 6509 to campus core (there is a second NIC to campus core, and the bonded plot, neither of which I can findin our cacti hierarchy at the moment)

Page 12: Factors affecting ANALY_MWT2 performance

12

Local network @ IU

MWT2

Page 13: Factors affecting ANALY_MWT2 performance

13

IU local network

• Storage nodes are each connected via 10GB link to 6248 switch stack

• Compute nodes are connected to the same stack via 1GB connections

• 6248 switch has dual-10GB uplink to rtsw2, the 100GB Brocade switch

Page 14: Factors affecting ANALY_MWT2 performance

14

IU-centric WAN picture

This picture does not show connectivity to the other MWT2 sites;to UC, the connection is through MREN currently

Page 15: Factors affecting ANALY_MWT2 performance

15

IU WAN network• WAN traffic last month (peaks up to 8 Gbps)

There are 100 Gbps links to Chicago available – though not all the way to UIUC and UC

Page 16: Factors affecting ANALY_MWT2 performance

16

WAN direct access versus dCache-locality mode caching

• We believe since turning on dCache-caching we have reduced the load on the WAN

dCache-locality was turnedcirca 8/6/2012 (~ Week 31)

Page 17: Factors affecting ANALY_MWT2 performance

17

dCache locality mode

• Cache hit rate is about 75%• Individual files are used an average of 4 times• The average transfer transfer reads 25% of the

file

Page 18: Factors affecting ANALY_MWT2 performance

18

dCache Site Caches

IU pools

UC pools

Cached data

Page 19: Factors affecting ANALY_MWT2 performance

19

Low efficiency at IU

• Slow jobs are not associated with a particular data server

• According to strace, most system time is spent in munmap command

• Jobs are slow even on a completely empty node• Data is cached at IU

Page 20: Factors affecting ANALY_MWT2 performance

20

Summary and improvements

• Add second 10G to 8024F-6248 link at UC and bond

• Cable up second 10G port for uct2-s[11-14] – will need to add another 8024F, which will also

require further trunking rearrangement• Adding additional storage nodes will increase

number of IO channels decreasing single-node contention