Report from HEPiX 2012: Network, Security and Storage [email protected] Geneva, November 16th.

20
Report from HEPiX 2012: Network, Security and Storage [email protected] Geneva, November 16th

Transcript of Report from HEPiX 2012: Network, Security and Storage [email protected] Geneva, November 16th.

Page 1: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

Report from HEPiX 2012:

Network, Security and Storage

[email protected], November 16th

Page 2: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

Network and SecurityNetwork traffic analysis

Updates on DC Networks

IPv6

Ciber-security updates

Federated Identity Management for HEP

Page 3: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

2

Network Traffic Analysis (i)

• At IHEPDeveloped a solution for large network flow data analysis that can benefit from the parallel processing power provided by Hadoop.

Probes analyze network traffic at the border routers for HEP experiments and produce flow information: who talks to who, what protocols, how long, bytes exchanged… Around 20GB netflow data per day.

Data processing, analysis and storage based on Hadoop. Data representation based on RRD and Highstock

Page 4: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

3

Network Traffic Analysis (ii)

• At CNRS/IN2P3Developed ZNeTS to record all network flow information in a DB and analyze it to detect and notify of resource misuse (p2p), scans, attacks, etc. Flow data is used to produce traffic statistics and views that can zoom from global network usage to host detail

Why? France legislation: connectivity providers need to store data to identify the user for one year

ZNeTS is free for French institutions (21 instances inside IN2P3, 50 outside)

Page 5: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

4

Network Traffic Analysis (iii)

• At DESYUse Open Source tools to help system administrators monitor, troubleshoot and detect anomalies.

nfdump tools to capture and store network flow information. nfsen and nfsight to process, analyze and visualize flow data.

Netdisco: Extract from network devices information about host, connection to switches, ARP, etc. and use it for troubleshooting and host inventory.

Page 6: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

5

Updates on DC Networks

• At CERNChanges on the DC network to accommodate the expected bandwidth demand growth and new services for Wigner and the LHC update Migration from Force10 to Brocade with an upgrade to

5.2Tbps switching fabric for the LCG Upgrade to 100Gbps of the LCG router interconnects (60

ports) Firewall capacity doubled: 60Gbps total, 16Gbps stateful Géant access upgrade from 20Gbps to 40Gbps Network architecture for Wigner

Page 7: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

6

HEPiX IPv6 Working Group

HEP Site IPv6 Survey (WLCG and other HEP sites, 42 replies) 15 sites IPv6 enabled. Providing DNS, web, email, CAs, Windows domain. 10 sites plan the deployment within next 12 months Other sites:

One proposed a new simpler architecture for IPv6! “So far, there have been no reported requirements or requests from experiments

or collaborations for IPv6” In general, end systems and core networks are ready, applications, tools

and sites are not HEPiX IPv6 Testing

IPv6 readiness will be documented and maintained for all “assets” used by sites and the LHC experiments. Results of this ‘summer’ tests:

GridFTP, globus_url_copy, FTS and DPM

OpenAFS, dCache, UberFTP, SLURM, Torque…

Future plans: Tests on the production infrastructure involving Tier 1 centres Plan HEP IPv6 days

Observations: MUCH work to be done and effort difficult to find (volunteers)

Page 8: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

7

IPv6 Updates (i)

• Deployment at IHEPStrong forces driving the IPv6 deployment: In China IPv6 has better available bandwidth and is free. One example: Tunnel IPv4 over IPv6 with USTC for HEP traffic Dual stack Campus Network and 10Gbps to CNGI Infrastructure monitoring with Cacti and patched Nagios Address assignment: DHCPv6 for DC, SLAAC for users OpenSource Firewall and IDS ready. Working on traffic

analysis and anomaly detection

• Observations IPv6 traffic is mostly video/iptv. Data transfers are IPv4 Moving HEP traffic to IPv6 is in the Work plan

Page 9: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

8

IPv6 Updates (ii)

• Deployment at CERN

Testing of network devices: completedIPv6 Testbed for CERN users: availableNew LANDB schema: in productionAddressing plan in LANDB: in productionProvisioning tools : on goingNetwork configuration: on goingUser interface (network.cern.ch): on goingNetwork services (DNS, DHCPv6, Radius, NTP): ongoingUser trainingIPv6 Service ready for production2013Q2

2011Q2

Today

2011Q3

2012Q1

2012Q1

Page 10: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

9

IPv6 Updates (iii)

• Testbed at FZU Monitoring of dual stack infrastructure using two nagios

instances: IPv4 only and IPv6 only Smokeping used to measure GridFTP latency and RTT

between FZU and the HEPiX testbed with similar results for IPv4 and IPv6

PXE over IPv6 not supported by manufacturers Network equipment supports IPv6 hardware switching

but very few support management via IPv6

Page 11: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

10

Cyber-security update (i)

Fro

m h

ttp://

ww

w.b

izar

roco

mic

s.co

m

Page 12: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

11

Cyber-security update (ii)

• Our full dependence on digital and the fact that we use interconnected accounts (Apple, Google, Amazon, …) make the security of our data depend on the weakest account.

• Vulnerability market shift: It’s more profitable to sell vulnerabilities in the black market than publishing or selling to vendors. And you can get an offer from a government.

• Windows, Linux or Mac OS? All of them are more or less equally affected by malware.

• Latest vulnerabilities:– Java ‘0-day’ 1.6 and 1.7 on various OS (CVE-2012-4681, patched)

• Disable Java in your browser if you don’t need it

– Internet Explorer 6 to 9 (CVE-2012-4969, patched)• Ummm, write your own browser?

Page 13: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

12

Federated Identity Management for HEP

“A framework to provide researchers with unique electronic identities authenticated in multiple administrative domains and across national boundaries that can be used together with community defined attributes to authorize access to digital resources”• A collaboration was started in 2011 called Federated IdM for

Research (FIM4R). Requirements have been documented and prioritized.

• Plan: Establish a pilot as a proof of concept for the architecture design and integration with WLCG – WLCG FIM pilot project started Oct 2012, lead by Romain Wartel

(CERN) to build a service enabling access to WLCG resources using home institute-issued federated credentials.

Page 14: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

StorageThe Lustre File System

Cloud Storage and S3

Tier1 Storage

Page 15: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

14

The Lustre File System (i)

• At IHEP 3 PB Lustre FS for detector raw data, reconstruction data, analysis

results, public group data and user personal storage 50 OSSs, 500 OSTs, 8k cores cluster 10Gbit Ethernet 1k clients, 0.2 billion files

• At GSI Phasing out old cluster to a new 1.4 PB for HPC 50 OSSs, 200 OSTs, cluster of 8.6k cores QDR Infiniband 500 clients, 110M files Teralink project: Outside institutes connect to storage via LNET

gateways IB-Ethernet

Page 16: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

15

The Lustre File System (ii)

Pros HEP jobs follow I/O patterns preferred by Lustre I/O performance and linear scalability with OSS Stability

Cons Central metadata server limiting scalability Difficult to backup and recover metadata and data Lots of small files and performance Some recurring bug requiring upgrade

Page 17: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

16

Cloud Storage and S3 (i)

• CERN Cloud Storage Evaluation Points of Interest:

Can we run cloud storage systems to complement or consolidate existing storage services?

Are the price/performance/scalability comparable to current CERN services?

S3 (Simple Storage Service) Protocol could be a standard interface for access, placement or federation of data allowing to provide storage services without change to user application

Focus on two S3 implementations of PB scale: OpenStack/Swift and Openlab collaboration with Huawei

Preliminary results: Client performance of local S3-based storage solutions looks

comparable with current production solutions Achieved expected stability and aggregated performance

(Huawei)

Page 18: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

17

Cloud Storage and S3 (ii)

• Huawei Massive Storage Nano-scale server: Cell phone (ARM) processors with one disk Spread the data to scale performance linearly Map S3 to a Distributed Hash Table of disk keys stored in nodes Data chunked into MB, protected with EC and stored at pseudo

random locations 1EB design goal. Current status: 384 node system 0.8 PB at CERN

• Mucura: Bringing cloud storage to your Desk Exploratory project by IN2P3 and IHEP to develop an open source

software system that provides personal space on a cloud. The interaction with remote files is the same as with your local files The system provides you significantly more storage than is locally

available in your personal computer (start with a few hundreds of GB) Targeted at HEP user community excluding I/O intensive applications

Page 19: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

18

Next Generation T1 Storage at RAL

Today CASTOR is used for tapes and disk-only storage. Evaluate alternatives for disk-only aiming for production for

2014 data run. Is the only case and depend on CASTOR development. CERN

moving towards EOS Dependence on (expensive) Oracle Nameserver is a SPoF IPv6 not on Roadmap

Based on a long list of MUSTs and SHOULDs selected for evaluation: dCache, CEPH, HDFS, orangeFS and Lustre

Tests include IOZone, RW throughput (file/gridFTP/xroot), deletion, draining and fault tolerance.

Tests ongoing, so far CASTOR is the most performant in some of the tests and not far off for others (well tuned!)

Page 20: Report from HEPiX 2012: Network, Security and Storage david.gutierrez@cern.ch Geneva, November 16th.

Thank you