William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

50
1 Cyberinfrastructure and Networks: The Advanced Networks and Services Underpinning the Large-Scale Science of DOE’s Office of Science William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

description

Cyberinfrastructure and Networks : The Advanced Networks and Services Underpinning the Large-Scale Science of DOE’s Office of Science. William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory. - PowerPoint PPT Presentation

Transcript of William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

Page 1: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

1

Cyberinfrastructure and Networks:The Advanced Networks and Services

Underpinning the Large-Scale Science ofDOE’s Office of Science

William E. Johnston ESnet Manager and Senior Scientist

Lawrence Berkeley National Laboratory

Page 2: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

ES

net

Scie

nce D

ata

N

etw

ork

(S

DN

) core

TWC

SNLL

YUCCA MT

BECHTEL-NV

PNNLLIGO

INEEL

LANL

SNLAAlliedSignal

PANTEX

ARM

KCP

NOAA

OSTI ORAU

SRS

JLAB

PPPLLab DCOffices

MIT

ANL

BNL

FNALAMES

NR

EL

LLNL

GA

DOE-ALB

OSC GTNNNSA

International (high speed)10 Gb/s SDN core10G/s IP core2.5 Gb/s IP coreMAN rings (≥ 10 G/s)OC12 ATM (622 Mb/s)OC12 / GigEthernetOC3 (155 Mb/s)45 Mb/s and less

Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)

Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)

QWESTATM

42 end user sites

ESnet IP core

SINet (Japan)Russia (BINP)

CA*net4 FranceGLORIAD (Russia, China)Korea (Kreonet2

Japan (SINet)Australia (AARNet)Canada (CA*net4Taiwan (TANet2)Singaren

ESnet IP core: Packet over SONET Optical Ring and

Hubs

ELP

ATL

DC

commercial and R&E peering points

MAE-E

PAIX-PAEquinix, etc.

PN

WG

Po

P/

PA

cifi

cWav

e

SEA

ESnet Provides Global High-Speed Internet Connectivity forDOE Facilities and Collaborators (ca. Summer, 2005)

ESnet core hubs IP

Abilene high-speed peering points with Internet2/Abilene

Abilene

Ab

ilen

e

CERN(USLHCnet CERN+DOE funded)

GEANT - France, Germany, Italy, UK, etc

NYC

Starlight

Chi NAP

CHI-SL

SNV

Ab

ilene

SNV SDN

JGI

LBNL

SLACNERSC

SDSC

Equinix

MA

N L

AN

Abi

lene

MAXGPoP

SoXGPoP

SNV

ALB

ORNL

CHI

MRENNetherlandsStarTapTaiwan (TANet2, ASCC)

Page 3: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

3

DOE Office of Science Drivers for Networking

• The role of ESnet is to provide networking for the Office of Science Labs and their collaborators

• The large-scale science that is the mission of the Office of Science is dependent on networks for

o Sharing of massive amounts of data

o Supporting thousands of collaborators world-wide

o Distributed data processing

o Distributed simulation, visualization, and computational steering

o Distributed data management

• These issues were explored in two Office of Science workshops that formulated networking requirements to meet the needs of the science programs (see refs.)

Page 4: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

Tier 1

Tier2 Center

Online System

eventreconstruction

French Regional Center

German Regional Center

InstituteInstituteInstituteInstitute ~0.25TIPS

Workstations

~100 MBytes/sec

~0.6-2.5 Gbps

100 - 1000

Mbits/sec

Physics data cache

~PByte/sec

Tier2 CenterTier2 CenterTier2 Center

~0.6-2.5 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center Tier 2

• 2000 physicists in 31 countries are involved in this 20-year experiment in which DOE is a major player.

• Grid infrastructure spread over the US and Europe coordinates the data analysis

CERN LHC CMS detector

15m X 15m X 22m, 12,500 tons, $700M.

analysis

Italian Center FermiLab, USA Regional Center

Courtesy Harvey

Newman, CalTech

CERN / LHC High Energy Physics Data Provides One ofScience’s Most Challenging Data Management Problems

(CMS is one of several experiments at LHC)

2.5-40 Gbits/sec

event simulation

human

Page 5: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

5

LHC Networking

• This picture represents the MONARCH model – a hierarchical, bulk data transfer model

• Still accurate for Tier 0 (CERN) to Tier 1 (experiment data centers) data movement

• Probably not accurate for the Tier 2 (analysis) sites

Page 6: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

6

Example: Complicated Workflow – Many Sites

Page 7: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

7

Distributed Workflow

• Distributed / Grid based workflow systems involve many interacting computing and storage elements that rely on “smooth” inter-element communication for effective operation

• The new LHC Grid based data analysis model will involve networks connecting dozens of sites and thousands of systems for each analysis “center”

Page 8: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

8

Carbon Assimilation

CO2 CH4

N2O VOCsDust

HeatMoistureMomentum

ClimateTemperature, Precipitation,Radiation, Humidity, Wind

ChemistryCO2, CH4, N2O

ozone, aerosols

MicroclimateCanopy Physiology

Species CompositionEcosystem StructureNutrient Availability

Water

DisturbanceFiresHurricanesIce StormsWindthrows

EvaporationTranspirationSnow MeltInfiltrationRunoff

Gross Primary ProductionPlant RespirationMicrobial RespirationNutrient Availability

Ecosystems

Species CompositionEcosystem Structure

WatershedsSurface Water

Subsurface WaterGeomorphology

Biogeophysics

En

erg

y

Wa

ter

Ae

ro-

dyn

am

ics

Biogeochemistry

MineralizationDecomposition

Hydrology

So

il W

ate

r

Sn

ow

Inte

r-ce

pte

dW

ate

r

Phenology

Bud Break

Leaf Senescence

HydrologicCycle

VegetationDynamics

Min

utes-T

o-H

ou

rsD

ays-To

-Week

sY

ears-T

o-C

en

turies

Example: Multidisciplinary Simulation

(Courtesy Gordon Bonan, NCAR: Ecological Climatology: Concepts and Applications. Cambridge University Press, Cambridge, 2002.)

A “complete” approach to

climate modeling

involves many interacting

models and data that are provided

by different groups at different locations

(Tim Killeen, NCAR)

Page 9: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

9

Distributed Multidisciplinary Simulation

• Distributed multidisciplinary simulation involves integrating computing elements at several remote locationso Requires co-scheduling of computing, data storage, and

network elements

o Also Quality of Service (e.g. bandwidth guarantees)

o There is not a lot of experience with this scenario yet, but it is coming (e.g. the new Office of Science supercomputing facility at Oak Ridge National Lab has a distributed computing elements model)

Page 10: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

10

Projected Science Requirements for NetworkingScience Areas considered in the Workshop [1](not including Nuclear Physics and Supercomputing)

Today End2End

Throughput

5 years End2End

Documented Throughput

Requirements

5-10 Years End2End Estimated

Throughput Requirements

Remarks

High Energy Physics

0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput with deadlines (Grid based analysis systems require QoS)

Climate (Data & Computation)

0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput

SNS NanoScience Not yet started

1 Gb/s 1000 Gb/s remote control and time critical throughput (QoS)

Fusion Energy 0.066 Gb/s(500 MB/s burst)

0.198 Gb/s(500MB/20 sec. burst)

N x 1000 Gb/s time critical throughput (QoS)

Astrophysics 0.013 Gb/s(1 TBy/week)

N*N multicast 1000 Gb/s computational steering and collaborations

Genomics Data & Computation

0.091 Gb/s(1 TBy/day)

100s of users 1000 Gb/s high throughput and steering

Page 11: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

11

Observed Drivers for the Evolution of ESnet

ESnet Monthly Accepted TrafficFeb., 1990 – May, 2005

ESnet is Currently Transporting About 530 Terabytes/mo.and this volume is increasing exponentially

TB

ytes

/Mon

th

Page 12: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

12

Traffic coming into ESnet = GreenTraffic leaving ESnet = BlueTraffic between ESnet sites% = of total ingress or egress traffic

Note• more than 90% of the ESnet traffic is OSC traffic• less that 20% of the traffic is inter-Lab

Who Generates ESnet Traffic?ESnet Inter-Sector Traffic Summary, Jan 03 / Feb 04/ Nov 04

Peering Points

Commercial

R&E (mostlyuniversities)

21/14/10%

17/10/14%

9/26/25%

14/12/9%

10/13/16%

4/6/13%

ESnet

~25/19/13%

DOE collaborator traffic, inc. data

72/68/62%

53/49/50%

DOE is a net supplier of data because DOE facilities are used by universities and commercial entities, as well as by DOE researchers

DOE sites

International(almost entirelyR&E sites)

Page 13: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

A Small Number of Science Users Account fora Significant Fraction of all ESnet Traffic

TB

ytes

/M

onth

Class 1: DOE Lab-International R&E

Class 2: Lab-U.S. (domestic) R&E

Class 3: Lab-Lab(domestic)

Notes: 1) This data does not include intra-Lab (LAN) traffic (ESnet ends at the Lab border routers, so science traffic on the Lab LANs is invisible to ESnet2) Some Labs have private links that are not part of ESnet - that traffic is not represented here.

ESnet Top 100 Host-to-Host Flows, Feb., 2005

Top 100 flows = 84 TBy

Total ESnet traffic Feb., 2005 = 323 TBy

in approx. 6,000,000,000 flows

All other flows(< 0.28 TBy/month

each)

Dom

estic

International

Inter-L

abClass 4: Lab-Comm.(domestic)

Page 14: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

14

Source and Destination of the Top 30 Flows, Feb. 2005T

erab

ytes

/Mon

th

Fer

mila

b (

US

)

Wes

tGri

d (

CA

)

SL

AC

(U

S)

INF

N C

NA

F (

IT)

SL

AC

(U

S)

RA

L (

UK

)

Fer

mila

b (

US

)

MIT

(U

S)

SL

AC

(U

S)

IN2P

3 (F

R)

IN2P

3 (F

R)

Fer

mila

b (

US

)

SL

AC

(U

S)

Kar

lsru

he

(DE

)

Fer

mila

b (

US

)

Jo

hn

s H

op

kin

s

12

10

8

6

4

2

0

LIG

O (

US

)

Cal

tech

(U

S)

LL

NL

(U

S)

NC

AR

(U

S)

Fer

mila

b (

US

)

SD

SC

(U

S)

Fer

mila

b (

US

)

Kar

lsru

he

(DE

)

LB

NL

(U

S)

U. W

isc.

(U

S)

Fer

mila

b (

US

)

U. T

exas

, Au

stin

(U

S)

BN

L (

US

)

LL

NL

(U

S)

BN

L (

US

)

LL

NL

(U

S)

Fer

mila

b (

US

)

UC

Dav

is (

US

)Q

wes

t (U

S)

E

Sn

et (

US

)F

erm

ilab

(U

S)

U. T

oro

nto

(C

A)

BN

L (

US

)

LL

NL

(U

S)

BN

L (

US

)

LL

NL

(U

S)

CE

RN

(C

H)

BN

L (

US

)N

ER

SC

(U

S)

LB

NL

(U

S)

DO

E/G

TN

(U

S)

JL

ab (

US

)U

. To

ron

to (

CA

)

Fer

mila

b (

US

)N

ER

SC

(U

S)

LB

NL

(U

S)

NE

RS

C (

US

)

LB

NL

(U

S)

NE

RS

C (

US

)

LB

NL

(U

S)

NE

RS

C (

US

)

LB

NL

(U

S)

CE

RN

(C

H)

Fer

mila

b (

US

)

DOE Lab-International R&E

Lab-U.S. R&E (domestic)

Lab-Lab (domestic)

Lab-Comm. (domestic)

Page 15: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

15

Observed Drivers for ESnet Evolution

• The observed combination ofo exponential growth in ESnet traffic, ando large science data flows becoming a significant fraction of all ESnet

traffic

show that the projections of the science community are reasonable and are being realized

• The current predominance of international traffic is due to high-energy physics

o However, all of the LHC US tier-2 data analysis centers are at US universities

o As the tier-2 centers come on-line, the DOE Lab to US university traffic will increase substantially

• High energy physics is several years ahead of the other science disciplines in data generation

o Several other disciplines and facilities (e.g. climate modeling and the supercomputer centers) will contribute comparable amounts of additional traffic in the next few years

Page 16: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

16

DOE Science Requirements for Networking

1) Network bandwidth must increase substantially, not just in the backbone but all the way to the sites and the attached computing and storage systems

2) A highly reliable network is critical for science – when large-scale experiments depend on the network for success, the network must not fail

3) There must be network services that can guarantee various forms of quality-of-service (e.g., bandwidth guarantees) and provide traffic isolation

4) A production, extremely reliable, IP network with Internet services must support Lab operations and the process of small and medium scale science

Page 17: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

17

ESnet’s Place in U. S. and International Science

• ESnet and Abilene together provide most of the nation’s transit networking for basic scienceo Abilene provides national transit networking for most of the

US universities by interconnecting the regional networks (mostly via the GigaPoPs)

o ESnet provides national transit networking for the DOE Labs

• ESnet differs from Internet2/Abliene in thato Abilene interconnects regional R&E networks – it does not

connect sites or provide commercial peeringo ESnet serves the role of a tier 1 ISP for the DOE Labs

- Provides site connectivity

- Provides full commercial peering so that the Labs have full Internet access

Page 18: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

18

ESnet and GEANT

• GEANT plays a role in Europe similar to Abilene and ESnet in the US – it interconnects the European National Research and Education Networks, to which the European R&E sites connect

• GEANT currently carries essentially all ESnet traffic to Europe (LHC use of LHCnet to CERN is still ramping up)

Page 19: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

19

Ensuring High Bandwidth, Cross Domain Flows

• ESnet and Abilene have recently established high-speed interconnects and cross-network routing

• Goal is that DOE Lab ↔ Univ. connectivity should be as good as Lab ↔ Lab and Univ. ↔ Univ. Constant monitoring is the key US LHC Tier 2 sites need to be incorporated

• The Abilene-ESnet-GEANT joint monitoring infrastructure is expected to become operational over the next several months (by mid-fall, 2005)

Page 20: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

20

ESnetAbilene

DENDEN

ELPELP

ALBALB

DCDC

DOE Labs w/ monitorsUniversities w/ monitorsnetwork hubshigh-speed cross connects: ESnet ↔ Internet2/Abilene ( scheduled for FY05)

Monitoring DOE Lab ↔ University Connectivity• Current monitor infrastructure (red&green) and target infrastructure• Uniform distribution around ESnet and around Abilene• All US LHC tier-2 sites will be added as monitors

Japan

Japan

EuropeEurope

SDGSDG

Japan

CHICHI

AsiaPacSEASEA

NYCNYC

HOUHOU

KCKC

LALA

ATLATL

INDIND

SNVSNV

Initial site monitors

SDSC

LBNL

FNAL

NCS*

BNL

OSU

ESnet

Abilene

CERNCERN

*intermittent

Page 21: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

21

One Way Packet Delays Provide a Fair Bit of Information

The result of a congested tail circuit

to FNAL

The result of a problems with the

monitoring system at CERN, not the network

Normal: Fixed delay from one site to another that is primary a function of

geographic separation

Page 22: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

22

Strategy For The Evolution of ESnet

A three part strategy for the evolution of ESnet1) Metropolitan Area Network (MAN) rings to provide

- dual site connectivity for reliability

- much higher site-to-core bandwidth

- support for both production IP and circuit-based traffic

2) A Science Data Network (SDN) core for- provisioned, guaranteed bandwidth circuits to support large, high-speed

science data flows

- very high total bandwidth

- multiply connecting MAN rings for protection against hub failure

- alternate path for production IP traffic

3) A High-reliability IP core (e.g. the current ESnet core) to address- general science requirements

- Lab operational requirements

- Backup for the SDN core

- vehicle for science services

Page 23: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

MetropolitanArea Rings

Primary DOE Labs

IP core hubs

Strategy For The Evolution of ESnet:Two Core Networks and Metro. Area Rings

New hubs

SDN/NLR hubs

GEANT (Europe)

Asia-Pacific

New York

Chi

cago

Washington, DC

Atl

anta

CERN

Seattle

AlbuquerqueAu

s.A

ust

rali

a

Science Data Network Core (SDN) (NLR circuits)

IP Core

San Diego

LA

Production IP core (10-20 Gbps)Science Data Network core (30-50 Gbps)Metropolitan Area Networks (20+ Gbps)Lab supplied (10+ Gbps)International connections (10-40 Gbps)

Su

nn

yval

e

Page 24: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

24

Site gateway routersite equip. Site gateway router

ESnet production

IP core

ANLFNAL

ESnet MAN Architecture (e.g. Chicago)R&E peerings

monitor

ESnet management and

monitoring

ESnet managedλ / circuit servicestunneled through the IP backbone

monitor

site equip.

ESnet production IP service

ESnet managedλ / circuit services

T320

International peerings

Site LAN Site LAN

ESnet SDN core

T320

2-4 x 10 Gbps channels

core router

switches managingmultiple lambdas

core router

Starlight Qwest

Page 25: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

25

First Two Steps in the Evolution of ESnet

1) The SF Bay Area MAN will provide to the five OSC Bay Area siteso Very high speed site access – 20 Gb/s

o Fully redundant site access

2) The first two segments of the second national10 Gb/s core – the Science Data Network – areSan Diego to Sunnyvale to Seattle

Page 26: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

IP core to Chicago (Qwest)

IP core to El Paso

SDN to Seattle (NLR)

SDN to San Diego

SF Bay Area

λ4 future

λ3 future

λ2 SDN/circuits

λ1 production IP

SLAC

Qwest /ESnet hub

SNLL

Joint Genome Institute

LBNL

NERSC

LLNL

NASAAmes

Level 3hub

ESnet hubs and sites

DOE Ultra Science Net

(research net)

ESnet MAN ring (Qwest

circuits)

ESnet SF Bay AreaMAN Ring (Sept., 2005)

10 Gb/soptical channels

• 2 λs (2 X 10 Gb/s channels) in a ring configuration, and delivered as 10 GigEther circuits

- 10-50X current site bandwidth

• Dual site connection (independent “east” and “west” connections) to each site

• Will be used as a 10 Gb/s production IP ring and2 X 10 Gb/s paths (for circuit services) to each site

• Qwest contract signed for two lambdas 2/2005 with options on two more

• Project completion date is 9/2005

Page 27: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

27

SF Bay Area MAN – Typical Site Configuration

SF BAMAN

East

λ1 and λ2

6509ESnet

= 4 x 10 GE line cards(using 2 ports max. per

card)

West

λ1 and λ2

site

1 or 2 x 10 GE(provisioned circuits

via VLANS)

nx1GEor 10GE

IP

Site LAN

0-10 Gb/spass-through

IP traffic

0-10 Gb/sdrop-offIP traffic

0-10 Gb/spass-throughVLAN traffic

Site

= 24 x 1 GE line cards

0-20 Gb/sVLAN traffic

max. of 2x10G connections on any line

card to avoid switch limitations

Page 28: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

MetropolitanArea Rings

Primary DOE Labs

IP core hubs

Evolution of ESnet – Step One:SF Bay Area MAN and West Coast SDN

New hubs

SDN/NLR hubs

GEANT (Europe)

Asia-Pacific

New York

Chi

cago

Washington, DC

El Paso

Atl

anta

CERN

Seattle

Albuquerque

Au

s.A

ust

rali

a

Science Data Network Core (SDN) (NLR circuits)

IP Core (Qwest)

San Diego

LA

Production IP coreScience Data Network coreMetropolitan Area NetworksLab suppliedInternational connections

Su

nn

yval

e

In service by Sept., 2005planned

Page 29: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

29

DENDEN

ELPELP

ALBALBATLATL

Metropolitan Area Rings

ESnet Goal – 2009/2010

Aus.

CERN Europe

SDGSDG

AsiaPacSEASEA

Major DOE Office of Science SitesHigh-speed cross connects with Internet2/Abilene

New ESnet hubsESnet hubs

SNVSNV

Europe

Japan

CHICHI

Science Data Network coreLab suppliedMajor international

Production IP ESnet core

DCDC

Japan

NYCNYC

Aus.

MetropolitanAreaRings

• 10 Gbps enterprise IP traffic • 40-60 Gbps circuit based transport

ESnetScience Data Network

(2nd Core – 30-50 Gbps,National Lambda Rail)

ESnet IP Core(≥10 Gbps)

10Gb/s10Gb/s30Gb/s

40Gb/s

CERN

Eu

rop

e

Page 30: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

30

Near-Term Needs for LHC Networking

• The data movement requirements of the several experiments at the CERN/LHC are considerable

• Original MONARC model (CY2000 - Models of Networked Analysis at Regional Centres for LHC Experiments – Harvey Newman’s slide, above) predictedo Initial need for 10 Gb/s dedicated bandwidth for

LHC startup (2007) to each of the US Tier 1 Data Centers

- By 2010 the number is expected to 20-40 Gb/s per Center

o Initial need for 1 Gb/s from the Tier 1 Centers to each of the associated Tier 2 centers

Page 31: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

31

Near-Term Needs for LHC Networking

• However, with the LHC commitment to Grid based data analysis systems, the expected bandwidth and network service requirements for the Tier 2 centers are much greater than the MONARCH bulk data movement model

o MONARCH still probably holds for the Tier0 (CERN) –Tier 1 transfers

o For widely distributed Grid workflow systems QoS is considered essential- Without a smooth flow of data between workflow nodes the

overall system would likely be very inefficient due to stalling the computing and storage elements

• Both high bandwidth and QoS network services must be addressed for LHC data analysis

Page 32: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

32

Proposed LHC high-level architecture

IN2P3IN2P3

GridKaGridKa

TRIUMFTRIUMF

ASCCASCC

FermilabFermilab

BrookhavenBrookhaven

NordicNordic

CNAFCNAF

SARASARAPICPIC

RAL

T2s and T1s are inter-connectedby the general purpose research

networks

Dedicated10 Gbit links

Any Tier-2 mayaccess data atany Tier-1 T2T2T2

T2T2T2

T2T2T2T2T2T2

T2T2T2

T2T2T2

T2T2T2

T2T2T2

T2T2T2

T2T2T2

T0/T1/T2 InterconnectivityT0/T1/T2 Interconnectivity

LHC Network Operations Working Group, LHC Computing Grid Project

Page 33: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

33

Near-Term Needs for North American LHC Networking

• Primary data paths from LHC Tier 0 to Tier 1 Centers will be dedicated 10Gb/s circuits

• Backup paths must be providedo About day’s worth of data can be buffered at CERNo However, unless both the network and the analysis

systems are over-provisioned it may not be possible to catch up even when the network is restored

• Three level backup strategyo Primary: Dedicated 10G circuits provided by CERN and

DOEo Secondary: Preemptable10G circuits (e.g. ESnet’s SDN,

NSF’s IRNC links, GLIF, CA*net4)o Tertiary: Assignable QoS bandwidth on the production

networks (ESnet, Abilene, GEANT, CA*net4)

Page 34: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

34

Proposed LHC high-level architecture

Main connection

Backup connection

L3 Backbones

Tier0

Tier1s

Tier2s

Page 35: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

35

LHC Networking and ESnet, Abilene, and GEANT

• USLHCnet (CERN+DOE funded) supports US participation in the LHC experiments

o Dedicated high bandwidth circuits from CERN to the U.S. transfer LHC data to the US Tier 1 data centers (FNAL and BNL)

• ESnet is responsible for getting the data from the trans-Atlantic connection points for the European circuits (Chicago and NYC) to the Tier 1 sites

o ESnet is also responsible for providing backup paths from the trans-Atlantic connection points to the Tier 1 sites

• Abilene is responsible for getting data from ESnet to the Tier 2 sites

• The new ESnet architecture (Science Data Network) is intended to accommodate the anticipated 20-40 Gb/s from LHC to US (both US tier 1 centers are on ESnet)

Page 36: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

36

ESnet Lambda Infrastructure and LHC T0-T1 Networking

Denver

Seattle

Su

nn

yv

ale

LA

San Diego

Chicago

Pitts

Raleigh

Jacksonville

Atlanta

KC

Baton Rouge

El Paso - Las Cruces

Phoenix

Pensacola

Dallas

San Ant.Houston

Albuq. Tulsa

New YorkClev

Boise

CE

RN

-1G

EA

NT

-1G

EA

NT

-2

Wash DC

CE

RN

-2

Tier 1 Centers

ESnet IP core hubs

New hubs

ESnet SDN/NLR hubsESnet Production IP core (10-20 Gbps)ESnet Science Data Network core (10G/link)(incremental upgrades, 2007-2010)Other NLR linksCERN/DOE supplied (10G/link)International IP connections (10G/link)

Cross connects with Internet2/Abilene

NLR PoPs

TRIUMF

FNAL

BNL

CE

RN

-3

Vancouver

Toronto

CANARIE

Page 37: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

37

Abilene* and LHC Tier 2, Near-Term Networking

Denver

Seattle

Su

nn

yv

ale

LA

San Diego

Chicago

Pitts

Raleigh

Jacksonville

Atlanta

KC

Baton Rouge

El Paso - Las Cruces

Phoenix

Pensacola

Dallas

San Ant.Houston

Albuq. Tulsa

New YorkClev

Boise

CE

RN

-1G

EA

NT

-1G

EA

NT

-2

Wash DC

CE

RN

-2

Tier 1 Centers

ESnet IP core hubs

New hubs

ESnet SDN/NLR hubsESnet Production IP core (10-20 Gbps)ESnet Science Data Network core (10G/link)(incremental upgrades, 2007-2010)Other NLR linksCERN/DOE supplied (10G/link)International IP connections (10G/link)

Cross connects with Internet2/Abilene

NLR PoPs

Vancouver

Toronto

CANARIE

Atlas Tier 2 Centers• University of Texas at Arlington • University of Oklahoma Norman • University of New Mexico Albuquerque • Langston University • University of Chicago • Indiana University Bloomington • Boston University• Harvard University • University of Michigan

FNAL

TRIUMF

CMS Tier 2 Centers• MIT • University of Florida at Gainesville • University of Nebraska at Lincoln • University of Wisconsin at Madison• Caltech• Purdue University • University of California San Diego

< 10G connections to Abilene

10G connections to USLHC or ESnet

CE

RN

-3

BNL

USLHCnet nodes

Page 38: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

38

QoS - New Network Service

• New network services are critical for ESnet to meet the needs of large-scale science like the LHC

• Most important new network service is dynamically provisioned virtual circuits that provideo Traffic isolation

- will enable the use of high-performance, non-standard transport mechanisms that cannot co-exist with commodity TCP based transport(see, e.g., Tom Dunigan’s compendium http://www.csm.ornl.gov/~dunigan/netperf/netlinks.html )

o Guaranteed bandwidth- the only way that we have currently to address deadline

scheduling – e.g. where fixed amounts of data have to reach sites on a fixed schedule in order that the processing does not fall behind far enough so that it could never catch up – very important for experiment data analysis

Page 39: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

39

OSCARS: Guaranteed Bandwidth Service

• Must accommodate networks that are shared resourceso Multiple QoS paths

o Guaranteed minimum level of service for best effort traffic

o Allocation management- There will be hundreds of contenders with different science

priorities

Page 40: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

40

OSCARS: Guaranteed Bandwidth Service

• Virtual circuits must be set up end-to-end across ESnet, Abilene, and GEANT, as well as the campuseso There are many issues that are poorly understood

o To ensure compatibility the work is a collaboration with the other major science R&E networks

- code is being jointly developed with Internet2's Bandwidth Reservation for User Work (BRUW) project – part of the Abilene HOPI (Hybrid Optical-Packet Infrastructure) project

- Close cooperation with the GEANT virtual circuit project(“lightpaths – Joint Research Activity 3 project)

Page 41: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

41

OSCARS: Guaranteed Bandwidth Service

usersystem2

usersystem1

site B

resourcemanager

resourcemanager

resourcemanager

polic

er

auth

oriz

atio

n

shap

er

site A

allocationmanager

• To address all of the issues is complex

-There are many potential restriction points

-There are many users that would like priority service, which must be rationed

polic

er

bandwidthbroker

Page 42: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

Between ESnet, Abilene, GEANT, and the connected regional R&E networks, there will be dozens of lambdas in production

networks that are shared between thousands of users who want to use virtual circuits.

similar situationin Europe

US R&E environment

Page 43: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

43

Federated Trust Services

• Remote, multi-institutional, identity authentication is critical for distributed, collaborative science in order to permit sharing computing and data resources, and other Grid services

• Managing cross site trust agreements among many organizations is crucial for authentication in collaborative environmentso ESnet assists in negotiating and managing the cross-site,

cross-organization, and international trust relationships to provide policies that are tailored to collaborative science

• The form of the ESnet trust services are driven entirely by the requirements of the science community and direct input from the science community

Page 44: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

44

ESnet Public Key Infrastructure

• ESnet provides Public Key Infrastructure and X.509 identity certificates that are the basis of secure, cross-site authentication of people and Grid systems

• These services (www.doegrids.org) provideo Several Certification Authorities (CA) with different uses

and policies that issue certificates after validating request against policy

This service was the basis of the first routine sharing of HEP computing resources between US and Europe

Page 45: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

45

ESnet Public Key Infrastructure

• ESnet provides Public Key Infrastructure and X.509 identity certificates that are the basis of secure, cross-site authentication of people and Grid systems

• The characteristics and policy of the several PKI certificate issuing authorities are driven by the science community and policy oversight (the Policy Management Authority – PMA) is provided by the science community + ESnet staff

• These services (www.doegrids.org) provideo Several Certification Authorities (CA) with different uses

and policies that issue certificates after validating certificate requests against policy

This service was the basis of the first routine sharing of HEP computing resources between US and Europe

Page 46: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

46

ESnet Public Key Infrastructure

• Root CA is kept off-line in a vault

• Subordinate CAs are kept in locked, alarmed racks in an access controlled machine room and have dedicated firewalls

• CAs with different policies as required by the science community

o DOEGrids CA has a policy tailored to accommodate international science collaboration

o NERSC CA policy integrates CA and certificate issuance with NIM (NERSC user accounts management services)

o FusionGrid CA supports the FusionGrid roaming authentication and authorization services, providing complete key lifecycle management

DOEGrids CA

NERSC CA

FusionGrid CA

…… CA

ESnet root CA

Page 47: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

47

DOEGrids CA (one of several CAs) Usage Statistics

User Certificates 1999 Total No. of Certificates 5479

Host & Service Certificates 3461 Total No. of Requests 7006

ESnet SSL Server CA Certificates 38

DOEGrids CA 2 CA Certificates (NERSC) 15

FusionGRID CA certificates 76

* Report as of Jun 15, 2005

0250500750

100012501500175020002250250027503000325035003750400042504500475050005250550057506000625065006750700072507500

Production service began in June 2003

No

.of

ce

rtif

ica

tes

or

req

ue

sts

User Certificates

Service Certificates

Expired(+revoked)Certificates

Total Certificates Issued

Total Cert Requests

Page 48: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

48

DOEGrids CA Usage - Virtual Organization Breakdown

DOEGrids CA Statistics (5479)

*Others41.2%

PPDG15.2%

iVDGL18.8%

ANL3.5%

PNNL0.4%

ORNL0.6%

NERSC3.2%

LBNL1.2%

FusionGRID4.8%

FNAL8.9%

ESnet0.4%

ESG0.8%

DOESG0.3%

NCC-EPA0.1%

LCG0.6%

*DOE-NSF collab.

“Other” is mostly auto renewal certs (via the Replacement Certificate interface) that

does not provide VO information

Page 49: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

49

North American Policy Management Authority• The Americas Grid, Policy Management Authority

• An important step toward regularizing the management of trust in the international science community

• Driven by European requirements for a single Grid Certificate Authority policy representing scientific/research communities in the Americas

• Investigate Cross-signing and CA Hierarchies support for the science community

• Investigate alternative authentication services

• Peer with the other Grid Regional Policy Management Authorities (PMA).o European Grid PMA [www.eugridpma.org ]o Asian Pacific Grid PMA [www.apgridpma.org]

• Started in Fall 2004 [www.TAGPMA.org]

• Founding memberso DOEGrids (ESnet)o Fermi National Accelerator Laboratoryo SLACo TeraGrid (NSF)o CANARIE (Canadian national R&E network)

Page 50: William E. Johnston  ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory

50

References – DOE Network Related Planning Workshops

1) High Performance Network Planning Workshop, August 2002http://www.doecollaboratory.org/meetings/hpnpw

2) DOE Science Networking Roadmap Meeting, June 2003http://www.es.net/hypertext/welcome/pr/Roadmap/index.html

3) DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003

http://www.csm.ornl.gov/ghpn/wk2003

4) Science Case for Large Scale Simulation, June 2003http://www.pnl.gov/scales/

5) Workshop on the Road Map for the Revitalization of High End Computing, June 2003

http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report)

6) ASCR Strategic Planning Workshop, July 2003http://www.fp-mcs.anl.gov/ascr-july03spw

7) Planning Workshops-Office of Science Data-Management Strategy, March & May 2004

o http://www-conf.slac.stanford.edu/dmw2004