Some Techniques Used by ESnet to Mitigate Network Emergencies · PDF fileESnet to Mitigate...
Transcript of Some Techniques Used by ESnet to Mitigate Network Emergencies · PDF fileESnet to Mitigate...
1
Some Techniques Used by ESnet to Mitigate Network
Emergencies
JET Roadmap WorkshopApril 14, 2004Joe Burrescia
ESnet
2
What Is an Emergency?
An emergency is any unplanned event that can cause deaths or significant injuries to employees, customers or the public; or that can shut down your business, disrupt operations, cause physical or environmental damage, or threaten the facility's financial standing or public image. Obviously, numerous events can be "emergencies," including:
– Fire– Hazardous materials incident – Flood or flash flood – Hurricane – Tornado – Winter storm – Earthquake – Communications failure– Radiological accident – Civil disturbance– Loss of key supplier or customer – Explosion
From FEMA Emergency Management Guide For Business & Industry
3
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external
assistance
4
TWC
JGISNLL
LBNL
SLAC
YUCCA MT
BECHTEL
PNNLLIGO
INEEL
LANL
SNLAKCP-
KirklandPANTEX
KCP
NOAA
OSTIORAU
SRS
ORNLJLAB
PPPL
ANL-DCINEEL-DCORAU-DC
LLNL/LANL-DC
MIT
ANL
BNL
FNALAMES
4xLAB-DCNERSC NREL
ALBHUB
LLNL
GA
LamontSDSC
Japan
GTN&NNSA
International (high speed)OC192 (10G/s optical)OC48 (2.5 Gb/s optical)Gigabit Ethernet (1 Gb/s)OC12 ATM (622 Mb/s)OC12 OC3 (155 Mb/s)T3 (45 Mb/s)T1-T3T1 (1 Mb/s)
Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)
QWESTATM
42 end user sites
ESnet IP
GEANT- Germany- France- Italy- UK- etc Sinet (Japan)Japan – Russia(BINP)
CA*net4CERNMRENNetherlandsRussiaStarTapTaiwan(ASCC)
CA*net4KDDI (Japan)FranceSwitzerlandTaiwan(TANet2)
AustraliaCA*net4Taiwan(TANet2)
Singaren
ELP HUB
SNV HUB CHI HUB
AOA HUB
ATL HUB
DC HUB
Peering points
MAE-EStarlightChi NAP
Fix-WPAIX-W
MAE-W
NY-NAP
PAIX-E
PNWG
SEA HUB
ESnet Connects DOE Facilities and Collaborators
Hubs
Equinix
Abile
ne Abile
ne
Abilene(future)
Abilene
Equinix
DOE-ALB
5
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated– Multiply distributed NOC hardware– Geographically dispersed personnel
• Massive DoS attacks against sites• Failures of physical circuits • Self recoverable communication outages• Communication outages requiring external
assistance
7
Distributed NOC Functionalities
– Load Sharing, Online always• DNS• Network Monitoring System (Aprisma’s Spectrum)• Mail• Trouble Ticket System• Main Web Server
– Backed up services, Offline till primary unavailable• Engineering private web server
– Router RANCID database– Circuit database– Contact database– DNS backend database
• PKI server
8
Geographically Distributed Staff
• 1 primary location– Berkeley Lab
• 3 remote locations– Livermore Telework Center– Ames Lab in Iowa– Brookhaven National Lab in New York
9
LBNLPPPL
BNL
AMES
Remote Engineers• partial duplicate infrastructure
DNS
Remote Engineer• partial duplicate
infrastructure
TWCRemoteEngineer
Dispersed Network Resources
ATL HUB
SEA HUB
ALBHUB
NYC HUBS
DC HUB
ELP HUB
CHI HUB
SNV HUB Duplicate InfrastructureCurrently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub
Engineers, 24x7 Network Operations Center, generator backed power
• Spectrum (net mgmt system)• DNS (name – IP address
translation)• Engineering databases• Public and private Web• E-mail (server and archive)• PKI cert. repository and
revocation lists• collaboratory authorization
service
10
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated• Massive DoS attacks against sites
– Phased Security Architecture • Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external
assistance
11
Maintaining Science Mission Critical Infrastructurein the Face of Cyberattack
• A Phased Security Architecture is being implemented to protects the network and the ESnet sites
• The phased response ranges from blocking certain site traffic to a complete isolation of the network which allows the sites to continue communicating among themselves in the face of the most virulent attacks– Separates ESnet core routing functionality from external Internet
connections by means of a “peering” router that can have a policy different from the core routers
– Provide a rate limited path to the external Internet that will insure site-to-site communication during an external denial of service attack
– Provide “lifeline” connectivity for downloading of patches, exchange of e-mail and viewing web pages (i.e.; e-mail, dns, http, https, ssh, etc.) with the external Internet prior to full isolation of the network
13
Phased Response to Cyberattack
LBNL
ESnet
router
router
borderrouter
X
peeringrouter
Lab
Lab
gatewayrouter
ESnet second response – filter traffic from outside of ESnet
Lab first response – filter incoming traffic at their ESnet gateway router
ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific
“lifeline” services
Xpeeringrouter
gatewayrouter
border router
router
attack trafficX
ESnet first response –filters to assist a site
14
Gartner WeeklyFLASH, 06 February 2004
*EVENT:* On 1 March 2004, MCI announced that it is now offering a denial-of-service (DoS) service-level agreement (SLA) designed to help customers defend against Internet attacks. The new SLA - believed to be the first offered by an ISP - guarantees that all MCI Internet customers will have immediate access to MCI's security staff, and that MCICustomer Support will respond to a suspected DoS attack within 15 minutes of a customer-generated trouble ticket. If MCI fails to meet the terms of the SLA, the customer will, at its request, be credited one day's prorated MCI charges for the affected service. The new SLA applies across all MCI IP services, and the performance guarantees are automatically extended to all customers at no additional cost.
15
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers
– Additional router hardening• Self recoverable communication outages• Communication outages requiring external
assistance
16
Additional Router Hardening
• Protocols– Considering using Authenticated protocols
• IGP– ISIS
» Side benefit of not running over IP– OSPFv2– iBGP with MD5 hash
• EGP– eBGP with MD5 hash
17
Additional Router Hardening
• Limit accepted routes– Sites
• Specific ACLs– Peers
• Specific ACLs where possible• “Bogon” ACL’s otherwise
• Limit accepted packets– Peers
• Reverse Path Forwarding (RPF) checks
18
Additional Router Hardening
• Router configurations verified daily– Downloaded from routers and compared to
known good versions.• Verify router OS (Juniper)
– Get a baseline MD5 checksum on critical system files for each router. Then generate an MD5 checksum on the actual files on the router and compare.
19
Router Hardening Resources
• http://nsa2.www.conxion.com• http://www.cymru.com/Documents/secure-ios-template.html• http://www.qorbit.net/documents/junos-template.pdf• http://www.cymru.com/Documents/secure-bgp-template.html
20
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages
– Redundant topology• Communication outages requiring external
assistance
21
Site gateway routersite equip. Site gateway router
Qwest hub
Vendor neutraltelecom facility
ESnet core
ANLFNAL
New ESnet Architecture – Chicago MAN as ExampleCERN
(DOE funded link)
monitor monitor
site equip.
ESnet production IP service
T320
StarLight
other international
peerings
Site LAN Site LAN
all interconnects from the sites
back to the core ring are high
bandwidth and have full module
redundancy
No single point failure can
disrupt
Current approach of
point-to-point tail circuits from hub
to site
22
ESnet/QwestNLRORNL
DENDENSNVSNV
ELPELP
ALBALB ATLATL
DCDC
MANs
High-speed cross connects with Internet2/Abilene
Major DOE Office of Science Sites
Production IP: Long-Term ESnet Connectivity Goal• Connecting MANs with two cores to ensure against hub failure (for example, NLR is shown as the second core below)
Japan
Japan
CERN/Europe
Europe
SDGSDG
Japan
CHICHI
AsiaPacSEASEA
NYCNYC
23
What Are Some Network Emergencies That Can be Anticipated ?
• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external
assistance– Networks helping networks
24
Network Cooperation During Emergencies
• Layer 1– Make use of (take over?) other network’s
“spare” physical links• Layer 2
– “Tunnel” across other network’s infrastructure• Layer 3
– (Pre)configure backup route announcements