CC* Integration: SANDIE: SDN-Assisted NDN for Data ...plugin for XRootD. Named Data Networking (NDN)...

Post on 21-Feb-2020

14 views 0 download

Transcript of CC* Integration: SANDIE: SDN-Assisted NDN for Data ...plugin for XRootD. Named Data Networking (NDN)...

CC* Integration:

SANDIE: SDN-Assisted NDN for

Data Intensive Experiments

Edmund Yeh

NSF Campus Cyberinfrastructure and Cybersecurity Innovation for Cyberinfrastructure PI Workshop

SCIENTIFIC and BROADER IMPACT▪ Lay groundwork for an NDN-based

data distribution and access system for data-intensive science fields

▪ Benefit user community through lowered costs, faster data access

and standardized naming structures

▪ Engage next generation of scientists in emerging concepts of future Internet architectures for data intensive applications

▪ Advance, extend and test the NDN paradigm to encompass the most data intensive research applications of global extent

SOLUTIONS + Deliverables

CHALLENGES▪ LHC program in HEP is world’s largest

data intensive application: handling One Exabyte by ~2018 at hundreds of sites

▪ Global data distribution, processing, access, analysis; large but limited computing, storage, network resources

APPROACH▪ Use Named Data Networking (NDN)

to redesign LHC HEP network; optimize workflow

SANDIE: SDN Assisted NDN for Data Intensive ExperimentsSeptember 25,2018 |College Park, MD

TEAM▪ Northeastern, Caltech, Colorado

State▪ In partnership with other LHC

sites and the NDN project team

CC* Integration:

▪ Deploy NDN edge caches with SSDs & 40G/100G network interfaces at 7 sites; combine with larger core caches

▪ Simultaneously optimize caching (“hot” datasets), forwarding, and congestion control in both the network core and site edges

▪ Development of naming scheme and attributes for fast access and efficient communication in HEP and other fields

PI: Edmund Yehco-PIs: Harvey Newman, Christos Papadopoulos

Program, Area: CC, Integration Award Number: 1659403

Edmund YehProfessorNortheastern Universityeyeh@ece.neu.edu

Christos PapadopoulosProfessor

Colorado State University

christos@colostate.edu

Harvey NewmanProfessor of PhysicsCaltechnewman@hep.caltech.edu

Project Title: SANDIE: SDN-Assisted NDN for Data Intensive Experiments

NSF Campus Cyberinfrastructure and Cybersecurity Innovation for Cyberinfrastructure PI Workshop

September 25,2018 |College Park, MD

Outline

•Named Data Networking (NDN)

•Optimized NDN algorithms

•Optimizing CMS workflows: simulations

•Global testbed

• Integration with Current Workflow: NDN-based filesystem

plugin for XRootD

Named Data Networking (NDN)

• Leading Information-centric Networking (ICN) architecture.

•Started by $7.9M NSF Future Internet Architecture grant (2010).

•Name data instead of endpoints: paradigm shift.

•Pull-based data distribution architecture with stateful forwarding.

•Native support for caching and multicast.

•Natural fit for data-intensive science.

•Website: http://named-data.net

• State-of-art, theoretically-grounded, high-performance optimization

frameworks for optimizing NDN.

• Joint optimization of caching, forwarding, and congestion control.

• VIP (Virtual Interest Packet) framework (Yeh, et al. 2014).

• Adaptive caching and routing framework (Ioannidis & Yeh 2016, 2017).

• Algorithms have optimality guarantees.

• Algorithms are distributed, adaptive, dynamic.

• Superior performance in delay, cache hits, cache evictions.

Optimized NDN Algorithms

• Long-term goal: optimize dynamic caching, distribution of datasets to REPOS, distribution of workflow jobs.

• First goal: given CMS workflow statistics, use simulation to find performance improvement with optimal VIP caching and forwarding algorithm (Yeh, et al. 2014).

• Took dataset and request statistics in PhEDEx and Elastic Search over 2 months at datablock level granularity (10s-100s of GBs).

• Measured CMS network topology and link capacities using PerfSONAR.

• Simulation: apply VIP algorithm to set of 500 most popular datablocks.

• 6 TB caches in all T1 and T2 US sites.

• Result: average data retrieval delays reduced by ~ 50% for off-site workflow.

Optimizing CMS Workflows: Simulations

• Topology includes all Tier 1 and Tier 2 sites in US:

– PerfSONAR: average throughput between sites

– 10 - 100 Gbps typically

CMS Simulation Topology

•Expand Caltech SDN testbed and CSU climate testbed.

•Deploy few more sites in USA and abroad (Northeastern, UCSD, U. Nebraska, ..)

•Caches each with:

⚫ Several terabytes of SSDs

⚫ 40-60 Terabytes of SAS disks

⚫ 10G to 100G network interfaces

⚫ NDN and Xrootd software systems

Global Testbed

•Goals: increased efficiency, reduced complexity for both

applications and network.

• Integrate with XRootD, the de-facto data management

software.

•First step: implement NDN-based filesystem plugin for

XRootD and suitable NDN Producer.

Integration with Current Workflow

Architecture and Flow Diagram

NDN Consumer for XRootD Plugin

• Support for: open, fstat, read and close system calls

• Composes Interests for these system calls over NDN:

/ndn/xrootd/……./root/path/for/ndn/xrd/foo?ndn.MustBeFresh=true

ndn prefix path to file of interest. ndn interest specific info

/ndn/xrootd/......./root/test/path/ndn/xrd/foo/%00%00?ndn.MustBeFresh=true

ndn prefix path to file of interest. seg. no ndn interest specific info

• For read system call, it breaks down the request into segments.

• Handles: Interest validation, timeout and Nack.

offset in file buffer len

NDN package len 7KB

first Interest last Interest

• Registers prefixes to NFD.

• Performs open, fstat, read and close system calls as required.

• Keeps track of opened files.

• Sends data as strings and non-negative integers.

/ndn/xrootd/……..../root/path/for/ndn/xrd/foo/00/?ndn.MustBeFresh=true&ndn.Nonce=825012545

ndn prefix path to file of interest. ndn data specific info

NDN Producer for XRootD Plugin

Demo

Consumer tracing:

Using xrdcp tool to copy file:

Demo

Producer tracing

• NDN is natural fit for LHC and other large-scale scientific data and

computation networks.

• Fully exploits both bandwidth, storage, and computation resources.

• Distributed data access by name, with high-performance caching, forwarding,

congestion control.

• Works well with overlay networks with SDN-assisted reserved bandwidth.

• Synergistic with ongoing move to edge computing, SDN, network function

virtualization, network slicing.

• Possible model for National Research Platform (NRP).

Conclusions