CS 644: Introduction to Big Data Chapter 8. Enabling Big-data...

62
CS 644: Introduction to Big Data Chapter 8. Enabling Big-data Scientific Workflows in High-performance Networks 1 Chase Wu New Jersey Institute of Technology

Transcript of CS 644: Introduction to Big Data Chapter 8. Enabling Big-data...

Page 1: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

CS 644: Introduction to Big Data

Chapter 8. Enabling Big-data Scientific Workflows in High-performance Networks

1

Chase Wu

New Jersey Institute of Technology

Page 2: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Outline

• Introduction• Challenges & Objectives• A Three-layer Architecture Solution

- Enabling Technologies: networking and computing• Networking for Big Data

- Software-defined Networking (SDN)- High-performance Networking (HPN)

• Computing for Big Data- Workflow Management and Optimization

• Simulation/Experimental Results• Conclusion

2

Page 3: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Introduction• Supercomputing for big-data science

3

AstrophysicsComputational biology

Climate research

Flow dynamics

Computational materialsFusion simulation

Neutron sciences

Nanoscience

Page 4: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Terascale Supernova Initiative (TSI)

4

Visualization channel

Visualization control channelComputation steering channel

• Collaborative project- Supernova explosion

• TSI simulation- 1 terabyte a day with a

small portion of parameters

- From TSI to PSI• Transfer to remote sites

- Interactive distributed visualization

- Collaborative data analysis

- Computation monitoring- Computation steering

Supercomputer or ClusterClient

Page 5: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Challenges for Extreme-scale Scientific Applications• A typical way of research conduct

- Run simulation code on supercomputer in a batch modeØ One day: 1 terabyte datasets

- Move datasets to HPSSØ About 8 hours

- Transfer datasets to remote sites over the InternetØ TCP-based transfer tools: up to one week

- Filter out data of interest- Partition dataset for parallel processing- Extract geometry data- Generate images on rendering engine- Display results on desktop, laptop, powerwall, etc.

}5

Visualization:

several hours or days

Start over if any parameter values are not set appropriately!

Page 6: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Challenges in Modern Sciences

• BIG DATA: from T to P, to E, to Z, to Y, and beyond…- Simulation

• Astrophysics, climate modeling, combustion research, etc.

- Experimental• Spallation Neutron Source, Large Hadron Collider, etc.

- Observational• Large-scale sensor networks, astronomical image data (Dark

Energy Camera), etc.

No matter which type of data is considered, we needan end-to-end workflow solution

for data transfer, processing, and analysis!

6

Page 7: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Big-data Scientific Workflows• Require massively distributed resources

- Hardware• Computing facilities, storage systems, special rendering engines, display devices

(tiled display, powerwall, etc.), network infrastructure, etc.

- Software• Domain-specific data analytics/processing tools, programs, etc.

- Data type• Real-time, archival

• Feature different complexities- Simple case: linear pipeline (a special case of DAG)- Complex case: DAG-structured graph

• Support different application types- Interactive: minimize total end-to-end delay for fast response- Streaming: maximize frame rate to achieve smooth data flow

7

Page 8: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

8

- Support distributed workflows in heterogeneous environments

- Optimize workflow performance to meet various user requirementsØ Delay, throughput, reliability, etc.Ø Remote visualization, online computational monitoring

and steering, etc.- Make the best use of computing and networking

resources

Ultimate Goals

Page 9: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Solution: A Three-layer Architecture

9

Page 10: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Enabling Technologies

10

• Three layers- Top: Abstract scientific workflow- Middle: Virtual overlay network (grid, cloud)- Bottom: Physical high-performance network

• Top and bottom layers meet at middle layer- From bottom to middle: resource abstraction

• Bandwidth scheduling• Performance modeling and prediction

- From top to middle: workflow mapping• Optimization: where to execute modules?

• Workflow execution- Actual data transfer: transport control- Actual module running: job scheduling

Page 11: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Networking Requirements• Provision dedicated channels to meet

different transport objectives- High bandwidths

• Multiples of 10Gbps to terabits networking• Support bulk data transfers

- Stable bandwidths• 100s of Mbps• Support interactive control operations

• Why not the Internet?- Only backbone has high bandwidths (last mile)- Packet-level resource sharing- Best-effort IP routing- TCP: hard to sustain 10s Gbps or to stabilize

11

Page 12: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

12

An Overview of TCP/IP Stack

Page 13: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Software-Defined Networking

• The Concept of Virtualization• Virtualization of Computing• Virtualization of Networking• Software-Defined Network• Possible Directions

13

Page 14: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Concept of Virtualization

• Decoupling HW/SW• Abstraction and layering• Using, demanding, but not owning or

configuring• Resource pool: flexible to slice, resize,

combine, and distribute• A degree of automation by software

14

Page 15: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Benefits of Virtualization

• An analogy: owning a huge house- Real estate, immovable property- Does not generate cash and income

• How to gain more profit?- Divide this huge house into suites, and RENT to

people!- Renting suites: using but not owning- Transform a static investment into cash

generators!!!

15

Page 16: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Virtualization of Computing• Partitioning one physical machine

- Virtual instances• Running concurrently, sharing resources

• Hypervisor: Virtual Machine Monitor (VMM)- A software layer presents abstraction of physical resources

Key Factor of Virtualization

16

Page 17: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

NetworksareHardtoManage

• Operatinganetworkisexpensive- Morethanhalfthecostofanetwork- Yet,operatorerrorcausesmostoutages

• Buggysoftwareintheequipment- Routerswith20+millionlinesofcode- Cascadingfailures,vulnerabilities,etc.

• Thenetworkis“intheway”- Especiallyaproblemindatacenters- …andhomenetworks

17

Page 18: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

TraditionalComputerNetworks

Collect measurements and configure the equipment

Management plane:

18

Page 19: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

SoftwareDefinedNetworking(SDN)

API to the data plane(e.g., OpenFlow)

Logically-centralized control

Switches

Smart, slow

Dumb, fast

19

Page 20: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

OpenFlow Protocol

NETWORK OPERATING SYSTEM

Bandwidth - on -

Demand

DynamicOptical Bypass

Unified Recovery

UnifiedControl Plane

Switch Abstraction

Networking Applications

VIRTUALIZATION (SLICING) PLANE

Underlying DataPlane Switching

Traffic Engineering

Application-Aware

QoS

Provide Choices

Packet Switch

Packet Switch

Wavelength Switch

Time-slotSwitch

Multi-layerSwitch

Packet & Circuit Switch

Packet & Circuit Switch

20

Page 21: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Architecture

Onix / Network OS

Logical Forwarding Plane

Control Plane / Applications

Network Hypervisor

Real States

Logical States Abstractions

Mapping

Control Commands

Distributes, Configures

Network Info Base

API

Distributed System

Abstraction

Provides

Provides

OpenFlow

21

Page 22: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Switch Forwarding Pipeline

Logical Forwarding Plane

As packets/flows traverse the network: moving both in logical and physical forwarding plane → logical context

22

Page 23: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Data-Plane:SimplePacketHandling

• Simplepacket-handlingrules- Pattern:matchpacketheaderbits- Actions:drop,forward,modify,sendtocontroller- Priority:disambiguateoverlappingpatterns- Counters:#bytesand#packets

1. src=1.2.*.*,dest=3.4.5.*à drop2. src=*.*.*.*,dest=3.4.*.*à forward(2)3.src=10.1.2.3,dest=*.*.*.*à sendtocontroller

23

Page 24: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

UnifiesDifferentKindsofBoxes• Router

- Match:longestdestinationIPprefix

- Action:forwardoutalink• Switch

- Match:destinationMACaddress

- Action:forwardorflood

• Firewall- Match:IPaddressesandTCP/UDPportnumbers

- Action:permitordeny• NAT

- Match:IPaddressandport- Action:rewriteaddressandport

24

Page 25: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Controller:Programmability

25

Network OS

Controller Application

Events from switchesTopology changes,Traffic statistics,Arriving packets

Commands to switches(Un)install rules,Query statistics,Send packets

Page 26: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

ExampleOpenFlowApplications

• Dynamicaccesscontrol• Seamlessmobility/migration• Serverloadbalancing• Networkvirtualization• Usingmultiplewirelessaccesspoints• Energy-efficientnetworking• Adaptivetrafficmonitoring• Denial-of-Serviceattackdetection

See http://www.openflow.org/videos/26

Page 27: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Example:DynamicAccessControl

• Inspectfirstpacketofaconnection• Consulttheaccesscontrolpolicy• Installrulestoblockorroutetraffic

27

Page 28: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

SeamlessMobility/Migration

• Seehostsendtrafficatnewlocation• Modifyrulestoreroutethetraffic

28

Page 29: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

ServerLoadBalancing• Pre-installload-balancingpolicy• SplittrafficbasedonsourceIP

29

src=0*

src=1*

Page 30: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

NetworkVirtualization

30

Partition the space of packet headers

Controller #1 Controller #2 Controller #3

Page 31: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

OpenFlowintheWild

• OpenNetworkingFoundation- Google,Facebook,Microsoft,Yahoo,Verizon,DeutscheTelekom,andmanyothercompanies

• CommercialOpenFlowswitches- HP,NEC,Quanta,Dell,IBM,Juniper,…

• Networkoperatingsystems- NOX,Beacon,Floodlight,Nettle,ONIX,POX,Frenetic

• Networkdeployments- Eightcampuses,andtworesearchbackbonenetworks- Commercialdeployments(e.g.Googlebackbone)

31

Page 32: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

ControllerDelayandOverhead

• Controllerismuchslowerthantheswitch• Processingpacketsleadstodelayandoverhead• Needtokeepmostpacketsinthe“fastpath”

32

packets

Page 33: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

DistributedController

33

Network OS

Controller Application

Network OS

Controller Application

For scalability and reliability

Partition and replicate state

Page 34: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

High-performance Networks• Production and testbed networks

- UltraScience Net- ESnet OSCARS

• Offers MPLS tunnels and VLAN virtual circuits- Internet2 ION

• Offers MPLS tunnels and VLAN virtual circuits- UCLP

• User Controlled Light Paths- CHEETAH

• Circuit-switched High-speed End-to-End Transport Architecture

- DRAGON• Dynamic Resource Allocation via GMPLS Optical

Networks34

Page 35: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

UltraScience Net – In a Nutshell• Experimental Network Research Testbed

35

Page 36: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Control Plane

36

Responsible for1. Reserving link

bandwidths2. Setting up end-to-

end network paths3. Releasing

resources when tasks are completed

Page 37: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Bandwidth Scheduling

37

Page 38: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Bandwidth Scheduler

• Central component- Computes paths and allocate link bandwidths

• Scheduling in USN- Fixed slot: start time, end time, target BW

• Extension of Dijkstra’s Algorithm- All slots: duration, target BW

• Extension of Bellman-Ford Algorithm- Both are solvable by polynomial-time algorithms

38

Page 39: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Network Model for Bandwidth Scheduling

• Graph: G(V, E)• Link bandwidth

- Segmented constant functions of time- Time-bandwidth (TB): (t[i], t[i+1], b[i])

• Aggregated TB for all links

39

Res

idua

l Ban

dwid

th

[0]t [2]t[1]t [3]t

Page 40: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

4 Types of Scheduling Problems (TON’13)• Given: G = (E, V), ATB, vs , vd , data size δ• Objective: minimize data transfer end time• Fixed Path with Fixed Bandwidth (FPFB)

- Compute a fixed path from vs to vd with a constant (fixed) bandwidth

• Fixed Path with Variable Bandwidth (FPVB)- Compute a fixed path from vs to vd with varying bandwidths

across multiple time slots• Variable Path with Fixed Bandwidth (VPFB)

- Compute a set of paths from vs to vd at different time slots with the same (fixed) bandwidth

• Variable Path with Variable Bandwidth (VPVB)- Compute a set of paths from vs to vd at different time slots with

varying bandwidths across multiple time slots

40

Page 41: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

VPFB & VPVB

• Multiple paths are used in a sequential order• Path switching incurs a delay (overhead) τ

- Path switching delay is negligible (τ = 0)• VPFB-0, VPVB-0

- Path switching delay is not negligible (τ > 0)• VPFB-1, VPVB-1

• When τ is large enough- VPFB-1 -> FPFB, VPVB-1 -> FPVB

41

Page 42: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Problem Features• FPFB is the most stringent• VPVB is the most flexible• FPFB and VPFB restrict the BW

- Not always optimal to start data transfer immediately

- Suited for transport methods with fixed rate• FRTP, PLUT, Tsunami, Hurricane, RBUDP

• FPVB and VPVB use variable BW- Always start immediately- Suited for transport methods with dynamically

adapted rate• SABUL, RAPID, RUNAT, improved TCP

42

Page 43: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Complexity and Algorithm

• An optimal algorithm for each problem• A heuristic for each NPC problem

43

Page 44: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Workflow Mapping

44

Page 45: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Cost ModelsA computing workflow consist of modules: .

A computer network modeled an arbitrary directed graph, where computing nodes are

interconnected by directed communication links.

Objectives: map modules to nodes to achieve minimum end-to-end delay (MED) or maximum frame rate (MFR).

0 1 1, ,... mw w w -

, 1, , 1jw j m= -!

( )jw

l × 1jw +1jz -

Computationalcomplexity jz

1jw -

Module

Recv data : Send data :

( , )G V E= | |V n= 0 1 1, , , nv v v -!

kvLink bandwidth: ,h kbhv

Min link delay: ,h kd Node processing power kpNetwork

m

,,

,

( )Message transmission time : , Node computing time : i iw wi j

i ji j

zzd

b pl

+

45

Link failure rate: ,h kf Node Failure rate

Workflow

kf

Page 46: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

• Workflow mapping and execution conditions- Single-node mapping

• If w is mapped to v,• Otherwise,

- Module execution precedence

- Data transfer precedence

1,c

wv wv Vx w V

Î

= " Îå

, ,, ,j i j

s fw e j w i j wt t w V e E³ " Î Î

, ,, ,i j i

s fe w i w i j wt t w V e E³ " Î Î

1wvx =0wvx =

46

Page 47: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

- Fair node sharing

- Fair link sharing

comp( ) ( , )( , ) , ,w w

w cv

z A w vT w v w V v Vp

l ×= " Î Î

where and( , ) ( )fw

sw

t

vtA w v t dta= ò

:( )( ) 0

( )f s

w w w

v wvw V t t t t

t xaÎ - - ³

= å

, , ,tran , , , , ,

,

( , )( , ) , ,i j i j h ki j h k h k i j w h k c

h k

z B e lT e l d e E l E

= + " Î Î

where and,

,,

, ,( , ) ( )fei j

s h kei j

t

i j h k ltB e l t dtb= ò ,

, ,,:( )( ) 0

( )h k i h j k

f si j w ee i ji j

l w v w ve E t t t t

t x xbÎ - - ³

= ×å

47

Page 48: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

• Performance Metrics- End-to-end delay (ED)

1

ED1 2

comp tran ( , )0 0

1[ ]

0 , 1 [ ]

, 1 1 [ ], [ 1][ ], [ 1]

[ ], [ 1]

(CP mapped to a path of nodes)

( ) ( , )

( ) ( ( , ), )

i i i

j j

i

q q

g e g gi i

qw w j P i

i j g j P i

i i i i P i P iP i P i

i P i P i

T P q

T T T T

z A w v

p

z g g B e g g ld

b

l

+

- -

= =

-

= Î ³

+ + ++

= +

= + = +

×æ ö= ç ÷ç ÷

è øæ ö×

+ +ç ÷ç ÷è ø

å å

å å2

0

q-

å

48

Page 49: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

- Frame rate: inverse of global bottleneck (BN)

- Overall failure rate

, ,' ', ' ' ', '

BN

'

comp ' '

, ,, , ', 'tran , ', ', ,

', '', '

( mapped to a network )( ) ( , )

( , )max max

( , )( , )

i i

i w j k w i w j k wi c j k i c j k

w c

w w i i

i i i

w V e E w V e Ej k j k j kj k j kv V l Ec v V l Ec

j kj k

T G Gz A w v

T w v pz B e lT e l

db

l

Î Î Î ÎÎ Î Î Î

×æ öç ÷æ ö ç ÷= =ç ÷ç ÷ ç ÷×è ø +ç ÷çè ø

÷

, ,

, ,

,mappedon mappedon

,,

1 (1 ) (1 )i j h k i h

i w h ci j w h k c

h k he l w v

w V v Ve E l E

f fÎ ÎÎ Î

æ ö æ öç ÷ ç ÷= - - × -ç ÷ ç ÷ç ÷ç ÷ è øè ø

Õ ÕF

49

Page 50: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

• Objective Functions- Minimum End-to-end Delay (MED)

- Maximum Frame Rate (MFR)

EDall possible mappingsmin ( ), such thatT £ FF

BNall possible mappingsmin ( ), such thatT £F F

50

Page 51: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Pipeline Mapping-- A special case of DAG

• Problem categories- MED

Ø No Node Reuse (MED-NNR)Ø Contiguous Node Reuse (MED-CNR)Ø Arbitrary Node Reuse (MED-ANR)

- MFRØ No Node Reuse or Share (MFR-NNR)Ø Contiguous Node Reuse and

Share (MFR-CNR)Ø Arbitrary Node Reuse and Share

(MFR-ANR)

51Arbi t rary node reuse

Source Destination

No node reuse

Cont i guous node reuse

M0 M1 Mn-5 Mn-4 Mn-2

Mn-1

Mn-3

Vn-2

M2

Vp+1

Vq+1

Vp

Vq

V1

V5

Vs Vd

Source Destination

M0 M1 Mn-5 Mn-4 Mn-2

Mn-1

Mn-3

Vn-2

M2

Vp+1

Vq+1

Vp

Vq

V1

V5

Vs Vd

Source Destination

M0 M1 Mn-5 Mn-4 Mn-2

Mn-1

Mn-3

Vn-2

M2

Vp+1

Vq+1

Vp

Vq

V1

V5

Vs Vd

Page 52: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Problem Category and Complexity

52

NoNodeReuse

ContiguousNodeReuse

ArbitraryNodeReuse

ObjectiveFunctionConstraints

MinimumEnd-to-endDelay

NP-complete

NP-complete

Polynomial(Dyn.Prog.)

MaximumFrameRate

NP-complete

NP-complete

NP-complete

• MED-ANR is polynomially solvable- Dynamic Programming -based solution

• MED/MFR-NNR/CNR are NP-complete- Reduce from DISJOINT-CONNECTING-PATH (DCP)

• MFR-ANR is NP-complete- Reduce from Widest-path with Linear Capacity Constraints

(WLCC)

Page 53: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

53

• Mapping algorithm for MED:- Recursive Critical Path for MED with Failure Rate

Constraint

RCP-F

From special to general: DAG Mapping (TC’14)

Page 54: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

54

• Mapping algorithm for MFR:• Layer-oriented DP algorithm for MFR with Failure Rate

Constraint- Sort a DAG-structured workflow in a topological order - Map computing modules to network nodes on a layer-by-layer

basis, considering:• Module dependency in the workflow• Network connectivity• Node and link failure rate

LDP-F

( ( ))

,

, , ,, ,

,

, ( ),1to 1, 0 1( ( ))

,

( , )max 1 (1 ) (1 ) (1 ) if

( ) ( , )max

( ) ( , )max

j ju j

iv V pre wmapping j

j j

h u

u j u j h ih i j u h i i

h i

w w j ii j w pre wj m h nv suc v i

h u

w w j h

h

Tz B e l

d f f h ibz A w vT

p

Tz A w v

p

l

l

" Î

Î= - £ £ -

Î

æ öç ÷ç ÷ç ÷×ç ÷+ Ù = - - × - × - £ ¹ç ÷ç ÷

×= ç ÷ç ÷è ø

×

!

F F F

1 (1 ) (1 ) ifj u if h i

æ öç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷ç ÷æ öç ÷ç ÷

Ù = - - × - £ =ç ÷ç ÷ç ÷ç ÷ç ÷è øè ø

F F F

Page 55: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

55

The DP table with separated layers in .

Layered workflow in a topological sorting.

vs=v0

vd=vn-1

v1v2v3v4

...

w0 w1 w2 w3 w4 w5 …... ws wm-1…...…...

…...…...

…...

...…...layer 0

vs=v0

vd=vn-1

v1v2v3v4

...

w0 w1 w2 w3 w4 w5 …... ws wm-1…...…...

…...…...

...…...layer 1

vs=v0

vd=vn-1

v1v2v3v4...

w0 w1 w2 w3 w4 w5 …... ws wm-1…...…...

…...…...…...

...…...layer 2

vs=v0

vd=vn-1

v1

vyvpvq

...

w0 w1 w2 w3 wl wt ws wm-1…...…...

…...…...

…...

...

…...layer l-2

vs=v0

vd=vn-1

w0 w1 w2 w3 wswm-1…...…...

…...…...

...

…...layer l-1

…...

T0,0

T1,1T2,2

T2,4T3,3

T4,5

v1

vyvpvq...

…...

Tq,s

Tn-1,m-1

…...

…...

…... wl wt

Ty,lTp,t

LDP-F

vs

v3

v1

vdvq

vx

vp

…...

…...

w0 wm-1

w2

wl

wt…...

…...

…...

Workflow

Computer network

w1

ws

vy

v2

w3 …...

w5

w4

layer 0 layer 1 layer 2 …... layer l-1layer l-2

v4

v5

Page 56: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

Jet air flow dynamics (pressure, raycasting)

TSI explosion (density, raycasting)

A Prototype System:Distributed Remote Intelligent Visualization Environment (DRIVE)

56

Two Examples in the Visualization of Large-scale Scientific Applications

Page 57: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

A Production System (JOGC’13):Scientific Workflow Automation and Management Platform (SWAMP)

57

Page 58: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

58

Use Case: Spallation Neutron Source (SNS)

Page 59: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

SNS Workflow (abstract)

59

Page 60: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

SNS Workflow (concrete)

60

Page 61: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

61

Page 62: CS 644: Introduction to Big Data Chapter 8. Enabling Big-data ...chasewu/Courses/Spring2019/CS644BigData/Sli… · -Enabling Technologies: networking and computing ... -Support distributed

SNS Workflow Results

62