Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network...

44
Lark: Software Defined Networking Integration with HTCondor and More Zhe Zhang University of Nebraska-Lincoln 1

Transcript of Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network...

Page 1: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Lark: Software Defined Networking Integration with

HTCondor and MoreZhe Zhang

University of Nebraska-Lincoln

1

Page 2: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Physical Switch

WAN

Worker Node 1 Worker Node 2 Worker Node 3

Figure 1: Traditional network topology for a cluster

What’s the problem here for HTCondor?2

Page 3: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

eth0Physical Network

Device

job 1 job 2 job 3

worker node

Problem & Motivation• At worker node level, HTCondor

cannot manage networks similarly like CPU, memory and disk.

• At network layer, HTCondor cannot see differences among jobs come from the same host. Thus the granularity is per-host but not per-job.

• What we want is to improve this granularity to be per-job. Figure 2: Zoom in view for a worker node

3

Page 4: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Previous Work• The Lark project aims to tackle this problem.

• Related Presentations and talks from previous years’ HTCondor weeks: Todd & Garhan@HTCondor Week13; Brian@HTCondor Week12.

• This talk will report our latest progress and development.

4

Page 5: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Partitionable Slot• Allows HTCondor to use the resources of a slot in a dynamic

way; these slots may be partitioned according to the jobs running as opposed at configuration time.

• Slots have a fixed set of resources which include the cores, memory and disk space.

• They spawn a dynamic slot based on the resources the job needs.

• By partitioning the slot at runtime, we can become more flexible about resource usage.

5

Page 6: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Partitionable slot examples• Partitionable slot resource cpu = 10 memory = 10240 disk = 102400

• Job A allocates to this slot with resource requirementcpu = 3 memory = 1024 disk = 10240

• After Job A occupies a dynamic slot, remaining resource:cpu = 7 memory = 9216 disk = 92160

Page 7: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Can we do this for network bandwidth resource?

• Yes, we can! By using extensible machine resource mechanism.

• In HTCondor configuration file, add: MACHINE_RESOURCE_INVENTORY_BANDWIDTH = /path/to/executable

• Executable to check available bandwidth resource and attributes are inserted to machine ClassAd.

Page 8: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

• Now machine slot broadcasts resource:cpu = 10 memory = 10240 disk = 102400 bandwidth = 1000

• How does user request for bandwidth resource?Universe = vanilla Executable = /usr/bin/curl Output = test.out Error = test.err Log = test.log request_bandwidth = 10 Arguments = -4 -O http://hcc-lark03.unl.edu/1GB.zipQueue

Page 9: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Not enough…

• We still need to make sure each job can only use the amount of bandwidth it requests.

• We utilize network namespaces + Open vSwitch: network namespaces give us network isolation and Open vSwitch provides host-level QoS and bandwidth limiting.

Page 10: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Open vSwitch• A production quality,

multilayer virtual switch (a piece of software runs in the kernel which performs like a hardware switch).

• Enable network automation through programmatic extensions.

• We utilize QoS functionality and OpenFlow support.

10

Figure 3: Open vSwitch

Page 11: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Worker Node

System Network Namespace

External Network

Physical Network Device

192.168.0.1

Initial Configuration

condor_starter

Network namespace with Open vSwitch Illustration - Step 1

Page 12: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Worker Node

System Network Namespace

External Network

Physical Network Device

192.168.0.1

Starter creates network pipes

condor_starter

Network Pipe Device

Network Pipe Device

Network namespace with Open vSwitch Illustration - Step 2

Page 13: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Network namespace with Open vSwitch Illustration - Step 3

Worker Node

System Network Namespace

External Network

Physical Network Device

Create Open vSwitch bridgeAssign IP address from physical device to bridge

condor_starter

Network Pipe Device

Network Pipe Device

Open vSwitch bridge

192.168.0.1

Page 14: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Network namespace with Open vSwitch Illustration - Step 4

Worker Node

System Network Namespace

External Network

Physical Network Device

Add physical interface and one end of virtual Ethernet device pair to bridge

condor_starter

Network Pipe Device

Network Pipe Device

Open vSwitch bridge

192.168.0.1

Page 15: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Network namespace with Open vSwitch Illustration - Step 5

Worker Node

System Network Namespace

External Network

Physical Network Device

Starter forks new process with new network namespace

condor_starter

Network Pipe Device

Network Pipe Device

Open vSwitch bridge

192.168.0.1

Job-Private Network Namespace

condor_starter

Page 16: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Network namespace with Open vSwitch Illustration - Step 6

Worker Node

System Network Namespace

External Network

Physical Network Device

Parent starter passes one end of network pipe to network namespace,Child starter configures IP address via DHCP or static allocation

condor_starter

Network Pipe Device

Open vSwitch bridge192.168.0.1

Job-Private Network Namespace

condor_starter

Network Pipe Device

192.168.0.2Network Pipe

Page 17: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Network namespace with Open vSwitch Illustration - Step 7

Worker Node

System Network Namespace

External Network

Physical Network Device

Final Configuration

condor_starter

Network Pipe Device

Open vSwitch bridge192.168.0.1

Job-Private Network Namespace

condor_starter

Network Pipe Device

192.168.0.2Network Pipe

Network Calls

Page 18: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration

Worker Node

System Network Namespace

Private Network Namespace

Private Network Namespace

Private Network Namespace

Open vSwitch

eth0

veth1external

veth3 external

veth2 external

veth1 internal

veth2 internal

veth3 internal

Job 1 Job 2 Job 3

Physical Switch

WAN

Worker Node 1 Worker Node 2 Worker Node 3

Page 19: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Host level bandwidth management

• Now each job owns an unique network device, we can apply Open vSwitch QoS on the port that connects to the external end of the network pipe.

• Combine these two aspects: extensible machine resource + Open vSwitch integration, we can have a basic host level bandwidth management feature for HTCondor jobs.

• Let’s consider the previous job again.

Page 20: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Test HTCondor job• Download a file of 1GB from hcc-lark03.unl.edu, job requests

bandwidth 10Mbps.request_bandwidth = 10

hcc-lark03

HTCondor Worker Node

Open vSwitch

Job 1

veth internal

veth external

eth0

QoS applied

Figure 4: Test job network topology

Figure 5: Traffic rate for test HTCondor job

10Mbps

See backup slides for more!

Page 21: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Now What?• It is nice to have host-level bandwidth management functionality,

but it is only useful at host level.

• Most sites really want to do WAN bandwidth management for HTCondor, which is more expensive and hence a more common bottleneck.

• To enable this capability, we need the network hardware to be able to associate application information (HTCondor) with network traffic.

• We seek the help from Software Defined Networking (SDN).

Page 22: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

What is SDN?“Software-Defined Networking (SDN) is an emerging architecture that is dynamic, manageable, cost-effective, and adaptable, making it

ideal for the high-bandwidth, dynamic nature of today's applications. This architecture decouples the network control and

forwarding functions enabling the network control to become directly programmable and the underlying infrastructure to be

abstracted for applications and network services. The OpenFlow™ protocol is a foundational element for building SDN solutions.”

!— Definition from Open Network Foundation (ONF)!

Page 23: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Figure 8: Regular switch v.s. OpenFlow enabled switch

Control Plane

Data Plane

Regular Switch

• Control Plane becomes a software runs on commodity hardware as a controller.

• There are flow table on switch, which stores the OpenFlow rules used for packet forwarding.

• When packet hits switch, first iterate flow table for match; otherwise it is sent to controller.

• Controller determines what to do with the packet according to the controller program and then installs rule for this packet match to the flow table on switch.

Control Plane (SDN Controller)

Data Plane

OpenFlow enabled switch

OpenFlow protocol

App App App

Page 24: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Figure 9: SDN-enabled cluster network topology

HTCondor Worker Node

Open vSwitch

OpenFlow Controller

Job 1 Job 2

veth1 internal

veth2 internal

veth1 external

veth2 external

eth0

Core Switch

HTCondor Worker Node

Open vSwitch

Job 1 Job 2

veth1 internal

veth2 internal

veth1 external

veth2 external

eth0

WAN

Page 25: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

HTCondor + SDN• A condor_starter plugin sends job and machine ClassAd

(which includes the job IP address) to an OpenFlow controller before the job executes.

• This uniquely identifies the traffic coming from a job.

• The controller classifies observed network traffic using this information; for example, it may classify the traffic by the job’s Owner attribute.

• The controller applies policy to the packet by installing an OpenFlow rule in the correspond switch.

Page 26: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Hardware Switch• On the network, we usually don’t care about individual users

- but what groups or projects they work on.

• Hence, we create a QoS queue with different bandwidth allocations (prioritization) for different projects on the WAN port.

• The controller classifies the network traffic by project associated with that job and writes a new hardware rule.

• The switch enqueues packets into the correct QoS queue.26

Page 27: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Core Switch Level - Example• Two HTCondor jobs, each of

which uploads a file of 200MB to FTP servers.

• Belong to different projects, e.g. CMS vs NonCMS, with different WAN bandwidth allocated. (8Mbps vs 4Mbps)

• OpenFlow controller associates project with HTCondor traffic and do project-based QoS at core switch.

27

FTP Server 1

FTP Server 2

HTCondor Worker Node 1

Open vSwitch

OpenFlow Controller

CMS job

veth1 internal

veth1 external

eth0

Core Switch

HTCondor Worker Node 2

Open vSwitch

NonCMS job

veth1 internal

veth1 external

eth0

Figure 10: Network setup for HTCondor jobs

Page 28: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

28

Figure 11: WAN bandwidth management for HTCondor jobs

8Mbps

4Mbps

Experiment result

Page 29: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

OpenFlow-enabled GridFTP• There are other users of WAN bandwidth besides

HTCondor jobs - GridFTP is often a culprit!

• We have implemented a GridFTP callout to provide the controller with an association of application-level file transfer info (such as transfer type, filename, username) with a TCP flow.

• The controller can then prioritize GridFTP traffic in a manner similar to HTCondor traffic.

29

Page 30: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

GridFTP example•Take three GridFTP clients, each downloading a file of 200MB from the GridFTP server. !

•Client1 -> /test1/200MBClient2 -> /test2/200MBClient3 -> /test3/200MB !

•We prioritize different access based on directory:/test1/* -> 4Mbps /test2/* -> 2Mbps /test3/* -> 1Mbps

OpenFlow Controller

WAN

GridFTP Client 3

Core Switch

GridFTP Server

GridFTP Client 1 GridFTP Client 2

Figure 12: GridFTP experiment setup

Page 31: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Experiment Result

Figure 13: GridFTP file transfers when downloading files in different directories.

31

4Mbps

2Mbps

1Mbps

Page 32: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Combining the examples• Each project likely has network traffic from multiple

applications (HTCondor + GridFTP).

• We want to direct network traffic for all the supported applications in one project to its associated QoS queue.

• Traffic competes according to normal TCP rules intra-project but bandwidth is protected inter-project.

32

Page 33: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Figure 14: HTCondor jobs + GridFTP file transfer33

FTP Server 1 FTP

Server 2

HTCondor Worker Node 1

Open vSwitch

OpenFlow Controller

CMS job

veth1 internal

veth1 external

eth0

Core Switch

HTCondor Worker Node 2

Open vSwitch

NonCMS job

veth1 internal

veth1 external

eth0

GridFTP CMS Client

GridFTP NonCMS Client

WAN

GridFTP Sever

• We combine previous HTCondor and GridFTP example together.

• HTCondor job: upload 200MB file to FTP server in WAN. (CMS and NonCMS)

• GridFTP clients download file of 200MB from GridFTP server in cluster. (also CMS and NonCMS users)

Page 34: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

• CMS project has a queue with bandwidth 8Mbps.

• NonCMS project has a queue with bandwidth 4Mbps.

• Within each project, GridFTP traffic grabs most of the bandwidth when both of the two applications are running.

• Across projects, their assigned total bandwidth are reserved.

34

8Mbps

4Mbps

Figure 15: WAN bandwidth management for multiple applications

Page 35: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Conclusion

• HTCondor has no mechanism to do network resource management. With network namespace, extensible machine resource and Open vSwitch, we can manage network at host level.

• Further by integrating SDN with this framework, we can manage WAN bandwidth resource for HTCondor and other applications.

35

Page 36: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Future Work

• Support more cluster computing softwares to investigate network traffic characteristics within project and across project.

• Get current work deployed in production cluster and do real-world measurements.

36

Page 37: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Acknowledgement

• Nebraska: Brian Bockelman, Garhan Attebury

• Wisconsin: Alan DeSmet, Dale W. Carder, Todd Tannenbaum

• NSF Funding ACI-1245864

37

Page 38: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Thanks! Questions?

38

Page 39: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Backup Slides

Page 40: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

A more complicated example for host-level bandwidth management

• Enforce total bandwidth resource at a worker node to be 10Mbps.

• Two users submit several curl jobs. Each of them downloads a 200MB file with bandwidth requirement.

• User A requests for 1Mbps; User B requests for 9Mbps.

40

Figure : Worker node network topology for multiple jobs

HTTP Server 1

HTTP Server 2

HTCondor Worker Node

Open vSwitch

Job 1 Job 2

veth1 internal

veth2 internal

veth1 external

veth2 external

eth0

1Mbps9Mbps

Page 41: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

• Due to host-level bandwidth management, two jobs can run at the same time (to saturate the total bandwidth resource), and jobs get their requested bandwidth.

• Without bandwidth resource requirement, when two jobs from different users are running, each of the job is expected to get 5Mbps bandwidth on average (equal competition).

41

Figure : HTCondor job traffic for two users

1Mbps

9Mbps

Page 42: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Attributes Meaningin_port Switch port number packet arrives dl_src Ethernet source addressdl_dst Ethernet destination addressdl_vlan VLAN ID

dl_vlan_pcp VLAN prioritydl_type Ethertype/length (e.g. 0x0800=IPv4)nw_tos IP TOS/DS bits

nw_proto IP protocol (e.g., 6=TCP)nw_src IP source addressnw_dst IP destination addresstp_src TCP/UDP source porttp_dst TCP/UDP destination port

Table 1: Packet match field

Page 43: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

OpenFlow Actions Class

Output ofp_action_output

Enqueue ofp_action_enqueue

Set VLAN ID ofp_action_vlan_vid

Set VLAN priority ofp_action_vlan_pcp

Set Ethernet src or dst address ofp_action_dl_addr

Set IP src or dst address ofp_action_nw_addr

Set IP Type of Service ofp_action_nw_tos

Set TCP/UDP src or dst port ofp_action_tp_port

Table 2: Available OpenFlow actions (1.0)

Page 44: Lark: Software Defined Networking Integration with HTCondor ... · Figure 4: Worker node network configurations with Open vSwitch integration v.s. LAN configuration Worker Node

Example Network Policies for HTCondor

• Block network traffic from specific HTCondor users.

• Isolate network traffic among HTCondor users.

• Block specific HTCondor users to communicate with outside network.

• And, of course, manage WAN traffic!

44