Wedding convenience and control with RemoteCondor

Post on 15-Jan-2015

5.797 views 0 download

Tags:

description

This presentation explains why Condor is not suitable for use on user-owned machines, and why RemoteCondor is the best available solution to the problem.

Transcript of Wedding convenience and control with RemoteCondor

Apr 2012 Remote Condor 1

UCSD HEP Group Trainings

Weddingconvenience and control

withRemoteCondor

by Igor SfiligoiRemoteCondor co-developed with J. Dost

UC San Diego

Apr 2012 Remote Condor 2

The Condor Batch System

● Condor is a Workload Management System● i.e. a batch system

● Strong points● Fault tolerant● Robust feature set● Flexible

● Large community base● Both commercial and scientific

http://research.cs.wisc.edu/condor/

Apr 2012 Remote Condor 3

Condor Architecture

● Clearly separates● Resource providers

from● Resource consumers

● Each has a daemonprocess to represent it● Startd for resource provides● Schedd for resource consumers

● A central service connects them all● Managed by a Collector/Negotiator pair

Machines (aka worker nodes)CPUs, Memory, IO,...

Job queues (aka submit nodes)Jobs submitted by users

Apr 2012 Remote Condor 4

Startd

Condor Architecture

Schedd

Schedd Startd

..

....

CollectorNegotiator

in a picture

Apr 2012 Remote Condor 5

The truth about submit nodes

● Corollary● The submit node is a server!

● There is no real “Condor client”● The cmdline tools are just a convenience

to talk to the daemon process

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

Startd

Apr 2012 Remote Condor 6

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

Apr 2012 Remote Condor 7

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Apr 2012 Remote Condor 8

Implications

● Being a server has several implications● Security implications

● Will have incoming connectivity● All security configuration on the submit node● Submit node controls user

authentication and authorization

● Unfriendly to non-dedicated hardware● Requires always on operation● Must be on a public&static IP address

High exploit risk

Requires high trustbetween all nodes

in the cluster

Impossible touse on a laptop

Not suitablefor an unmanaged

user machine

Apr 2012 Remote Condor 9

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

Apr 2012 Remote Condor 10

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

Apr 2012 Remote Condor 11

What are the alternatives?

● Out of the box, Condor provides● Remote submission● Condor-C

● In the contrib sections, you can find● RemoteCondor

This presentationargues that this isthe best solution

So what is wrong with these?

Apr 2012 Remote Condor 12

Schedd

Schedd node

Remote submission

● Essentially, connecting to a remote Schedd● condor_submit -remote … + condor_transfer_data

and● condor_q -name ..., condor_rm -name ..., …

● So no daemon processes on the submit node● A true client solution!

Scheddcondor_submit

condor_qcondor_transfer_data

Submit node

CollectorNegotiator

StartdAu

thhttp://research.cs.wisc.edu/condor/manual/v7.6/condor_submit.html

http://research.cs.wisc.edu/condor/manual/v7.6/condor_transfer_data.html

Apr 2012 Remote Condor 13

So, what's the problem?

● No local user log file● Must use

condor_qto monitor progress

● Fully Condor-based user authentication● While rich, not what users expect

(e.g. no user/password)

● Hard to tie into campus-wide auth

● Staged input data not shared

● Annoying at best● High monitoring load● And it does not work

with DAGMan

Could be a problem with large datasets

Apr 2012 Remote Condor 14

Condor-C

● Based on the Grid paradigm● Submit locally, then delegate to remote Schedd

● Still running a daemon process● But requires no incoming connections

Schedd

Schedd node

Schedd

condor_submitcondor_q

Submit node

CollectorNegotiator

StartdAu

th

● Secure● Laptop

friendly

Schedd

http://research.cs.wisc.edu/condor/manual/v7.6/5_3Grid_Universe.html#sec:Condor-C

Apr 2012 Remote Condor 15

What are the drawbacks?

● Awkward syntax● At least compared to Vanilla universe● See the Condor manual for examples

● Has scalability problems● Could likely be improved,

but this is the current state-of-the-art

● Fully Condor-based user authentication● Staged input data not shared

Same as remotesubmissions

Can be mitigatedwith Job Router

(but adds anotherlayer of complexity)

Apr 2012 Remote Condor 16

Introducing

RemoteCondor

Apr 2012 Remote Condor 17

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there True client

approach

Apr 2012 Remote Condor 18

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Apr 2012 Remote Condor 19

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

Advantages:● True local Condor experience● Standard system authentication and authorization

● No admin privileges for the users

● Trust based on “central” Schedd admin skills● Can regulate and transform Condor submissions

● Minimize security risk● Central handling● Familiar to users

No exceptions

Big deal!

Where's the news?

Apr 2012 Remote Condor 20

What's the big idea?

● Let the users login into a remote machine● And run the cmdline tools there

● … while preserving the local look-and-feel● RemoteCondor provides

● Wrappers around major Condor cmdline tools● Integration with sshfs

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Apr 2012 Remote Condor 21

RemoteCondor wrappers

● Provide wrappers that use ssh under the hood● Users (almost) unaware of the trick

● But may be prompted for a password● Works best with public key authentication

sshd

Schedd node

Schedd

condor_submitcondor_q

Submit nodeCollector

Negotiator

StartdAu

th

condor_submitcondor_q

Apr 2012 Remote Condor 22

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node● Using ssh as a tunnel● All in user space (FUSE)

● RemoteCondor will properly convert paths(within certain limits)

http://fuse.sourceforge.net/sshfs.html

Disk local to Scheddfor maximum performance

Apr 2012 Remote Condor 23

RemoteCondor and sshfs

● But being able to talk to Condor is not enough● Users must be able to create and read data!

● Using sshfs solves the problem● Schedd-local disk mounted on submit node

sshd

Schedd node

Schedd

Submit nodeCollector

Negotiator

StartdAu

th

Real disksshfs

Apr 2012 Remote Condor 24

Using RemoteCondor

● Distributed in the Condor src tarball● In the Contrib section

● Requires a “make install”● To put the proper files in place

● Plus minimal configuration● Where is the remote Schedd node?● What username to use?● Where to mount the sshfs partition?

https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=RemoteCondor

Apr 2012 Remote Condor 25

Summary

● Traditional Condor not suitable for user machines● Keeping Schedd nodes professionally maintained

highly desirable● To minimize security risks and control job flow

● RemoteCondor allows this operation modewhile preserving the local look-and-feel● Requires minimal local install

Apr 2012 Remote Condor 26

Acknowledgements

This work is partially sponsored by ● the US National Science Foundation under Grants No. OCI-0943725 (STCI) and PHY-0612805 (CMS Maintenance & Operations),

and ● the US Department of Energy under Grant No. DE-FC02-06ER41436 subcontract No. 647F290 (OSG).