Dynamic DAGMan with ClassAds
description
Transcript of Dynamic DAGMan with ClassAds
![Page 1: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/1.jpg)
Condor Team MemberComputer Sciences DepartmentUniversity of Wisconsin-Madison
[email protected]://www.cs.wisc.edu/condor
Dynamic DAGMan with ClassAds
Himani Apte
![Page 2: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/2.jpg)
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
![Page 3: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/3.jpg)
www.cs.wisc.edu/condor
DAGMan
› Directed Acyclic Graph Manager
› Meta-scheduler for Condor
› DAG: set of jobs with dependencies
› Manages submission of DAG jobs
› Enforces execution order
› DAGMan itself is a Condor job!
![Page 4: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/4.jpg)
www.cs.wisc.edu/condor
Example DAGJob A A.condor
Job B B.condor
Job C C.condor
Job D D.condor
Parent A Child B C
Parent B C Child D
Script PRE A input.sh
Script POST D output.sh
A
CB
D
![Page 5: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/5.jpg)
www.cs.wisc.edu/condor
Simplified state diagram of a DAG node
Waiting Pre-running Submitted Done
Post-running
Failed
![Page 6: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/6.jpg)
www.cs.wisc.edu/condor
DAGMan: important properties
› Monitors job state using Condor logs
› Simple and clean recovery model• Rescue DAG: saves state at failure• Restart: reconstruct internal state
› Scripts allow “lazy” planning
› Throttling parameters
![Page 7: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/7.jpg)
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
![Page 8: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/8.jpg)
www.cs.wisc.edu/condor
Motivation for dynamic DAGMan
› DAG: complete execution order
› Flexibility to make run-time decisions• Which subset of DAG nodes should execute?• When should node X execute?
› Conditional DAGs• Associate a condition with DAG edges• Simplest condition: successful completion of
parent nodes
![Page 9: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/9.jpg)
www.cs.wisc.edu/condor
Conditional DAG: examples
A
Condition:
A.x = = true
B C
Yes No
P1 P2
C
Condition:
P1.x OR P2.x
Example 1 Example 2
![Page 10: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/10.jpg)
www.cs.wisc.edu/condor
Motivation for dynamic DAGMan
› Scripts can be leveraged for lazy planning• For simple conditions
• E.g. exit value of job
• Modify DAG structure• E.g. convert branch-not-taken to no-op/empty
› We want a generic solution
› Supported by “Dynamic DAGMan”
![Page 11: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/11.jpg)
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
![Page 12: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/12.jpg)
www.cs.wisc.edu/condor
ClassAds
› Classified advertisements
› Used extensively in Condor• Define jobs, machines, resources• Define conditions, triggers,
requirements• Maintain internal state
![Page 13: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/13.jpg)
www.cs.wisc.edu/condor
ClassAds
› List of attribute-value pairs• Simple value types: integer, strings• Complex types: list, expressions,
ClassAds
› Matchmaking framework• Tests match between two classAds• Using “Requirements” expression
› Great fit for Dynamic DAGMan
![Page 14: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/14.jpg)
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
![Page 15: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/15.jpg)
www.cs.wisc.edu/condor
Putting together: DAGMan + ClassAds
› Dynamic DAGMan research project• Work-in-progress• Not yet available in Condor
› DAG nodes have associated classAds› Basic node attributes
• Job identifier, name, type• Status (Waiting, Submitted, Done, etc.)
![Page 16: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/16.jpg)
www.cs.wisc.edu/condor
Dynamic DAGMan: attributes
› Execution characteristics of job• Exit value• Wall-clock time • CPU utilization (local and remote)• Network statistics (bytes sent / received)• Information about files transferred (for vanilla
universe)
› Attributes maintained by Condor for a job
![Page 17: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/17.jpg)
www.cs.wisc.edu/condor
Dynamic DAGMan: conditions
› Requirements expression• Defines trigger condition for the node• Arbitrarily complex expression • Defined on the attributes of parent
nodes
› Use matchmaking to determine if a node can be submitted
![Page 18: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/18.jpg)
www.cs.wisc.edu/condor
Dynamic DAG: example
A
condition x = = true
B C
Yes No
Job A A.condor
Job B B.condor
Job C C.condor
Parent A Child B \
COND [ ( other.job == A &&
other.x == true ) ]
Parent A Child C \
COND [ ( other.job == A &&
other.x == false ) ]
![Page 19: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/19.jpg)
www.cs.wisc.edu/condor
Dynamic DAGMan: example
Job P1 P1.condor
Job P2 P2.condor
Job C C.condor
Parent P1 P2 Child C \
COND [ (other.job == P1 &&
other.x == true) ||
(other.job == P2 &&
other.x == true) ]
P1 P2
C
Condition:
P1.x OR P2.x
![Page 20: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/20.jpg)
www.cs.wisc.edu/condor
Dynamic DAGMan
› Recovery model is still the same• Rescue DAG: saves node state at failure• ClassAd attribute-values can be re-
generated from Condor logs
› Flexibility to make run-time decisions• Which subset of nodes in the DAG
should be executed?• When should node X be executed?
![Page 21: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/21.jpg)
www.cs.wisc.edu/condor
Outline
› DAGMan workflow management
› Motivation for dynamic DAGMan
› ClassAds
› Putting together: DAGMan + ClassAds
› Looking ahead
![Page 22: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/22.jpg)
www.cs.wisc.edu/condor
Looking ahead
› DAG with only implicit edges• Parent-child relations embedded in classAds• Nodes specify
• Trigger condition• Preference for child nodes to run
• On-the-fly dependency formation based on previous node execution
› DAGMan collaborates with Quill• Getting attributes from persistent storage
![Page 23: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/23.jpg)
www.cs.wisc.edu/condor
Looking ahead
› Allow job to modify/add its attributes• Determine what happens after job exits
› Global state control• Throttling expression/parameters
› Global DAG-classAd• Statistics on running, successful and failed
jobs• E.g. if (#failed jobs > N ) run cleanup node
![Page 24: Dynamic DAGMan with ClassAds](https://reader036.fdocuments.net/reader036/viewer/2022062500/56814fb3550346895dbd6d7c/html5/thumbnails/24.jpg)
www.cs.wisc.edu/condor
Thank-you
We are interested in knowing your suggestions!