TaskFlow Y! + HP brownbag
-
Upload
joshua-harlow -
Category
Software
-
view
136 -
download
4
Transcript of TaskFlow Y! + HP brownbag
Yahoo!Joshua Harlow
TaskFlowand
OpenStack
HPMin Pae
● Joshua Harlow○ Yahoo! dev. for ~7 years○ OpenStack dev. for ~3.5 years○ Master Trouble-maker○ Oslo, kazoo, anvil, taskflow, cloudinit… more …
● Min Pae○ HP dev. for ~7 months○ OpenStack dev. for ~7 months○ Lead spell checker○ Cue, taskflow, automaton… period ...
Who are we
- Distributed systems are complex- Scale out, resumption, resilency, HA,
visibility into active work … are not easily solveable problems (some learn this the hard way)
- Understanding your states and workflows (and managing, transitioning and running) is key to solving many of these complex problems
The problem
- Declarative workflows
- Persisted execution state (checkpoints)
- Automatic migration of workflows/jobs
- Horizontal scalability
- Magic!
Taskflow does ...
- Atom (task and retry execution units)
- Flow (composition unit)
- Engine (work execution <-> persistence)
- Job / Jobboard (work discovery/ownership unit)
- Conductor (‘conducts’ automated discovery/ownership, flow construction and execution)
Taskflow is ...
- Execution unit
- Has- dependencies (“requires”)- data (“provides”)
- Defines- execute(...) - business
logic- revert(...) - exception
handler
Taskflow - Atom:Task
class TakeABottleDown(task.Task):
def execute(self, bottles_left): sys.stdout.write('Take one down, ') sys.stdout.flush() time.sleep(TAKE_DOWN_DELAY) return bottles_left - 1
def revert(self, **kwargs): …
class PassItAround(task.Task): …
class Conclusion(task.Task): ...
- Controls retry semantics of associated flow (and subflows and …)
- Has- dependencies (“requires”)- data (“provides”)
- Defines- execute(...) - business logic- revert(...) - exception
handler- on_failure(...) - decision
maker that affects retry semantics
Taskflow - Atom:Retry
class Retry1(retry.Retry):
def execute(self, param1): print param1 return param1 + ‘ printed’
def revert(self, **kwargs) print “reverting...”
def on_failure(self, **kwargs): if self.attempts < 5: return retry.RETRY else: return retry.REVERT_ALL
- Composition of Tasks
- Defines transitions between Tasks
- Allows implicit and explicit dependencies
- Required methods(?)- add(...) - add (and link)
task(s), flow(s)- iter_links(...) - iterator over
the created links (links are created during add)
Taskflow - Flow
s = linear_flow.Flow(‘bottle-song’)
take_bottle = TakeABottleDown(...)
pass_it = PassItAround(...)
next_bottles = Conclusion(...)
s.add(take_bottle, pass_it, next_bottles)
- Run flows (and associated tasks) to completion- Decompose flows into a DAG
- Edge dependencies mandated by flow(s) patterns are always retained
- Prepare persistence layer- Run tasks/retries as they are ready
- Optionally in parallel (and/or remotely)...
- Save and fetch results from persistence layer and run next tasks/retries (and repeat)
- State machine based:- http://docs.openstack.org/developer/taskfl
ow/states.html#engine
Taskflow - Engines(s)
- Place where work can be placed by producer entities and consumed/owned (and worked on) by other consumer entities
- Similar to a job queue but builds in liveness semantics/capabilities (and semantics expect single ownership via a claim concept)- If a owner of a unit of work dies, the claim
on the work they are performing is automatically lost and freed up for others
- Typically tied to a unit of work (being a flow) and its optional persistence location (so that prior work can be resumed)
Taskflow - Job(s) & Jobboard(s)
● Essentially an advanced/specialized job processor- Connects to a jobboard- Periodically fetches contents of jobboard- Attempts to claim a job- Constructs jobs work (flow, other...)- Performs jobs work (using engines of
various types and persistence backends to enable reliablility)
- Removes job (on completion)- (rinse and repeat)
● Expected to be scaled out (run as many conductors as needed/desired)
Taskflow - Conductor
Why would u want this?
- Jobs and Jobboards provide work ownership and work discovery- Horizontal scaling via conductors
- Automatic migration of work between conductors- Persistence of execution state enable
resumption and automated ownership transfer
- When a conductor fails, job(s) in progress is picked up (and resumed to last checkpoint) by the next worker that frees up, no need to wait for the worker to come back.
- Turn your software off safely and handle failures gracefully!
Wherefore Taskflow?
- Declarative definition of work- Decouples what (Task, Flow) from how
(Engine)- Coroutines are not separable from the
surrounding code, and can not be automatically parallelized
- Separation of declaration and execution allows flexibility in execution strategy- Engine tracks execution state and
transitions- Parallel (green)threaded execution…- Remote worker execution…
Wherefore Taskflow? (cont.)
- Not strongly tied into python as a language (for better or worse); concepts are easily transferable to java/go/….
- Alacarte: use what you want - Use the basics until you are ready to use
jobboards, or select a local engine until you are ready to run remote workers…
Wherefore Taskflow? (cont.)
Wherefore Taskflow? (cont.)
Notifications
Remote task workers
Dynamic flow modification
Real time dashboard of atom/flow/job transitions (WIP)
Applications that can be paused
DDOS your favorite site (joke)
The potential is nearly limitless!!
Wherefore Taskflow? (cont.)
DEMO
?? Questions ??
- High level (overview)- https://wiki.openstack.org/wiki/TaskFlow- https://wiki.openstack.org/wiki/TaskFlow#Big_picture
- Developer oriented (more detail)- http://docs.openstack.org/developer/taskflow/
- Extreme!! developer oriented (ultra detail)- Freenode
- #openstack-state-management- #openstack-oslo
- ML: [email protected] Moar examples!
More information!