TaskFlow Y! + HP brownbag

Yahoo!Joshua Harlow

TaskFlowand

OpenStack

HPMin Pae

https://wiki.openstack.org/TaskFlow

http://openstack.org/

● Joshua Harlow○ Yahoo! dev. for ~7 years○ OpenStack dev. for ~3.5 years○ Master Trouble-maker○ Oslo, kazoo, anvil, taskflow, cloudinit… more …

● Min Pae○ HP dev. for ~7 months○ OpenStack dev. for ~7 months○ Lead spell checker○ Cue, taskflow, automaton… period ...

Who are we

- Distributed systems are complex- Scale out, resumption, resilency, HA,

visibility into active work … are not easily solveable problems (some learn this the hard way)

- Understanding your states and workflows (and managing, transitioning and running) is key to solving many of these complex problems

The problem

- Declarative workflows

- Persisted execution state (checkpoints)

- Automatic migration of workflows/jobs

- Horizontal scalability

- Magic!

Taskflow does ...

- Atom (task and retry execution units)

- Flow (composition unit)

- Engine (work execution <-> persistence)

- Job / Jobboard (work discovery/ownership unit)

- Conductor (‘conducts’ automated discovery/ownership, flow construction and execution)

Taskflow is ...

- Execution unit

- Has- dependencies (“requires”)- data (“provides”)

- Defines- execute(...) - business

logic- revert(...) - exception

handler

Taskflow - Atom:Task

class TakeABottleDown(task.Task):

def execute(self, bottles_left): sys.stdout.write('Take one down, ') sys.stdout.flush() time.sleep(TAKE_DOWN_DELAY) return bottles_left - 1

def revert(self, **kwargs): …

class PassItAround(task.Task): …

class Conclusion(task.Task): ...

- Controls retry semantics of associated flow (and subflows and …)

- Has- dependencies (“requires”)- data (“provides”)

- Defines- execute(...) - business logic- revert(...) - exception

handler- on_failure(...) - decision

maker that affects retry semantics

Taskflow - Atom:Retry

class Retry1(retry.Retry):

def execute(self, param1): print param1 return param1 + ‘ printed’

def revert(self, **kwargs) print “reverting...”

def on_failure(self, **kwargs): if self.attempts < 5: return retry.RETRY else: return retry.REVERT_ALL

- Composition of Tasks

- Defines transitions between Tasks

- Allows implicit and explicit dependencies

- Required methods(?)- add(...) - add (and link)

task(s), flow(s)- iter_links(...) - iterator over

the created links (links are created during add)

Taskflow - Flow

s = linear_flow.Flow(‘bottle-song’)

take_bottle = TakeABottleDown(...)

pass_it = PassItAround(...)

next_bottles = Conclusion(...)

s.add(take_bottle, pass_it, next_bottles)

- Run flows (and associated tasks) to completion- Decompose flows into a DAG

- Edge dependencies mandated by flow(s) patterns are always retained

- Prepare persistence layer- Run tasks/retries as they are ready

- Optionally in parallel (and/or remotely)...

- Save and fetch results from persistence layer and run next tasks/retries (and repeat)

- State machine based:- http://docs.openstack.org/developer/taskfl

ow/states.html#engine

Taskflow - Engines(s)

http://docs.openstack.org/developer/taskflow/states.html#engine

http://docs.openstack.org/developer/taskflow/states.html#engine

- Place where work can be placed by producer entities and consumed/owned (and worked on) by other consumer entities

- Similar to a job queue but builds in liveness semantics/capabilities (and semantics expect single ownership via a claim concept)- If a owner of a unit of work dies, the claim

on the work they are performing is automatically lost and freed up for others

- Typically tied to a unit of work (being a flow) and its optional persistence location (so that prior work can be resumed)

Taskflow - Job(s) & Jobboard(s)

● Essentially an advanced/specialized job processor- Connects to a jobboard- Periodically fetches contents of jobboard- Attempts to claim a job- Constructs jobs work (flow, other...)- Performs jobs work (using engines of

various types and persistence backends to enable reliablility)

- Removes job (on completion)- (rinse and repeat)

● Expected to be scaled out (run as many conductors as needed/desired)

Taskflow - Conductor

Why would u want this?

- Jobs and Jobboards provide work ownership and work discovery- Horizontal scaling via conductors

- Automatic migration of work between conductors- Persistence of execution state enable

resumption and automated ownership transfer

- When a conductor fails, job(s) in progress is picked up (and resumed to last checkpoint) by the next worker that frees up, no need to wait for the worker to come back.

- Turn your software off safely and handle failures gracefully!

Wherefore Taskflow?

- Declarative definition of work- Decouples what (Task, Flow) from how

(Engine)- Coroutines are not separable from the

surrounding code, and can not be automatically parallelized

- Separation of declaration and execution allows flexibility in execution strategy- Engine tracks execution state and

transitions- Parallel (green)threaded execution…- Remote worker execution…

Wherefore Taskflow? (cont.)

- Not strongly tied into python as a language (for better or worse); concepts are easily transferable to java/go/….

- Alacarte: use what you want - Use the basics until you are ready to use

jobboards, or select a local engine until you are ready to run remote workers…


Notifications

Remote task workers

Dynamic flow modification

Real time dashboard of atom/flow/job transitions (WIP)

Applications that can be paused

DDOS your favorite site (joke)

The potential is nearly limitless!!


http://docs.openstack.org/developer/taskflow/notifications.html

http://docs.openstack.org/developer/taskflow/workers.html#high-level-architecture

https://github.com/rackerlabs/canary

?? Questions ??

- High level (overview)- https://wiki.openstack.org/wiki/TaskFlow- https://wiki.openstack.org/wiki/TaskFlow#Big_picture

- Developer oriented (more detail)- http://docs.openstack.org/developer/taskflow/

- Extreme!! developer oriented (ultra detail)- Freenode

- #openstack-state-management- #openstack-oslo

- ML: [email protected] Moar examples!

More information!

https://wiki.openstack.org/wiki/TaskFlow#Big_picture

https://wiki.openstack.org/wiki/TaskFlow#Big_picture

http://docs.openstack.org/developer/taskflow/

http://freenode.net/

http://lists.openstack.org/pipermail/openstack-dev/

mailto:[email protected]

TaskFlow Y! + HP brownbag

Software

Transcript of TaskFlow Y! + HP brownbag