The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis...

86
The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others

Transcript of The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis...

Page 1: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

The Haxl Project at Facebook

Simon MarlowJon Coens

Louis BrandyJon Purdy& others

Page 2: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

BusinessLogic

service API

Databases

Other back-end services

Page 3: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Use case: fighting spam

BusinessLogic

Databases

Other back-end services

www (PHP)

Is this thing spam?

YES/NO

Page 4: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Use case: fighting spam

BusinessLogic

Databases

Other back-end services

www (PHP)

Site-integrity engineers

push new rules hundreds of times per day

Page 5: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Data dependencies in a computation database

thrift

memcache

Page 6: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Code wants to be structured hierarchically

• abstraction• modularity

database

thrift

memcache

Page 7: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Code wants to be structured hierarchically

• abstraction• modularity

database

thrift

memcache

Page 8: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Code wants to be structured hierarchically

• abstraction• modularity

database

thrift

memcache

Page 9: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Code wants to be structured hierarchically

• abstraction• modularity

database

thrift

memcache

Page 10: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Execution wants to be structured horizontally

• Overlap multiple requests• Batch requests to the same data source• Cache multiple requests for the same data

database

thrift

memcache

Page 11: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Furthermore, each data source has different characteristics• Batch request API?

• Sync or async API?

• Set up a new connection for each request, or keep a pool of connections around?

• Want to abstract away from all of this in the business logic layer

Page 12: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

But we know how to do this!

Page 13: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

But we know how to do this!

• Concurrency.• Threads let us keep our abstractions & modularity while

executing things at the same time.

• Caching/batching can be implemented as a service in the process• as we do with the IO manager in GHC

Page 14: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

But we know how to do this!

• Concurrency.• Threads let us keep our abstractions & modularity while

executing things at the same time.

• Caching/batching can be implemented as a service in the process• as we do with the IO manager in GHC

• But concurrency (the programing model) isn’t what we want here.

Page 15: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

But we know how to do this!

• Concurrency.• Threads let us keep our abstractions & modularity while

executing things at the same time.

• Caching/batching can be implemented as a service in the process• as we do with the IO manager in GHC

• But concurrency (the programing model) isn’t what we want here.

• Example...

Page 16: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• x and y are Facebook users

• suppose we want to compute the number of friends that x and y have in common

• simplest way to write this:

length (intersect (friendsOf x) (friendsOf y))

Page 17: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Brief detour: TAO• TAO implements Facebook’s data model

• most important data source we need to deal with

• Data is a graph• Nodes are “objects”, identified by 64-bit ID• Edges are “assocs” (directed; a pair of 64-bit IDs)

• Objects and assocs have a type• object fields determined by the type

• Basic operations:• Get the object with a given ID• Get the assocs of a given type from a given ID

User A

User B

User C

User D

FRIENDS

Page 18: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Back to our example

length (intersect (friendsOf x) (friendsOf y))

• (friendsOf x) makes a request to TAO to get all the IDs for which there is an assoc of type FRIEND (x,_).

• TAO has a multi-get API; very important that we submit (friendsOf x) and (friendsOf y) as a single operation.

Page 19: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Using concurrency

• This:

length (intersect (friendsOf x) (friendsOf y))

Page 20: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Using concurrency

• This:

• Becomes this:

length (intersect (friendsOf x) (friendsOf y))

dom1 <- newEmptyMVarm2 <- newEmptyMVarforkIO (friendsOf x >>= putMVar m1)forkIO (friendsOf y >>= putMVar m2)fx <- takeMVar m1fy <- takeMVar m2return (length (intersect fx fy))

Page 21: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end
Page 22: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Using the async package:

doax <- async (friendsOf x)ay <- async (friendsOf y)fx <- wait axfy <- wait ayreturn (length (intersect fx fy))

Page 23: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end
Page 24: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Using Control.Concurrent.Async.concurrently:

do(fx,fy) <- concurrently (friendsOf x) (friendsOf y)return (length (intersect fx fy))

Page 25: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end
Page 26: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Why not concurrency?

• friendsOf x and friendsOf y are • obviously independent• obviously both needed• “pure”

Page 27: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Why not concurrency?

• friendsOf x and friendsOf y are • obviously independent• obviously both needed• “pure”

• Caching is not just an optimisation:• if friendsOf x is requested twice, we must get the same

answer both times• caching is a requirement

Page 28: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Why not concurrency?

• friendsOf x and friendsOf y are • obviously independent• obviously both needed• “pure”

• Caching is not just an optimisation:• if friendsOf x is requested twice, we must get the same

answer both times• caching is a requirement

• we don’t want the programmer to have to ask for concurrency here

Page 29: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Could we use unsafePerformIO?

• we could do caching this way, but not concurrency. Execution will stop at the first data fetch.

length (intersect (friendsOf x) (friendsOf y))

friendsOf = unsafePerformIO ( .. )

Page 30: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Central problem

• Reorder execution of an expression to perform data fetching optimally.

• The programming model has no side effects (other than reading)

Page 31: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

What we would like to do:

• explore the expression along all branches to get a set of data fetches

Page 32: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

What we would like to do:

• submit the data fetches

Page 33: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

What we would like to do:

• wait for the responses

Page 34: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

What we would like to do:

• now the computation is unblocked along multiple paths• ... explore again• collect the next batch of data fetches• and so on

Round 0

Round 1

Round 2

Page 35: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Facebook’s existing solution to this problem: FXL

• Lets you write

• And optimises the data fetching correctly.

• But it’s an interpreter, and works with an explicit representation of the computation graph.

Length(Intersect(FriendsOf(X),FriendsOf(Y)))

Page 36: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• We want to run compiled code for efficiency

• And take advantage of Haskell• high quality implementation

• great libraries for writing business logic etc.

• So, how can we implement the right data fetching behaviour in a Haskell DSL?

Page 37: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Start with a concurrency monad

newtype Haxl a = Haxl { unHaxl :: Result a }

data Result a = Done a| Blocked (Haxl a)

instance Monad Haxl wherereturn a = Haxl (Done a)m >>= k = Haxl $case unHaxl m of

Done a -> unHaxl (k a)Blocked r -> Blocked (r >>= k)

Page 38: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Old idea:• Scholz 1995 A Concurrency Monad Based on Constructor

Primitives

• Claessen’s A Poor Man’s Concurrency Monad (JFP 1999)

• Called the “Resumption Monad” in Harrison’s Cheap (But Functional) Threads (JFP 2004)

• Used to good effect in Li & Zdancewic’s PLDI’07 paper Combining events and threads for scalable network services implementation and evaluation of monadic, application-level concurrency primitives

• Nowadays it’s a Free Monad

Page 39: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• The concurrency monad lets us run a computation until it blocks, do something, then resume it

• But we need to know what it blocked on...

• Could add some info to the Blocked constructor

Page 40: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

newtype Haxl a = Haxl { unHaxl :: IO (Result a) }

data Result a = Done a| Blocked (Haxl a)

instance Monad Haxl wherereturn a = Haxl (return (Done a))m >>= k = Haxl $ doa <- unHaxl mcase a of

Done a -> unHaxl (k a)Blocked r -> return (Blocked (r >>= k))

dataFetch :: Request a -> Haxl adataFetch r = doaddRequest rHaxl (return (Blocked (Haxl (getResult r))))

• We choose to put it in IO instead:

Page 41: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Ok so far, but we still get blocked at the first data fetch.

numCommonFriends x y = dofx <- friendsOf xfy <- friendsOf yreturn (length (intersect fx fy))

Blocked here

Page 42: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• To explore multiple branches, we need to use Applicative

instance Applicative Haxl wherepure = returnHaxl f <*> Haxl a = Haxl $ dor <- fcase r ofDone f' -> dora <- a case ra ofDone a' -> return (Done (f' a'))Blocked a' -> return (Blocked (f' <$> a'))

Blocked f' -> dora <- acase ra ofDone a' -> return (Blocked (f' <*> return a'))Blocked a' -> return (Blocked (f' <*> a'))

Page 43: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• This is precisely the advantage of Applicative over Monad: • Applicative allows exploration of the structure of the

computation

• Our example is now written:

• Or:

numCommonFriends x y = length <$> (intersect <$> friendsOf x <*> friendsOf y)

numCommonFriends x y = length <$> common (friendsOf x) (friendsOf y)where common = liftA2 intersect

Page 44: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Note that we still have the Monad!

• The Monad allows us to make decisions based on values when we need to.

• Batching will not explore the then/else branches• exactly what we want.

dofs <- friendsOf xif simon `elem` fs

then ...else ...

Blocked here

Page 45: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• But it does mean the programmer should use Applicative composition to get batching.

• This is suboptimal:

• So our plan is to guide programmers away from this by using a combination of:• APIs that discourage it (e.g. providing lifted operations)

• Code analysis and automatic feedback

• Profiling

dofx <- friendsOf xfy <- friendsOf yreturn (length (intersect fx fy))

Page 46: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• We really want bulk operations to benefit from batching.

• But this doesn’t work: mapM uses Monad rather than Applicative composition.

• This is why traverse exists:

• So in our library, we make mapM = traverse

• Also: sequence = sequenceA

traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)

friendsOfFriends id = concat <$> (mapM friendsOf =<< friendsOf id)

Page 47: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

---- Return the names of all the friends of the user that are-- members of the Functional Programming group.--fpFriends :: Id -> Haxl [Text]fpFriends id = do

fs <- friendsOf idfp <- filterM (memberOfGroup functionalProgramming) fsmapM getName fp

Page 48: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

---- Return the names of all the friends of the user that are-- members of the Functional Programming group.--fpFriends :: Id -> Haxl [Text]fpFriends id = do

fs <- friendsOf idfp <- filterM (memberOfGroup functionalProgramming) fsmapM getName fp

friendsOf

memberOfGroup

getName

Page 49: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But we need a different filterM. The standard one:

filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]filterM _ [] = return []filterM p (x:xs) = do

flg <- p xys <- filterM p xsreturn (if flg then x:ys else ys)

Page 50: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But we need a different filterM. The standard one:

•and the one that batches the data fetches correctly:

filterM :: (Applicative f, Monad f) => (a -> f Bool) -> [a] -> f [a]filterM pred xs = dobools <- mapM pred xsreturn [ x | (x,True) <- zip xs bools ]

filterM :: (Monad m) => (a -> m Bool) -> [a] -> m [a]filterM _ [] = return []filterM p (x:xs) = do

flg <- p xys <- filterM p xsreturn (if flg then x:ys else ys)

Page 51: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Let’s write the same example in a slightly different way:

nameOfFPFriend :: Id -> Haxl (Maybe Text)nameOfFPFriend id = dob <- memberOfGroup functionalProgramming idif b then Just <$> getName id

else return Nothing

fpFriends2 :: Id -> Haxl [Text]fpFriends2 id = dofs <- friendsOf idname_maybes <- mapM nameOfFPFriend fsreturn (catMaybes name_maybes)

Page 52: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

friendsOf

memberOfGroup

getName

•Before there were obviously 3 rounds.

Page 53: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

friendsOf

memberOfGroup

getName

•Before there were obviously 3 rounds.•But in the new version, we packaged up two fetches in nameOfFPFriend

nameOfFPFriend

Page 54: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

friendsOf

memberOfGroup

getName

•Before there were obviously 3 rounds.•But in the new version, we packaged up two fetches in nameOfFPFriend•The monad still executes everything in 3 rounds.

nameOfFPFriend

Page 55: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But maybe we don’t like all this mapM/filterM stuff

Page 56: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But maybe we don’t like all this mapM/filterM stuff•A query-style API might be nicer.

Page 57: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But maybe we don’t like all this mapM/filterM stuff•A query-style API might be nicer.

newtype HaxlQuery a = HaxlQuery { query :: Haxl [a] }-- basically ListT Haxl, but ListT doesn’t work!-- instances of Monad, MonadPlus

selectFrom :: Haxl [a] -> HaxlQuery aselectFrom = HaxlQuery

liftH :: Haxl a -> HaxlQuery aliftH m = HaxlQuery (return <$> m)

suchThat :: Haxl Bool -> HaxlQuery ()suchThat m = liftH m >>= guard

Page 58: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

•But maybe we don’t like all this mapM/filterM stuff•A query-style API might be nicer.

fpFriends3 :: Id -> Haxl [Text]fpFriends3 id = query [ name | f <- selectFrom (friendsOf id),

_ <- suchThat (memberOfFPGroup f)name <- liftH (getName f) ]

newtype HaxlQuery a = HaxlQuery { query :: Haxl [a] }-- basically ListT Haxl, but ListT doesn’t work!-- instances of Monad, MonadPlus

selectFrom :: Haxl [a] -> HaxlQuery aselectFrom = HaxlQuery

liftH :: Haxl a -> HaxlQuery aliftH m = HaxlQuery (return <$> m)

suchThat :: Haxl Bool -> HaxlQuery ()suchThat m = liftH m >>= guard

Page 59: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

DSL tricks

instance Num a => Num (Haxl a) where(+) = liftA2 (+)(-) = liftA2 (-)-- etc.

simonFriends id =filterM (isCalled “Simon”) =<< friendsOf id

numCoolFriends id = length <$> fpFriends id + length <$> simonFriends id

duplicate data fetching between the two arguments

Page 60: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Implementation

• DataSource abstraction

• Replaying requests

• Scaling

• Hot-code swapping

• Experience

• Status etc.

Page 61: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Data Source Abstraction

• We want to structure the system like this:

• Core code includes the monad, caching support etc.

• Core is generic: no data sources built-in

Core

TAO

Memcache

other service...

Data sources

Page 62: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

How do we arrange this?

• Three ways that a data source interacts with core:• issuing a data fetch request

• persistent state

• fetching the data

• Package this up in a type class

• Let’s look at requests first...

class DataSource req where...

parameterised over the type of

requests

Page 63: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Example Request type

• Core has a single way to issue a request

• Note how the result type matches up.

data ExampleReq a whereCountAardvarks :: String -> ExampleReq IntListWombats :: Id -> ExampleReq [Id]deriving Typeable

it’s a GADT, where the type parameter is the type of the result of this request

dataFetch :: DataSource req => req a -> Haxl a

Page 64: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Fetching requests

• It is Core’s job to keep track of requests submitted via dataFetch

• When the computation is blocked, we have to fetch the batch of requests from the data source

class DataSource req wherefetch :: [BlockedFetch req] -> IO ()

data BlockedFetch req= forall a . BlockedFetch (req a) (MVar a)

Page 65: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

How do we store the requests?

• Core doesn’t know which data sources are in use

• Need to use dynamic typing (Typeable) to implement the store of pending requests and the cache.

class (Typeable1 req, Hashable1 req, Eq1 req) => DataSource req where ...

class Eq1 req whereeq1 :: req a -> req a -> Bool

unfortunate that we need this

Page 66: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Data Source State

• Core keeps track of the state for each data source

• Passes it to fetch

class (Typeable1 req, Hashable1 req, Eq1 req) => DataSource req where

data DataState req

fetch :: DataState req-> [BlockedFetch req]-> IO ()

The state for a data source is an associated datatype

Page 67: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

• Clean data source abstraction

• Means that we can plug in any set of data sources at runtime• e.g. mock data sources for testing

• core code can be built & tested independently

Page 68: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Replayability

• The Haxl monad and the type system give us:• Guarantee of no side effects, except via dataFetch

• Guarantee that everything is cached

• The ability to replay requests...

Page 69: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code

Page 70: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core

Page 71: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Page 72: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Cache

Page 73: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Cache

Page 74: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Cache

Page 75: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Cache

Page 76: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

user code Haxl Core data sources

Cache

• The data sources change over time

• But if we persist the cache, we can re-run the user code and get the same results

• Great for • testing• fault diagnosis• profiling

Page 77: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Scaling• Each server has lots of cores, pounded by requests

from other boxes constantly.

Page 78: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Hot code swapping

• 1-2K machines, new code pushed many times per day

• Use GHC’s built-in linker• Had to modify it to unload code

• GC detects when it is safe to release old code

• We can swap in new code while requests are still running on the old code

Page 79: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Haskell Extensions?

Page 80: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Haskell Extensions?

• Other people on the team are Haskell beginners

Page 81: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Haskell Extensions?

• Other people on the team are Haskell beginners

• Goal: keep the code simple, use few extensions

Page 82: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Haskell Extensions?

• Other people on the team are Haskell beginners

• Goal: keep the code simple, use few extensions

{-# LANGUAGE ExistentialQuantification,TypeFamilies,GADTs,RankNTypes,ScopedTypeVariables,DeriveDataTypeable,StandaloneDeriving,MultiParamTypeClasses,FlexibleInstances

#-}

Page 83: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Haskell Extensions?

• Other people on the team are Haskell beginners

• Goal: keep the code simple, use few extensions

• But we’re doing ok.

{-# LANGUAGE ExistentialQuantification,TypeFamilies,GADTs,RankNTypes,ScopedTypeVariables,DeriveDataTypeable,StandaloneDeriving,MultiParamTypeClasses,FlexibleInstances

#-}

Page 84: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Status

• Prototyped most features (including hot code swapping & scaling)

• Core is written

• We have a few data sources, more in the works

• Busy hooking it up to the infrastructure

• Can play around with the system in GHCi, including data sources

Page 85: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Key points

• Haxl Monad:• Implicit concurrency with Applicative

• Automatic batching of data fetches

• Haskell’s type system guarantees the user can’t break the rules• ... and gives us guaranteed replayability

• Clean separation of core from datasources

• Scaling & hot-code swapping

Page 86: The Haxl Project at Facebook - Haskell · The Haxl Project at Facebook Simon Marlow Jon Coens Louis Brandy Jon Purdy & others. Business Logic service API Databases Other back-end

Questions?