Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

73
Patterns of the Lambda Architecture Truth and Lies at the Edge of Scale

Transcript of Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Page 1: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Patterns of the Lambda Architecture

Truth and Lies at the Edge of Scale

Page 2: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern Set

Page 3: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Tradeoff RulesPICK

ANY

TWO

Page 4: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture

Page 5: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Search w/ Update

BuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Page 6: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Search w/ Update

BuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Main Index

Page 7: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Search w/ Update

BuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Updates Index

Page 8: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Search w/ Update

BuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Serve Result

Page 9: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda ArchitectureBuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Batch

Speed

Serving

Page 10: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

Recommender

Page 11: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

RecommenderBuild

Model

Page 12: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

Recommender

Applies Model

Page 13: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

Recommender

Serves Result

Page 14: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

Recommender

Batch

Speed

Serving

Page 15: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Arch Layers

• Batch layer Deep Global Truth throughput

• Speed layer Relevant Local Truth throughput

• Serving layer Rapid Retrieval latency

Page 16: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Arch: Technology

• Batch layer Hadoop, Spark, Batch DB Reports

• Speed layer Storm+Trident, Spark Str., Samza, AMQP, …

• Serving layer Web APIs, Static Assets, RPC, …

Page 17: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture

Batch

Speed Serving

Page 18: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture

Batch

Speed Servingλ

Page 19: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture

λ(v)• Pure Function on immutable data

Page 20: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Two Big Ideas

• Fine-grained control over architectural tradeoffs

• Truth lives at the edge, not the middle

Page 21: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Ideal Data System

Page 22: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Ideal Data System• Capacity -- Can process arbitrarily large amounts of data

• Affordability -- Cheap to run

• Simplicity -- Easy to build, maintain, debug

• Resilience -- Jobs/Processes fail&restart gracefully

• Justification -- Incorporates all relevant data into result

• Comprehensiveness-- Answer questions about any subject

• Accuracy -- Few approximations or avoidable errors

• Responsiveness -- Low latency for delivering results

• Recency -- Promptly incorporates changes in world

Page 23: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Patterns

Page 24: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

TrainRecomm’der

Visitor ⇒History

History ⇒Alsobuy

Visitor:Product

Visitor ⇒Alsobuy

UpdateRecommendation

Fetch/UpdateHistory

Visitor:ProductHistory

Webserver

Recommender

Page 25: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: Train / React• Model of the world lets you make immediate decisions

• World changes slowly, so we can re-build model at leisure

• Relax: Recency

• Batch layer: Train a machine learning model

• Speed layer: Apply that model

• Examples: most Machine Learning thingies

Page 26: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Search w/ UpdateBuildIndexes

A Ton of Text

HistoricalIndex

Live IndexerMoreText

RecentIndex

API

Page 27: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: Baseline / Delta• Understanding the world takes a long time

• World changes much faster than that, and you care

• Relax: Simplicity, Accuracy

• Batch layer: Process the entire world

• Speed layer: Handle any changes since last big run

• Examples: Real-time Search index; Count Distinct; other Approximate Stream Algorithms

Page 28: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pagerank

48

24

42 12

126

24

24 42

6 6

6

6

6 6

6

Page 29: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

48

24

42 12

126

24

24 42

6 6

6

6

6 6

6

94

-5

-

Guess Using Neighbors

?

Page 30: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

48

24

42 12

126

24

24 42

6 6

6

6

6 6

6

94

-5

-

Guess Using Neighbors

12÷3 = 4

24÷5 ≈ 59

Page 31: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

48

24

42 12

126

24

24 42

6 6

6

6

6 6

6

94

-5

-

…But Don’t Fix Neighbors

meh

Page 32: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Batch Updates Graph

42

30

36 11

106

21

21 36

46

6

6

5 5

4

93

9

6

Page 33: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

PagerankConvergePagerank

Friend Relations

User ⇒Pagerank

Retrieve Bob’sFacebook NtwkBob Bob’s Friends’

PageranksEstimate

Bob’s Pagerank

But don’t bother updating Bob’s Friends (or friends friends or …)

API

Page 34: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: Guess Beats Blank• You can’t calculate a good answer quickly

• But Comprehensiveness is a must

• Relaxing: Accuracy, Justification

• Batch layer: finds the correct answer

• Speed layer: makes a reasonable guess

• Examples: Any time the sparse-data case is also the most valuable

Page 35: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Marine Corp’s 80% Rule

“Any decision made with more than 80% of the

necessary information is hesitation”

— “The Marine Corps Way” Santamaria & Martino

Page 36: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

A Guess Beats a Blank• You can’t calculate a good answer quickly

• But Comprehensiveness is a must

• Relaxing: Accuracy, Justification

• Batch layer: finds the correct answer

• Speed layer: makes a reasonable guess

• Examples: Any time the sparse-data case is also the most valuable

Page 37: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: World/Local• Understanding the world needs full graph

• You can tell a little white lie reading immediate graph only

• Relaxing: Accuracy, Justification

• Batch layer: uses global graph information

• Speed layer: just reads immediate neighborhood

• Examples: “Whom to follow”, Clustering, anything at 2nd-degree (friend-of-a-friend)

Page 38: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Network Security

Find PotentialEvilness

Connection Counts

Agents of Interest

Store Interaction

Net Connect

ions

DetectedEvilnesses

ApproximateStreaming Agg

Agent of Interest?

Dashboard

Page 39: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: Slow Boil/Flash Fire• Two tempos of data: months vs milliseconds

• Short-term data too much to store

• Long-term data too much to handle immediately

• Often accompanies Baseline / Deltas

• Examples:• Trending Topics• Insider Security

Page 40: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Banking, OversimplifiedReconcileAccounts

AccountBalances

Event Store Transaction Update Records

Page 41: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Banking, OversimplifiedReconcileAccounts

AccountBalances

Event Store Transaction Update Records

nice-to-haveessential

This wins over fast layer

Page 42: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Pattern: C-A-P Tradeoffs• C-A-P tradeoffs:

• Can’t depend on when data will roll in (Justification)

• Can’t live in ignorance (Comprehensiveness)

• Batch layer: The final answer

• Speed layer: Actionable views

• Examples: Security (Authorization vs Auditing), lots of counting problems

Page 43: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Common Theme

The System Asymptotesto Truth over time

Page 44: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Two Big Ideas

• Fine-grained control over those architectural tradeoffs

• Truth lives at the edge, not the middle

Page 45: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture for a Dinky Little Blog

Page 46: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Blog: Traditional Approach

• Familiar with the ORM Rails-style blog:

• Models: User, Article, Comment

• Views:

• /user/:id (user info, links to their articles and comments);

• /articles (list of articles);

• /articles/:id (article content, comments, author info)

Page 47: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

User

id 3

name joeman

homepage http://…

photo http://…

bio “…”

Article

id 7

title The Crisis

body These are…

author_id 3

created_at 2014-08-08

Comment

id 12

body “lol”

article_id 7

author_id 3

Page 48: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Author NameAuthor Bio Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

AuthorPhoto

Joe has written 2 Articles:“A Post about My Cat”

Donec nec justo eget felis facilisis fermentum. Aliquam porttitor mauris sit amet orci. Aenean dignissim pellentesque (… read more)

“Welcome to my blog”

Donec nec justo eget felis facilisis fermentum. Aliquam porttitor mauris sit amet orci. Aenean dignissim pellentesque (… read more)

Article TitleArticle Body Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.

Author Name

AuthorPhoto

Author Bio Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Comments"First Post" (8/8/2014 by Commenter 1)

"lol" (8/8/2014 by Commenter 2)

"No comment" (8/8/2014 by Commenter 3)

article show user show

Page 49: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

articles

users

comments

Webserver

Traditional: Assemble on Read

Page 50: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Changes update models

updatearticle

updateuser

updatecomments

Δarticle

Δuser

Δcom’nt

models

user

com’nt

article

history

Page 51: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Models Trigger Reporters

updatearticle

updateuser

updatecomments

Δarticle

Δuser

Δcom’nt

models

user

com’nt

article

history

compactarticle

user’s #articles

expanded user

user’s #comments

sidebar user

compact comment

expandedarticle

exp’darticle

compactarticle

user’s #articles

exp’duser

sidebar user

user’s #comments

compactcomment

microuser

micro user

Page 52: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Serve Report Fragmentsexp’darticle

compactarticle

user’s #articles

exp’duser

sidebar user

user’s #comments

compactcomment

micro user

showarticle

Article TitleArticle Body Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.

Author Name

AuthorPhoto

Author Bio Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Comments"First Post" (8/8/2014 by Commenter 1)

"lol" (8/8/2014 by Commenter 2)

"No comment" (8/8/2014 by Commenter 3)

Page 53: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Article TitleArticle Body Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.

Author Name

AuthorPhoto

Author Bio Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Comments"First Post" (8/8/2014 by Commenter 1)

"lol" (8/8/2014 by Commenter 2)

"No comment" (8/8/2014 by Commenter 3)

article show rendered{"title":"Article Title","body":"Article Body Lorem [...]","author":{ ... },"comments: [ {"comment_id":1, "body":"First Post",...}, {"comment_id":2, "body":"lol",...}, ...]}

Page 54: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Serve Report FragmentsArticle Title

Article Body Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident.

Author Name

AuthorPhoto

Author Bio Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor

Comments"First Post" (8/8/2014 by Commenter 1)

"lol" (8/8/2014 by Commenter 2)

"No comment" (8/8/2014 by Commenter 3)

exp’darticle

compactarticle

user’s #articles

exp’duser

sidebar user

user’s #comments

compactcomment

micro user

showuser

Page 55: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Reports are Cheap

updatearticle

updateuser

updatecomments

Δarticle

Δuser

Δcom’nt

models

user

com’nt

article

history

compactarticle

user’s #articles

expanded user

user’s #comments

sidebar user

compact comment

expandedarticle

exp’darticle

compactarticle

user’s #articles

exp’duser

sidebar user

user’s #comments

compactcomment

microuser

micro user

list articles

showarticle

list user’s articles

showuser

Page 56: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

• (…hack hack hack…)

/articles/v2/show.json

/articles/v1/show.json

• (…hack hack hack…)

What data model would you like to receive? {“title”:”…”,

“body”:”…”,…}

j/k lol can I have

Data Engineer Web Engineer

{“title”:”…”, “body”:”…”, “snippet”:…}

Page 57: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Syndicated Data• Reports are cheap, single-concern, and faithful to the view.

• You start thinking like the customer, not the database

• All pages render in O(1):• Your imagination doesn’t have to fit inside a TCP timeout

• Data is immutable, flows are idempotent:• Interface change is safe

• Data is always _there_,• Asynchrony doesn’t affect consumers

• Everything is decoupled:• Way harder to break everything

Page 58: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Syndicated Data

• The Data is always _there_

• …but sometimes it’s more perfect than other times.

Page 59: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

• Lambda architecture isn’t about speed layer / batch layer.

• It's about

• moving truth to the edge, not the center;

• enabling fine-grained tradeoffs against fundamental limits;

• decoupling consumer from infrastructure

• decoupling consumer from asynchrony

• …with profound implications for how you build your teams

λ Arch: Truth, not Plumbing

Page 60: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Page 61: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Page 62: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Lambda Architecture Entity Resolution

Page 63: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Scrape Product Web

Page 64: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Intake

parseAmazonAmazon

parseeBayeBay

parseMa&Pa

Ma&PaElectronics

Bulk

Stream

RPC Callbackkey

words

mfr &model

ASIN

VendorListingListings

Page 65: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Batch Layer: Resolve/Unify

ProductResolver

Unified ProductsListings

UnifyProducts

Page 66: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Improve

ProductResolver

keywords

mfr &model

ASIN

VendorListing

FetchProducts

Unified ProductsListings

UnifyProducts

Page 67: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Update

ProductResolver

keywords

mfr &model

ASIN

VendorListing

FetchProducts

Unified Products

Resolve &Update

Listings

UnifyProducts

Page 68: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Cannot have Consistency

ProductResolver

keywords

mfr &model

ASIN

VendorListing

FetchProducts

Unified Products

Resolve &Update

Listings

UnifyProducts

Page 69: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Objections

Page 70: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Objections• Three objections

1.Why hasn't it been done before

2.Architecture Astronaut

3.I'm not at high scale

• Response

1.Chef/Puppet/Docker/etc

2.Chef/Puppet/Docker/etc

3.Shut Up

Page 71: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe
Page 72: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Objections

• Two APIs? Really?

• Yes. Guilty. That’s dumb and must be fixed.• Spark or Samza, if you’re willing to only drink one flavor of

Kool-Aid• EZbake.io, a CSC / 42six project to attack this• …but we shouldn’t be living at the low level anyhow

Page 73: Patterns of the Lambda Architecture -- 2015 April - Hadoop Summit, Europe

Objections

• Orchestration: “logical plan” (dataflow graph)

• Optimization/Allocation: “physical plan” (what goes where)

• Resource Projector: instantiates infrastructure• HTTP listeners, Trident streams, Oozie scheduling, ETL

flows, cron jobs, etc• Transport Machineries:

• move data around, fulfilling locality/ordering/etc guarantees• Data Processing: UDFs and operators