Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach
-
Upload
planet-cassandra -
Category
Technology
-
view
538 -
download
0
description
Transcript of Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach
Eric Lubow
@elubow
Message
Architectures in Distributed
Systems
Message Architectures in Distributed Systems Eric Lubow @elubow
Overview
• SimpleReach
• Why is messaging important
• Goals
• Explanations
• Questions
Message Architectures in Distributed Systems Eric Lubow @elubow
Personal Vanity
• CTO of SimpleReach
• Co-author of Practical Cassandra
• Skydiver, Mixed Martial Artist,
Motorcyclist, Dog dad, NY Giants fan
• IronMatt Foundation for Pediatric Brian
Tumors (ironmatt.org)
Message Architectures in Distributed Systems Eric Lubow @elubow
Message Architectures in Distributed Systems Eric Lubow @elubow
Message Architectures in Distributed Systems Eric Lubow @elubow
• Millions of URLs per day
• Over 3.75 billion page views per month
• 7b events per day (~80k events/second)
• Auto-scale 175-190 machines depending on traffic
• Built a predictive measurement algorithm for the social web
SimpleReach
Message Architectures in Distributed Systems Eric Lubow @elubow
Why is Messaging Important?
• Most large scale systems discussions only talk about storage
• Direct high volumes of data around your infrastructure
• Control flow of data through your infrastructure
• Decouple important systems
• Scalability, Elasticity, Deliverability, and Redundancy
• Buffering and Asynchronous communication
Message Architectures in Distributed Systems Eric Lubow @elubow
The database is NOT a transport layer
App
❶
❹
❸
❷
incoming request
sync persist data
send response
async queue message
Data Flow
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow Patterns
• Enrichment/In-stream Modification Schemes
• Monitoring and Instrumentation
Message Architectures in Distributed Systems Eric Lubow @elubow
Messaging Systems
• RabbitMQ
• ZeroMQ
• Kafka
• Amazon SQS
• NSQ
• ActiveMQ
• Resque
• Custom
Message Architectures in Distributed Systems Eric Lubow @elubow
What Did SimpleReach Choose?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
NSQ• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Simple to configure and deploy
• Allow for maintenance windows with no downtime
• Ephemeral channels for testing
• Channel sampling
github.com/bitly/nsq
Message Architectures in Distributed Systems Eric Lubow @elubow
separate hosts
• a topic is a distinct stream of messages (a single nsqd instance can have multiple topics)
• a channel is an independent queue for a topic (a topic can have multiple channels)
• consumers discover producers by querying nsqlookupd (a discovery service for topics)
• topics and channels are created at runtime (just start publishing/subscribing)
nsqd
“metrics”
Channels
“event”
Topics
“enrichment”
“writer”
Consumers
AAABBB
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Topics and Channels
Message Architectures in Distributed Systems Eric Lubow @elubow
Everyone Speaks The Same Language
http:// + {“content-type”: “application/json”}
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
Message Architectures in Distributed Systems Eric Lubow @elubow
• nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/channels)
• nsq_to_http - utility that helps transport an aggregate stream over HTTP
• nsq_to_file - utility that safely persists an aggregated stream to disk
• nsq_stat - iostat like utility for a topic/channel
• nsq_tail - tail like utility for a topic/channel
NSQ Tools
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
Right Tool For The Job
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
Message Architectures in Distributed Systems Eric Lubow @elubow
NSQNSQD
API
consumer
NSQNSQD
API
NSQNSQD
API
consumer
nsqlookupd
nsqlookupd
PUBLISH
REGISTER
DISCOVER
SUBSCRIBE
How Does It Work?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
The Schrute of the Problem
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
Message Architectures in Distributed Systems Eric Lubow @elubow
Simple Deployment & Automation
• Chef cookbook - github.com/simplereach/chef-nsq
• Written in Go
• Easily distributable binaries
• Deploy lookup nodes
• Nsqd’s installed locally
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
Message Architectures in Distributed Systems Eric Lubow @elubow
nsqlookupd nsqlookupd
consumer➊ regularly poll for topic producers
➋ connect to all producers
HTTP requests
Runtime Discovery
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
Message Architectures in Distributed Systems Eric Lubow @elubow
Path of a Packet
Internet
EC
Inte
rn
al
AP
I
Solr
C*
Mongo
Redis
Vertica
API
Fire Hose
SC
Co
ns
um
ers
Qu
eu
e
Message Architectures in Distributed Systems Eric Lubow @elubow
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
Message Architectures in Distributed Systems Eric Lubow @elubow
Broadcast Importance for Polyglottany
Aggregator
Mongo Writer
Broadcast
Redis Writer
Cassandra Writer
Solr Writer
Calculator
NSQ
Vertica Writer
Message Architectures in Distributed Systems Eric Lubow @elubow
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
Message Architectures in Distributed Systems Eric Lubow @elubow
What Is Enrichment?
A mechanism to add value to a message to enhance processing in
your system
Message Architectures in Distributed Systems Eric Lubow @elubow
How Do We Enrich
Raw EventEnriched
Event
Consumer A
Consumer B
Consumer C
NSQ Broadcast
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
Message Architectures in Distributed Systems Eric Lubow @elubow
Monitoring / Instrumentation
• Comes with statsd support built-in
• Statsd talks to both Graphite and nsqadmin
• Nsqadmin comes with graphs for message processing stats
• Nagios plugins available for monitoring topic/channel depth
• Average end to end latency calculations are done on a per-channel basis
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
• Monitoring and Instrumentation
Message Architectures in Distributed Systems Eric Lubow @elubow
Summary• Large Systems are more than just storage
• Abstraction
• Highly Available
• Controlled Data Flow Patterns
• Monitoring & Automation
Message Architectures in Distributed Systems Eric Lubow @elubow
We’re
Hiring
Message Architectures in Distributed Systems Eric Lubow @elubow
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
Cassandra Day, New York
Thank you.