C* Summit 2013: Big Architectures for Big Data by Eric Lubow
-
Upload
planet-cassandra -
Category
Technology
-
view
1.648 -
download
1
description
Transcript of C* Summit 2013: Big Architectures for Big Data by Eric Lubow
Eric Lubow
@elubow
Big Architecturesfor Big Data
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Overvie• SimpleReach
• Goals
• Tools
• Architecture Implementation
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
The 2 Truths
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Even with the right tools, 80% of the work of building a big data system is acquiring and refining
The Real Truth
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
• Millions of URLs per day
• Over 1.25 billion page views per month
• 500m events per day (~6k events/second)
• Auto-scale 125-160 machines depending on traffic
SimpleReach
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
And It Goes Like This...
C*
Vertica
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Goals• Consistent non-data storage layer access patterns
• Data accuracy across storage engines
• Minimize downtime/Minimize cost of downtime
• High availability
• Allow access to many toolsets (for all languages, DBs, Engines)
• Clients should have minimal architecture knowledge
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Consistent Access Patterns
realtime_score
(‘score’, ‘realtime’)
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Authentication, Tracking,
Per service access keys
Track call volume by access key
Prevent internal denial of service
Monitor availability and performance
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Controlled Data Flow
Social Event Collector
Social Data
Batch & Write Processed DataBatch & Write Raw Data
Calculate Score
Write
NSQ Multicast NSQ NSQ
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
NSQ by Bit.ly• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Runtime discovery for consumers to find producers
• Allow for maintenance windows with no downtime
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Path of a Packet
InternetEC
Inte
rnal
API
Solr
C*
Mong
Redis
Vertic
API
Fire Hos
SC
Cons
umer
s
Que
ue
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Evolution Takes Work• Know your access patterns
• Service Oriented Architecture (Internal API)
• Data accuracy checks: visual and programmatic
• Built framework for testing out engines (Storage, Queueing, etc)
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Homogeneous Machines at Base Application
Base AMI
Organizational Base
Event Collection
NSQ
Mongos
App Config
Users
Monitoring
Consumer
NSQ
Mongos
App Config
Users
Base Image Layout Producer Consumer
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
DevOps Wizardry• Extensive use of AWS
• Monitor: Nagios, Statsd, and Graphite
• Manage: Chef, OpsWorks, cSSHx, Vagrant
• Deployments
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Evolving Amazon Tools• Full Featured API
• OpsWorks
• Cloud Formation
• S3 / CloudFront
• Elastic Beanstalk
• Elastic MapReduce
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Service
Internal API
Solr
Real-timeC*
C*
Vertica
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Service Architecture MachinesApplication
Base AMI
Organizational Base
iAPI Front End
nginx
App Config
Users
Monitoring
Data Store
App Config
Users
Base Image Layout Proxy Machines Storage Machines
Amazon Linux
Monitoring
Amazon Linux
Application Group
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Anatomy of an Endpoint
Mong
Mong
Vertic
C*
C*
hour
lyco
nten
t Mong
Mong
Vertic
C*
C*ten
min
ute
cont
ent
Que
ryin
g M
achi
nes
Helen
Helen
PyVertic
PyMon
PyMon
PyVertic
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Endpoint Breakout • Availability
• Consistent Access Patterns
• Minimal downtime changes
• Smaller code deploys
• Non-monolithic code base
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Architecture DistributionUS-EAST-1a
MONGO-SHARD-0001-B
MONGO-SHARD-0000-A
CASSANDRA-0001
CASSANDRA-0010
REDIS-0001A
VERTICA-0001
iAPI-0001
US-EAST-1b
MONGO-SHARD-0002-B
MONGO-SHARD-0001-A
CASSANDRA-0002
CASSANDRA-0011
REDIS-0001B
iAPI-0002
US-EAST-1e
MONGO-SHARD-0002-A
MONGO-SHARD-0000-B
CASSANDRA-0003
CASSANDRA-0012
VERTICA-0003
iAPI-0003
VERTICA-0002
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Problems?
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
The Schrute of the Problem
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
New Service Questions• Can its host be completely homogenous?
• Can it accept downtime (and what should downtime look like)?
• Does it fit into an existing service?
• Does it require datacenter distribution?
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Summary• Solutions Require Evolution
• Build, Use, and Integrate Tools
• Abstraction
• Homogeneous Distribution
• Monitoring & Automation
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
We’re (Ask about Food Coma Fridays)
Big Architectures for Big Data
Eric Lubow @elubow #Cassandra13
Questions are guaranteed in life.Answers aren’t.
Eric Lubow
@elubow
Thank you.