Real World Cassandra

Post on 04-Jul-2015

9.532 views 0 download

Transcript of Real World Cassandra

|

the prospect engine for brands.

Cassandra in Online Advertising: Real Time Bidding

Who are we?

Costa Sevdinoglou & Edward Capriolo

Impressions look like…

A High Level look at RTB

4. On behalf of the marketer, m6d bids the impressions via the

auction house. If m6d wins, we display our ad to the

browser.

3. Exchanges serve as auction houses for the impressions

1. Browsers visit Publishers and create impressions.

2. Publishers sell impressions via Exchanges.

Performance and Data

• Billions and billions of bid requests a day

• A single request can result in multiple Cassandra Operations!

• One cluster is just under 10TB and growing

• Low latency requirement below 120 ms typical

• Limited data available to m6d via the exchange

Segment Data

Segments are how we assign product or service

affinity to a group of users. User’s we consider to be

like minded with respect to a given brand will be

placed in the same segment.

Segment Data is just one component of our

overarching data model.

Segments help to reduce the number of calculations

we do in real time.

Old Approach for Segment Data

Limitations

•Periodically updated.

•Only subsection of

the data.

•Cluster performance

is effected during a

data push.

Application Nodes (Tomcat + MySQL )

Event Logs

Hadoop Aggregation

MySQL Data Push

Cassandra Approach for Segment Data

Better!

• Updating in real time now

possible

• Distributed not duplicated

• Less complexity to manage

• Storing more information

• We can now bid on users

sooner!

Application Nodes (Tomcat + Less MySQL Usage)

Cassandra

One Ring to rule them all

http://askyyy.blog.163.com/blog/static/1234575992010428819399/

Peer to Peer per operation replication

Fail fast, self-healing

Each write goes to all natural endpoints

Hinted handoff if destination is down

Repair on Read

No more: STOP SLAVE; SET GLOBAL

SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;

Multi Data Center

No designing and managing complex replication topologies

create keyspace world

with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'

and strategy_options={1:3, 2:3, 3:3};

The same process as single data center

No log shipping, or separate processes to run

Monitoring & Management

Many Many things to monitor with JMX

Nice command line tools

Most values can be tweaked at run time

Capacity Planning

How many

Rows

Columns

Size of Average Column

Latency requirements

Throughput read and writes per sec

Unit Tests FTW!

Max 2 billion columns per row

Awesome

Unless you accidentally write 2 billion columns to a row key named “null”

Check maxRowSize JMX

Watch logs for messages about compacting large rows

Local (NYC) Meetups

www.meetup.com/NYC-Cassandra-User-Group/