Real-World Cassandra at ShareThis

38
1 Real-World Cassandra at ShareThis Use Cases, Data Modeling, and Hector

Transcript of Real-World Cassandra at ShareThis

Page 1: Real-World Cassandra at ShareThis

1

Real-World Cassandra at ShareThis

Use Cases, Data Modeling, and Hector

Page 2: Real-World Cassandra at ShareThis

ShareThis + Our Customers: Keys to Unlocking Social

2

1. DEPLOY SOCIAL TOOLS ACROSS BRANDS (AND DEVICES)

2. TAKE YOUR SOCIAL INVENTORY TO MARKET

3. LEVERAGE SHARETHIS: FOR DIRECT SALES, RESEARCH AND UN-RESERVED INVENTORY

Page 3: Real-World Cassandra at ShareThis

Largest Ecosystem For Sharing and EngagementEngagement Across The Web

3

120 SOCIAL CHANNELS120 SOCIAL CHANNELS

SHARETHIS ECOSYSTEMSHARETHIS ECOSYSTEM

211 MILLION PEOPLE(95.1% of the web)

2.4 MILLION PUBLISHERS

Source: ComScore U.S. January 2013; internal numbers, January 2013

Page 4: Real-World Cassandra at ShareThis

Data Modeling and Why it Matters (Keep it even, Keep it slice-able)

Page 5: Real-World Cassandra at ShareThis

5

Use Cases

Page 6: Real-World Cassandra at ShareThis

A New Product: SnapSets

3 - x1.large

A New Product: SnapSets

Page 7: Real-World Cassandra at ShareThis

Use Case: SnapSets, A New Product

Page 8: Real-World Cassandra at ShareThis

Use Case: SnapSets, A New Product (Continued)

CF: Users (userId)meta:first_name=Ronaldmeta:last_name=Melenciometa:username=ronsharethisscrapbook:timestamp:scrapbookId:name=Scrapbook 1scrapbook:timestamp:scrapbookId:date_created=Jan 10url1:sid:clipID={LOCATION DATA}url1:sid:456={LOCATION DATA}

CF: Scrapbooks (scrapbookId)clip:timestamp:clipId:url=sharethis.comclip:timestamp:clipId:title=Clip 1clip:timestamp:clipId:likes=10

CF: Clip (clipId)comment:timestamp:commentId={"name":"Ronald","timestamp":'"jan 10","comment":"hi"}

CF: Stats (user:userId,application,publisher:pubId)meta:total_scrapbooks=1meta:total_clips=100meta:total_scrapbook_comments=100scrapbook:timestamp:scrapbookId:total_comments=10scrapbook:timestamp:scrapbookId:clip:timestamp:clipId:likes=10scrapbook:timestamp:scrapbookId:clip:timestamp:clipId:dislikes=10

Page 9: Real-World Cassandra at ShareThis

9

Use Cases

Page 10: Real-World Cassandra at ShareThis

High Velocity Reads and Writes: Count Service

9 – hi1.4xlarge9 – x1.large

Page 11: Real-World Cassandra at ShareThis

Use Case: Count Service for URL's

● 1 Billion Pageviews per day = 12k pageviews per second

● 60 Million Social Referrals per day = 720 social referrals per second

● 1 Million Shares per day = 12 shares per second

● No expiration of Data* (3bn rows)

● Requires minimum latency possible

● Multiple read requests per page on blogs

● Normalize and Hash the URL for a row key

● Each social channel is a column

● Retrieve the whole row for counts

● Fix it by cheating ^_^ *

Page 12: Real-World Cassandra at ShareThis

12

Use Cases

Page 13: Real-World Cassandra at ShareThis

Insights that Matter – Your Social Analytics Dashboard

13

Timely Social Analytics

Dive deeper into your most social content

Identify popular articles

Uncover which social channels are driving

the most social traffic

Benchmark your social engagement with SQI

Measure social activity on an hourly, daily, weekly & monthly basis.

12 - x1.large

Page 14: Real-World Cassandra at ShareThis

Use Case: Loading Processed Batch Data

● Backend Hadoop stack for processing analytics

● 58 JSON schemas map tabular data to key/value storage for slicing

● MondoDB* did not scale for frequent row level writes on the same table

● Needed to maintain read throughput during spikes to writes when analytics were finished

● No TTL* - Works daily, doesn't work hourly

● Switching from Astyanax to Hector

● Using a Hector Client through Java API's

Page 15: Real-World Cassandra at ShareThis

Use Case: Loading Processed Batch Data (continued)

{ "schema": [ { "column_name":"publisher", "column_type":"UTF8Type", "column_level":"common", "column_master":"" }, {"column_name":"domain","column_type":"UTF8Type","column_level":"common","column_master":""}, {"column_name":"percenta","column_type":"FloatType","column_level":"composite_slave","column_master":"category"}, {"column_name":"percentb","column_type":"FloatType","column_level":"composite_slave","column_master":"category"}, {"column_name":"sqi","column_type":"FloatType","column_level":"composite_slave","column_master":"category"}, {"column_name":"month","column_type":"UTF8Type","column_level":"partition","column_master":""}, {"column_name":"category","column_type":"UTF8Type","column_level":"composite_master","column_master":""} ], "row_key_format": "publisher:domain:month", "column_family_name": "sqi_table"}

CF -> Data TypeRow -> Publisher:domain:timestampColumns -> master:slave = value (topics, categories, urls, timestamps, etc)

Page 16: Real-World Cassandra at ShareThis

16

Use Cases

Page 17: Real-World Cassandra at ShareThis

Insights that Matter – Your Social Analytics Dashboard

17

Real Time Social Analytics

Dive deeper into your most social content

Identify trending articles in real-time

Uncover which social channels are driving

the most social traffic

Benchmark your social engagement with SQI

Measure social activity on an hourly, daily, weekly & monthly basis.

12 - cc1.4xlarge

Page 18: Real-World Cassandra at ShareThis

Insights that Matter – Your Social Analytics Dashboard

18

Real Time Social Analytics

Dive deeper into your most social content

Identify trending articles in real-time

Uncover which social channels are driving

the most social traffic

Benchmark your social engagement with SQI

Measure social activity on an hourly, daily, weekly & monthly basis.

12 - cc1.4xlarge

Page 19: Real-World Cassandra at ShareThis

Insights that Matter – And aren't accessible

Page 20: Real-World Cassandra at ShareThis

Insights that Matter – And aren't accessible

Page 21: Real-World Cassandra at ShareThis

Insights that Matter – And aren't accessible

● Too many columns – unbounded url / channel sets

● Cascading failure

● Solutions:

– Bigger Boxes – meh...

– Split up the columns – split the rowkeys

● Hash Urls and keep stats separate

– Split up the columns – split the CF

● Move URLs to their own space

– Split up the columns – split the Keyspace

● Keyspace is a timestamp

Page 22: Real-World Cassandra at ShareThis

22

Ask Good Data Modeling Questions

Page 23: Real-World Cassandra at ShareThis

23

● How many rows will there be?● How many columns per row will you need?● How will you slice your data?● What are the maximum number of rows ?● What are the maximum number of columns?● Is your data relational?● How long will your data live?

Page 24: Real-World Cassandra at ShareThis

24

Hectorhttps://github.com/hector-client/hector/wiki/User-Guide

Page 25: Real-World Cassandra at ShareThis

Hector Imports

import me.prettyprint.cassandra.model.BasicColumnFamilyDefinition;import me.prettyprint.cassandra.model.ConfigurableConsistencyLevel;import me.prettyprint.cassandra.serializers.LongSerializer;import me.prettyprint.cassandra.serializers.StringSerializer;import me.prettyprint.cassandra.service.ColumnSliceIterator;import me.prettyprint.cassandra.service.ThriftCfDef;import me.prettyprint.cassandra.service.ThriftKsDef;import me.prettyprint.cassandra.service.template.ColumnFamilyResult;import me.prettyprint.cassandra.service.template.ColumnFamilyTemplate;import me.prettyprint.cassandra.service.template.ThriftColumnFamilyTemplate;

import me.prettyprint.hector.api.beans.ColumnSlice;import me.prettyprint.hector.api.beans.HColumn;import me.prettyprint.hector.api.beans.HCounterColumn;import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;import me.prettyprint.hector.api.ddl.ComparatorType;import me.prettyprint.hector.api.ddl.KeyspaceDefinition;import me.prettyprint.hector.api.exceptions.HectorException;import me.prettyprint.hector.api.factory.HFactory;import me.prettyprint.hector.api.mutation.Mutator;import me.prettyprint.hector.api.query.ColumnQuery;import me.prettyprint.hector.api.query.CounterQuery;import me.prettyprint.hector.api.query.QueryResult;import me.prettyprint.hector.api.query.SliceCounterQuery;import me.prettyprint.hector.api.query.SliceQuery;

Page 26: Real-World Cassandra at ShareThis

Hector: Add a keyspace

public static Cluster getCluster(String name, String hosts) { return HFactory.getOrCreateCluster(name, hosts); }

public static KeyspaceDefinition createKeyspaceDefinition(String keyspaceName, int replication) { return HFactory.createKeyspaceDefinition( keyspaceName, ThriftKsDef.DEF_STRATEGY_CLASS, // "org.apache.cassandra.locator.SimpleStrategy" replication, null // ArrayList of CF definitions ); }

public static void addKeyspace(Cluster cluster, KeyspaceDefinition ksDef) { KeyspaceDefinition keyspaceDef = cluster.describeKeyspace(ksDef.getName()); if (keyspaceDef == null) { cluster.addKeyspace(ksDef, true); System.out.println("Created keyspace: " + ksDef.getName()); } else { System.err.println("Keyspace already exists"); } }

Page 27: Real-World Cassandra at ShareThis

Hector: Define a CF

public static ColumnFamilyDefinition createGenericColumnFamilyDefinition( String ksName, String cfName, ComparatorType ctName) { BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(ksName); columnFamilyDefinition.setName(cfName); columnFamilyDefinition.setDefaultValidationClass(ctName.getClassName()); columnFamilyDefinition.setReplicateOnWrite(true); return new ThriftCfDef(columnFamilyDefinition); } public static ColumnFamilyDefinition createCounterColumnFamilyDefinition(String ksName, String cfName) { BasicColumnFamilyDefinition columnFamilyDefinition = new BasicColumnFamilyDefinition(); columnFamilyDefinition.setKeyspaceName(ksName); columnFamilyDefinition.setName(cfName); columnFamilyDefinition.setDefaultValidationClass(ComparatorType.COUNTERTYPE.getClassName()); columnFamilyDefinition.setReplicateOnWrite(true); return new ThriftCfDef(columnFamilyDefinition); }

Page 28: Real-World Cassandra at ShareThis

Hector: Add a CF

Keyspace k = HFactory.createKeyspace(nameString, cluster);

public static void addColumnFamily(Cluster cluster, Keyspace keyspace, ColumnFamilyDefinition cfDef) { KeyspaceDefinition ksDef = cluster.describeKeyspace(keyspace.getKeyspaceName()); if (ksDef != null) { List<ColumnFamilyDefinition> list = ksDef.getCfDefs(); String cfName = cfDef.getName(); boolean exists = false; for (ColumnFamilyDefinition myCfDef : list) { if (myCfDef.getName().equals(cfName)) { exists = true; System.err.println("Found Column Family: " + cfName + ". Did not insert."); } } if (!exists) { cluster.addColumnFamily(cfDef, true); System.out.println("Created column family: " + cfDef.getName()); } } else { System.err.println("Keyspace definition is null"); } }

Page 29: Real-World Cassandra at ShareThis

Hector: Insert Column

public static void insertColumn( Cluster cluster, Keyspace keyspace, String cfName, String rowKey, String columnName, String columnValue) { Mutator<String> mutator = HFactory.createMutator(keyspace, StringSerializer.get()); //HFactory.createColumn(columnName, columnValue, StringSerializer.get(), StringSerializer.get()) HColumn<String, String> hCol = HFactory.createStringColumn(columnName, columnValue); mutator.insert(rowKey, cfName, hCol); mutator.execute(); }

public static void incrementCounter( Cluster cluster, Keyspace keyspace, String cfName, String rowKey, String counterColumnName) { Mutator<String> mutator = HFactory.createMutator(keyspace, StringSerializer.get()); mutator.insertCounter( rowKey, cfName, HFactory.createCounterColumn(counterColumnName, 1, StringSerializer.get())); mutator.execute(); }

Page 30: Real-World Cassandra at ShareThis

Hector: Read Column

public static String getColumn( Cluster cluster, Keyspace keyspace,

String cfName, String rowKey, String columnName) { ColumnQuery<String, String, String> query = Hfactory.createColumnQuery(

keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); query.setColumnFamily(cfName).setKey(rowKey).setName(columnName); HColumn<String, String> value = query.execute().get(); if (value != null) { return value.getValue(); } return ""; }

Page 31: Real-World Cassandra at ShareThis

Hector: Read Column

public static String getColumn( Cluster cluster, Keyspace keyspace,

String cfName, String rowKey, String columnName) { ColumnQuery<String, String, String> query = Hfactory.createColumnQuery(

keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get()); query.setColumnFamily(cfName).setKey(rowKey).setName(columnName); HColumn<String, String> value = query.execute().get(); if (value != null) { return value.getValue(); } return ""; }

Page 32: Real-World Cassandra at ShareThis

Hector: Read Column

public static long getCounter( Cluster cluster, Keyspace keyspace,

String cfName, String rowKey, String counterColumnName) { CounterQuery<String, String> query =

HFactory.createCounterColumnQuery(keyspace, StringSerializer.get(),StringSerializer.get());

query.setColumnFamily(cfName).setKey(rowKey).setName(counterColumnName); HCounterColumn<String> counter = query.execute().get(); if (counter != null) { return counter.getValue(); } return 0; }

Page 33: Real-World Cassandra at ShareThis

Hector: Read A Slice

public static Map<String, String> getSlice( Cluster cluster, Keyspace keyspace, String cfName, String rowKey, String start, String end, boolean reversed, int count) {

SliceQuery<String, String, String> query = HFactory.createSliceQuery(keyspace, StringSerializer.get(), StringSerializer.get(), StringSerializer.get());

// for counter use HFactory.createSliceQuery query.setColumnFamily(cfName); query.setKey(rowKey); query.setRange(start, end, reversed, count); Iterator<HColumn<String, String>> iter = query.execute().get().getColumns().iterator(); Map<String, String> answer = new HashMap<String, String>(); while (iter.hasNext()) { HColumn<String, String> temp = iter.next(); answer.put(temp.getName(), temp.getValue()); } return answer; }

Page 34: Real-World Cassandra at ShareThis

Hector: Read All Columns

public static Map<String, String> getAllValues( Cluster cluster, String keyspace, String cf, String rowkey) {

HashMap<String, String> values = new HashMap<String, String>(); Keyspace keyspaceObject = HFactory.createKeyspace(keyspace, cluster); SliceQuery<String,String,String> query =

Hfactory.createSliceQuery(keyspaceObject, StringSerializer.get(), StringSerializer.get(), StringSerializer.get());

query.setColumnFamily(cf).setKey(rowkey).setRange("", "", true, 10000); QueryResult<ColumnSlice<String,String>> result = query.execute(); Iterator<HColumn<String, String>> iter = result.get().getColumns().iterator(); while (iter.hasNext()) { HColumn<String, String> current = iter.next(); values.put(current.getName(), current.getValue()); } return values; }

Page 35: Real-World Cassandra at ShareThis

Hector: DANGER

private static void dropAllKeyspaces(Cluster cluster) { for (KeyspaceDefinition ksDef: cluster.describeKeyspaces()) { if (!(ksDef.getName().equals("system") || ksDef.getName().equals("OpsCenter"))) { cluster.dropKeyspace(ksDef.getName(), true); System.out.println("Dropped keyspace: " + ksDef.getName()); } } } private static void dropKeyspace(Cluster cluster, String keyspace) { KeyspaceDefinition ksDef = createKeyspaceDefinition(keyspace, Hector.replication); cluster.dropKeyspace(ksDef.getName(), true); System.out.println("Dropped keyspace: " + ksDef.getName()); } private static void dropColumnFamily(Cluster cluster, String keyspace, String cf) { cluster.dropColumnFamily(keyspace, cf); System.out.println("Dropped Column Family: " + cf ); }

Page 36: Real-World Cassandra at ShareThis

Conclusions

● Data Modeling is Important

● Use Cassandra for write throughput

● Keep your ring even and your data slice-able

● Wrap your libraries and switch when you need to

Page 37: Real-World Cassandra at ShareThis

● We're hiring: http://www.sharethis.com/about/careers

● Work with REAL big data, billions of requests per day

● Work on products that millions people see and interact with on a daily basis

● Work with a real-time pipeline, machine learning, complex user models

● #1 fastest growing company San Francisco

● free lunches

● ... and of course work with a bunch fun, smart people and PhDs

Page 38: Real-World Cassandra at ShareThis

38

Thank You!