OSCON TALK: Becoming Friends with Cassandra and Spark

151
BECOMING FRIENDS WITH CASSANDRA & SPARK DANI TRAPHAGEN & JON HADDAD YOU SPARK C*

Transcript of OSCON TALK: Becoming Friends with Cassandra and Spark

Page 1: OSCON TALK: Becoming Friends with Cassandra and Spark

BECOMING FRIENDS WITH CASSANDRA & SPARK

DANI TRAPHAGEN & JON HADDAD

YOU

SPARK

C*

Page 2: OSCON TALK: Becoming Friends with Cassandra and Spark

BECOMING FRIENDS WITH CASSANDRA & SPARK

DANI TRAPHAGEN & JON HADDAD

YOU

SPARKC*

Page 3: OSCON TALK: Becoming Friends with Cassandra and Spark

HOUSEKEEPING

Page 4: OSCON TALK: Becoming Friends with Cassandra and Spark

RAISE YOUR HAND IF YOU DON’T HAVE THE VM OSCON2016.ZIP

Page 5: OSCON TALK: Becoming Friends with Cassandra and Spark

1.copy the vm files to a place of your choosing

2.open virtual ovf

VM INSTRUCTIONS

Page 6: OSCON TALK: Becoming Friends with Cassandra and Spark

3.import the .ovf as prompted

Page 7: OSCON TALK: Becoming Friends with Cassandra and Spark

3.open the packer ovf in VirtualBox

Page 8: OSCON TALK: Becoming Friends with Cassandra and Spark

4.check out the vm

Page 9: OSCON TALK: Becoming Friends with Cassandra and Spark

LET’S GET STARTED

Page 10: OSCON TALK: Becoming Friends with Cassandra and Spark

WHAT ARE WE GOING TO COVER?1. CASSANDRA ARCHITECTURE,

CQL, DATA MODELING 2. SPARK DATAFRAMES

Page 11: OSCON TALK: Becoming Friends with Cassandra and Spark

RDBMS & YOU

Page 12: OSCON TALK: Becoming Friends with Cassandra and Spark

SQLITE, PYTHON SCRIPTS, LOG FILES

SUCH AS?

SMALL DATA

Page 13: OSCON TALK: Becoming Friends with Cassandra and Spark

MOST WEB SITES

RDBMS

MEDIUM DATA

Page 14: OSCON TALK: Becoming Friends with Cassandra and Spark

CAN RDBMS WORK FOR BIG DATA?

YOU BIG DATA

Page 15: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALE

Page 16: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALESTARTING

MY BUSINESS

YAY!

Page 17: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALESTARTING

MY BUSINESS

YAY!

Page 18: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALESTARTING

MY BUSINESS

YAY!

OH, WHOA, THINGS ARE KICKING UP

Page 19: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALESTARTING

MY BUSINESS

YAY!

OH, WHOA, THINGS ARE KICKING UP

Page 20: OSCON TALK: Becoming Friends with Cassandra and Spark

VERTICAL SCALESTARTING

MY BUSINESS

YAY!

OH, WHOA, THINGS ARE KICKING UP

Page 21: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIE

Page 22: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIEATOMICITY

Page 23: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIEATOMICITYCONSISTENCY

Page 24: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIEATOMICITYCONSISTENCYISOLATION

Page 25: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIEATOMICITYCONSISTENCYISOLATIONDURABILITY

Page 26: OSCON TALK: Becoming Friends with Cassandra and Spark

ACID IS A LIEATOMICITYCONSISTENCYISOLATIONDURABILITY

Page 27: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

Page 28: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENT

Page 29: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENT

Page 30: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER

Page 31: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

Page 32: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

Page 33: OSCON TALK: Becoming Friends with Cassandra and Spark

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

REPLICATION LAG

Page 34: OSCON TALK: Becoming Friends with Cassandra and Spark

CONSISTENT?

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

REPLICATION LAG

Page 35: OSCON TALK: Becoming Friends with Cassandra and Spark

CONSISTENT?

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

REPLICATION LAG

IDK?

Page 36: OSCON TALK: Becoming Friends with Cassandra and Spark

CONSISTENT?

ASYNC REPLICATION != CONSISTENCY

CLIENTMASTER SLAVE

REPLICATION LAG

LOL NO! IDK?

Page 37: OSCON TALK: Becoming Friends with Cassandra and Spark

THIRD NORMAL FORM DOESN’T SCALE

▸ UNPREDICTABLE

▸ DATA > MEMORY?

▸ DISK SEEKS ALL DAY

▸ USERS = ANGRY

Page 38: OSCON TALK: Becoming Friends with Cassandra and Spark

THIRD NORMAL FORM DOESN’T SCALE

AWFUL▸ UNPREDICTABLE

▸ DATA > MEMORY?

▸ DISK SEEKS ALL DAY

▸ USERS = ANGRY

Page 39: OSCON TALK: Becoming Friends with Cassandra and Spark

SHARDING

Page 40: OSCON TALK: Becoming Friends with Cassandra and Spark

SHARDING

CLIE

NT

Page 41: OSCON TALK: Becoming Friends with Cassandra and Spark

SHARDING

CLIE

NT

Page 42: OSCON TALK: Becoming Friends with Cassandra and Spark

SHARDING

CLIE

NTNIGHTMARE

Page 43: OSCON TALK: Becoming Friends with Cassandra and Spark

AVAILABILITY?

Page 44: OSCON TALK: Becoming Friends with Cassandra and Spark

AVAILABILITY?NOT WITH

THESE KNUCKLEHEADS

Page 45: OSCON TALK: Becoming Friends with Cassandra and Spark

CONCLUSION: SCALING IS HARD

Page 46: OSCON TALK: Becoming Friends with Cassandra and Spark

FRIEND #1: CASSANDRA

Page 47: OSCON TALK: Becoming Friends with Cassandra and Spark

FRIEND #1: CASSANDRA

Page 48: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

Page 49: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

PEER TO PEER

▸ With Cassandra there is no Master Slave Hierarchy

▸ Every node is the captain of it’s own ship

▸ Processes within Cassandra make this possible

▸ Replication

▸ Consistency Level

NODE1

NODE2

NODE3

NODE4

Page 50: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

PEER TO PEER

▸ With Cassandra there is no Master Slave Hierarchy

▸ Every node is the captain of it’s own ship

▸ Processes within Cassandra make this possible

▸ Replication

▸ Consistency Level

NODE1

NODE2

NODE3

NODE4

Page 51: OSCON TALK: Becoming Friends with Cassandra and Spark

WHAT DOES THIS GET US?

Page 52: OSCON TALK: Becoming Friends with Cassandra and Spark

WHAT DOES THIS GET US?

LINEAR SCALABILITY

Page 53: OSCON TALK: Becoming Friends with Cassandra and Spark

WHAT DOES THIS GET US?

LINEAR SCALABILITY

HIGH AVAILABILITY

Page 54: OSCON TALK: Becoming Friends with Cassandra and Spark

TOPOLOGY

Page 55: OSCON TALK: Becoming Friends with Cassandra and Spark

CLIENT

TOPOLOGY

Page 56: OSCON TALK: Becoming Friends with Cassandra and Spark

CLIENT

TOPOLOGY

OPERATION

Page 57: OSCON TALK: Becoming Friends with Cassandra and Spark

CLIENT

TOPOLOGY

OPERATION

Page 58: OSCON TALK: Becoming Friends with Cassandra and Spark

CLIENT

TOPOLOGY

OPERATION

Page 59: OSCON TALK: Becoming Friends with Cassandra and Spark

CLIENT

TOPOLOGY

OPERATION

Page 60: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ Replication factor is the number of replicas/puppies

ARCHITECTURE

REPLICATION IS HOW CASSANDRA DISTRIBUTES DATA

NODE1

NODE2

Page 61: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ Replication factor is the number of replicas/puppies

ARCHITECTURE

REPLICATION IS HOW CASSANDRA DISTRIBUTES DATA

NODE1

NODE2

Page 62: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ Replication factor is the number of replicas/puppies

ARCHITECTURE

REPLICATION IS HOW CASSANDRA DISTRIBUTES DATA

NODE1

NODE2

Page 63: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ The coordinator talks to the client, sending an ack for the write

ARCHITECTURE

HOW DO WE ACKNOWLEDGE REPLICATION?

NODE1

NODE2

COORDINATOR

Page 64: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ The coordinator talks to the client, sending an ack for the write

ARCHITECTURE

HOW DO WE ACKNOWLEDGE REPLICATION?

NODE1

NODE2

COORDINATOR

Page 65: OSCON TALK: Becoming Friends with Cassandra and Spark

NODE3

NODE4

▸ The coordinator talks to the client, sending an ack for the write

ARCHITECTURE

HOW DO WE ACKNOWLEDGE REPLICATION?

NODE1

NODE2

COORDINATOR

ack

Page 66: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

TUNABLE CONSISTENCY LEVELS

NODE1

NODE2

NODE3

NODE4

▸ One

▸ Quorum

▸ All

Page 67: OSCON TALK: Becoming Friends with Cassandra and Spark

ONE

ARCHITECTURE

NODE1

NODE2

NODE3

NODE4

▸ One replica acks adorable puppy data

Page 68: OSCON TALK: Becoming Friends with Cassandra and Spark

ONE

ARCHITECTURE

NODE1

NODE2

NODE3

NODE4

▸ One replica acks adorable puppy data

Page 69: OSCON TALK: Becoming Friends with Cassandra and Spark

▸ All replicas ack adorable puppy data

NODE3

NODE4

ARCHITECTURE

ALL

NODE1

NODE2

Page 70: OSCON TALK: Becoming Friends with Cassandra and Spark

▸ All replicas ack adorable puppy data

NODE3

NODE4

ARCHITECTURE

ALL

NODE1

NODE2

Page 71: OSCON TALK: Becoming Friends with Cassandra and Spark

▸ All replicas ack adorable puppy data

NODE3

NODE4

ARCHITECTURE

ALL

NODE1

NODE2

Page 72: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

QUORUM

NODE1

NODE2

NODE3

▸ Quorum = (sum_of_replication_factors / 2) + 1

▸ How many nodes get puppies if our replication factor is 3, & we want quorum?

NODE4

Page 73: OSCON TALK: Becoming Friends with Cassandra and Spark

ARCHITECTURE

QUORUM

NODE1

NODE2

NODE3

▸ Quorum = (sum_of_replication_factors / 2) + 1

▸ How many nodes get puppies if our replication factor is 3, & we want quorum?

NODE4

Page 74: OSCON TALK: Becoming Friends with Cassandra and Spark

MULTI-DC PARAMETERS▸Quorum vs. Local_Quorum

▸One vs. Local_One

US-EAST US-WEST

Page 75: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITIONER

CONSISTENT HASHINGJust how is data actually distributed around the cluster?

Page 76: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITIONER

CONSISTENT HASHINGJust how is data actually distributed around the cluster?

Page 77: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITIONER

CONSISTENT HASHINGJust how is data actually distributed around the cluster?

Page 78: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITIONER

CONSISTENT HASHINGJust how is data actually distributed around the cluster?

Page 79: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITIONER

CONSISTENT HASHINGJust how is data actually distributed around the cluster?

Page 80: OSCON TALK: Becoming Friends with Cassandra and Spark

CASSANDRA DATA MODELING SOUNDS HARD

Page 81: OSCON TALK: Becoming Friends with Cassandra and Spark

CASSANDRA DATA MODELING SOUNDS HARDNOT REALLY

Page 82: OSCON TALK: Becoming Friends with Cassandra and Spark

GAIN QUERY POWERSWITH CQL

Page 83: OSCON TALK: Becoming Friends with Cassandra and Spark

GAIN QUERY POWERSWITH CQL

Page 84: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA STRUCTURES IN CASSANDRA

Page 85: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

DATA STRUCTURES IN CASSANDRA

Page 86: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

DATA STRUCTURES IN CASSANDRA

TABLE

Page 87: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

DATA STRUCTURES IN CASSANDRA

ROWS TABLE

Page 88: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

DATA STRUCTURES IN CASSANDRA

ROWS

TABLE

Page 89: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

PARTITIONS

DATA STRUCTURES IN CASSANDRA

ROWS

TABLE

Page 90: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

PARTITIONS

DATA STRUCTURES IN CASSANDRA

ROWS

TABLE

Page 91: OSCON TALK: Becoming Friends with Cassandra and Spark

KEYSPACE

PARTITIONS

DATA STRUCTURES IN CASSANDRA

ROWS

TABLE

Page 92: OSCON TALK: Becoming Friends with Cassandra and Spark

PRIMARY KEY = PARTITION KEY + CLUSTERING COLUMNS

Page 93: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITION KEY

Page 94: OSCON TALK: Becoming Friends with Cassandra and Spark

PARTITION KEYTHIS IS HOW YOU RETRIEVE A PARTITION

Page 95: OSCON TALK: Becoming Friends with Cassandra and Spark

CLUSTERING COLUMNS

Page 96: OSCON TALK: Becoming Friends with Cassandra and Spark

CLUSTERING COLUMNSTHIS IS HOW YOU GET SORTING, ORDER AND UNIQUE IDENTIFICATION

Page 97: OSCON TALK: Becoming Friends with Cassandra and Spark

WHY ARE CLUSTERING COLUMNS SO COOL?

Page 98: OSCON TALK: Becoming Friends with Cassandra and Spark

HOW DO I USE CQL?

Page 99: OSCON TALK: Becoming Friends with Cassandra and Spark

CQLSH

HOW DO I USE CQL?

Page 100: OSCON TALK: Becoming Friends with Cassandra and Spark

SOME EXAMPLES FROM A MOVIE DB

Page 101: OSCON TALK: Becoming Friends with Cassandra and Spark

CREATE A KEYSPACECREATE KEYSPACE movielens_small WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'};

Page 102: OSCON TALK: Becoming Friends with Cassandra and Spark

CREATE A TABLECREATE TABLE movies ( id uuid PRIMARY KEY, avg_rating float, genres set<text>, name text, release_date date, url text, video_release_date date)

PRIMARY KEY IN WHITE

Page 103: OSCON TALK: Becoming Friends with Cassandra and Spark

CREATE A TABLECREATE TABLE ratings_by_movie ( movie_id uuid, user_id uuid, rating int, ts int, PRIMARY KEY (movie_id, user_id))

PRIMARY KEY IN WHITE

Page 104: OSCON TALK: Becoming Friends with Cassandra and Spark

INSERT STATEMENT EXAMPLEinsert into movies (id, name, genres) values (976de5da-93ae-4bf0-b127-d19eea1c8ea4, 'My Awesome Movie (2016)', {'Comedy'});

Page 105: OSCON TALK: Becoming Friends with Cassandra and Spark

THIS ALL LOOKS TOO FAMILIAR, DOESN’T IT?

Page 106: OSCON TALK: Becoming Friends with Cassandra and Spark

BUT REMEMBER…

Page 107: OSCON TALK: Becoming Friends with Cassandra and Spark

THIRD NORMAL FORM DOESN’T SCALE

▸ UNPREDICTABLE

▸ DATA > MEMORY?

▸ DISK SEEKS ALL DAY

▸ USERS = ANGRY

Page 108: OSCON TALK: Becoming Friends with Cassandra and Spark

THIRD NORMAL FORM DOESN’T SCALE

AWFUL▸ UNPREDICTABLE

▸ DATA > MEMORY?

▸ DISK SEEKS ALL DAY

▸ USERS = ANGRY

Page 109: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA MODELING PRO TIPS

Page 110: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA MODELING PRO TIPS▸no joins

Page 111: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA MODELING PRO TIPS▸no joins

▸query driven methodology, instead

Page 112: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA MODELING PRO TIPS▸no joins

▸query driven methodology, instead

▸denormalize

Page 113: OSCON TALK: Becoming Friends with Cassandra and Spark

DATA MODELING PRO TIPS▸no joins

▸query driven methodology, instead

▸denormalize

▸disks are cheap

Page 114: OSCON TALK: Becoming Friends with Cassandra and Spark

JON & DANI, I’M STARTING TO GET COLD FEET!

Page 115: OSCON TALK: Becoming Friends with Cassandra and Spark

I MISS THE WARM EMBRACE OF RDBMS

I DIDN’T HAVE TO DENORMALIZE

BACK THEN

Page 116: OSCON TALK: Becoming Friends with Cassandra and Spark

CHILL OUT

Page 117: OSCON TALK: Becoming Friends with Cassandra and Spark

& PREPARE TO BE WOWED

Page 118: OSCON TALK: Becoming Friends with Cassandra and Spark

& PREPARE TO BE WOWED

Page 119: OSCON TALK: Becoming Friends with Cassandra and Spark

CDM

Page 120: OSCON TALK: Becoming Friends with Cassandra and Spark

ROLL UP YOUR SLEEVES

TYPE STUFF

Page 121: OSCON TALK: Becoming Friends with Cassandra and Spark

REMEMBER THAT VM?

Page 122: OSCON TALK: Becoming Friends with Cassandra and Spark

1.use movielens_small;2.desc tables;3.desc movies;4.select * from movies limit 10;

TRY IT OUT

Page 123: OSCON TALK: Becoming Friends with Cassandra and Spark

YOU SHOULD GET…

Page 124: OSCON TALK: Becoming Friends with Cassandra and Spark

YOUR 10 MOVIES

Page 125: OSCON TALK: Becoming Friends with Cassandra and Spark

ADDING ON5. select * id, name from movies limit 100;6. PICK YOUR FAVORITE MOVIE

BONUS: CAN YOU FIND THE AVERAGE

RATINGS FOR YOUR FAVORITE MOVIE?

Page 126: OSCON TALK: Becoming Friends with Cassandra and Spark

MOVIE ID LIST

Page 127: OSCON TALK: Becoming Friends with Cassandra and Spark

SELECT A MOVIE

Page 128: OSCON TALK: Becoming Friends with Cassandra and Spark

TOP GUN EXAMPLE

Page 129: OSCON TALK: Becoming Friends with Cassandra and Spark

TOP GUN EXAMPLE

Page 130: OSCON TALK: Becoming Friends with Cassandra and Spark

FIFTH ELEMENT BECAUSE OBVIOUSLY

Page 131: OSCON TALK: Becoming Friends with Cassandra and Spark

FIFTH ELEMENT BECAUSE OBVIOUSLY

Page 132: OSCON TALK: Becoming Friends with Cassandra and Spark

NICE WORK YOU!

Page 133: OSCON TALK: Becoming Friends with Cassandra and Spark

FRIEND #2: SPARK

Page 134: OSCON TALK: Becoming Friends with Cassandra and Spark

FRIEND #2: SPARK

Page 135: OSCON TALK: Becoming Friends with Cassandra and Spark

BATCH PROCESSING

LOTS OF DATA?

Page 136: OSCON TALK: Becoming Friends with Cassandra and Spark

STREAMING & REAL TIME AGGREGATION

Page 137: OSCON TALK: Becoming Friends with Cassandra and Spark

MACHINE LEARNING FOR THE INEVITABLE END OF TIMES

Page 138: OSCON TALK: Becoming Friends with Cassandra and Spark

GRAPH ANALYTICS

Page 139: OSCON TALK: Becoming Friends with Cassandra and Spark

2 WAYS OF WORKING

Page 140: OSCON TALK: Becoming Friends with Cassandra and Spark

1. RDDBASED ON FUNCTIONAL PROGRAMMING

Page 141: OSCON TALK: Becoming Friends with Cassandra and Spark

blah.map( lambda x : x * 2 )

Page 142: OSCON TALK: Becoming Friends with Cassandra and Spark

COOL BUT NOT EASY

Page 143: OSCON TALK: Becoming Friends with Cassandra and Spark

COOL BUT NOT EASY

Page 144: OSCON TALK: Becoming Friends with Cassandra and Spark

2. DATAFRAMES

Page 145: OSCON TALK: Becoming Friends with Cassandra and Spark

PRETTY EASY

Page 147: OSCON TALK: Becoming Friends with Cassandra and Spark

TODAY WE TALK BATCH WITH DATAFRAMES AND PYTHON

Page 148: OSCON TALK: Becoming Friends with Cassandra and Spark

ROLL UP YOUR SLEEVESOPEN THE OSCON TUTORIAL ON YOUR DESKTOP

Page 149: OSCON TALK: Becoming Friends with Cassandra and Spark

FRIENDSHIP LEVELS

Page 150: OSCON TALK: Becoming Friends with Cassandra and Spark

OTHER RESOURCES TO LEARN:1. free courses -

www.academy.datatax.com 2. our blogs -

www.rustyrazorblade.com & www.dtrapezoid.com

3. our friend’s blog - https://lostechies.com/ryansvihla/

4. datastax blog - http://www.datastax.com/dev/blog

Page 151: OSCON TALK: Becoming Friends with Cassandra and Spark

THANK YOU, MAGICAL HUMANS

@DTRAPEZOID @RUSTYRAZORBLADE