Introduction to NoSQL

69
NOSQL Yan Cui @theburningmonk

description

A run down on the available NoSQL options and practical examples of using Redis to solve real-world web use cases.

Transcript of Introduction to NoSQL

Page 1: Introduction to NoSQL

NOSQL

Yan Cui@theburningmonk

Page 2: Introduction to NoSQL

Server-side Developer @

Page 3: Introduction to NoSQL

iwi by numbers• 400k+ DAU

• ~100m requests/day

• 25k+ concurrent users

• 1500+ requests/s

• 7000+ cache opts/s

• 100+ commodity servers (EC2 small instance)

• 75ms average latency

Page 4: Introduction to NoSQL

Sign Posts

• Why NOSQL?

• Types of NOSQL DBs

• NOSQL In Practice

• Q&A

Page 5: Introduction to NoSQL

CURRENT TRENDS

A look at the…

Page 6: Introduction to NoSQL

2006 2007 2008 2009 2010 20110

400

800

1200

1600

2000Digital Universe

1.8 ZettaBytes!!

161 ExaBytes

Page 7: Introduction to NoSQL

Big Data

“…data sets whose size is beyond the ability of commonly used software tools to capture, manage and process within a tolerable elapsed time…”

Page 8: Introduction to NoSQL

Big DataUnit Symbol Bytes

Kilobyte KB 1024

Megabyte MB 1048576

Gigabyte GB 1073741824

Terabyte TB 1099511627776

Petabyte PB 1125899906842624

Exabyte EB 1152921504606846976

Zettabyte ZB 1180591620717411303424

Yottabyte YB 1208925819614629174706176

PAIN

-O-M

eter

Page 9: Introduction to NoSQL
Page 10: Introduction to NoSQL

Vertical ScalingServer Cost

PowerEdge T110 II (basic)8 GB, 3.1 Ghz Quad 4T $1,350

PowerEdge T110 II (basic)32 GB, 3.4 Ghz Quad 8T $12,103

PowerEdge C2100192 GB, 2 x 3 Ghz $19,960

IBM System x3850 X52048 GB, 8 x 2.4 Ghz $646,605

Blue Gene/P14 teraflops, 4096 CPUs $1,300,000

K Computer (fastest super computer)10 petaflops, 705,024 cores, 1,377 TB

$10,000,000 annual operating cost

Page 11: Introduction to NoSQL

Horizontal Scaling

• Incremental scaling

• Cost grows incrementally

• Easy to scale down

• Linear gains

Page 12: Introduction to NoSQL
Page 13: Introduction to NoSQL

Hardware Vendor

Page 14: Introduction to NoSQL
Page 15: Introduction to NoSQL

INTRODUCING NOSQLHere’s an alternative…

Page 16: Introduction to NoSQL

NOSQL is …

• No SQL

• Not Only SQL

• A movement away from relational model

• Consisted of 4 main types of DBs

Page 17: Introduction to NoSQL

NOSQL is …

• Hard

• A new dimension of trade-offs

• CAP theorem

Page 18: Introduction to NoSQL

CAP TheoremA

PC

Availability: Each client can always read and write data

Partition Tolerant: System works despite network partitions

Consistency: All clients have the same view of data

Page 19: Introduction to NoSQL

NOSQL DBs are …

• Specialized for particular use cases

• Non-relational

• Semi-structured

• Horizontally scalable (usually)

Page 20: Introduction to NoSQL

Motivations

• Horizontal Scalability

• Low Latency

• Cost

• Minimize Downtime

Page 21: Introduction to NoSQL

Motivations

Use the right tool for the right job!

Page 22: Introduction to NoSQL

RDBMS

• CAN scale horizontally (via sharding)

• Manual client side hashing

• Cross-server queries are difficult

• Loses ACIDcity

• Schema update = PAIN

Page 23: Introduction to NoSQL

TYPES OF NOSQL DBS

Page 24: Introduction to NoSQL

Types Of NOSQL DBs

• Key-Value Store

• Document Store

• Column Database

• Graph Database

Page 25: Introduction to NoSQL

Key-Value Store

morpheus

101110100110101001100110100100100010101011101010101010110000101000110011111010110000101000111110001100000

“key” “value”

Page 26: Introduction to NoSQL

Key-Value Store

• It’s a Hash

• Basic get/put/delete ops

• Crazy fast!

• Easy to scale horizontally

• Membase, Redis, ORACLE…

Page 27: Introduction to NoSQL

Document Store

morpheus

{ name : “Morpheus”, rank : “Captain”, occupation: “Total badass”}

“key” “document”

Page 28: Introduction to NoSQL

Document Store

• Document = self-contained piece of data

• Semi-structured data

• Querying

• MongoDB, RavenDB…

Page 29: Introduction to NoSQL

Column Database

Name Last Name Age Rank Occupation Version Language

Thomas Anderson 29

Morpheus Captain Total badass

Cypher Reagan

Agent Smith 1.0b

The Architect

C++

Page 30: Introduction to NoSQL

Column Database

• Data stored by column

• Semi-structured data

• Cassandra, HBase, …

Page 31: Introduction to NoSQL

Graph Database

1

2

7 3

5

9

name = “Thomas Anderson”age = 29

name = “Trinity”

age = 3 days

KNOWS

KNOWS KNOWS

name = “Morpheus”rank = “Captain”occupation = “Total badass”

disclosure = public

KNOW

S

name = “Cypher”last name = “Reagan”

KNOWS

disclosure = secretage = 6 months

name = “Agent Smith”version = 1.0blanguage = C++

name = “The Architect”

CODED_BY

Page 32: Introduction to NoSQL

Graph Database

• Nodes, properties, edges

• Based on graph theory

• Node adjacency instead of indices

• Neo4j, VertexDB, …

Page 33: Introduction to NoSQL

NOSQL IN PRACTICE Real-world use cases for NoSQL DBs...

Page 34: Introduction to NoSQL

Redis

• Remote dictionary server

• Key-Value store

• In-memory, persistent

• Data structures

Page 35: Introduction to NoSQL

Redis

Lists

Sets

Sorted Sets

Hashes

Page 36: Introduction to NoSQL

Redis

Page 37: Introduction to NoSQL

COUNTERSRedis in Practice #1

Page 38: Introduction to NoSQL

Counters

• Potentially massive numbers of ops

• Valuable data, but not mission critical

Page 39: Introduction to NoSQL

Counters

• Lots of row contention in SQL

• Requires lots of transactions

Page 40: Introduction to NoSQL

Counters

• Redis has atomic incr/decrINCR Increments value by 1INCRBY Increments value by given amountDECR Decrements value by 1DECRBY Decrements value by given amount

Page 41: Introduction to NoSQL

Image by Mike Rohde

Counters

Page 42: Introduction to NoSQL

RANDOM ITEMSRedis in Practice #2

Page 43: Introduction to NoSQL

Random Items• Give user a random article

• SQL implementation

– select count(*) from TABLE

– var n = random.Next(0, (count – 1))

– select * from TABLE where primary_key = n

– inefficient, complex

Page 44: Introduction to NoSQL

Random Items

• Redis has built-in randomize operationSRANDMEMBER Gets a random member from a set

Page 45: Introduction to NoSQL

Random Items

• About sets:

–0 to N unique elements

–Unordered

–Atomic add

Page 46: Introduction to NoSQL

Image by Mike Rohde

Random Items

Page 47: Introduction to NoSQL

PRESENCERedis in Practice #3

Page 48: Introduction to NoSQL

Presence

• Who’s online?

• Needs to be scalable

• Pseudo-real time

Page 49: Introduction to NoSQL

Presence

• Each user ‘checks-in’ once every 3 mins

AB

00:22am

CD

00:23am

E

00:24am

A

00:25am

?

00:26am

A, C, D & E are online at 00:26am

Page 50: Introduction to NoSQL

Presence

• Redis natively supports set operationsSADD Add item(s) to a setSREM Remove item(s) from a setSINTER Intersect multiple setsSUNION Union multiple setsSRANDMEMBER Gets a random member from a set... ...

Page 51: Introduction to NoSQL

Image by Mike Rohde

Presence

Page 52: Introduction to NoSQL

LEADERBOARDSRedis in Practice #4

Page 53: Introduction to NoSQL

Leaderboards

• Gamification

• Users ranked by some score

Page 54: Introduction to NoSQL

Leaderboards

• About sorted sets:

– Similar to a set

– Every member is associated with a score

– Elements are taken in order

Page 55: Introduction to NoSQL

Leaderboards

• Redis has ‘Sorted Sets’ZADD Add/update item(s) to a sorted setZRANK Get item’s rank in a sorted set (low -> high)ZREVRANK Get item’s rank in a sorted set (high -> low)ZRANGE Get range of items, by rank (low -> high)ZREVRANGE Get range of items, by rank (high -> low)... ...

Page 56: Introduction to NoSQL

Image by Mike Rohde

Leaderboards

Page 57: Introduction to NoSQL

QUEUESRedis in Practice #5

Page 58: Introduction to NoSQL

Queues

• Redis has push/pop support for lists

• Allows you to use list as queue/stack

LPOP Remove and get the 1st item in a listLPUSH Prepend item(s) to a listRPOP Remove and get the last item in a listRPUSH Append item(s) to a list

Page 59: Introduction to NoSQL

Queues

• Redis supports ‘blocking’ pop

• Message queues without polling!

BLPOP Remove and get the 1st item in a list, or block until one is available

BRPOP Remove and get the last item in a list, or block until one is available

Page 60: Introduction to NoSQL

Image by Mike Rohde

Queues

Page 61: Introduction to NoSQL

Redis

• Supports data structures

• No built-in clustering

• Master-slave replication

• Redis Cluster is on the way...

Page 62: Introduction to NoSQL

SUMMARIES

Before we go...

Page 63: Introduction to NoSQL

Considerations

• In memory?

• Disk-backed persistence?

• Managed? Database As A Service?

• Cluster support?

Page 64: Introduction to NoSQL

SQL or NoSQL?

• Wrong question

• What’s your problem?

– Transactions

–Amount of data

–Data structure

Page 65: Introduction to NoSQL

http://blog.nahurst.com/visual-guide-to-nosql-systems

Page 66: Introduction to NoSQL

Dynamo DB

• Fully managed

• Provisioned through-put

• Predictable cost & performance

• SSD-backed

• Auto-replicated

Page 67: Introduction to NoSQL

Google BigQuery

• Game changer for Analytics industry

• Analyze billions of rows in seconds

• SQL-like query syntax

• Prediction API

• NOT a database system

Page 68: Introduction to NoSQL

Scalability

• Success can come unexpectedly and

quickly

• Not just about the DB

Page 69: Introduction to NoSQL

Thank You!

@theburningmonk