How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ •...
Transcript of How$to$choose$a$NoSQL$solu1on · 2017-11-07 · Emergence$of$NoSQL$ •...
How to choose a NoSQL solu1on
Vikas Sangwan, Intuit
We will be talking about…
• Why use NoSQL • Some Concepts and Terminology (Consistency,
Availability, CAP Theorem) • ClassificaBons and Survey of NoSQL Databases • How to choose a NoSQL soluBon
Characteris1cs of Typical RDBMS
• Generally not designed to scale horizontally. o Scale well for reads o Hard to scale for writes
• Structured and normalized data • Well defined schemas
• Increased performance by
denormalizing • ApplicaBon Managed
Sharding • DistribuBng Memory
Systems
Emergence of NoSQL
• Dynamic growth of data generated by Users, applicaBons and sensors.
• Structured, Semi-‐Structured and Un-‐structured data with the rise of Web 2.0, social networking and mobile.
• DistribuBng compuBng availability with VirtualizaBon an cloud compuBng.
Scaling massive data set
Scaling massive data set
Characteris1cs for NoSQL Data Stores
• Designed to run on clusters • Mostly Open source • Support unstructured, semi-‐structured and structured data • Support Schema-‐less data modeling • Transparent Auto-‐sharding • DistribuBng Query Mechanisms (Map Reduce)
Riak
Redis
HBase
Neo4J
Cassendra
MangoDB
CouchDB
InfiniteGraph
Project Voldemort
Amazon Dynamo
Amazon SimpleDB
Google BigTable
BerkleyDB
Hypertable
Memcached
OrientDB
FlockDB
Tokyo Cabinet
RavenDB
KAI
Which NoSQL DB?
Master-‐Slave Replica1on
• Useful for scaling reads • Read Resilient • Lack of consistency
Peer to Peer Replica1on
• Resilient to node failure • Can easily add or
remove nodes • Inconsistency
Data Sharding
• Improved performance for write and read
• Data DistribuBon o Users or LocaBon based o Even Load on all nodes
• Not resilient to node failures
Combining Replica1on and Sharding
Sharding and Master-‐Slave ReplicaBon Sharding and Peer to Peer ReplicaBon
Consistency • Strong Consistency
o Highest form of consistency, all data changes are atomic and take effect immediately.
• Eventual Consistency o Eventually all updates propagate through the distributed systems and
all nodes will become consistent.
• Weak Consistency o No guarantee about all updates being propagate to all nodes and
same user may see data out of sync.
CAP Theorem
NoSQL Databases
• Key-‐Value Store o Redis, Riak
• Column Family o HBase, Cassendra
• Document o MongoDB, CouchDB
• Graph o Neo4J, OrientDB
Key-‐Value Store
Column Family
Graph Based
Document Oriented
NoSQL
Key-‐Value Data Store
• Examples: Redis, Riak, Memcached, Berkley DB, Amazon Dynamo, LinkedIn Voldemort
• Suitable for storing User Session details, preferences, profiles.
• Not suitable for highly correlated data, transacBons over mulBple operaBons, data based queries.
Key Value
Name Tom
Street Name Silver Crest
Street Number 15609
City San Diego
State California
Phone 858 111 1111
Redis
• Key Value Store in namespaces, primarily in memory
• Data Structure Server • Extensive commands • Very Fast, trade durability for
speed • Configurable durability and
replicaBon • Pub/Sub • Who is using – GitHub, CraigsList,
Engine Yard, Flicker, Yahoo, StackOverflow
String Integer List Set Hash Blobs
(Data Structures)
Key
Riak
• Key Value stored in buckets…primarily distributed store.
• Speaks web language • Map Reduce and Stored funcBons • Adhoc relaBon support through
links • Fault Tolerant • Support Quorum for Consistency • Who is using -‐ Comcast, Yammer,
Voxer, Boeing, BestBuy, Joyent, Kiip, DotCloud, Formspring
Value (blob, text, xml etc)
Key
Column Family
• Examples: Hbase, Cassendra, Amazon SimpleDB, Hypertable
• Suitable for Blogging Systems, Logs Management, Visit counters
• Not suitable for data in need to ACID transacBons, queries by combining data, when queries on data may change.
Key Value
Key1 {Column1Key1:ValueA,Column2Key1:ValueB}
Key2 {Column1Key2:ValueD,Column3Key2:ValueE}
Key3 {Column1Key3:ValueF,Column3Key3:ValueG}
Key4 {Column1Key4:ValueH,Column2Key4:ValueI}
Key5 {Column1Key5:ValueJ,Column3Key5:ValueK}
HBase
• Fault tolerant, consistent and scalable column family database.
• Built on Hadoop • In build support for versioning,
expiraBon and compression. • REST, Thrim, Avro and other
interfaces • Uses Regions for scalability. • Used by Facebook,
StumbleUpon, Ebay, Yahoo, Ning, Meetup etc.
Row Key
Column1:ValueX, Column2:ValueY
Column2:ValueX, Column3:ValueY
Column4:ValueX, Column5:ValueY, Column6:ValueZ
Column Families
Document Oriented
• Examples: MongoDB, CouchDB, RavenDB
• Suitable for Blogging Systems, Logs Management, Content Management System
• Not suitable for data in need to ACID transacBons across systems, queries by combining aggregate data
Key Value
Key1 JSON/XML Document1
Key2 JSON/XML Document2
Key3 JSON/XML Document3
Key4 JSON/XML Document4
Key5 JSON/XML Document5
MongoDB
• Store documents as BSON • JSON, Java script and Wire
Protocol interfaces • Master-‐Slave replicaBon, Replica
Sets • Support auto-‐sharding for
scalability • Support Indexing and Map-‐Reduce • Data Center Aware • Adhoc queries support • Used by Bit.ly, FourSquare etc.
{Name: Tom, Address{
Street: Silver Crest, Street Number: 13568
City: San Diego{, Phone: 858 111 1111. Company: Foo Inc}
Key
Graph DB
• Neo4J, OrientDB, InfiniB Graph
• Good for Social connected data, recommendaBon and locaBon based services.
• Not suitable for analyBcal soluBons, disjoint data sets
Key Value
Key1 Value1
Key Value
Key1 Value1
Key Value
Key1 Value1
Key Value
Key1 Value1
Key Value
Key1 Value1
Neo4J
• Nodes, relaBons -‐ properBes • Consistent through
transacBons • Support ACID transacBons • Rest Interface • Highly Available through
replicated slaves • Used by Adobe, Cisco,
Fuseworks, Open Tree of life.
How to I choose a NoSQL DB
• Feature based comparison • Performance Benchmark • Proof of concept
Feature based comparison
• Access and programming interface • Queries support • Consistency • Versioning • ReplicaBon Models • Scaling • MulB data center awareness • Tools and uBliBes • Community Support
Performance Benchmark
• Yahoo cloud service benchmark (YCSB) o 50/50 Read and Update o 95/5 Read and Update o Scalability Test
• Basho Bench • Individual Data Store’s Benchmarks
Proof of Concept
• Test the scenarios which are highly important to you. • Choose the most common and some unique, bizarre use
cases. • Baseline the assumpBons and volume definiBons • Build a representaBve test, as possible close to realisBc data
Polyglot Persistence
• Different database for different data storage needs • Access data through a data access layer hid the storage
details from business services • Add complexiBes for programming, deployment and
operaBon support.
Ques1ons