NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS...
-
Upload
ann-shelton -
Category
Documents
-
view
222 -
download
0
Transcript of NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS...
![Page 1: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/1.jpg)
NoSQL
Or Peles
![Page 2: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/2.jpg)
What is NoSQL
• A collection of various technologies meant to work around RDBMS limitations (mostly performance)
• Not much of a definition...
![Page 3: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/3.jpg)
RDBMS Limitations
• Hard to scale horizontally (for updates)– Distributed ACID requires 2 phase commit.
• Schema can be a bitch– Hard to change.– Data normalization can slow down queries.
![Page 4: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/4.jpg)
Web Scale
• Some numbers:– Youtube serves over 100MM videos a day.– Ebay adds over 10TB of storage every week.– Facebook holds over 80 Billion photos, and serves
hundreds of thousands of requests/second.
![Page 5: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/5.jpg)
Ideal System
• Available – can always read and write.• Consistent – Reads always pick up the latest
write.• Partition tolerant – The system can be split
across multiple machines and datacenters.
![Page 6: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/6.jpg)
Starbucks doesn’t use two phase commit
• A great example presented here.• Asynchronous execution• Correlation• Exception handling:– Write off– Retry– Compensation
• 2 phase commit would create a choke point.
![Page 7: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/7.jpg)
CAP Theorem
• CAP (Eric Brewer, 2000):Simply put, of the following 3 properties: • Consistency• Availability• Partition tolerance
Only two can hold at any system.
![Page 8: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/8.jpg)
CAP in practice
![Page 9: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/9.jpg)
• CATwo phase commits, works best at a single data
center. Scaling issues.• CP
Sharding. Data may become unavailable if a shard fails.
• APMay return inaccurate data. DNS is a prime example.
![Page 10: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/10.jpg)
Consistency Types
• Strict• Eventual– Causual– Read your writes– Session– Monotonic read– Monotonic write
![Page 11: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/11.jpg)
Concepts
• In memory vs. disk based.• Shared everything vs. shared nothing.• Master slave vs. server symmetry.• Elastic scalability.• MapReduce.
![Page 12: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/12.jpg)
Sharding
• Split data across machines (database instances).– Feature based sharding.– Key based sharding.– Lookup table.
![Page 13: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/13.jpg)
NoSQL Categories
• Key-Value stores• Document store• Tabular
![Page 14: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/14.jpg)
Lean & MeanThe Key-Value In-Memory DBs
• In memory DBs are simpler and faster than their on-disk counterparts.
• Key value stores offer a simple interface with no schema.
• Major limitation – data size is limited to RAM size.
• Often used as caches for on-disk DB systems.
![Page 15: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/15.jpg)
Open Source In-Memory DBs
• Memcached/MemchachedDB• Redis– Both are key-value stores that rely on hash
partitioning– Memcached is an LRU based cache.– Redis is more of a data structure server.– Both use a Shared-Nothing architecture
![Page 16: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/16.jpg)
Memcached
• Really a giant, distributed hash table.• Advantages:– Relatively simple– Practically no server to server talk.– Linear scalability
• Disadvantages:– Doesn’t understand data – no server side operations.
The key and value are always strings.– It’s really meant to only be a cache – no more, no less.– No recovery, limited elasticity.
![Page 17: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/17.jpg)
Redis
• Like Memcached, it’s a distributed hash in memory.
• Offers support for lists and sets, as well as strings.
• Offers limited server side operations.• Supports master-slave architecture and data
replicas for scalability and high availability. • Also supports a persistent mode that writes to
disk.
![Page 18: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/18.jpg)
Document Stores
• As the name implies, these databases store documents.
• Usually schema-free. The same database can store multiple documents.
• Allow indexing based on document content.• Prominent examples: CouchDB, MongoDB.
![Page 19: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/19.jpg)
Documents
• A document is just a collection of values, usually serialized in JSON.
• Many implementations offer nesting of documents
• Example:{ "username" : "bob", "address" : { "street" : "123 Main Street", "city" :
"Springfield", "state" : "NY" } }
![Page 20: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/20.jpg)
CouchDB
• Written in ERLANG.• Offers ACID guarantees based on multi-version
control.• Supports replication, but isn’t a real
distributed database.
![Page 21: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/21.jpg)
MongoDB
• Written in C++.• Atomic operations on single documents only.• Excellent scalability based on sharding.• Support for server side javascript and
MapReduce.
![Page 22: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/22.jpg)
Tabular stores
• The original: Google’s BigTable– Proprietary, not open source.
• The open source elephant alternative – Hadoop with HBase.
• A top level Apache Project.• Large number of users.• Contains a distributed file system, MapReduce, a
database server (Hbase), and more.• Rack aware.
![Page 23: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/23.jpg)
Hadoop components
![Page 24: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/24.jpg)
Hadoop basic components
• At it’s core, Hadoop is a framework for running MapReduce operations on large data sets.
• The data sets are placed as text files on the distributed file system.
![Page 25: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/25.jpg)
Hadoop MapReduce Flow
![Page 26: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/26.jpg)
HBase
• A database engine built on top of Hadoop distributed file system.
• Scales up to Billions of rows with Millions of columns.
• Has a Java interface for queries.
![Page 27: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/27.jpg)
The Tradeoff – SQL vs. NoSQL
• RDBMS:– Mature.– Standard SQL (but not for DDL, extensions).– Robust tools.
• NoSQL:– Scale– Schemaless
![Page 28: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/28.jpg)
References
• Eventual Consistency (Werner Vogels, CTO, Amazon)• Starbucks doesn’t use two phase commit.• Hadoop the definitive guide (O’Reilly) • MongoDB the definitive guide (O’Reilly)• Many wiki pages
![Page 29: NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...](https://reader036.fdocuments.net/reader036/viewer/2022070415/5697c02b1a28abf838cd8772/html5/thumbnails/29.jpg)
Questions