NoSQL for great good [hanoi.rb talk]
-
Upload
huy-do -
Category
Engineering
-
view
436 -
download
3
Transcript of NoSQL for great good [hanoi.rb talk]
![Page 1: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/1.jpg)
NoSQL database for great good
@huydx hanoi.rb
![Page 2: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/2.jpg)
$> whoami
huy Software developer Tokyo base ruby/scala user
nickname: @huydx
![Page 3: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/3.jpg)
Disclaimer
This talk is not going to go detail about any NoSQL
I'm going to talk about: when we need to choose a nosql db, how should we think?
![Page 4: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/4.jpg)
What people often think about NoSQL?
![Page 5: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/5.jpg)
• As cache
• As a magic which can make "any" web system faster
![Page 6: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/6.jpg)
Your system is slow
Just use NoSQLRDBMS is shit
![Page 7: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/7.jpg)
RDBMS is not slow NoSQL is not the cure for everything
![Page 8: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/8.jpg)
RDBMS is awesome
• Can scan 7m rows / sec with index! • Can handle very big data (facebook) • Has very flexible query language (SQL) • Has some awesome analytics feature
(window function-postgresql)
• Has ACID properties
https://www.percona.com/blog/2008/04/09/how-fast-can-mysql-process-data/
![Page 9: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/9.jpg)
Why ACID is important• Atomicity : protect transaction (all or nothing) • Consistency: protect data correctness • Isolation : protect data from concurrency • Durability : protect data from failure
ACID makes a database something you can
TRUST
![Page 10: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/10.jpg)
So• RMDBS is way better than you thought
• You should learn to do RMDBS the right way
• How to make the best performance from RMDBS (index tuning, query optimize, data modeling, master-slave replication, monitoring, shard-ing the right way....)
https://www.percona.com/blog/2014/03/27/a-conversation-with-5-facebook-mysql-gurus/
![Page 11: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/11.jpg)
But this talk is about NoSQL!!!!!!
![Page 12: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/12.jpg)
Where RMDBS is not fit for?• Nature of data: When data is not row-column
style (multidimensional data) • How your data scale : Data shard-ing (You
don't want to shard-ing) • ACID is great, but it degrade performance • Single point of failure : single master • Data usage : when you need realtime, fast data
https://www.percona.com/blog/2009/08/06/why-you-dont-want-to-shard/
![Page 13: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/13.jpg)
Let's talk about NoSQL
![Page 14: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/14.jpg)
We have plenty of playersBut when, and how to use them?
![Page 15: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/15.jpg)
We have a decent answer: It depends!!
![Page 16: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/16.jpg)
What do you want to store?
• Geo-partial data? • Users important data? (password, paying
information..) • Cache data? • Analytics realtime data (write/read intensive)
![Page 17: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/17.jpg)
Where do you want to store?
• On memory? • On disk?
• On Slow Disk (HDD) • On Fast Disk (SSD)
![Page 18: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/18.jpg)
How big is your data
• Able to fit into memory?
• Able to fit into single machine?
• Not able to fit into hundreds of machine?
![Page 19: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/19.jpg)
It's there any factor to category NoSQL database
![Page 20: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/20.jpg)
![Page 21: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/21.jpg)
Data Model
Query Model
+
![Page 22: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/22.jpg)
NoSQL categorized by how data model
Documentpair each key with a complex data structure known as a document (JSON, BSON).
MongoDB, CouchDB, RethinkDB
Column Family
One row key pair with many column (rows in RDBMS) (easy for block partition)
Cassandra, HBase, Hypertable
Graph Store data as nodes + link between nodes
Neo4j, FlockDB
KVSJust a key + a value (a value can be complex, but will not be able to as wide as column family)
Riak, Memcached, Redis, CouchBase
![Page 23: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/23.jpg)
What about merit / demerit
of each data model?
![Page 24: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/24.jpg)
Data model affect how we query data
User always want query method to be as flexible as possible
But sometimes, we have to face the trade-off between
flexibility and scalability
![Page 25: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/25.jpg)
• Document : query can be very flexible because document is examinable (mongodb has very rich query language). Data model can change very flexible
• Column Family : just a key value with a very wide fields, which make it very fast to look up a bunch of values
• Graph : for very special cases when you need to store and query relationship (followers in twitters)
• KVS : when you really need high performance, and just need to look up for simple value
![Page 26: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/26.jpg)
So it really depends, right?
![Page 27: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/27.jpg)
Data model for NoSQL is hard!
So be careful with your selection
![Page 28: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/28.jpg)
Sometimes the borderline of data modeling is blurred
We need other factor to consider
![Page 29: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/29.jpg)
Scalability
![Page 30: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/30.jpg)
First we need to know about CAP theorem
![Page 31: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/31.jpg)
http://webpages.cs.luc.edu/~pld/353/gilbert_lynch_brewer_proof.pdf
![Page 32: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/32.jpg)
We can only have two of them!!!!!!!!
NO MORE!!!!
![Page 33: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/33.jpg)
http://blog.flux7.com/blogs/nosql/cap-theorem-why-does-it-matter
![Page 34: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/34.jpg)
Just ask your self: what do you care about
• You need very fast write and read, data can be a little bit stale -> A + P
• You need transaction, and every one must see the same view, but sometimes something must be lock -> C + P
• You don't need a distributed system which is false-tolerance with network problem -> C + A
![Page 35: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/35.jpg)
So we have two options to think about, what's more?
![Page 36: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/36.jpg)
Operation
![Page 37: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/37.jpg)
Programmer may not care but
Infrastructure engineer care
![Page 38: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/38.jpg)
What factors affect operation?
• What is your database distributed model, how they shard, and replicate (master-slave or p2p)
• Do your database run on JVM? (operating a JVM system is waaaayyy bothersome than a system written in C or C++)
• Do your database has single point of failure? • Do your database optimized for SSD only?
![Page 39: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/39.jpg)
Operation is hard
When you fail at operation, you lost your data
So choose what you know very well about
![Page 40: NoSQL for great good [hanoi.rb talk]](https://reader031.fdocuments.net/reader031/viewer/2022032300/55cda4ffbb61ebfe3b8b4713/html5/thumbnails/40.jpg)
Conclusion
• It's really depends!!!!
• Ask your self: Is it really needed to use nosql?
• First know your requirement, know your data
• Investigate carefully before choosing any solution (when you fail to choose, you lost your data)