DMDW Extra Lesson - NoSql and MongoDB

49
STAATLICH ANERKANNTE FACHHOCHSCHULE STUDIEREN UND DURCHSTARTEN. Author: Dip.-Inf. (FH) Johannes Hoppe Date: 06.05.2011

description

 

Transcript of DMDW Extra Lesson - NoSql and MongoDB

Page 1: DMDW  Extra Lesson - NoSql and MongoDB

STAATLICHANERKANNTEFACHHOCHSCHULE

STUDIERENUND DURCHSTARTEN.

Author: Dip.-Inf. (FH) Johannes HoppeDate: 06.05.2011

Page 2: DMDW  Extra Lesson - NoSql and MongoDB

STAATLICHANERKANNTEFACHHOCHSCHULE

NoSQL and MongoDB

Author: Dip.-Inf. (FH) Johannes HoppeDate: 06.05.2011

Page 3: DMDW  Extra Lesson - NoSql and MongoDB

Not only SQL

01

3

Page 4: DMDW  Extra Lesson - NoSql and MongoDB

Trends

4

2002 2004 2006 2008 2010 2012

Data

Page 5: DMDW  Extra Lesson - NoSql and MongoDB

Trends

Data› Facebook had 60k servers in 2010› Google had 450k servers in 2006 (speculated)› Microsoft: between 100k and 500k servers (since Azure)› Amazon: likely has a similar numbers, too (S3)

5

Facebook Server Footprint

Page 6: DMDW  Extra Lesson - NoSql and MongoDB

Trends

Trend 1: increasing data sizes

Trend 2: more connectedness (“web 2.0”)

Trend 3: more individualization (fever structure)

6

Page 7: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

7

Page 8: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Database paradigms

› Relational (RDBMS)› NoSQL

› Key-Value stores› Document databases› Wide column stores (BigTable and clones)› Graph databases

› Other

8

Page 9: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Some NoSQL use cases

1. Massive data volumes› Massively distributed architecture required to store the data› Google, Amazon, Yahoo, Facebook…

2. Extreme query workload› Impossible to efficiently do joins at that scale with an RDBMS

3. Schema evolution› Schema flexibility (migration) is not trivial at large scale› Schema changes can be gradually introduced with NoSQ

9

Page 10: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL - CAP theorem

Requirements for distributed systems:

› Consistency› Availability› Partition tolerance

10

Page 11: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL - CAP theorem

Consistency› The system is in a consistent state after an operation› All clients see the same data› Strong consistency (ACID)

vs. eventual consistency (BASE)

ACID: Atomicity, Consistency, Isolation and Durability

BASE: Basically Available, Soft state, Eventually consistent

11

Page 12: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL - CAP theorem

Availability› The system is “always on”, no downtime› Node failure tolerance

– all clients can find some available replica› Software/hardware upgrade tolerance

12

Page 13: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL - CAP theorem

Partition tolerance› The system continues to function even when › Split into disconnected subsets (by a network disruption)› Not only for reads, but writes as well!

13

Page 14: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CAP Theorem› E. Brewer, N. Lynch

› You can satisfy

at most 2 out of the 3 requirements

14

Page 15: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CAP Theorem CA› Single site clusters

(easier to ensure all nodes are always in contact)› When a partition occurs, the system blocks› e.g. usable for two-phase commits (2PC) which already

require/use blocks

15

Page 16: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CAP Theorem CA› Single site clusters

(easier to ensure all nodes are always in contact)› When a partition occurs, the system blocks› e.g. usable for two-phase commits (2PC) which already

require/use blocks

Obviously, any horizontal scaling strategy is based on data partitioning; therefore, designers are forced to decide between consistency and availability.

16

Page 17: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CAP Theorem CP› Some data may be inaccessible (availability sacrificed),

but the rest is still consistent/accurate› e.g. sharded database

17

Page 18: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CAP Theorem AP› System is still available under partitioning,

but some of the data returned my be inaccurate› Need some conflict resolution strategy› e.g. Master/Slave replication

18

Page 19: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

RDBMS

› Guaratnee ACID by CA (two-phase commits)

› SQL› Mature:

19

Page 20: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

NoSQL DBMS

› No relational tables› No fixed table schemas› No joins› No risk, no fun!

› CP and AP(and sometimes even AP and on top of CP MongoDB*)

* This is damn cool!

20

Page 21: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Key-value

› One key one value, very fast› Key: Hash (no duplicates)› Value: binary object („BLOB“)

(DB does not understand your content)

› Players: Amazon Dynamo, Memcached…

21

Page 22: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

22

customer_22

?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=%

GIVE ME A MEANING!

key value

Page 23: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Document databases

› Key-value store, too› Value is „understood“ by the DB› Querying the data is possible

(not just retrieving the key‘s content)

› Players: Amazon SimpleDB, CouchDB, MongoDB …

23

Page 24: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

24

customer_22

{ Type: “Customer”, Name: "Norbert“, Invoiced: 2222 }

key value

Page 25: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

25

customer_22

{ Type: "Customer", Name: "Norbert", Invoiced: 2222 Messages: [ { Title: "Hello", Text: "World" }, { Title: "Second", Text: "message" } ] }

key value / documents

Page 26: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

(Wide) column stores› Often referred as “BigTable clones”› Each key is associated with many attributes (columns)› NoSQL column stores are actually hybrid row/column

stores› Different from “pure” relational column stores!

› Players: Google BigTable, Cassandra (Facebook), HBase…

26

Page 27: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Won‘t be stored as: It will be stored as:

22;Norbert;22222 22;23;2423;Hans;50000 Norbert;Hans;Franz24;Franz;44000 22222;50000;44000

27

CustomerId Name Invoiced

22 Norbert 22222

23 Hans 50000

24 Franz 44000

Page 28: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Graph databases

› Multi-relational graphs› SPARQL query language (W3C Recommendation!)› Players: Neo4j, InfoGrid …

(note: graph DBs are special and somehow the “black sheep” in the NoSQL world –the following PROs/CONs don’t apply very well)

28

Page 29: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

PROs (& Promisses)

› Scheme-free / semi-structured data› Massive data stores› Scaling is easy› Very, very high availability› Often simpler to implement

(and OR Mappers aren’t required)

› „Web 2.0 ready“

29

Page 30: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

CONSs

› NoSQL implementations often „alpha“, no standards› Data consistency, no transactions,› Insufficient access control› SQL: strong for dynamic, cross-table queries (JOIN)› Relationships aren‘t enforced

(conventions over constrains – except for graph DBs (of course))

› Premature optimization: Scalability(Don’t build for scalability if you never need it!)

30

Page 31: DMDW  Extra Lesson - NoSql and MongoDB

MongoDB

02

31

Page 32: DMDW  Extra Lesson - NoSql and MongoDB

NoSQL

Lets rock!

MongoDB Quick Reference Cardshttp://www.10gen.com/reference

32

Page 33: DMDW  Extra Lesson - NoSql and MongoDB

Basic Deployment› Create the default data directory in c:\data\db› Start mongod.exe› Optionally: mongod.exe --dbpath c:\data\db --port 27017 --logpath c:\data\

mongodb.log

› Start the shell: mongo.exe

33

Page 34: DMDW  Extra Lesson - NoSql and MongoDB

Data Import› cd c:\dba-training-data\data› mongoimport -d twitter -c tweets twitter.json

› cd c:\dba-training-data\data\dump\training› mongorestore -d training -c scores scores.bson

› cd c:\dba-training-data\data\dump› mongorestore -d digg digg

34

Page 35: DMDW  Extra Lesson - NoSql and MongoDB

35

Page 36: DMDW  Extra Lesson - NoSql and MongoDB

MongoDB Documents

(in the shell)› use digg› db.stories.findOne();

36

Page 37: DMDW  Extra Lesson - NoSql and MongoDB

JSON BSONAll JSON documents are stored in a binary format called BSON. BSON supports a richer set of types than JSON.http://bsonspec.org

37

Page 38: DMDW  Extra Lesson - NoSql and MongoDB

CRUD – Create

(in the shell)› db.people.save({name: 'Smith', age: 30});

See how the save command works:› db.foo.save

38

Page 39: DMDW  Extra Lesson - NoSql and MongoDB

CRUD – CreateHow training.scores was created:

for(i=0; i<1000; i++) { ['quiz', 'essay', 'exam'].forEach(function(name) { var score = Math.floor(Math.random() * 50) + 50; db.scores.save({student: i, name: name, score: score}); }); } db.scores.count();

39

Page 40: DMDW  Extra Lesson - NoSql and MongoDB

CRUD – ReadQueries are specified using a document-style syntax!

› use training› db.scores.find({score: 50});› db.scores.find({score: {"$gte": 70}});› db.scores.find({score: {"$gte": 70}});

40

Cursor!

Page 41: DMDW  Extra Lesson - NoSql and MongoDB

Exercises

1. Find all scores less than 65.

2. Find the lowest quiz score. Find the highest quiz score.

3. Write a query to find all digg stories where the view count is greater than 1000.

4. Query for all digg stories whose media type is either 'news' or 'images' and where the topic name is 'Comedy’.(For extra practice, construct two queries using different sets of operators to do this. )

5. Find all digg stories where the topic name is 'Television' or the media type is 'videos'. Skip the first 5 results, and limit the result set to 10.

41

Page 42: DMDW  Extra Lesson - NoSql and MongoDB

CRUD – Update

› use digg; › db.people.update({name: 'Smith'}, {'$set': {interests: []}});› db.people.update({name: 'Smith'},

{'$push': {interests: ['chess']}});

42

Page 43: DMDW  Extra Lesson - NoSql and MongoDB

Exercises

1. Set the proper 'grade' attribute for all scores. For example, users with scores greater than 90 get an 'A.' Set the grade to ‘B’ for scores falling between 80 and 90.

2. You're being nice, so you decide to add 10 points to every score on every “final” exam whose score is lower than 60. How do you do this update?

43

Page 44: DMDW  Extra Lesson - NoSql and MongoDB

CRUD – Delete

› db.dropDatabase();› db.foo.drop();› db.foo.remove();

44

Page 45: DMDW  Extra Lesson - NoSql and MongoDB

45

“Map Reduce is the Uzi of aggregation tools. Everything described with count, distinct and group can be done with MapReduce, and more.”

Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide

Page 46: DMDW  Extra Lesson - NoSql and MongoDB

MapReduceTo use map-reduce, you first write a map function.

map = function() {emit(this.user.name, {diggs: this.diggs, posts: 0});

}

46

Page 47: DMDW  Extra Lesson - NoSql and MongoDB

MapReduceThe reduce functions then aggregation those docs by key.

reduce = function(key, values) { var diggs = 0; var posts = 0; values.forEach(function(doc) { diggs += doc.diggs; posts += 1; }); return {diggs: diggs, posts: posts};}

47

Page 48: DMDW  Extra Lesson - NoSql and MongoDB

MapReduceNow both are used to perform custom aggregation.

db.stories.mapReduce(map, reduce, {out: 'digg_users'});

48

Page 49: DMDW  Extra Lesson - NoSql and MongoDB

THANK YOUFOR YOUR ATTENTION

49