Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3:...

18
Lecture 3: Document Databases – MongoDB – Aggregation, Map-Reduce, and Distributed Operation Databases (3): NoSQL & Deductive Databases Martin Homola, Ján Kľuka , Alexander Šimko, Jozef Šiška Department of Applied Informatics Faculty of Mathematics, Physics and Informatics Comenius University in Bratislava 7 Oct 2021

Transcript of Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3:...

Page 1: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Lecture 3: Document Databases – MongoDB –Aggregation, Map-Reduce, and Distributed Operation

Databases (3): NoSQL & Deductive Databases

Martin Homola, Ján Kľuka, Alexander Šimko, Jozef Šiška

Department of Applied InformaticsFaculty of Mathematics, Physics and Informatics

Comenius University in Bratislava

7 Oct 2021

Page 2: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation

Aggregation in MongoDB

I So far, we have only done CRUD operationsI MongoDB can also perform aggregation operations:

I Single-purpose: counting and distinct values collectionI Map-Reduce processingI Aggregation pipelines

I Data and work on aggregation operations can bedistributed over many nodes

I Allows processing of “big data” –data sets too large to fit into or be processed by one machine

Page 3: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Simple Aggregation

Simple Aggregation

I Counting:db.telefony.countDocuments( { "casti.cislo": { $gte: 190000 } } )

I Distinct values selection:db.telefony.distinct( "casti.predvolba",

{ "casti.cislo": { $gte: 190000 } } )

I countDocuments and distinct are actually computed using muchmore powerful aggregation pipelines

Page 4: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Map-Reduce

Map-Reduce Aggregation

I Map-Reduce is quite general mass data-processing frameworkI Collection is processed in 2 main stages:

1. A map JavaScript function takes each documentand emit()s zero or more key-value pairs.

2. Mongo groups emitted pairs by keys.3. All values for one key are sent to a reduce JS function

to produce a single value.

Page 5: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Map-Reduce

Map-Reduce Example

Page 6: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Map-Reduce

Reduce Requirements and Additional Tools

I The reduce function must (be):I Produce a value of the same type as all input valuesI Associative: reduce(k, [u, reduce(k, [v , w ])]) = reduce(k, [u, v , w ])I Commutative: reduce(k, [u, v ]) = reduce(k, [v , u])I Idempotent: reduce(k, [reduce(k, vs)]) = reduce(k, vs)

I MongoDB also allows:I Selection of documents by a query and a limitI Pre-sortingI Post-processing by a finalize JS functionI outputting the results as a new collection

Page 7: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Map-Reduce

mapReduce() Example – AverageLet’s compute for each regionthe average number of inhabitants of its towns:

db.sidla.mapReduce(function() { emit(this.kraj, {

po: this.pocet_obyvatelov,pm: 1 }) },

function(key, values) { return {po: values.reduce( (sum, val) => sum + val.po, 0 ),pm: values.reduce( (sum, val) => sum + val.pm, 0 ) } },

{query: { druh: /mesto/ },finalize: function(key, reducedVal) {

return reducedVal.po / reducedVal.pm},out: ’kraje’

})

Page 8: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Aggregation Pipeline

Aggregation Pipeline – aggregate()

I aggregate() processes a collection using a multi-stage pipelineI Similar to Unix shell pipelinesI Less flexible than Map-Reduce, but without slow JavaScriptI Declarative pipeline description

=⇒ can be optimized by the serverI Can also process change streams

Page 9: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Aggregation Pipeline

Aggregation Pipeline example

Page 10: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Aggregation Pipeline

aggregate() Example – Match, Group, and Sort

Let’s sort regions by the average number of inhabitants of their towns:

db.sidla.aggregate( [{ $match: { druh: /mesto/ } },{ $group: {

_id: "$kraj",ppo: { $avg: "$pocet_obyvatelov" }

} },{ $sort: { ppo: 1 } }

] )

Page 11: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Aggregation Pipeline

Joins with $lookup$lookup adds to every document the array all docs from anothercollection with a matching value of some field ' left outer joindb.sidla.aggregate( [

{ $match: { druh: /mesto/ } },{ $lookup: {

from: "kraje",localField: "kraj",foreignField: "nazov",as: "dataKraja"

} },{ $project: {

nazov: 1,podielPoctuObyvatelov: {

$divide: ["$pocet_obyvatelov",{ $arrayElemAt: [ "$dataKraja.pocet_obyvatelov", 0 ] }

]}

} }] )

Page 12: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Aggregation Aggregation Pipeline

Aggregation Pipeline Stages

Pipeline stages can:I select documents ($match), limit, skip;I group by a field or expression ($group, $bucket) and in each group

accumulate the values of other fields (sum, min, max, avg, collect toan array, . . . );

I join documents with documents from a collection ($lookup)also recursively ($graphLookup)

I explode an array field into one document for each element ($unwind);I project, set, unset, and compute fields;I sort documents;I merge with an existing collection or output to a new collection...

Page 13: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

Distributed Operation

I MongoDB can distribute big collections over multiple server nodesI A collection can be partitioned into shards –

distribution of storage and processing powerI Unshared collections and shards can replicated in replica sets –

safety, distribution of processing power for readsI Multi-document transactions available only in replica sets

(but a single “replica”, i.e., just the original, is enough)I Aggregation pipelines and map-reduce run distributed over shards

Page 14: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

Replication

Page 15: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

Replica Set Creation

$ mongod --replSet "rs0" --bind_ip localhost,mongodb0.example.net...

$ mongo...> rs.initiate( {

_id : "rs0",members: [

{ _id: 0, host: "mongodb0.example.net:27017" },{ _id: 1, host: "mongodb1.example.net:27017" },{ _id: 2, host: "mongodb2.example.net:27017" }

]})

Page 16: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

Sharding

Page 17: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

Enabling Sharding

Sharding config server, shard servers, and routers must be started

sh.addShard("mongo1.example.net:27017")sh.addShard("mongo2.example.net:27017")sh.enableSharding("prednaska")// Shard a collection by a keysh.shardCollection("prednaska.telefony",

{ "casti.predvolba": 1 } )// orsh.shardCollection("prednaska.telefony",

{ formatovane: "hashed" } )// orsh.shardCollection("prednaska.telefony",

{ "casti.predvolba": 1, "casti.cislo": 1 } )

Page 18: Lecture3: DocumentDatabases–MongoDB– Aggregation,Map … · 2020. 4. 18. · Lecture3: DocumentDatabases–MongoDB– Aggregation,Map-Reduce,andDistributedOperation Databases(3):

Distribution

References

I Eric Redmond and Jim R. Wilson: Seven Databases in Seven Weeks.The Pragmatic Bookshelf: Dallas, Texas; Raleigh, North Carolina,2012.

I The MongoDB 4.4 Manual. MongoDB, Inc, 2020. [online]https://docs.mongodb.com/manual/