De normalised london aggregation framework overview

21
Chris Harris Email : [email protected] Twitter : cj_harris5 DeNormalised London: Aggregation Framework Overview Wednesday, 21 March 12

description

 

Transcript of De normalised london aggregation framework overview

Page 1: De normalised london  aggregation framework overview

Chris Harris Email : [email protected]

Twitter : cj_harris5

DeNormalised London: Aggregation Framework Overview

Wednesday, 21 March 12

Page 2: De normalised london  aggregation framework overview

Terminology

RDBMS MongoDBTable CollectionRow(s) JSON DocumentIndex IndexJoin Embedding & LinkingPartition ShardPartition Key Shard Key

Wednesday, 21 March 12

Page 3: De normalised london  aggregation framework overview

Here is a “simple” SQL Modelmysql> select * from book;+----+----------------------------------------------------------+| id | title |+----+----------------------------------------------------------+| 1 | The Demon-Haunted World: Science as a Candle in the Dark || 2 | Cosmos || 3 | Programming in Scala |+----+----------------------------------------------------------+3 rows in set (0.00 sec)

mysql> select * from bookauthor;+---------+-----------+| book_id | author_id |+---------+-----------+| 1 | 1 || 2 | 1 || 3 | 2 || 3 | 3 || 3 | 4 |+---------+-----------+5 rows in set (0.00 sec)

mysql> select * from author;+----+-----------+------------+-------------+-------------+---------------+| id | last_name | first_name | middle_name | nationality | year_of_birth |+----+-----------+------------+-------------+-------------+---------------+| 1 | Sagan | Carl | Edward | NULL | 1934 || 2 | Odersky | Martin | NULL | DE | 1958 || 3 | Spoon | Lex | NULL | NULL | NULL || 4 | Venners | Bill | NULL | NULL | NULL |+----+-----------+------------+-------------+-------------+---------------+4 rows in set (0.00 sec)

Wednesday, 21 March 12

Page 4: De normalised london  aggregation framework overview

The Same Data in MongoDB

{ "_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"), "title" : "Programming in Scala", "author" : [ { "first_name" : "Martin", "last_name" : "Odersky", "nationality" : "DE", "year_of_birth" : 1958 }, { "first_name" : "Lex", "last_name" : "Spoon" }, { "first_name" : "Bill", "last_name" : "Venners" } ]}

Wednesday, 21 March 12

Page 5: De normalised london  aggregation framework overview

What problem are we solving?

• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc

• Map/Reduce is a big hammer• Simpler tasks should be easier

• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine

• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays

Wednesday, 21 March 12

Page 6: De normalised london  aggregation framework overview

How will we solve the problem?

• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation

• Return computed values• Framework: new operations added easily• C++ implementation

Wednesday, 21 March 12

Page 7: De normalised london  aggregation framework overview

Aggregation - Pipelines

• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed

through a pipeline to produce a result• ps -ef | grep -i mongod

Wednesday, 21 March 12

Page 8: De normalised london  aggregation framework overview

Example - twitter{ "_id" : ObjectId("4f47b268fb1c80e141e9888c"), "user" : { "friends_count" : 73, "location" : "Brazil", "screen_name" : "Bia_cunha1", "name" : "Beatriz Helena Cunha", "followers_count" : 102, }}

• Find the # of followers and # friends by location

Wednesday, 21 March 12

Page 9: De normalised london  aggregation framework overview

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Wednesday, 21 March 12

Page 10: De normalised london  aggregation framework overview

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Predicate

Wednesday, 21 March 12

Page 11: De normalised london  aggregation framework overview

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Predicate

Parts of the document you want to project

Wednesday, 21 March 12

Page 12: De normalised london  aggregation framework overview

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Predicate

Parts of the document you want to project

Function to apply to the

result set

Wednesday, 21 March 12

Page 13: De normalised london  aggregation framework overview

Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}

Wednesday, 21 March 12

Page 14: De normalised london  aggregation framework overview

DemoDemo files are at https://gist.github.com/

2036709

Wednesday, 21 March 12

Page 15: De normalised london  aggregation framework overview

Projections

• $project can reshape results• Include or exclude fields• Computed fields

• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual

documents

Wednesday, 21 March 12

Page 16: De normalised london  aggregation framework overview

Unwinding

• $unwind can “stream” arrays• Array values are doled out one at time in the

context of their surrounding documents• Makes it possible to filter out elements before

returning

Wednesday, 21 March 12

Page 17: De normalised london  aggregation framework overview

Grouping

• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or

set: $push, $addToSet• Other functions

• $min, $max, $first, $last

Wednesday, 21 March 12

Page 18: De normalised london  aggregation framework overview

Sorting

• $sort can sort documents• Sort specifications are the same as today,

e.g., $sort:{ key1: 1, key2: -1, …}

Wednesday, 21 March 12

Page 19: De normalised london  aggregation framework overview

Computed Expressions

• Available in $project operations• Prefix expression language

• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting: $add:[“$field1”, $ifNull:[“$field2”,

“$field3”]]• Other functions….

• $divide, $mod, $multiply

Wednesday, 21 March 12

Page 20: De normalised london  aggregation framework overview

Computed Expressions

• String functions• $toUpper, $toLower, $substr

• Date field extraction• $year, $month, $day, $hour...

• Date arithmetic• $ifNull• Ternary conditional

• Return one of two values based on a predicate

Wednesday, 21 March 12

Page 21: De normalised london  aggregation framework overview

conferences, appearanceshttp://www.10gen.com/events

and meetupshttp://www.meetup.com/London-MongoDB-User-Group

download at mongodb.org

We’re Hiring !Chris Harris

Email : [email protected] : cj_harris5

Wednesday, 21 March 12