Back to Basics Webinar 5: Introduction to the Aggregation Framework

34
MongoDB Europe 2016 Old Billingsgate, London 15 th November mongodb.com / europe Register with code JD20 for a 20% discount

Transcript of Back to Basics Webinar 5: Introduction to the Aggregation Framework

Page 1: Back to Basics Webinar 5: Introduction to the Aggregation Framework

MongoDB Europe 2016Old Billingsgate, London

15th November

mongodb.com/europe

Register with code JD20 for a 20% discount

Page 2: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Back to Basics 2016 : Webinar 5

Introduction to the Aggregation Framework

Joe Drumgoole

Director of Developer Advocacy, EMEA

@jdrumgoole

V1.1

Page 3: Back to Basics Webinar 5: Introduction to the Aggregation Framework

3

Recap

• Webinar 1 – Introduction to NoSQL

– The different types of NoSQL databases

– MongoDB is a document database

• Webinar 2 – My First Application

– Creating databases and collections

– CRUD, Indexes and Explain

• Webinar 3 – Schema Design

– Dynamic schema

– Embedding approaches

• Webinar 4 – GeoSpatial and Text Indexes

Page 4: Back to Basics Webinar 5: Introduction to the Aggregation Framework

4

The Aggregation Framework

• An analytics engine for MongoDB

• What is analytics?

• Think of the two types of database, OLTP, OLAP

• OLTP : Online Transaction Processing

– airline booking,

– ATMs,

– Taxi booking

• OLAP : Online Analytical Processing

– Which tickets make us most money?

– When do we need to refill our ATMs?

– How many cabs do we need to service the West End of London?

Page 5: Back to Basics Webinar 5: Introduction to the Aggregation Framework

5

What Does This Look Like?

OLTP OLAP

Page 6: Back to Basics Webinar 5: Introduction to the Aggregation Framework

6

OLAP - There Be (Hadoop) Dragons Here

• OLAP queries are often table scans

• The output of the queries is often stored for future analysis and comparison

• Many customers are looking at Spark and Hadoop for OLAP but:

– Complexity is astronomical

– Focused on algorithmic analysis of data (you gotta write a program)

– Requires some significant knowledge of parallel alogrithms and parallel

processing

• The aggregation framework is a kinder gentler tool

• May do what you want in less time with less anguish

Page 7: Back to Basics Webinar 5: Introduction to the Aggregation Framework

7

The Aggregation Framework – A Processing Pipeline

Match Project Group SortLimit

• Think unix pipeline

• The output of one stage is passed to the input of the next stage

• Each stage performs one job

• Stages can be repeated

• The input is a single collection

Page 8: Back to Basics Webinar 5: Introduction to the Aggregation Framework

8

Pipeline Operators

• $match

Filter documents

• $project

Reshape documents

• $group

Summarize documents

• $out

Create new collections

• $sort

Order documents

• $limit/$skip

Paginate documents

• $lookup

Join two collections togther

• $unwind

Expand an array

Page 9: Back to Basics Webinar 5: Introduction to the Aggregation Framework

9

The MoT Data Set

Page 10: Back to Basics Webinar 5: Introduction to the Aggregation Framework

10

Example Document

{ "_id" : ObjectId("5759ee6e8684975e1098af68"),

"TestID" : 400,

"VehicleID" : "278",

"TestDate" : ISODate("2013-04-23T00:00:00Z"),

"TestClassID" : "4",

"TestType" : "N",

"TestResult" : "P",

"TestMileage" : 99284,

"Postcode" : "E",

"Make" : "AUDI",

"Model" : "A3",

"Colour" : "BLACK",

"FuelType" : "P",

"CylinderCapacity" : 1598,

"FirstUseDate" : ISODate("2003-11-11T00:00:00Z“) }

Page 11: Back to Basics Webinar 5: Introduction to the Aggregation Framework

11

Lets Use The Shell

MongoDB Enterprise > use vosa

switched to db vosa

MongoDB Enterprise > db.results2013.findOne()

{

"_id" : ObjectId("577294020cb23533dfbaac18"),

"TestID" : 17,

"VehicleID" : 28,

"TestDate" : ISODate("2013-05-02T00:00:00Z"),

"TestClassID" : "2",

"TestType" : "N",

"TestResult" : "P",

"TestMileage" : 46414,

"Postcode" : "BN",

"Make" : "SUZUKI",

"Model" : "UNCLASSIFIED",

"Colour" : "GREEN",

"FuelType" : "P",

"CylinderCapacity" : 398,

"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")

}

Page 12: Back to Basics Webinar 5: Introduction to the Aggregation Framework

12

$limit

MongoDB Enterprise > db.results2013.aggregate([ { "$limit" :2 } ] )

{

"_id" : ObjectId("577294020cb23533dfbaac18"),

"TestID" : 17,

"VehicleID" : 28,

"TestDate" : ISODate("2013-05-02T00:00:00Z"),

"TestClassID" : "2",

"TestType" : "N",

"TestResult" : "P",

"TestMileage" : 46414,

"Postcode" : "BN",

"Make" : "SUZUKI",

"Model" : "UNCLASSIFIED",

Page 13: Back to Basics Webinar 5: Introduction to the Aggregation Framework

13

Let’s Make a Small Collection

> db.results2013.aggregate( [ { "$limit" : 10000 }, {"$out" : "results10k" } ] )

> db.results10k.count()

10000

> db.results10k.findOne()

{

"_id" : ObjectId("577294020cb23533dfbaac18"),

"TestID" : 17,

"VehicleID" : 28,

"TestDate" : ISODate("2013-05-02T00:00:00Z"),

"TestClassID" : "2",

"TestType" : "N",

"TestResult" : "P",

"TestMileage" : 46414,

"Postcode" : "BN",

"Make" : "SUZUKI",

"Model" : "UNCLASSIFIED",

"Colour" : "GREEN",

"FuelType" : "P",

"CylinderCapacity" : 398,

"FirstUseDate" : ISODate("1993-08-11T00:00:00Z")

}

Page 14: Back to Basics Webinar 5: Introduction to the Aggregation Framework

14

$match

…aggregate([ { "$limit" :2000 },

{ "$match" : { "FirstUseDate" : { "$ne" : "NULL" }}} ])

{ "_id" : ObjectId("577294020cb23533dfbaac18"), "TestID" : 17, "VehicleID" : 28,

"TestDate" : ISODate("2013-05-02T00:00:00Z"), "TestClassID" : "2", "TestType" :

"N", "TestResult" : "P", "TestMileage" : 46414, "Postcode" : "BN", "Make" :

"SUZUKI", "Model" : "UNCLASSIFIED", "Colour" : "GREEN", "FuelType" : "P",

"CylinderCapacity" : 398, "FirstUseDate" : ISODate("1993-08-11T00:00:00Z") }

{ "_id" : ObjectId("577294020cb23533dfbaac19"), "TestID" : 22, "VehicleID" : 33,

"TestDate" : ISODate("2013-06-07T00:00:00Z"), "TestClassID" : "2", "TestType" :

"N", "TestResult" : "P", "TestMileage" : 15605, "Postcode" : "PE", "Make" :

"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",

"CylinderCapacity" : 150, "FirstUseDate" : ISODate("1962-01-01T00:00:00Z") }

{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "TestID" : 44, "VehicleID" : 49,

"TestDate" : ISODate("2013-08-09T00:00:00Z"), "TestClassID" : "4", "TestType" :

"N", "TestResult" : "PRS", "TestMileage" : 72694, "Postcode" : "SO", "Make" :

"UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Colour" : "BLACK", "FuelType" : "P",

"CylinderCapacity" : 998, "FirstUseDate" : ISODate("2001-05-16T00:00:00Z") }

...

Page 15: Back to Basics Webinar 5: Introduction to the Aggregation Framework

15

$project (1 of 2)

ageinusecs = { "$subtract" : [ "$TestDate", "$FirstUseDate" ] }

ageinyears = { "$divide" :[ ageinusecs , (1000*3600*24*365) ] }

floorage = { "$floor" : ageinyears }

ispass = { "$cond" : [{"$eq": ["$TestResult","P"]},1,0]}

project = { "$project" : { "Make” :1,

"Model” :1,

"VehicleID" :1,

"TestResult” :1,

"Age” :floorage,

"pass” :ispass }}

Page 16: Back to Basics Webinar 5: Introduction to the Aggregation Framework

16

$project (2 of 2)

MongoDB Enterprise > db.nonulldates.aggregate( [ project ] )

{ "_id" : ObjectId("577294020cb23533dfbaac18"), "VehicleID" : 28, "TestResult" : "P", "Make" : "SUZUKI", "Model" : "UNCLASSIFIED", "Age" : 19, "pass" : 1 }

{ "_id" : ObjectId("577294020cb23533dfbaac19"), "VehicleID" : 33, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 51, "pass" : 1 }

{ "_id" : ObjectId("577294020cb23533dfbaac1a"), "VehicleID" : 49, "TestResult" : "PRS", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }

{ "_id" : ObjectId("577294020cb23533dfbaac1b"), "VehicleID" : 54, "TestResult" : "P", "Make" : "NISSAN", "Model" : "MICRA GX", "Age" : 13, "pass" : 1 }

{ "_id" : ObjectId("577294020cb23533dfbaac1c"), "VehicleID" : 54, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 13, "pass" : 0 }

{ "_id" : ObjectId("577294020cb23533dfbaac1d"), "VehicleID" : 63, "TestResult" : "P", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 1 }

{ "_id" : ObjectId("577294020cb23533dfbaac1e"), "VehicleID" : 63, "TestResult" : "F", "Make" : "UNCLASSIFIED", "Model" : "UNCLASSIFIED", "Age" : 12, "pass" : 0 }

{ "_id" : ObjectId("577294020cb23533dfbaac1f"), "VehicleID" : 93, "TestResult" : "P", "Make" : "BMW", "Model" : "318ti SE COMPACT", "Age" : 12, "pass" : 1 }

Page 17: Back to Basics Webinar 5: Introduction to the Aggregation Framework

17

$group

countMakes = { "$group" : {{ "_id" : "$Make"} , "total" : {"$sum" : 1 }}}

db.nonulldates.aggregate( [ countMakes ])

{ "_id" : "IVECO", "total" : 1 }

{ "_id" : "ISUZU", "total" : 1 }

{ "_id" : "YAMAHA", "total" : 1 }

{ "_id" : "OLDSMOBILE", "total" : 1 }

{ "_id" : "KAWASAKI", "total" : 1 }

{ "_id" : "MASERATI", "total" : 1 }

{ "_id" : "BENELLI", "total" : 1 }

{ "_id" : "BENTLEY", "total" : 3 }

{ "_id" : "AUDI", "total" : 26 }

{ "_id" : "SMART", "total" : 2 }

{ "_id" : "HARLEY-DAVIDSON", "total" : 1 }

Page 18: Back to Basics Webinar 5: Introduction to the Aggregation Framework

18

Summary

• A pipeline of operations

• Select, project, group, sort

• $out must appear last in an aggregation pipeline

• There are a range of accumulators (see the group by

documentation)

• Very powerful way to reshape and analyse data

• Shard aware to gain maxinum performance for large clusters

Page 19: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Q&A

Page 20: Back to Basics Webinar 5: Introduction to the Aggregation Framework
Page 21: Back to Basics Webinar 5: Introduction to the Aggregation Framework

21

• This is slide content

Page 22: Back to Basics Webinar 5: Introduction to the Aggregation Framework
Page 23: Back to Basics Webinar 5: Introduction to the Aggregation Framework
Page 24: Back to Basics Webinar 5: Introduction to the Aggregation Framework
Page 25: Back to Basics Webinar 5: Introduction to the Aggregation Framework

25

Page 26: Back to Basics Webinar 5: Introduction to the Aggregation Framework

26

Page 27: Back to Basics Webinar 5: Introduction to the Aggregation Framework

LOREM

IPSUM

LOREM

IPSUM

LOREM

IPSUM

LOREM

IPSUM

Sollicitudin VenenatisLOREM

IPSUM

LOREM

IPSUM

LOREM

IPSUM

LOREM

IPSUM

Graphic Element Examples

Page 28: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Porta Ultricies

Commodo Porta

Graph Examples

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Category 1 Category 2 Category 3 Category 4

Series 1

Series 2

Page 29: Back to Basics Webinar 5: Introduction to the Aggregation Framework

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Category 1 Category 2 Category 3 Category 4

Series 1

Series 2

Page 30: Back to Basics Webinar 5: Introduction to the Aggregation Framework

{

_id : ObjectId("4c4ba5e5e8aabf3"),

employee_name: "Dunham, Justin",

department : "Marketing",

title : "Product Manager, Web",

report_up: "Neray, Graham",

pay_band: “C",

benefits : [

{ type : "Health",

plan : "PPO Plus" },

{ type : "Dental",

plan : "Standard" }

]

}

Code/Highlight Example

Page 31: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Aggregation Framework Agility Backup Big Data Briefcase

Buildings Business Intelligence Camera Cash Register Catalog

Chat Checkmark Checkmark Cloud Commercial Contract

Computer Content Continuous Development Credit Card Customer Success

Page 32: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Data Center Data Variety Data Velocity Data Volume Data Warehouse Database

Dialogue Directory Documents Downloads Drivers Dynamic Schema

EDW Integration Faster Time to Market File Transfer Flexible Gear Hadoop

Health Check High Availability Horizontal Scaling Integrating into Infrastructure Internet of Things Iterative Development

Page 33: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Life Preserver Line Graph Lock Log Data Lower Cost Magnifying Glass

Man Mobile Phone Meter Monitoring Music New Apps

New Data Types Online Open Source Parachute Personalization Pin

Platform Certification Product Catalog Puzzle Pieces RDBMS Realtime Analytics Rich Querying

Page 34: Back to Basics Webinar 5: Introduction to the Aggregation Framework

Life Preserver RSS Scalability Scale Secondary Indexing Steering Wheel

Stopwatch Text Search Tick Data Training Transmission Tower Trophy

Woman World