#MongoDBWorld MongoDB World New York City, …files.meetup.com/1742411/MongoDB_2.6_MUG.pdf•...

MongoDB World New York City, June 23-25

#MongoDBWorld

See what’s next in MongoDB including •  MongoDB 2.6 •  Sharding •  Replication •  Aggregation

http://world.mongodb.com Save 25% with discount code 25MarkHelmstetter

What’s New in MongoDB 2.6

Senior Solutions Architect, MongoDB

Mark Helmstetter

@helmstetter

Recent Release History

2.4 Mar‘13

2.6 Apr ‘14

2.2 Aug ‘12

Aggregation Framework

Multi-Data Center Deployments

Improved Performance and Concurrency

Hash-based sharding

Text Search

V8 JavaScript engine

Faster counts

What’s New

•  Query Engine Improvements •  New Write Operation Protocol •  Update Improvements •  Aggregation Framework Enhancements •  Integrated Text Search •  Geospatial Enhancements •  Enterprise-Grade Security •  Operations Improvements

Query Engine Improvements

Query Engine Improvements

•  Index Intersection

•  Index Filters

•  Query Plan Cache Methods

•  count() now supports use of hint()!

•  maxTimeMS!

•  parallelCollectionScan

Index Intersection

•  Simpler ad-hoc queries

•  Existing Indexes can be combined to optimize a query

–  Less Index Maintenance –  Smaller Working Set –  Lower Write Overhead –  More Adaptive

Index Intersection

•  Limited to 2 indexes

•  "Self-Intersections" of a single index (up to 10) –  Intersect 2 separate index scans of the same index –  Example: {tags: {$all: ["mongodb", "java"]}}!

•  Query + Sort, must contain predicates over both indexes

Compound Indexes vs. Index Intersection

•  Compound index – almost always faster

•  Index intersection more for ad-hoc / varied workload –  (a, b) –  (b, c) –  (c, d) –  (b) –  …

Index Filters •  Determine which indexes the optimizer evaluates for

a query shape (query, sort, projection)

•  If an index filter exists for a query shape, the optimizer only considers those indexes specified in the filter

–  Ignores hint()!–  optimizer may still select a collection scan

•  Filters exist for the duration of the server process, do not persist after shutdown

•  planCacheClearFilters command to manually remove

Query Plan Cache Methods

•  Provide visibility and control over the plan cache –  db.collection.getPlanCache()!

–  PlanCache.help()!

–  PlanCache.listQueryShapes()!

–  PlanCache.getPlansByQuery()!

–  PlanCache.clearPlansByQuery()!

–  PlanCache.clear()!

maxTimeMS

•  Specifies a cumulative time limit in milliseconds for processing operations on a cursor or command db.collection.find( {year:2006}).maxTimeMS(50)!

maxTimeMS – Java Example

try {! DBCursor cursor = c.find() .maxTime(50, TimeUnit.MILLISECONDS);!

while (cursor.hasNext()) {!

DBObject next = cursor.next();!

// do something! }!

} catch (MongoExecutionTimeoutException t) {!

// handle or ignore?!

}!

Parallel Collection Scan

•  Allows applications to use multiple parallel cursors when reading all documents from a collection

•  parallelCollectionScan command returns a document that contains an array of cursor information

•  List<Cursor> cursors = c.parallelScan( ParallelScanOptions.builder() .numCursors(10).build());!

New Write Operation Protocol

GetLastError

MongoDB

Client

Client

Client

Client

Write Request

Write executed

mongod must remember what happened with this write

GLE Request

New Write Commands

•  Declare write concern up front

•  Write command always gets a response –  content depends on writeConcern setting

•  Three commands insert, update, delete

•  All support sending multiple documents

•  Declare whether multiple ops must be executed in order given or unordered (any order)

mongo shell - WriteResult

> db.items.update( {x:1,y:0}, {$set:{updated:true}})!

WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })!

Write Operations

Ordered

•  Processed serially

•  Driver groups adjacent ops only

•  stops on first error

•  returns that error

db.c.initializeOrderedBulkOp()!

Unordered

•  Executes ops in any order

•  driver groups "same" ops together

•  all ops will be tried

•  returns all errors

db.c.initializeUnorderedBulkOp()!

Bulk.insert()

var bulk = !

db.items.initializeUnorderedBulkOp();!

bulk.insert( { _id: "abc123", x:0} );!

bulk.insert( { _id: "ijk123", x:1 } );!

bulk.insert( { _id: "mop123", x:1 } );!

bulk.execute();!

BulkWriteResult

BulkWriteResult({!

!"writeErrors" : [ ],!!"writeConcernErrors" : [ ],!

!"nInserted" : 3,!

!"nUpserted" : 0,!

!"nMatched" : 0,!!"nModified" : 0,!

!"nRemoved" : 0,!

!"upserted" : [ ]!

})!

Bulk.find()

•  Bulk.find.removeOne()!

•  Bulk.find.remove()!

•  Bulk.find.replaceOne()!

•  Bulk.find.updateOne()!

•  Bulk.find.update()!

Bulk.find() Example

var bulk = !

db.items.initializeUnorderedBulkOp();!

bulk.find({ x : 1 }).remove();!

bulk.find({ x : 0 }).update(!

{ $set: { y: 0 } } )!

bulk.execute();!

Update Improvements

New Update Operators

•  $bit operator now supports bitwise xor operation

•  $min and $max operators that perform conditional update depending on the relative size of the specified value and the current value of a field.

•  $currentDate operator to set the value of a field to the current date.

•  The $mul operator for multiplicative increments for insert and update operations.

$min / $max examples

{ _id: 1, highScore: 800, lowScore: 200 }db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } )!

db.scores.update( { _id: 1 }, { $max: { highScore: 950 } } )!

Enhanced $push Array Operator

•  $push operator has enhanced support for the $sort, $slice, and $each modifiers and supports a new $position modifier.

–  $each no longer has to be first –  $sort without $each/$slice –  $slice from beginning or end of array (not middle yet)

Updates preserve field order

•  MongoDB preserves the order of the document fields following write operations except for the following cases:

–  The _id field is always the first field in the document. –  Updates that include renaming of field names may result in

the reordering of fields in the document.

Field Name Enforcement

•  Updates cannot use update operators (e.g $set) to target fields with empty field names (i.e. "").

•  Updates no longer support saving field names that contain a dot (.) or a field name that starts with a dollar sign ($).

Enforce Index Key Length Limit

•  Stronger enforcement of the limit on index key length

•  Index creation, insert, update, chunk migration will all error when index key length exceeded

•  Run db.upgradeCheckAllDBs() before upgrading to find current keys that violate this limit

•  failIndexKeyTooLong = false config option to revert to 2.4 behavior

Other Improvements

•  Set allowed on immutable fields –  db.foo.update( {_id:1}, {$set:{_id:1, foo:"bar"}})!

•  Cleanup bad data, invalid field names ($unset/$rename etc)

Aggregation Framework Enhancements

•  Aggregation framework is used to process data and return computed results

–  db.collection.aggregate( <pipeline> ) –  runCommand(..) –  Drivers aggregate() method

•  Result of an aggregation is a single document –  Result field with an array containing results –  Limited to max BSON size of 16MB

Aggregation in 2.4

•  aggregate() can now return a cursor, no more 16MB limit!

•  New explain() operation to aid analysis of aggregation operations

•  External-disk-based sorting process

Enhancements

•  db.collection.aggregate(pipeline, options)!

•  Options –  explain!–  allowDiskUse!–  cursor!

New aggregate options

•  $out stage to output to a collection.

•  $redact stage to allow additional control to accessing the data

New pipeline stages

New Set Operators

•  $setIsSubset!

•  $setEquals!

•  $setDifference!

•  $setIntersection!

•  $setUnion!

•  $allElementsTrue!

•  $anyElementTrue!

•  $let and $map operators to allow for the use of variables.

•  $literal operator and $size operator.

•  $cond expression now accepts either an object or an array.

New / modified operators

•  find() queries a collection and returns a cursor

•  To access documents, iterate the cursor

•  Mongo shell automatically iterates the cursor (up to DBQuery.shellBatchSize times) and prints

Cursors - recap

> commandResult = db.runCommand({

aggregate: "coll",

pipeline: [ { $match : { } } ],

cursor: { batchSize: 1 }

});!

> db.coll.aggregate(

[{ $match : { } }],

{ cursor : { batchSize : 1 } } )!

Aggregation cursors in 2.6

> var cursor = new DBCommandCursor(db.getMongo(),

commandResult);!

> cursor!

... 20 documents (including results from cursor.firstBatch) ...

Type "it" for more

Aggregation cursors in 2.6

•  Allows the name of a collection to be specified which the result of the aggregation should be written

•  A new collection will be created if the one specified in $out does not already exist

•  If the collection already exists, it will be dropped before being written

$out in 2.6

> db.runCommand({

aggregate: "test",

pipeline: [ { $match : { } }, { $out : “out” } ]

});!

> db.coll.aggregate([{ $match : { } }, { $out:"out" }] !

$out in 2.6 - examples

The $out collection cannot be sharded or capped.

aggregate failed: {

"errmsg" : "exception: namespace 'records.users' is sharded so it can't be used for $out'",

"code" : 17017,

"ok" : 0

}

$out in 2.6

The $out option must be the last pipeline operator.

aggregate failed: {

"code" : 16991,

"ok" : 0,

"errmsg" : "exception: $out can only be the final stage in the pipeline"

}

$out in 2.6

The only replica set member that can be used with $out is the primary. Read preference other than primary - driver will route the aggregation to the primary and log a warning.

$out in 2.6 – read preference

•  Unique, empty temporary collection (prepTempCollection)

•  Original output collection indexes are created on the temporary collection

•  Output of a pipeline is written into the temporary collection in batches of 16MB

•  Pipeline errors (inc. writing, eg duplicate IDs) result in temporary collection destruction

•  Pipeline success: the temporary collection is atomically renamed into place

$out explained

•  aggregate(DBObject firstOp, DBObject... additionalOps) is deprecated

•  aggregate(List<DBObject> pipeline, AggregationOptions options)!

–  Returns Cursor!

Aggregation – Java Driver

List<DBObject> pipeline = new ArrayList<DBObject>();!

pipeline.add(new BasicDBObject("$match", !

new BasicDBObject("countryCode", "USA") ));!

AggregationOptions options = AggregationOptions.builder()!

.outputMode(AggregationOptions.OutputMode.CURSOR)!

.build();!

Cursor c = coll.aggregate(pipeline, options);!

while (c.hasNext()) {!

DBObject next = c.next();!

System.out.println(next);!

}!

Aggregation – Java Example

Integrated Text Search

Text Search

•  Now production-ready, enabled by default

•  Integrated with query system –  find() –  aggregate() –  Composable with other query operators

•  Multi-language document support

Possibilities with Text Search

•  Relevance ranking •  Boolean operators •  Language-specific tokenization and stemming •  Fielded search •  Field-weighted scoring •  Stop words •  Negated query terms (-)

•  Index specific fields –  { "subject" : "text", "content" : "text" }

•  Index all fields –  { "$**": "text" }

•  Options –  Default language –  Weights –  Language field name

Create a Text Index

•  Single Word –  "coffee"

•  Multiple Words –  "bake coffee cake"

•  Phrase –  "\"bake coffee cake\""

•  Exclude Terms –  "bake coffee -cake"

Search by

•  Case-insensitive –  Query for "Coffee" matches "coffee"

•  Diacritic-sensitive –  Except where the root doesn't have the diacritic –  Query for "café" matches "cafe" (in french) –  Query for "café" doesn't match "cafe" (in english)

•  Language-Specific –  Stemming and tokenizing –  Stop words

Search Behavior

db.articles.find(

{ $text : { $search : "dog" } }

)

•  Specify the language for the search

db.articles.find(

{ $text : {

$search : "chien" },

$language : "fr"

}

)

find()

db.articles.aggregate(

[{ $match : { $text : { $search: "bake coffee cake" } } }]

)

•  Specify the language for the search

db.articles.aggregate( [

{ $match : { $text : {

$search : "chien" },

$language : "fr"

}} ]

)

aggregate()

Geospatial Enhancements

2dsphere indexes – Version 2

•  Adds support for additional GeoJSON objects: –  MultiPoint –  MultiLineString –  MultiPolygon –  GeometryCollection

•  Sparse by default

Geospatial Enhancements

•  Support for geospatial query clauses in $or expressions

•  Stronger validation of geospatial queries

•  $uniqueDocs deprecated, geospatial queries no longer return duplicated results when a document matches the query multiple times

Security

Security Starts with the Database

Database

•  Must have security built in

•  Must be able to integrate with corporate standards

•  Must be general purpose

Business Needs Replica Set Benefits

Authentication

In Database

LDAP*

Kerberos*

x.509 Certificates*

Authorization

Built-in Roles

User-Defined Roles

Collection/Database Access Control

Field Level Security

Auditing Admin Operations*

Queries (via Partner Solutions)

Encryption Network: SSL (with FIPS 140-2)*

Disk: Partner Solutions

MongoDB Enterprise-Grade Security

*Requires MongoDB Enterprise

Security – New in 2.6 •  Authentication with LDAP (Enterprise)

•  x.509 Certificate Authentication (Enterprise)

•  User defined roles

•  Auditing (Enterprise)

•  Enterprise edition for Windows

•  Windows Kerberos Support (Enterprise)

•  Confirming identity for everything accessing the database

•  Create unique credentials for each entity •  Clients, admins/devs, software

systems, other cluster nodes

•  Integrated with the corporate identity management system

•  Enforce password policies

Authentication

Application

Reporting

ETL

[email protected]

[email protected]

[email protected]

[email protected] [email protected] [email protected]

[email protected] [email protected] [email protected]

•  Integrate with choice of corporate authentication mechanisms •  LDAP server integration, eliminates duplication with enterprise

user directory

•  Kerberos services, with support for Active Directory

•  x.509 Certificates, for clients and inter-cluster nodes

•  Red Hat Identity Management

Authentication in MongoDB

•  Defines what an entity can do in the database

•  Grant access only to the specific data needed

•  Control which actions an entity can perform

•  Group common access privileges into roles

Authorization

User Identity Resource

Commands

Responses

Authorization

Authorization in MongoDB •  User-defined roles assign fine-grained privileges

•  Delegate roles across team

•  User –  Identified by (name, db) pair –  Defined in the admin.system.users collection –  Users are assigned roles

•  Role –  A grouping of privileges –  May also contain other roles –  Defined in the admin.system.roles collection

•  Privilege –  A set of actions on a given resource

User Defined Roles - Concepts

•  Resource –  An object that can be acted on –  (db name, collection name) pair (or {cluster: true}) –  Either or both of db name and collection name may be

wildcarded

•  Action –  An operation that can be performed in the system –  e.g. find, remove, update, insert, commands …

User Defined Roles - Concepts

db.createRole({role: "myClusterWideAdmin",

privileges: [], roles: []})

db.grantPrivilegesToRole(

"myClusterWideAdmin",

[ {resource: {cluster: true},

actions : ["addShard"]},

{resource: {db: "test", collection: ""},

actions : ["enableSharding", "moveChunk"]},

{resource: {db: "users", collection: "users"},

actions : ["update", "insert", "remove"]},

{resource: {db: "", collection: ""},

actions : ["find"]}])

Create a Custom Role

•  backup and restore –  minimal privileges for the task –  backup role suitable for MMS Backup Agent

•  dbOwner –  union of readWrite, dbAdmin and userAdmin roles

Built-In Roles in 2.6

read readAnyDatabase clusterMonitor

readWrite readWriteAnyDatabase hostManager

dbAdmin dbAdminAnyDatabase clusterManager

userAdmin userAdminAnyDatabase dbOwner

clusterAdmin backup restore

root

•  Done through user-defined roles – the built-in roles all grant privileges with db-level granularity.

•  When granting privileges to a role, you must specify the resource:

Collection Level Access Control

{ cluster: true } The cluster resource

{ db: “foo”, collection: “” } Any non-system collection within db foo

{ db: “”, collection: “foo” } Collection foo in any db

{ db: “foo”, collection: “bar” } Collection bar in db foo

{ db: “”, collection: “” } Any resource except for system collections and the cluster resource

{ anyResource: true } Absolutely any resource

Auditing in MongoDB

•  Records •  Schema operations & database configuration changes •  Authentication & authorization activities •  Configurable filters •  Write log to multiple destinations in JSON or BSON

•  Partner solutions for capture of read / write activity •  IBM Guardium

MongoDB Field Level Security

User 1 -  Confidential -  Secret

{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456, blah:"bar" }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }

User 2 -  Top Secret -  Secret -  Confidential

User 3 -  Unclassified

Fiel

d Le

vel A

cces

s C

ontr

ol

•  Enables a single document to to store data with multiple security levels



{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456 }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }



Fiel

d Le

vel A

cces

s C

ontr

ol



{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456, blah:"bar" }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }



Fiel

d Le

vel A

cces

s C

ontr

ol



{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123 }, field2: { level: [ “Top Secret” ], data: 456 }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }



Fiel

d Le

vel A

cces

s C

ontr

ol

Field Level Security: Implementation

Operational Improvements

Improving Performance and Scalability

•  Query Router Connection Pooling

•  Bulk Write Operations

•  Resource Overload Protection with $maxTimeMS!

usePowerOf2Sizes now default

•  Default allocation strategy for all new collections

•  Uses more storage relative to total document

•  Less fragmentation

•  More efficient for Insert/Update/Delete workloads

•  Fewer document moves

•  More predictable sizing

Background index build on secondaries

•  If you initiate a background index build on a primary, the secondaries will replicate the index build in the background

Automatic rebuild of interrupted index builds

•  Standalone or a primary un-clean shutdown during an index build, index build restarted when the instance restarts.

•  Clean shutdown or if a user kills the index build, the interrupted index builds do not automatically restart upon the restart of the server

•  Secondary instance terminates during an index build, index build restarted when the instance restarts

Enhanced Sharding and Replication Administration

•  cleanupOrphaned command to remove orphaned documents from a shard

•  mergeChunks command to combine contiguous chunks located on a single shard

•  rs.printReplicationInfo() and rs.printSlaveReplicationInfo() methods to provide a formatted report of the status of a replica set

Networking

•  Removed upward limit for the maxIncomingConnections for mongod and mongos

•  Connection pools for a mongos instance may be used by multiple MongoDB servers.

•  New cursor.maxTimeMS() and corresponding maxTimeMS option for commands to specify a time limit.

Operational Improvements

•  Background Secondary Indexing

•  Mixed SSL Connections

•  Expanded SNMP Support

Thanks!

[email protected]

Mark Helmstetter

@helmstetter

MongoDB World New York City, June 23-25

#MongoDBWorld

See what’s next in MongoDB including •  MongoDB 2.6 •  Sharding •  Replication •  Aggregation

http://world.mongodb.com Save 25% with discount code 25MarkHelmstetter

#MongoDBWorld MongoDB World New York City, …files.meetup.com/1742411/MongoDB_2.6_MUG.pdf•...

Documents

Transcript of #MongoDBWorld MongoDB World New York City, …files.meetup.com/1742411/MongoDB_2.6_MUG.pdf•...