#MongoDBWorld MongoDB World New York City, …files.meetup.com/1742411/MongoDB_2.6_MUG.pdf•...
Transcript of #MongoDBWorld MongoDB World New York City, …files.meetup.com/1742411/MongoDB_2.6_MUG.pdf•...
MongoDB World New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including • MongoDB 2.6 • Sharding • Replication • Aggregation
http://world.mongodb.com Save 25% with discount code 25MarkHelmstetter
What’s New in MongoDB 2.6
Senior Solutions Architect, MongoDB
Mark Helmstetter
@helmstetter
Recent Release History
2.4 Mar‘13
2.6 Apr ‘14
2.2 Aug ‘12
Aggregation Framework
Multi-Data Center Deployments
Improved Performance and Concurrency
Hash-based sharding
Text Search
V8 JavaScript engine
Faster counts
What’s New
• Query Engine Improvements • New Write Operation Protocol • Update Improvements • Aggregation Framework Enhancements • Integrated Text Search • Geospatial Enhancements • Enterprise-Grade Security • Operations Improvements
Query Engine Improvements
Query Engine Improvements
• Index Intersection
• Index Filters
• Query Plan Cache Methods
• count() now supports use of hint()!
• maxTimeMS!
• parallelCollectionScan
Index Intersection
• Simpler ad-hoc queries
• Existing Indexes can be combined to optimize a query
– Less Index Maintenance – Smaller Working Set – Lower Write Overhead – More Adaptive
Index Intersection
• Limited to 2 indexes
• "Self-Intersections" of a single index (up to 10) – Intersect 2 separate index scans of the same index – Example: {tags: {$all: ["mongodb", "java"]}}!
• Query + Sort, must contain predicates over both indexes
Compound Indexes vs. Index Intersection
• Compound index – almost always faster
• Index intersection more for ad-hoc / varied workload – (a, b) – (b, c) – (c, d) – (b) – …
Index Filters • Determine which indexes the optimizer evaluates for
a query shape (query, sort, projection)
• If an index filter exists for a query shape, the optimizer only considers those indexes specified in the filter
– Ignores hint()!– optimizer may still select a collection scan
• Filters exist for the duration of the server process, do not persist after shutdown
• planCacheClearFilters command to manually remove
Query Plan Cache Methods
• Provide visibility and control over the plan cache – db.collection.getPlanCache()!
– PlanCache.help()!
– PlanCache.listQueryShapes()!
– PlanCache.getPlansByQuery()!
– PlanCache.clearPlansByQuery()!
– PlanCache.clear()!
maxTimeMS
• Specifies a cumulative time limit in milliseconds for processing operations on a cursor or command db.collection.find( {year:2006}).maxTimeMS(50)!
maxTimeMS – Java Example
try {! DBCursor cursor = c.find() .maxTime(50, TimeUnit.MILLISECONDS);!
while (cursor.hasNext()) {!
DBObject next = cursor.next();!
// do something! }!
} catch (MongoExecutionTimeoutException t) {!
// handle or ignore?!
}!
Parallel Collection Scan
• Allows applications to use multiple parallel cursors when reading all documents from a collection
• parallelCollectionScan command returns a document that contains an array of cursor information
• List<Cursor> cursors = c.parallelScan( ParallelScanOptions.builder() .numCursors(10).build());!
New Write Operation Protocol
GetLastError
MongoDB
Client
Client
Client
Client
Write Request
Write executed
mongod must remember what happened with this write
GLE Request
New Write Commands
• Declare write concern up front
• Write command always gets a response – content depends on writeConcern setting
• Three commands insert, update, delete
• All support sending multiple documents
• Declare whether multiple ops must be executed in order given or unordered (any order)
mongo shell - WriteResult
> db.items.update( {x:1,y:0}, {$set:{updated:true}})!
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })!
Write Operations
Ordered
• Processed serially
• Driver groups adjacent ops only
• stops on first error
• returns that error
db.c.initializeOrderedBulkOp()!
Unordered
• Executes ops in any order
• driver groups "same" ops together
• all ops will be tried
• returns all errors
db.c.initializeUnorderedBulkOp()!
Bulk.insert()
var bulk = !
db.items.initializeUnorderedBulkOp();!
bulk.insert( { _id: "abc123", x:0} );!
bulk.insert( { _id: "ijk123", x:1 } );!
bulk.insert( { _id: "mop123", x:1 } );!
bulk.execute();!
BulkWriteResult
BulkWriteResult({!
!"writeErrors" : [ ],!!"writeConcernErrors" : [ ],!
!"nInserted" : 3,!
!"nUpserted" : 0,!
!"nMatched" : 0,!!"nModified" : 0,!
!"nRemoved" : 0,!
!"upserted" : [ ]!
})!
Bulk.find()
• Bulk.find.removeOne()!
• Bulk.find.remove()!
• Bulk.find.replaceOne()!
• Bulk.find.updateOne()!
• Bulk.find.update()!
Bulk.find() Example
var bulk = !
db.items.initializeUnorderedBulkOp();!
bulk.find({ x : 1 }).remove();!
bulk.find({ x : 0 }).update(!
{ $set: { y: 0 } } )!
bulk.execute();!
Update Improvements
New Update Operators
• $bit operator now supports bitwise xor operation
• $min and $max operators that perform conditional update depending on the relative size of the specified value and the current value of a field.
• $currentDate operator to set the value of a field to the current date.
• The $mul operator for multiplicative increments for insert and update operations.
$min / $max examples
{ _id: 1, highScore: 800, lowScore: 200 }db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } )!
db.scores.update( { _id: 1 }, { $max: { highScore: 950 } } )!
Enhanced $push Array Operator
• $push operator has enhanced support for the $sort, $slice, and $each modifiers and supports a new $position modifier.
– $each no longer has to be first – $sort without $each/$slice – $slice from beginning or end of array (not middle yet)
Updates preserve field order
• MongoDB preserves the order of the document fields following write operations except for the following cases:
– The _id field is always the first field in the document. – Updates that include renaming of field names may result in
the reordering of fields in the document.
Field Name Enforcement
• Updates cannot use update operators (e.g $set) to target fields with empty field names (i.e. "").
• Updates no longer support saving field names that contain a dot (.) or a field name that starts with a dollar sign ($).
Enforce Index Key Length Limit
• Stronger enforcement of the limit on index key length
• Index creation, insert, update, chunk migration will all error when index key length exceeded
• Run db.upgradeCheckAllDBs() before upgrading to find current keys that violate this limit
• failIndexKeyTooLong = false config option to revert to 2.4 behavior
Other Improvements
• Set allowed on immutable fields – db.foo.update( {_id:1}, {$set:{_id:1, foo:"bar"}})!
• Cleanup bad data, invalid field names ($unset/$rename etc)
Aggregation Framework Enhancements
• Aggregation framework is used to process data and return computed results
– db.collection.aggregate( <pipeline> ) – runCommand(..) – Drivers aggregate() method
• Result of an aggregation is a single document – Result field with an array containing results – Limited to max BSON size of 16MB
Aggregation in 2.4
• aggregate() can now return a cursor, no more 16MB limit!
• New explain() operation to aid analysis of aggregation operations
• External-disk-based sorting process
Enhancements
• db.collection.aggregate(pipeline, options)!
• Options – explain!– allowDiskUse!– cursor!
New aggregate options
• $out stage to output to a collection.
• $redact stage to allow additional control to accessing the data
New pipeline stages
New Set Operators
• $setIsSubset!
• $setEquals!
• $setDifference!
• $setIntersection!
• $setUnion!
• $allElementsTrue!
• $anyElementTrue!
• $let and $map operators to allow for the use of variables.
• $literal operator and $size operator.
• $cond expression now accepts either an object or an array.
New / modified operators
• find() queries a collection and returns a cursor
• To access documents, iterate the cursor
• Mongo shell automatically iterates the cursor (up to DBQuery.shellBatchSize times) and prints
Cursors - recap
> commandResult = db.runCommand({
aggregate: "coll",
pipeline: [ { $match : { } } ],
cursor: { batchSize: 1 }
});!
> db.coll.aggregate(
[{ $match : { } }],
{ cursor : { batchSize : 1 } } )!
Aggregation cursors in 2.6
> var cursor = new DBCommandCursor(db.getMongo(),
commandResult);!
> cursor!
... 20 documents (including results from cursor.firstBatch) ...
Type "it" for more
Aggregation cursors in 2.6
• Allows the name of a collection to be specified which the result of the aggregation should be written
• A new collection will be created if the one specified in $out does not already exist
• If the collection already exists, it will be dropped before being written
$out in 2.6
> db.runCommand({
aggregate: "test",
pipeline: [ { $match : { } }, { $out : “out” } ]
});!
> db.coll.aggregate([{ $match : { } }, { $out:"out" }] !
$out in 2.6 - examples
The $out collection cannot be sharded or capped.
aggregate failed: {
"errmsg" : "exception: namespace 'records.users' is sharded so it can't be used for $out'",
"code" : 17017,
"ok" : 0
}
$out in 2.6
The $out option must be the last pipeline operator.
aggregate failed: {
"code" : 16991,
"ok" : 0,
"errmsg" : "exception: $out can only be the final stage in the pipeline"
}
$out in 2.6
The only replica set member that can be used with $out is the primary. Read preference other than primary - driver will route the aggregation to the primary and log a warning.
$out in 2.6 – read preference
• Unique, empty temporary collection (prepTempCollection)
• Original output collection indexes are created on the temporary collection
• Output of a pipeline is written into the temporary collection in batches of 16MB
• Pipeline errors (inc. writing, eg duplicate IDs) result in temporary collection destruction
• Pipeline success: the temporary collection is atomically renamed into place
$out explained
• aggregate(DBObject firstOp, DBObject... additionalOps) is deprecated
• aggregate(List<DBObject> pipeline, AggregationOptions options)!
– Returns Cursor!
Aggregation – Java Driver
List<DBObject> pipeline = new ArrayList<DBObject>();!
pipeline.add(new BasicDBObject("$match", !
new BasicDBObject("countryCode", "USA") ));!
AggregationOptions options = AggregationOptions.builder()!
.outputMode(AggregationOptions.OutputMode.CURSOR)!
.build();!
Cursor c = coll.aggregate(pipeline, options);!
while (c.hasNext()) {!
DBObject next = c.next();!
System.out.println(next);!
}!
Aggregation – Java Example
Integrated Text Search
Text Search
• Now production-ready, enabled by default
• Integrated with query system – find() – aggregate() – Composable with other query operators
• Multi-language document support
Possibilities with Text Search
• Relevance ranking • Boolean operators • Language-specific tokenization and stemming • Fielded search • Field-weighted scoring • Stop words • Negated query terms (-)
• Index specific fields – { "subject" : "text", "content" : "text" }
• Index all fields – { "$**": "text" }
• Options – Default language – Weights – Language field name
Create a Text Index
• Single Word – "coffee"
• Multiple Words – "bake coffee cake"
• Phrase – "\"bake coffee cake\""
• Exclude Terms – "bake coffee -cake"
Search by
• Case-insensitive – Query for "Coffee" matches "coffee"
• Diacritic-sensitive – Except where the root doesn't have the diacritic – Query for "café" matches "cafe" (in french) – Query for "café" doesn't match "cafe" (in english)
• Language-Specific – Stemming and tokenizing – Stop words
Search Behavior
db.articles.find(
{ $text : { $search : "dog" } }
)
• Specify the language for the search
db.articles.find(
{ $text : {
$search : "chien" },
$language : "fr"
}
)
find()
db.articles.aggregate(
[{ $match : { $text : { $search: "bake coffee cake" } } }]
)
• Specify the language for the search
db.articles.aggregate( [
{ $match : { $text : {
$search : "chien" },
$language : "fr"
}} ]
)
aggregate()
Geospatial Enhancements
2dsphere indexes – Version 2
• Adds support for additional GeoJSON objects: – MultiPoint – MultiLineString – MultiPolygon – GeometryCollection
• Sparse by default
Geospatial Enhancements
• Support for geospatial query clauses in $or expressions
• Stronger validation of geospatial queries
• $uniqueDocs deprecated, geospatial queries no longer return duplicated results when a document matches the query multiple times
Security
Security Starts with the Database
Database
• Must have security built in
• Must be able to integrate with corporate standards
• Must be general purpose
Business Needs Replica Set Benefits
Authentication
In Database
LDAP*
Kerberos*
x.509 Certificates*
Authorization
Built-in Roles
User-Defined Roles
Collection/Database Access Control
Field Level Security
Auditing Admin Operations*
Queries (via Partner Solutions)
Encryption Network: SSL (with FIPS 140-2)*
Disk: Partner Solutions
MongoDB Enterprise-Grade Security
*Requires MongoDB Enterprise
Security – New in 2.6 • Authentication with LDAP (Enterprise)
• x.509 Certificate Authentication (Enterprise)
• User defined roles
• Auditing (Enterprise)
• Enterprise edition for Windows
• Windows Kerberos Support (Enterprise)
• Confirming identity for everything accessing the database
• Create unique credentials for each entity • Clients, admins/devs, software
systems, other cluster nodes
• Integrated with the corporate identity management system
• Enforce password policies
Authentication
Application
Reporting
ETL
• Integrate with choice of corporate authentication mechanisms • LDAP server integration, eliminates duplication with enterprise
user directory
• Kerberos services, with support for Active Directory
• x.509 Certificates, for clients and inter-cluster nodes
• Red Hat Identity Management
Authentication in MongoDB
• Defines what an entity can do in the database
• Grant access only to the specific data needed
• Control which actions an entity can perform
• Group common access privileges into roles
Authorization
User Identity Resource
Commands
Responses
Authorization
Authorization in MongoDB • User-defined roles assign fine-grained privileges
• Delegate roles across team
• User – Identified by (name, db) pair – Defined in the admin.system.users collection – Users are assigned roles
• Role – A grouping of privileges – May also contain other roles – Defined in the admin.system.roles collection
• Privilege – A set of actions on a given resource
User Defined Roles - Concepts
• Resource – An object that can be acted on – (db name, collection name) pair (or {cluster: true}) – Either or both of db name and collection name may be
wildcarded
• Action – An operation that can be performed in the system – e.g. find, remove, update, insert, commands …
User Defined Roles - Concepts
db.createRole({role: "myClusterWideAdmin",
privileges: [], roles: []})
db.grantPrivilegesToRole(
"myClusterWideAdmin",
[ {resource: {cluster: true},
actions : ["addShard"]},
{resource: {db: "test", collection: ""},
actions : ["enableSharding", "moveChunk"]},
{resource: {db: "users", collection: "users"},
actions : ["update", "insert", "remove"]},
{resource: {db: "", collection: ""},
actions : ["find"]}])
Create a Custom Role
• backup and restore – minimal privileges for the task – backup role suitable for MMS Backup Agent
• dbOwner – union of readWrite, dbAdmin and userAdmin roles
Built-In Roles in 2.6
read readAnyDatabase clusterMonitor
readWrite readWriteAnyDatabase hostManager
dbAdmin dbAdminAnyDatabase clusterManager
userAdmin userAdminAnyDatabase dbOwner
clusterAdmin backup restore
root
• Done through user-defined roles – the built-in roles all grant privileges with db-level granularity.
• When granting privileges to a role, you must specify the resource:
Collection Level Access Control
{ cluster: true } The cluster resource
{ db: “foo”, collection: “” } Any non-system collection within db foo
{ db: “”, collection: “foo” } Collection foo in any db
{ db: “foo”, collection: “bar” } Collection bar in db foo
{ db: “”, collection: “” } Any resource except for system collections and the cluster resource
{ anyResource: true } Absolutely any resource
Auditing in MongoDB
• Records • Schema operations & database configuration changes • Authentication & authorization activities • Configurable filters • Write log to multiple destinations in JSON or BSON
• Partner solutions for capture of read / write activity • IBM Guardium
MongoDB Field Level Security
User 1 - Confidential - Secret
{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456, blah:"bar" }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }
User 2 - Top Secret - Secret - Confidential
User 3 - Unclassified
Fiel
d Le
vel A
cces
s C
ontr
ol
• Enables a single document to to store data with multiple security levels
Field Level Security
User 1 - Confidential - Secret
{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456 }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }
User 2 - Top Secret - Secret - Confidential
User 3 - Unclassified
Fiel
d Le
vel A
cces
s C
ontr
ol
Field Level Security
User 1 - Confidential - Secret
{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123, blah:"foo" }, field2: { level: [ “Top Secret” ], data: 456, blah:"bar" }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }
User 2 - Top Secret - Secret - Confidential
User 3 - Unclassified
Fiel
d Le
vel A
cces
s C
ontr
ol
Field Level Security
User 1 - Confidential - Secret
{ _id: ‘xyz’, field1: { level: [ “Confidential” ], data: 123 }, field2: { level: [ “Top Secret” ], data: 456 }, field3: { level: [ “Unclassified” ], data: 789, blah:"baz" } }
User 2 - Top Secret - Secret - Confidential
User 3 - Unclassified
Fiel
d Le
vel A
cces
s C
ontr
ol
Field Level Security: Implementation
Operational Improvements
Improving Performance and Scalability
• Query Router Connection Pooling
• Bulk Write Operations
• Resource Overload Protection with $maxTimeMS!
usePowerOf2Sizes now default
• Default allocation strategy for all new collections
• Uses more storage relative to total document
• Less fragmentation
• More efficient for Insert/Update/Delete workloads
• Fewer document moves
• More predictable sizing
Background index build on secondaries
• If you initiate a background index build on a primary, the secondaries will replicate the index build in the background
Automatic rebuild of interrupted index builds
• Standalone or a primary un-clean shutdown during an index build, index build restarted when the instance restarts.
• Clean shutdown or if a user kills the index build, the interrupted index builds do not automatically restart upon the restart of the server
• Secondary instance terminates during an index build, index build restarted when the instance restarts
Enhanced Sharding and Replication Administration
• cleanupOrphaned command to remove orphaned documents from a shard
• mergeChunks command to combine contiguous chunks located on a single shard
• rs.printReplicationInfo() and rs.printSlaveReplicationInfo() methods to provide a formatted report of the status of a replica set
Networking
• Removed upward limit for the maxIncomingConnections for mongod and mongos
• Connection pools for a mongos instance may be used by multiple MongoDB servers.
• New cursor.maxTimeMS() and corresponding maxTimeMS option for commands to specify a time limit.
Operational Improvements
• Background Secondary Indexing
• Mixed SSL Connections
• Expanded SNMP Support
MongoDB World New York City, June 23-25
#MongoDBWorld
See what’s next in MongoDB including • MongoDB 2.6 • Sharding • Replication • Aggregation
http://world.mongodb.com Save 25% with discount code 25MarkHelmstetter