Agility and Scalability with MongoDB
-
Upload
mongodb -
Category
Technology
-
view
372 -
download
1
description
Transcript of Agility and Scalability with MongoDB
MongoDB Scalability and Agility
2
• Now
• Secure
• All varieties
• Fast and interactive
• Scalable to “Big”
• Agile to develop and deploy operationally
• Cloud and edge
Data Challenge“I want my data...”
iStock licensed (pixelfit)
3
Scalability with MongoDB
Metric Meaning Examples
Operations per Second
Concurrent reads and writes per second
> 1 Million per second
Nodes per Cluster
Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources
> 1000 nodes
Records / Documents
Data objects in any number of schemas or structures
> 10 billion
Data Volume Total amount of data: documents X size
> 1 Petabyte = 10^15 = 1,000,000,000,000,000≈ 2^50
Key Differentiation
5
Operational Database Landscape
6
Document Data Model
Relational MongoDB
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
7
Documents are Rich Data Structures
{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-
Coordinate
s
8
Document Model Benefits
• Agility and flexibility– Data model supports business change– Rapidly iterate to meet new requirements
• Intuitive, natural data representation– Eliminates ORM layer– Developers are more productive
• Reduces the need for joins, disk seeks– Programming is more simple– Performance delivered at scale
11
Big Data Tech Interest Comparison
j.mp/Ssvpev
Architecture for Availability & Scalability
14
Replica Sets
• Replica Set – two or more copies
• Availability solution– High Availability
– Disaster Recovery
– Maintenance
• Deployment Flexibility– Data locality to users
– Workload isolation: operational & analytics
• Self-healing shard
Primary
Driver
Application
Secondary
Secondary
Replication
16
Global Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
17
Automatic Sharding
• Sharding types
• Range
• Hash
• Tag-aware
• Elastic increase or decrease in capacity
• Automatic balancing
18
Query Routing
• Multiple query optimization models
• Each sharding option appropriate for different apps
Performance
20
Drag Strip: straight ahead, quarter-mile, stop
21
Road Race:stay fast, stay agile, continuous
Nürburgring, Germany
MongoDB at Scale
24
• Large data set
CarFax
25
Baseline MongoDB Comparison Initial Production
• Vehicle History Database
• 11 billion records (growing at 1 billion per year)
• 30-year-old VMS-based RDBMS
• Cumbersome
• Costly
• Performance: 4x faster than baseline, 10x key-value
• Scale out using inexpensive commodity servers
• Built-in redundancy
• Flexible dynamic schema data model
• Strong consistency
• Analytics/aggregation
• MongoDB is primary data store
• 50 servers• 10 shards• 5 node replica sets per
shard
In-depth NoSQL evaluation
26
• 13 billion+ documents– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
CARFAX Sharding and Replication
27
CARFAX Replication
28
29
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B (~16.5B with replication).*
Foursquare
30
• 11 MongoDB clusters– 8 are sharded
• Largest cluster for check-ins
• 15 shards (check ins)
• Shard key user_id
Foursquare clusters
31
Facebook / parse.com mobile apps
• Persistent database for 270,000 mobile applications
• 200 M end-user mobile devices
• 250% annual growth in client apps
• 500% growth in requests
• 1.5 M collections
• Key differentiators:
– Document data model
– High perf. & avail.
– Geospatial query and index
• Charity Majors operations: j.mp/X3jVRC
– Understand your database and your data, and build for them.
Scalability Exercises in the Cloud with Amazon Web Services
35
• 27x hs1.8xlarge instances
– 16x VCPU
– 24x 2TB SATA drives, RAID0
– 8x mongod microshards
• Modified Yahoo Cloud Serving Benchmark (YCSB)
– Long Integer IDs (>2B)
– Zipfian-distributed integer fields
– Aggregation queries
• Load direct to 216 shards, 10 days, $4K "objects" : 7,170,648,489, "avgObjSize" : 147,438.99952658816, "dataSize" : NumberLong("1,057,240,224,818,640") (commas added)
Petascale Database
CGroup Memory Segregation
for DB in `seq 0 3`; do sudo cgcreate \ -a mongodb:mongodb \ -t mongodb:mongodb \ -g memory:mongodb$D sudo echo 48G > \ /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec \ -g memory:mongodb$DB \ numactl –interleave=all \ mongod –-config ~/mongod$DB.confdone
37
• Ingest 250-byte stock quotes at 2M/s
• Concurrently run 5 QPS, subsecond/indexed response on timeStamp, accountId, instrumentId, systemKey
• 5x r3.4xlarge– 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod
– 2.1M insert/second direct to shards
• 16x c3.8xlarge– 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos
– 2.1M insert/second via mongos
Megawrite Ingest
38
• 2 threads on c3.8xl
• 264 bsonsize object, _id index only
• coll.insert() 15,600 ins / sec
• coll.insert(List<DBObject>)listsize = 64: 118,000 ins / sec
• Bulk ops APIsize = 64: 120,000 ins / sec
Java API comparison
BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; }}
7x Load with BulkOp
How do I Pick A Shard Key?
41
Shard Key characteristics
• A good shard key has:– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible– scatter gather otherwise
• Choosing a good shard key is important!– affects performance and scalability
– changing it later is expensive
42
Hashed shard key
• Pros:– Evenly distributed writes
• Cons:– Random data (and index) updates can be IO intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N
43
Low cardinality shard key
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ a, b )
44
Ascending shard key
• Monotonically increasing shard key values cause "hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ ISODate(…), $maxKey )
Ensuring Success with High Scalability
46
Success Factors
• Storage: random seeks (IOPS)
• RAM: working set based on query patterns
• Query: indexing
• Delete: most expensive operation
• Real-time vs. bulk operations
• Continuity: HA, DR, backup, restore
• Agile process: iterate by powers of 4
• Sharding: shard key and strategy
• Resources: don’t go it alone!