Mongo db ops mug pres
-
Upload
david-erickson -
Category
Technology
-
view
81 -
download
0
description
Transcript of Mongo db ops mug pres
MongoDB Ops
This is not the 0tle of my talk
Not This
Not This Either
HOW TO BE A DBA
Inspira0on
CARFAX replica set architecture. People read this and don’t know how much of it applies to them. Is this a good
architecture?
MongoDB Ops Database Resiliency as a Service
Risk Mi'ga'on as a Database
That’s the 0tle of my talk
Topics
• Risk Mi0ga0on • Proac0ve and Itera0ve Ops • MMS Tools • Discussion
I went to the Na0onal Building Museum’s exhibit “Designing for Disaster”
It’s was all about understanding threats and designing structures to withstand natural
disasters.
This was on the wall and I loved it
This is what we do. We try to get the value on the leO to go as close to zero with the $$$ that
we have.
Probability • Building Analogy: – Likelihood of problem
• In IT systems – (Mean Time Between Failure) MTBF – Know your infrastructure – Categorize failure scenarios
• What we can do: – Proac0vely Monitor, Profile, Feedback – Perform Root Cause Analysis
Vulnerability
• Building Analogy: – People and assets in harm’s way
• In IT Systems – Impact, Severity – Mission cri0cality
• What we can do: – Plan for the problem / exposure we actually have
Performance • Building Analogy – Integrity of infrastructure during adverse events
• In IT systems – Failover with consistency – Mean Time To Recovery (MTTR) (HA vs. DR) – Performance (speed)
• What we can do – Ensure HA/DR plans actually accomplish resiliency goals – Keep MTTR’s low (ideally they are automa0c) – Actually test DR plans
Old School Ops • Make sure hardware is sized correctly • Make SQL more efficient, slowing down development
• Hook up systems to my enterprise monitoring tools
• Execute the S.O.P.’s someone handed me if they were ever wri^en in the first place
• “It’s your first day … congratula3ons you are now the expert”
New School Ops
• Proac0ve – Monitor Your App (“Knowing is half the ba^le”) – Compare Expected vs. Actual
• Itera0ve – Include O&M from the beginning of ops planning – Con0nuous Integra0on / Development – Run Dev / Integra0on like produc0on – Automate Everything (Using Dev) – As mission changes O&M also must change
Status and Profiling
• Heartbeat and Status Services – I’d require this as a Dev Ops job interview task
• Low level tools – mongostat, – system profiler, – oplog, – mtools
• Plugins to various monitoring tools – Nagios, SNMP, etc
MongoDB Management Service (MMS)
Monitoring Backup & Recovery Automa0on
MMS Monitoring App Data Tier
MMS (VLAN / Cloud)
agent agent
Java Container
HTTP/S
Operator
Alerts
Dashboards
Pull
Push
Monitoring Side Bar: MMS Schema
• Time Series Data • Data collec0on bucketed • Data Captured a Minute Intervals in Hourly Docs • Graphs Rolled up for bigger 0me resolu0ons with aggrega0on queries
• User queries never cause real-‐0me aggrega0ons • 8 Shards run global MMS Monitoring – 35k instances
MMS Backup App Data Tier
MMS (VLAN / Cloud)
agent agent
Java Container
HTTP/S
Operator / Script
Get .tar restore point mongos
MMS Daemon
HEAD
Blockstore
MMS Automa0on App Data Tier
MMS (VLAN / Cloud)
agent agent
Java Container
HTTP/S
Operator
Edit Goal State Apply Goal State
Things to Monitor • Determine what is normal
• Failovers (Planned / Unplanned) • Recovering Hosts • Replica0on Lag • Connec0ons • Oplog Window • Lock %
• RUNNING OUT OF STORAGE!!!
Things to know
• Individual doc dele0on is expensive – Plan for dele0on profile
• BSON storage gets fragmented by updates – Repair jobs can be run on secondaries
• “Automate” Everything – Un0l you’ve scripted something you don’t know if it’s going to work
Thanks! Discussion