Managing RightScale on RightScale
-
Upload
rightscale -
Category
Technology
-
view
755 -
download
1
description
Transcript of Managing RightScale on RightScale
1
Managing RightScale on RightScale
February 1, 2011
2
Your Panel Today
Presenting
• Rafael H. Saavedra – VP, Engineering at RightScale
• Chris Horne – Director, Product Marketing at RightScale
Q&A
• Douglas Johnson, Operations Manager at RightScale
Please use the questions window to ask questions any time!
3
Topics
• Managing RightScale on RightScale (Dev, Staging, Prod & Meta)
• RightScale Meta manages RightScale Production
• Production System Overview
• Monitoring Production – Quis Custodiet Ipsos Custodes
• Our Favorite RightScale Features
• Our Not-so-favorite Features
• Deploying RightScale – Cloud Best Practices
4
RightScale
Production
Managing RightScale on RightScale
Customer A Customer DCustomer B Customer C
RightScale
Development
RightScale
Staging
RightScale
Development
5
RightScale
Production
RS Production is managed by RS Meta
RightScale Meta
Production
RightScale
StagingCustomer A Customer D
RightScale
Development
RightScale
Development
6
A multitude of RightScale systems
• Meta Production manages the Production system
• Meta currently lives outside the cloud containing production
• Meta is extremely secure, accessible only by a handful of operations folks
• The Production system is my.rightscale.com
• We are reaching 200 servers with a large fraction in EC2 US-East
• Servers are located in every cloud to achieve high availability
• Servers are allocated in well defined availability zones
• A few staging systems are used for integration and QA
• Ad hoc systems for performance testing, demos, betas, etc.
• Many development systems with simplified configurations
• Development systems are available at the click of a button
7
Significant increase in cloud usage
N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10
EC
2 U
sage
N-08 D-08 J-09 F-09 M-09 A-09 M-09 J-09 J-09 A-09 S-09 O-09 N-09 D-09 J-10 F-10 M-10 A-10 M-10 J-10 J-10 A-10 S-10 O-10
EC
2 U
sa
ge
8
Some interesting RightScale numbers
• 2M servers launched by RightScale
• RightScale continuously monitors more than 70k servers
• Every day at RightScale:
• 2,000 array resize actions are executed
• 35,000 alert escalations are triggered
• 20,000 escalation emails are sent to users
• 9.0TB of monitoring data is exchange with our servers
• 1.6TB of logging data is sent to our servers
9
RightScale production (simplified)d
ae
mo
ns
DB Master
DB Slave
da
tab
as
es
mir
rors
log
gin
gm
on
ito
rin
g
Front Ends
da
sh
bo
ard
AP
I
Main App oth
ers
10
What do our users do?
• Dashboard, API, monitoring graphs & event notifications
• Most of the requests are monitoring updates 85% (70%)
• Dashboard and API calls are heavier requests; they represent
7% of requests but 26% of bandwidth
Monitoring85%
Notifications8%
API6%
Dashboard1%
Distribution by Requests
Monitoring70%
Notifications4%
API15%
Dashboard11%
Distribution by Bandwidth
11
We eat our own dog food
• Production servers are organized into independent deployments
• Core servers: frontends, core/api servers, databases, daemons
12
We eat our own dog food
• We use security groups extensively to isolate servers
• ServerTemplates are versioned for each major release
• This preserves the ability to launch exact configurations of past versions
13
Monitoring, alerts & escalations
• We monitor as much relevant data as possible and display it
in insightful ways to quickly detect patterns and abnormalities
• We proactively eliminate the conditions that raise critical alerts
• No broken windows policy. No critical alerts can remain unresolved.
API Network Activity Dashboard Network Activity
14
How to monitor hundreds of servers?
15
How to monitor hundreds of servers?
• We leverage a
monitoring data
warehouse to
develop heat maps
& stacked graphs
16
Quis Custodiet Ipsos Custodes?*
• We monitor the monitoring and alerting systems
• We extensively use alerts to monitor the responsiveness of all
RightScale servers
• When you have hundreds of cloud servers, you statistically
see more instance failures. Instance and EBS failures can
cause headaches. Be prepared to grab a new instance.
• The meta & production monitoring and alerting systems are
fully decoupled from each other
* Who watches the watchmen?
17
Our favorite RightScale features
• RightImages – Resist the temptation to build custom images.
Leverage pure, base images to avoid introducing surprises.
• Input Inheritance – Makes it easy to keep configurations in
sync for dozens of servers
• ServerTemplates – Makes it very easy to reproduce
configurations across production, staging and development.
You have to fully automate configuration to manage a high
number of servers.
• Component Library – There are always new assets
(RightScripts, ServerTemplates, Macros, etc.) that can be
adapted to our needs
• Monitoring – It’s easy to make collectd plugins to monitor just
about anything
18
Our not-so-favorite features
• ServerTemplates Inputs – Powerful but too many of them
make templates difficult to use. Document them well for others.
• Revision Management – Still a ways to go to make users
aware of new versions and how to update
• Component Library – Finding new resources from the library
is not easy and intuitive
• Alerts – They work pretty well but they are not easy to
configure, in particular, custom ones
19
Best practices for upgrading RightScale
• In the cloud, the cost of duplicating servers is minimal
• Avoid upgrading existing servers (a non-cloud approach).
Launch fresh ones with new software instead (fail forward).
• Old servers can take over in case something goes wrong
• Launch additional slaves to capture recovery points
• One slave continues to replicate in case of master failure
• Another slave is frozen at upgrade point – can rollback by failing over
• Don’t forget to take snapshots in case of major failure
20
Front Ends
DB Slave
Databases
DB Master
Main App
Upgrading RightScale Step-by-Step
Main App
DB Slave
7) Take snapshot
at cutoff
6) Stop replication
2) Servers with new code
1) Servers with current code
4) Cut access
to site5) Stop all access
to databases
3) Add second slave
9) Reconnect
all servers8) Update schema
10) Open access
to site
21
Front Ends
DB Slave
Databases
DB Master
Main App
Upgrading RightScale Step-by-Step
Main App
DB Slave
Cutoff SnapshotServers with new code
Servers with old code
22
Have a project and want to discuss how RightScale can help?
Contact [email protected] or (866) 720-0208
Ready to get started?
Sign up for our Free Edition: www.RightScale.com/Free
Call us for a VIP trial of our paid editions
Need to learn more?
TCO calculator: www.RightScale.com/tco-calculator
User Conference Videos: www.RightScale.com/conference
Webinar archive: www.RightScale.com/webinars
White papers: www.RightScale.com/whitepapers
Q&A / Getting Started
23
Thank You!