Architectures for open and scalable clouds
-
Upload
cloudscaling-inc -
Category
Technology
-
view
29.102 -
download
2
description
Transcript of Architectures for open and scalable clouds
CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution
Architectures for open and scalable cloudsFebruary 14, 2012
Randy Bias, CTO & Co-founder
Our Perspective on Cloud Computing
2
It came from the large Internet players.
A Story of Two Clouds
3
A Story of Two Clouds
4
Tenets of Open & Scalable Clouds
1. Avoid vendor lock-in like bubonic plague
• See also Open Cloud Initiative (opencloudinitiative.org)
2. Simplicity scales, complexity fails
• 10x bigger == 100x more complex
3. TCO matters; measuring ROI is critical to success
4. Security is paramount ... but different
5. Risk acceptance over risk mitigation
6. Agility & iteration over big bang
5
This is a BIG Topic
• What I am covering today is patterns in:
• Hardware and software
• Networking, storage, and compute
• NOT covered today:
• Cloud operations
• Infrastructure software engineering
• Measuring success through operational excellence
• Security
6
Open Clouds(briefly)
7
A Word on ‘Open’
8
Here we go ...
• Elements:
• Open APIs & protocols
• Open hardware
• Open networking
• Open source software (OSS)
• Combined with:
• Architectural patterns, best practices, & de facto standards
• Operational excellence
9
Open APIs & Protocols
10
Open Hardware
11
Open Networking
12
Published Networking Blueprints
Open Source Software
13
Open Cloud OS
Open & ScalableCloud Patterns
14
Threads
• Small failure domains are less impacting
• Loose-coupling minimizes cascade failures
• Scale-out over scale-up with exceptions
• More AND cheaper
• State synchronization is dangerous (remember CAP)
• Everything has an API
• Automation ONLY works w/ homogeneity & modularity
• Lowest common denominator (LCD) services (LBaaS vs F5aaS)
• People are the number one source of failures
15
Pattern:Loose coupling
16
Synchronous, blocking calls mean cascading
failures.
Async, non-block calls mean failure in
isolation.
Pattern:Open source software
17
Excessive software taxation is the past.
Black boxes create lock-in.
You can always fork.
Pattern:Uptime in software - self management
18
Hardware fails.Software fails.
People fail.
Only software can measure itself &
respond to failure in near real-time.
Applications designed for 99.999% uptime can run anywhere
Pattern:Scale-out, not UP
19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...
Scale Up: (Virtual*) Servers are like pets
You name them and when they get
sick, you nurse them back to
health
garfield.company.com
Pattern:Scale-out, not UP
19attrib: Bill Baker, Distinguished Engineer, Microsoft* added by yours truly ...
Scale Up: (Virtual*) Servers are like pets
You name them and when they get
sick, you nurse them back to
health
garfield.company.com
Scale Out: (Virtual*) Servers are like cattle
You number them and when they get
sick, you shoot them
web001.company.com
Pattern:Buy from ODMs
20
ODMs operate their businesses on 3-10%
margins.
AMZN, GOOG, and Facebook buy direct
without a middleman.
Only a few enterprise vendors are pivoting to
compete.
Pattern:Less enterprise “value” in x86 servers
21
Generic servers rule. Full stop. Nothing is better because nothing else is
*generic*.
“... a data center full of vanity free servers ... more
efficient ... less expensive to build and run ... “ - OCP
Pattern:Flat Networking
22
The largest cloud operators all run layer-3 routed, flat networks with no VLANs.
Cloud-ready apps don’t need or want VLANs.
Enterprise apps can be supported on open clouds
using Software-defined Networking (SDN)
Pattern:Software-defined Networking (SDN)
23
• x86 server is the new Linecard• network switch is the new ASIC• VXLAN (or NVGRE) is the new Chassis• SDN Controller is the new SUP Engine
“Network Virtualization”
Pattern:Flat Networking + SDNs
24
Flat + SDN co-exist & thrive together
Standard SecurityGroup
1 2
Availability Zone
VM VM
VM
VM
VM
VM
Virtual L2 Network
VM
VMVM
Virtual Private Cloud
Networking
VPC SecurityGroup
Internet
VPC Gateway
Physical Node
Pattern:RAIS instead of HA pairs/clusters
• Redundant arrays of inexpensive services (RAIS)
• Load balanced
• No state sharing
• On failure, connections are lost, but failures are rare
• Ridiculously simple & scalable
• Most things retry anyway
• Hardware failures are in-frequent & impact subset of traffic
• (N-F)/N, where N = total, F = failed
• Cascade failures are unlikely and failure domains are small
25
Service array (RAIS) example:
26
Backbone Routers
Cloud Access Switches
AZ (Spine) Switches
RAIS (NAT, LB, VPN)
OSPF Route Announcements
Return Traffic (default or source NAT)
API
Public IP Blocks
Cloud Control Plane
Pattern:Lots of inexpensive 1RU Switches
27
1RU: 6K-30K VMs / AZ
Simple spine-and-leaf flat routed network
Rack 1 Rack 2 Rack 3
Pattern:Lots of inexpensive 1RU Switches
27
1RU: 6K-30K VMs / AZ
Simple spine-and-leaf flat routed network
Rack 1 Rack 2 Rack 3
Modular: 40K-200K VMs / AZ
Rack 1Rack 2
MultipleRacks
Rack 1Rack 2
MultipleRacks
Rack 1Rack 2
MultipleRacks
Pattern:Direct-attached Storage (DAS)
28
Cloud-ready apps manage their own data replication.
DAS is the smallest failure domain possible with
reasonable storage I/O.
SAN == massive failure domain.
SSDs will be the great equalizer.
Pattern:Elastic Block Device Services
29
EBS/EBD is a crutch for poorly written apps.
Bigger failure domains (AWS outage anyone?), complex, sets
high expectations
Sometimes you need a crutch. When you do, overbuild the
network, and make sure you have a smart scheduler.
Pattern:More Servers == More Storage I/O
30
>1M writes/second, triple-redundancy w/ Cassandra on AWS
Linear scale-out == linear costs for performance
Pattern:Hypervisors are a commodity
31
Cloud end-users want OS of choice, not HVs.
Level up! Managing iron is for mainframe operators.
Hypervisor of the future is open source, easily modifiable, &
extensible.