Cloud Computing

30
Cloud Computing An Introduction and Overview Tom Creighton CTO, Family Search [email protected]

description

Cloud computing is about moving services, computation and/or data?for cost and business advantage?off-site to an internal or external, location-transparent, centralized facility or contractor. By making data available in the cloud, it can be more easily and ubiquitously accessed, often at much lower cost, increasing its value by enabling opportunities for enhanced collaboration, integration, and analysis on a shared common platform.

Transcript of Cloud Computing

Page 1: Cloud Computing

Cloud Computing

An Introduction and Overview

Tom CreightonCTO, Family [email protected]

Page 2: Cloud Computing

Topic Areas

• Cloud computing overview

• Example large-scale cloud bursting application

• Resource management in an enterprise cloud

Page 3: Cloud Computing

Why Cloud Computing?

• Cost Savings– Economies of scale

– Use what’s needed; shut down what’s not

• CapEx to OpEx– Not necessarily an objective;

– Projects move forward without large infrastructure investment

• Scalability, Availability, Flexibility, Security

• Opportunity benefit (maybe the biggest)

Page 4: Cloud Computing

Cloud Computing Definitions

• Essential Characteristics– On-demand self-service

– Broad network access

– Resource pooling

– Rapid elasticity

– Measured service

• Service Models– Software as a Service

– Platform as a Service

– Infrastructure as a Service

NIST defines five essential characteristics, three service models, and four deployment models.

Page 5: Cloud Computing

Cloud Computing Definitions

ACM CTO Roundtables

What is Cloud Computing?

Cloud computing is about moving services, computation and/or data—for cost and business advantage—off-site to an internal or external, location-transparent, centralized facility or contractor. By making data available in the cloud, it can be more easily and ubiquitously accessed, often at much lower cost, increasing its value by enabling opportunities for enhanced collaboration, integration, and analysis on a shared common platform.

Three divisions (areas):• SaaS: WAN-enabled application services

(eg. Google Apps, Salesforce.com, WebEx.)

• PaaS: Foundational elements to develop new applications (eg. Coghead, Google Application Engine.)

• Iaas: Providing computational and storage infrastructure in a centralized, location-transparent service (eg. Amazon.)

Page 6: Cloud Computing

Cloud Computing Definitions

• The term "Cloud Computing" refers to the on-demand delivery of IT resources via the Internet with pay-as-you-go pricing.

Page 7: Cloud Computing

Cloud Computing Models

• Deployment Models

– Private cloud

– Community cloud

– Public cloud

– Hybrid cloud

NIST cloud computing reference architecture

Page 8: Cloud Computing

Cloud Conceptual Framework

Page 9: Cloud Computing

Cisco Domain Ten Framework

Page 10: Cloud Computing
Page 11: Cloud Computing

AWS Regions and Edge Locations

Page 12: Cloud Computing

Regions and Availability Zones

Page 13: Cloud Computing

Not All Regions Have Same Number of Zones

Page 14: Cloud Computing

Cisco Domain Ten Framework

Page 15: Cloud Computing

AWS Account Management

Page 16: Cloud Computing

Cisco Domain Ten Framework

Page 17: Cloud Computing

AWS Service Catalog

Page 18: Cloud Computing

S3 Management Console

Page 19: Cloud Computing

S3 Objects in a Bucket

Page 20: Cloud Computing

An Example Cloud Use

Page 21: Cloud Computing

Cloudbursting Example

• 1 Trillion pairs to classify

• 500 TB of data

• 10 TB result set

• CPU Bound

– Single core: ~ 1000-1500 classify ops/s

• 11.5 K core-days

Page 22: Cloud Computing

Approach

• AWS c1.xlarge spot instances

– 8 core, 24GB RAM

– $0.075 on AWS spot market

– Best CPU/$ ratio

• HP Blades in familysearch.org datacenter

– 8 core, 24 GB RAM

– 300 servers

Page 23: Cloud Computing

Issues to Address

• Machine Instability– Machine death from AWS spot instance volatility

– Hardware failure (1000s of machines – certainty)

• Coordination– Partitioning the work

– Different server clusters• Different aws availability zones

• Fs.org datacenter

– Restart failed processes

Page 24: Cloud Computing

The Solution

• Single Queue Multiple Reader Pattern– SQS with visibility timeouts to manage retries

– Dead letter queue for messages failing > threshold

• Claim Check Pattern– AWS S3 as data store

• Homogeneous, Idempotent work units– 15-20 minute target completion time

– 0 side effects

– Work may get done multiple times

Page 25: Cloud Computing

Claim Check - EIP

Page 26: Cloud Computing
Page 27: Cloud Computing

Results

• Work Completed

• 64 hours

• 1000 machines avg.; ~2500 peak

• $5000

– 6X cost savings over previous Hadoop based solution

• 300 lines of Java/bash code

• 1 engineer / 1 week

Page 28: Cloud Computing

Next Steps

• Better metrics– Log aggregations via Splunk– Performance counters

• Better deployment– CloudFormation

• AWS Simple Workflow (SWF) for better server lifetime management

• Investigate other OSS possibilities– Storm– Hadoop YARN– akka

Page 29: Cloud Computing

Conclusion

• Cloud technology offers new opportunities in scale, availability, security and management.

• Properly using these tools is necessary to achieve business goals.

• CapEx to OpEx is one reason to move to cloud computing.• A more important reason is the tremendous flexibility that

the environment offers.• VPC provides the kinds of governance and security an

enterprise needs to be comfortable with so much power under the hood.

• Resource management is crucial in an enterprise environment.

Page 30: Cloud Computing