These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are...

71
WATERLOO CHERITON SCHOOL OF COMPUTER SCIENCE Introduction to Introduction to Cloud Computing Cloud Computing CS 446/646 ECE452 Jul 4 th , 2011 IMPORTANT NOTICE TO STUDENTS These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague and incomplete on purpose to spark class discussions

Transcript of These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are...

Page 1: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Introduction to Introduction to Cloud ComputingCloud Computing

CS 446/646 ECE452Jul 4th, 2011

IMPORTANT NOTICE TO STUDENTS

These slides are NOT to be used as a replacement for student notes.These slides are sometimes vague and incomplete on purpose to spark class discussions

Page 2: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

2WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Grid ComputingDef● “combination of computer resources from multiple

administrative domains applied to a common task” [1]

Characteristics● distributed parallel computation● many different applications● constructed with middleware ● variable size● heterogeneous composition

[1] http://en.wikipedia.org/wiki/Grid_computing

Page 3: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

3WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud versus Grid Computing*Cloud Computing● tightly coupled nodes

(can be dissimilar)● high inter-node

bandwidth● centralized management

& job scheduling● standards are being

developed● virtualized resources

Grid Computing● heterogeneous loosely

coupled nodes● unpredictable inter-node

bandwidth● distributed management

& job scheduling● standardized protocols &

interfaces

*Advanced Topics in Computer Systems: Cloud Computing and Management – 2011 (Raouf Boutaba)

Page 4: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

4WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud & Grid Computing*Similarities● distributed computing● reduced cost● increased reliability● increased flexibility

*Advanced Topics in Computer Systems: Cloud Computing and Management – 2011 (Raouf Boutaba)

Page 5: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

5WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Utility ComputingDef● “The packaging of computing resources (computation,

storage etc.) as a metered service”*

Observation● not a new concept

– "If computers of the kind I have advocated become the computers of the future, then computing may someday be organized as a public utility just as the telephone system is a public utility... The computer utility could become the basis of a new and important industry." John McCarthy, MIT Centennial in 1961

* http://en.wikipedia.org/wiki/Utility_computing

Page 6: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

6WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

What is Cloud ComputingNew Label?● cloud computing → grid computing + utility computing

– yes: “the vision is the same”*● reduce cost, increase reliability & flexibility

– no: “on demand computing, larger amounts of data”*– yes: “fundamentally the problems are the same”*

● management of nodes● consumer driven● parallel computation

● difficult to define**● NIST (National Institute of Standards & Technology)

– “universally” accepted definition

*Cloud computing and grid computing 360-degree compared I. Foster et al. 2008

** “Twenty Experts Define Cloud Computing”, SYS-CON Media Inc 2008.

Page 7: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

7WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 8: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

8WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 9: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

9WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 10: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

10WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 11: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

11WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 12: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

12WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Computing – NIST Definition● “Cloud computing is a model for enabling convenient,

on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models.”*

* http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc

Page 13: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

13WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsOn-demand self-service*● a consumer can unilaterally provision computing

capabilities without human interaction with the service provider

● computing capabilities– server time– network storage– number of servers etc.

seems a bit strange to exclude human interaction from this process?

*Draft NIST Working Definition of Cloud Computing – 2009, Peter Mell and Tim Grance

Page 14: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

14WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsUbiquitous network access● capabilities are

– available over the network – accessed through standard mechanisms

● to support & promote– heterogeneous (thin or thick) client platforms

Page 15: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

15WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsMulti-tenancy / Resource pooling● provider’s computing resources are pooled to serve

multiple consumers● computing resources

– storage– processing– memory– network bandwidth & virtual machines

● location independence– no control over the exact location of the resources

Page 16: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

16WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsMulti-tenancy / Resource pooling● has major implications

– performance, scalability, security● example (data & multi-tenancy)

– to discuss in class– considerations (evolution, scalability, management etc.)– design

● non-functional concerns for multi-tenant applications– security & privacy, release management– configurable options (UI, workflows etc.)

Futher Reading: http://msdn.microsoft.com/en-us/library/aa479086.aspx

Page 17: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

17WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsRapid elasticity● capabilities can be rapidly and elastically provisioned● unlimited (virtual) resources● predicting a ceiling is difficult

Page 18: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

18WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsRapid elasticity● capabilities can be rapidly and elastically provisioned● unlimited (virtual) resources● predicting a ceiling is difficult

Page 19: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

19WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Essential CharacteristicsMeasured service● metering capability of service/resource abstractions

– storage– processing– bandwidth– active user accounts

● OK so what happened to utility computing – pay as you go model?– NIST does not talk about $$– more on this later when we discuss deployment models

Page 20: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

20WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Relevant TechnologiesAccess● well defined protocols for communication● broadband / high speed access

Distributed Computing● for data storage and computation

Virtualization● decoupling from the physical computing resources● types:

– hardware, memory, data storage, data schema, network

Page 21: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

21WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Reference Architecture

infrastructure

storagevirtualization

cloud run-time

service

applications

service

service service

Page 22: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

22WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Reference Architecture

infrastructure

storagevirtualization

cloud run-time

service

applications

service

service service

man

agem

ent

secu

rity

mon

itori

ng

met

erin

g

Page 23: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

23WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Reference Architecture

infrastructure

storagevirtualizationIaaS

cloud run-time

service PaaS

applications

service

service service

SaaS

man

agem

ent

secu

rity

mon

itori

ng

met

erin

g

Page 24: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

24WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesSaaS (Software-as-a-Service)● vendor/provider controlled applications accessed over the

network● characteristics

– network based access– multi-tenancy– single software release for all

SaaS Examples– Salesforce.com, Google Docs

Page 25: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

25WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesSaaS & Multi-tenancy● SaaS applications are multi-tenant applications● application data

– Google docs

SaaS Application Design● SaaS applications are 'net native'● configurability, efficiency, and scalability● SOA & SaaS

Page 26: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

26WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesNet Native Application Characteristics● cloud specific design, development & deployment

– multi-tenant data model– builtin metering & management – browser based client & client tools– customization via configuration

Page 27: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

27WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesSaaS Disadvantages● dependency on

– network, cloud service provider● performance

– limited client bandwidth● security

– good: better security than personal computers– bad: CSP is in charge of the data– ugly: user privacy

Page 28: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

28WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesPaaS (Platform-as-a-Service)● vendor provided development environment

– tools & technology selected by vendor– control over data life-cycle

Advantages● rapid development & deployment● scalability & fault-tolerance offered by provider● small start-up cost

– considerations: required skills set, money

Page 29: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

29WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesPaaS – Architectural Characteristics● multi-tenancy

– data● native scalability

– load balancing & fail-over● native integrated management & monitoring

– performance– resource consumption/utilization– load

Page 30: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

30WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesPaaS Disadvantages● inherits all from SaaS● choice of development technology is limited to vendor

provided/supported tools and services

PaaS Examples● Google app engine

– Google Site + Google Docs

Page 31: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

31WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesIaaS (Infrastructure-as-a-Service)● vendor provided and consumer provisioned computing

resources– processing, storage, network, etc.– consumer is provided customized virtual machines– consumer has control over

● OS, memory● storage● servers & deployment configurations● limited control over network resources

Page 32: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

32WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI ServicesAdvantages● infrastructure scalability● native integrated management

– performance, resource consumption/utilization, load● economical cost

– hardware, IT support

IaaS Examples● Amazon Elastic Compute Cloud – EC2

Page 33: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

33WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI Services

Page 34: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

34WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

SPI Services & Control

Network

Storage

Server

VM

APP

Data

Network

Storage

Server

VM

APP

Data

Network

Storage

Server

VM

APP

Data

Network

Storage

Server

Services

APP

Data

Network

Storage

Server

Services

APP

Data

Organizationcontrolled

Organization & service provider share control

Service Providercontrolled

In-houseDeployment

HostedDeployment

IaaSCloud

PaaSCloud

SaaSCloud

[1] Visualizing the Boundaries of Control in the Cloud. Dec 2009. http://kscottmorrison.com/2009/12/01/visualizing-the-boundaries-of-control-in-the-cloud/

Page 35: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

35WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

XaaSXaaS (Everything-as-a-Service)● composite second level services● Security-as-a-Service

– McAfee*● McAfee SaaS Email Archiving● McAfee SaaS Email Inbound Filtering● McAfee Vulnerability Assessment SaaS (PEN Tests)

● CaaS – Communication-as-a-Service– VoIP, private PBX

Page 36: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

36WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Reference Architecture

infrastructure

storagevirtualizationIaaS

cloud run-time

service PaaS

applications

service

service service

SaaS

man

agem

ent

secu

rity

mon

itori

ng

met

erin

g

Page 37: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

37WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Cloud Deployment ModelsPrivate cloud● infrastructure is operated solely for an organization● managed by the organization or by a third party

Community cloud● supports a specific community● infrastructure is shared by several organizations

Page 38: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

38WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Cloud Deployment ModelsPublic cloud● infrastructure is made available to the general public● owned by an organization selling cloud services

Hybrid cloud● infrastructure is a composition of two or more clouds

deployment models● enables data and application portability

Page 39: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

39WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

NIST Cloud Computing Model

Page 40: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

40WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Storage & Clouds

Page 41: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

41WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageCAP Theorem*● web services cannot ensure all three of the following

properties at once– consistency

● set of operations has occurred all at once

– availability● an operation must terminate in an intended response

– partition tolerance● operations will complete, even if individual components are

unavailable

* Eric Brewer, University of California, Berkeley

Page 42: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

42WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageHorizontal Storage Scaling● “any horizontal scaling strategy is based on data

partitioning”*– forced to decide between consistency & availability

Recall● ACID provides strong data consistency guarantees at the

cost of availability

* BASE: An Acid Alternative, Dan Pritchett http://queue.acm.org/detail.cfm?id=1394128

Page 43: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

43WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Scaling StorageTwo approaches to Scaling● vertical – bigger hardware

● horizontal – more hardware– functional partitioning– sharding

*http://en.wikipedia.org/wiki/Shard_%28database_architecture%29

http://queue.acm.org/detail.cfm?id=1394128

Page 44: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

44WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageBASE – an ACID alternative*● basically available, soft state, eventual consistency● characteristic

– “optimistic and accepts that the database consistency will be in a state of flux”*

– supports partial failures● scalability promise

– “leads to levels of scalability that cannot be obtained with ACID”*

– at the cost of consistency

* BASE: An Acid Alternative, Dan Pritchett http://queue.acm.org/detail.cfm?id=1394128

Page 45: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

45WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageEventual Consistency● consistency across functional groups is easy to relax● we encounter this on daily basis● some scenarios

– update of online user profile– online master card payment– ATM cheque deposit

● idempotent operations– permit partial failures

Page 46: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

46WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageGeneral Characteristics● simplified data model● built on distributed storage systems

– GFS - Google File System– HDFS – Hadoop Distributed File System

● highly available– relaxed consistency

● fault-tolerant via replication and distributed components

Page 47: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

47WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed StorageGeneral Characteristics● eventual consistency

– all replicas will be updated at different times and in different order

● examples– Google BigTable– Yahoo PNUTS– Amazon S3

Page 48: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

48WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Google Bigtable*Introduction● distributed storage for managing structured data● designed to

– scale to very large size– store different types of data (URLs, images)– achieve high availability, low latency & fault-tolerence

*Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. Bigtable: a distributed storage system for structured data. In Proc. USENIX Symposium on Operating System Design and Implementation (OSDI'06), 2006

Page 49: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

49WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Google BigtableData Model● multi-dimensional sorted map (not a relational database)● (rowkey, column key, timestamp) → byte[]

rowkey

columnkey

time

- row name is reversed URL- contents column has three versions- anchor column has one version each

Page 50: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

50WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Google BigtableObservations● rows are maintained in lexicographic order

– data locality → access latency– rowkey must be unique across all

● column family has to be defined first● unbounded number of columns can be added to each

column family● time indexing allows for

– historical versions of the data

Page 51: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

51WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Google BigtableScalability, Availability & Fault-tolerance● master-slave type architecture● master sever

– manages data distribution to different slave nodes– monitors the life-cycle of slave nodes– slave nodes can be added or removed dynamically

● client data utilization– connects to the master to get slave node addresses– connect directly to the slave nodes to manage data

Page 52: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

52WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computation in Clouds(Map Reduce)

Page 53: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

53WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed ComputationMotivation● facilitate computation over a large data-set● fault-tolerance

– single computations can fail● redundancy

– same computation can be performed by different nodes● easy management & configuration setup

– shield the developers from the details

Page 54: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

54WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed ComputationBasic Idea● functional programming● functional decomposition

– large problem broken into a set of small problems● design

– functional transformation of input data → pipes & filters– isolated execution → parallel computing

● server (task) farm to solve the big problem– cloud elasticity

Page 55: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

55WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computationwc the cloud (class activity)● design considerations

– large data-sets– parallel execution– fault-tolerance– scalability– reusability

Page 56: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

56WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computationwc the cloud with MapReduce

But a word about MapReduce first

Page 57: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

57WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

MapReduce*Programming Model● input: a set of key value paris● output: a set of key value pairs● computation: transform input set into output set

– map function (user defined)– reduce function (user defined)

*MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Page 58: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

58WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

MapReduceControl Flow

merge

input reader

mapfunction

partition & comparefunctions

reducefunction

output writer

Page 59: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

59WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computationwc the cloud with MapReduce

input map partition & compare reduce output

*Advanced Topics in Computer Systems: Cloud Computing and Management – 2011 (Raouf Boutaba)

Page 60: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

60WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computationwc the cloud with MapReduce

map(String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1");

*MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Page 61: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

61WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed Computationwc the cloud with MapReduce

reduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result));

*MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Page 62: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

62WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Distributed ComputationObservations● MapReduce framework

– manages the job– job decomposition into independent functional units

● user provides the data transformation operations– map & reduce functions only

● work is distributed over multiple nodes– what is the performance bottleneck here?– how can we improve upon this?

Page 63: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

63WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

More ExamplesDistributed Grep● map → emits a line if it matches a supplied pattern● reduce → copies the supplied intermediate data to the

output

URL Access Frequency● map → processes logs of web page requests and outputs

<URL, 1>● reduce → adds together all values for the same URL and

emits a <URL, total count>

*MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Page 64: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

64WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

More ExamplesInverted Index● map → parses each document, and emits a sequence of

<word, document ID>● reduce → for a given word,emits a <word, list(document

ID)> pair.

*MapReduce: Simplified Data Processing on Large Clusters. Jeffrey Dean and Sanjay Ghemawat. OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. http://labs.google.com/papers/mapreduce.html

Page 65: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

65WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Scalability● “a scalable architecture is critical to take advantage of a

scalable infrastructure” *

Application Scalability Characteristics ● increase resources results in proportional increase in

performance● scalable service is capable of handling heterogeneity● scalable service is resilient

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 66: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

66WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Scaling & Elasticity

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 67: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

67WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Design for failure and nothing will fail● assume everything will fail and design backwards

– hardware failure, software failure– unexpected increase in system load

● avoid “single point of failure”– added redundancy and fault-tolerence

● plan for automated failure recognition, reporting, replacement

● ideas for data failure recovery?– partitioning, replication, write-logs

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 68: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

68WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Decouple System Components● web based application

– decouple web server from app server from database server– system components to act as black boxes– system components to interact via interfaces

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 69: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

69WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Decouple System Components

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 70: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

70WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Data Partitioning● dynamic data

– keep the data in the cloud if possible to avoid network latencies

– moving computation to the data● static data:

– utilize content delivery services to cache data at the edge locations (closer to the client/user)

– replication & caching

*Architecting for the Cloud: Best Practices - by J Varia - 2010

Page 71: These slides are NOT to be used as a replacement for student … · 2011-07-27 · These slides are NOT to be used as a replacement for student notes. These slides are sometimes vague

71WATERLOOCHERITON SCHOOL OFCOMPUTER SCIENCE

Cloud Architecture – Best Practices*Parallel Processing● “cloud makes parallelization effortless”● implement parallelization wherever posisble● automate parallelization● request parallelization

– via thread safe & share nothing principles● examples

– parallel hardware, data access, & computation

*Architecting for the Cloud: Best Practices - by J Varia - 2010