DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo...

43
Data Management in the Cloud Tim Kraska Montag, 22. Februar 2010 Systems Group/ETH Zurich

Transcript of DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo...

Page 1: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Data Management in the Cloud

Tim Kraska

Montag, 22. Februar 2010 Systems Group/ETH Zurich

Page 2: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

[Anology from IM 2/09 / Daniel Abadi] 22.02.2010 2 Systems Group/ETH Zurich

MILK?

Page 3: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

22.02.2010 3 Systems Group/ETH Zurich

Buy a cow   High upfront investment   High maintenance cost   Produces a fixed amount

of milk   Stepwise scaling

Buy bottled milk   Pay-per-use   Lower maintenance cost   Linear scaling   Fault-tolerant

Do you want milk?

Page 4: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Traditional Computing vs. Cloud Computing

22.02.2010 4 Systems Group/ETH Zurich

Your computer is a cow   High upfront investment   High maintenance cost   Fixed amount of resources   Stepwise scaling

Cloud computing is bottled milk   Pay-per-use   Lower maintenance cost   Linear scaling   Fault-tolerant

Page 5: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Requirements for DM in the Cloud

  Scalability   response time independent of number of clients

  100 percent read + write availability   no client is ever blocked under any circumstances

  Cost ($$$)   pay as you go along, no investment upfront   get cheaper every year, leverage new technology

  No administration   “outsource” patches, backups, fault tolerance

22.02.2010 5 Systems Group/ETH Zurich

Consistency: Optimization goal, not constraint

Page 6: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Outline

  Motivation   Camp 1: Relational DB’s in the Cloud

  Amazon RDS   MS Azure

  Camp 2: New Cloud DB/Storage System   Amazon Dynamo   Google’s MegaStore, BigTable, GFS   Building a DB applications without a DBMS

  Analytics in the Cloud: Hadoop   What’s next?

22.02.2010 6 Systems Group/ETH Zurich

Page 7: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Why not use a standard relational DB?

22.02.2010 7 Systems Group/ETH Zurich

  Advantages   Traditional databases are available   Proven to work well; many tools   People trained and confident with them

Page 8: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Camp 1: Install standard DBMS in the Cloud

  Advantages   Traditional databases are available   Proven to work well; many tools   People trained and confident with them

22.02.2010 8 Systems Group/ETH Zurich

Virtual Machine

Database (e.g., Oracle, DB2, MS

SQL Server, MySQL,….)

Page 9: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Camp 1: Install standard DBMS in the Cloud

  Advantages   Traditional databases are available   Proven to work well; many tools   People trained and confident with them

  Disadvantages   Build for scale-up, not scale-out   Traditional DBMS solve the wrong

problem anyway   Focus on throughput and consistency   SAP and Oracle misuse the DBMS already

today   Traditional DBMS make the wrong

assumptions   e.g., DBMS optimizers fail on virtualized

hardware

22.02.2010 9 Systems Group/ETH Zurich

vs

Page 10: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

CAP-Theorem

22.02.2010 10 Systems Group/ETH Zurich

Consistency Availability

Tolerance against Network

Partitioning

vs

Page 11: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Amazon Relational Database Service (RDS)

22.02.2010 11 Systems Group/ETH Zurich

Virtual Machine

DB1

EBS/S3

LOGS/SNAPSHOTS

Cloud Watch

Elastic Load Balancing

Auto Scaling

Cloud Front Virtual Machine

DB2

Virtual Machine

DB3

Client

Virtual Machine

AppServer

RD

S

[aws.amazon.com/rds]

Page 12: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

MS Azure - Architecture

Fabric Replication

Client Tier 

Mgmt. Services

Data Node

SQL Server

Fabric

Mgmt. Services

Data Node

SQL Server

Fabric

Mgmt. Services

Data Node

SQL Server

Fabric

Mgmt. Services

Data Node

SQL Server

Fabric

Distributed SQL Data Cluster

Mgmt. Services

Data Node

SQL Server

Fabric

Mgmt. Services

Data Node

SDS Service Tier 

Storage Tier 

[ADO.Net client]

REST SOAP

SDS Runtime

[www.microsoft.com/windowsazure/sqlazure]

22.02.2010 12 Systems Group/ETH Zurich

Page 13: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Camp 2: Rethink the whole system architecture

  Rethink the whole system architecture   Do NOT use a traditional DBMS   Build a new systems tailored for cost efficiency, availability, and

scalability and see consistency as an optimization goal

  Advantages and Disadvantages   Requires new breed of (immature) systems + tools   Solves the right problem and gets it right   Optimized for cost, availability, scalability,...   Leverages organization‘s investments in SOA

22.02.2010 13 Systems Group/ETH Zurich

Page 14: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

  Data model: Key->Value   Operations

  Put(Key, Value)   Get(Key)   NO-SQL!!!!

Key/Value-Store: Example Dynamo

22.02.2010 14 Systems Group/ETH Zurich

D‐GH‐M

V‐Z

A‐C

N‐P

Q‐U

Put(“Tim”, “ETH Zurich”)

[SOSP 07]

Page 15: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

  Data model: Key->Value   Operations

  Put(Key, Value)   Get(Key)

  Routing   Consistent Hashing   Partitioned Hash Table

  High Availability   Replication   Sloppy Quorum and hinted hand-

over   Gossiping (Membership)

  Consistency   Vector Clocks   Merkle Trees (Anti-Entropy)

Key/Value-Store: Example Dynamo

22.02.2010 15 Systems Group/ETH Zurich

D‐GH‐M

V‐Z

A‐C

N‐P

Q‐U

Put(“Tim”, “ETH Zurich”)

[SOSP 07]

Page 16: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

  Data model: Key->Value   Operations

  Put(Key, Value)   Get(Key)

  Routing   Consistent Hashing   Partitioned Hash Table

  High Availability   Replication   Sloppy Quorum and hinted hand-

over   Gossiping (Membership)

  Consistency   Vector Clocks   Merkle Trees (Anti-Entropy)

Key/Value-Store: Example Dynamo

22.02.2010 16 Systems Group/ETH Zurich

D‐GH‐M

V‐Z

A‐C

N‐P

Q‐U

Put(“Tim”, “ETH Zurich”)

[SOSP 07]

Page 17: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

  Data model: Key->Value   Operations

  Put(Key, Value)   Get(Key)

  Routing   Consistent Hashing   Partitioned Hash Table

  High Availability   Replication   Sloppy Quorum and hinted hand-

over   Gossiping (Membership)

  Consistency   Vector Clocks   Merkle Trees (Anti-Entropy)

Key/Value-Store: Example Dynamo

22.02.2010 17 Systems Group/ETH Zurich

D‐GH‐M

V‐Z

A‐C

N‐P

Q‐U

Put(“Tim”, “ETH Zurich”)

[SOSP 07]

Page 18: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Google’s Approach

22.02.2010 18 Systems Group/ETH Zurich

Chubby (Locking Service)

Google File System (GFS) Distributed Storage (made for append-only)

BigTable Distributed Multi-Dimensional Map

MegaStore Transactions, Indexes, GQL

Scheduling

Page 19: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

BigTable

  Distributed multi-level map (again no SQL)   Fault-tolerant, persistent   Scalable

  Thousands of servers   Terabytes of in-memory data   Petabyte of disk-based data   Millions of reads/writes per second, efficient scans

  Self-managing   Servers can be added/removed dynamically   Servers adjust to load imbalance

22.02.2010 19 Systems Group/ETH Zurich

[SOSP 03]

Page 20: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

BigTable Architecture

22.02.2010 20 Systems Group/ETH Zurich

handles failover, monitoring

holds tablet data, logs

holds metadata, handles master-election

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable tablet server

serves data

Bigtable master performs metadata ops, load balancing

Bigtable cell Bigtable client Bigtable client library

Cluster Scheduling Master GFS Chubby

(Locking Service)

[SOSP 03]

Page 21: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

BigTable DataModel

22.02.2010 21 Systems Group/ETH Zurich

  Distributed multi-dimensional sparse map (key, column, timestamp) cell contents

… …

“<html>…”

t1 t2

t3 www.cnn.com

Key

COLUMNS

TIMESTAMPS

“contents”

[SOSP 03]

Page 22: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Google’s Approach

22.02.2010 22 Systems Group/ETH Zurich

Chubby (Locking Service)

Google File System (GFS) Distributed Storage (made for append-only)

BigTable Distributed Multi-Dimensional Map

MegaStore Transactions, Indexes, GQL

Scheduling

Page 23: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Bulding DB applications without a DBMS

Concept commercialized in Sausalito (28msec, Inc.) [SIGMOD 08]

22.02.2010 23 Systems Group/ETH Zurich

Page 24: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Step 1: Clients commit update records to pending update queues

22.02.2010 24 Systems Group/ETH Zurich

Pending Update Queues (SQS)

S3

Client 1 Client 1 Client 1

[SIGMOD 08]

Page 25: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Step 2: Checkpointing propagates updates from SQS to S3

22.02.2010 25 Systems Group/ETH Zurich

Pending Update Queues (SQS) S3

Client 1 Client 1 Client 1

ok ok Locks (SQS) [SIGMOD 08]

Page 26: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Cloud Storage/DB Systems

  Commercial   Azure SQL, Store   Google MegaStore, BigTable   Amazon S3, SimpleDB, RDS, CloudFront   28msec Sausalito   …

  OpenSource:   HBase (≈ BigTable)   CouchDB/Scalaris (≈ Dynamo)   Cassandra (≈ Dynamo + BigTable Data-Model)   Redis   Cloudy (ETHZ)   SCADS (Berkeley)   …

22.02.2010 26 Systems Group/ETH Zurich

Page 27: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Outline

  Motivation   Camp 1: Relational DB’s in the Cloud

  Amazon RDS   MS Azure

  Camp 2: New Cloud DB/Storage System   Amazon Dynamo   Google’s MegaStore, BigTable, GFS   Building a DB applications without a DBMS

  Analytics in the Cloud: Hadoop   What’s next?

22.02.2010 27 Systems Group/ETH Zurich

Page 28: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Why MapReduce?

Big Data 22.02.2010 28 Systems Group/ETH Zurich

Page 29: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Potential Applications

22.02.2010 29 Systems Group/ETH Zurich

  Web data analysis applications   Example: Google

Page 30: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Google: The Data Challenge

22.02.2010 30 Systems Group/ETH Zurich

  Jeffrey Dean, Google Fellow, PACT’06 keynote speech:   20+ billion web pages x 20KB = 400 TB  One computer can read 30-35 MB/sec from disk

 ~ 4 months to read the web   ~ 1,000 hard drives just to store the web   Even more to “do” something with the data   But: Same problem with 1,000 machines < 3 hours

  MapReduce CACM’08 article:   100,000 MapReduce jobs executed in Google every day   Total data processed > 20 PB of data per day

Page 31: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Map/Reduce in a Nutshell

22.02.2010 31 Systems Group/ETH Zurich

  Given:   a very large dataset   a well-defined computation task to be performed on

elements of this dataset (preferably, in a parallel fashion on a large cluster)

  MapReduce programming model/abstraction/framework:   Just express what you want to compute (map() & reduce())  Don’t worry about parallelization, fault tolerance, data

distribution, load balancing (MapReduce takes care of these)

 What changes from one application to another is the actual computation; the programming structure stays similar

[OSDI 04]

Page 32: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Map/Reduce in a Nutshell

22.02.2010 32 Systems Group/ETH Zurich

  Here is the framework in simple terms:  Read lots of data

 Map: extract something that you care about from each record (similar to map in functional programming)

  Shuffle and sort

 Reduce: aggregate, summarize, filter, or transform (similar to fold in functional programming)

 Write the results

  One can use as many Maps and Reduces as required to model a given problem

[OSDI 04]

Page 33: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Map/Reduce Example: Count Word Occurrences in a Document

22.02.2010 33 Systems Group/ETH Zurich

map (String key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, "1");

reduce (String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values: result += ParseInt(v); Emit(AsString(result));

map(k1, v1) list(k2, v2) reduce(k2, list(v2)) list(v2)

Page 34: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Google’s MapReduce Model

22.02.2010 34 Systems Group/ETH Zurich

[OSDI 04]

Page 35: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Parallel Dataflow Systems and Languages

22.02.2010 35 Systems Group/ETH Zurich

MapReduce ApacheHadoop Dryad

...

Sawzall PigLaBn,HiveQL DryadLINQHigh-level Dataflow Languages

Parallel Dataflow Systems

  A high-level language provides:   more transparent program structure   easier program development and maintenance   automatic optimization opportunities

Page 36: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Outline

  Motivation   Camp 1: Relational DB’s in the Cloud

  Amazon RDS   MS Azure

  Camp 2: New Cloud DB/Storage System   Amazon Dynamo   Google’s MegaStore, BigTable, GFS   Building a DB applications without a DBMS

  Analytics in the Cloud: Hadoop   What’s next?

22.02.2010 36 Systems Group/ETH Zurich

Page 37: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

What’s next???

  Consistency (CAP Theorem)   How to balance consistency vs.

cost vs. availability?   Does consistency limit the

scalability?   How to scale transaction

processing?   Data model & languages

  Is Key/Value really what we need?   Why do I need to implement a join

by myself???   E.g. BOOM

22.02.2010 37 Systems Group/ETH Zurich

  Other workload patterns   Streaming   Scientific   Data Mining   …

  Cost optimizations   New optimization goal: Cost (not

performance)   SLA imply implicit cost

  Benchmarks   Security (don’t get me started)   ….

Page 38: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

What’s next???

  Consistency (CAP Theorem)   How to balance consistency vs.

cost vs. availability?   Does consistency limit the

scalability?   How to scale transaction

processing?   Data model & languages

  Is Key/Value really what we need?   Why do I need to implement a join

by myself???   E.g. BOOM

22.02.2010 38 Systems Group/ETH Zurich

  Other workload patterns   Streaming   Scientific   Data Mining   …

  Cost optimizations   New optimization goal: Cost (not

performance)   SLA imply implicit cost

  Benchmarks   Security (don’t get me started)   ….

Page 39: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Consistency Rationing

  ACID prevents scaling & availability (CAP theorem)!

  Strong consistency is expensive   But not everything is worth gold!

  Idea: Handle data according to the cost of inconsistency

  Violating consistency is OK as long as it helps to reduce the overall cost

[VLDB 09]

22.02.2010 39 Systems Group/ETH Zurich

Page 40: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Consistency Rationing - Guarantees Amou

nt of d

ata 

Cost of violating consistency 

C  B A 

Apply weak consistency protocols (e.g. session consistency)

Apply strong consistency protocols (e.g., serializability)

Switches between A and C guarantees •  Depends on some strategy •  Decision is local per server

  Consistency requirements per category instead of transaction level

  Different categories can mix in a single transaction

[VLDB 09]

22.02.2010 40 Systems Group/ETH Zurich

Page 41: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Streaming in the Cloud - Smoky

22.02.2010 41 Systems Group/ETH Zurich

S1–S3S4–S5

S11–S13

S14–S20

S6–S10

E:S1

E:S2 E:S11

E:S12

E:S7

Q2 Q1

Q4

Q3

E:S7(S11)

E:S8

Provide streaming as a service: No installation, maintenance etc.  Use Cases  Server/workflow monitoring  message transformations   combining RSS streams   ...

 Leverage established cloud techniques

  Idea: Storage ring: Key-> Data

Stream ring: Event-type ->Query

Page 42: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Other (selected) ETH projects

  CloudBench   A benchmark for the cloud

  Barrelfish   new written-from-scratch OS kernel   targeted for multi- and many-core processor systems

  Rhizoma   constraint-based runtime system for distributed applications   self-hosting

  Concierge   blur the burden between mobile and cloud applications

  Xadoop   XQuery on Hadoop

22.02.2010 42 Systems Group/ETH Zurich

Page 43: DM in the Cloud 2010-02-18cs.brown.edu/people/tkraska/talks/DMCloud2010-02-18.pdf · Amazon Dynamo Google’s MegaStore, BigTable, GFS Building a DB applications without a DBMS Analytics

Summary

  Cloud has changed the view on data management   Focus on scalability, availability and cost   Instead of consistency and performance at any price   E.g. No-SQL Movement

  As a result many Cloud Storage/DB Systems appeared on the marked place

  Although (or because) the cloud is economical driven, it provides many interesting research challenges

[email protected]

22.02.2010 43 Systems Group/ETH Zurich