Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China .

Post on 31-Mar-2015

218 views 0 download

Tags:

Transcript of Advanced data management Jiaheng Lu Department of Computer Science Renmin University of China .

Advanced data management

Jiaheng Lu

Department of Computer Science

Renmin University of Chinawww.jiahenglu.net

Course purpose

2

Teach in English

The objective is to expose graduate students to exciting data management topics

Course contents

3

Cloud computing and cloud data management

XML data management

Column-store database

Data processing in bioinformatics

Lecturer Academic experience

2006.9 ~2008.6 University of California, Irvine, Postdoc researcher

2002.8 ~2006.8 National University of Singapore, PhD candidate

1998.9 ~ 2001.1 Shanghai Jiao Tong University Master candidate

University of California, Irvine

Research in Postdoc

66

Data integration in medical system

[US patent]

Approximate string search [ICDE08]

7National University of Singapore

Course grading

8

Report 30%

Google App Engine 30%

In-class presence and quiz 40%

23/4/11 9

Any question and any comments ?

Cloud computing

Why we use cloud computing?

Why we use cloud computing?

Case 1:

Write a file

Save

Computer down, file is lost

Files are always stored in cloud, never lost

Why we use cloud computing?

Case 2:

Use IE --- download, install, use

Use QQ --- download, install, use

Use C++ --- download, install, use

……

Get the serve from the cloud

What is cloud and cloud computing?

Cloud

Demand resources or services over Internet

scale and reliability of a data center.

What is cloud and cloud computing?

Cloud computing is a style of computing in which dynamically scalable and often virtualized resources are provided as a serve over the Internet.

Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them.

The architecture of cloud computing system

Characteristics of cloud computing

Virtual. software, databases, Web servers,

operating systems, storage and networking as virtual servers.

On demand. add and subtract processors, memory,

network bandwidth, storage.

IaaSInfrastructure as a Service

PaaSPlatform as a Service

SaaSSoftware as a Service

Types of cloud service

Software delivery model

No hardware or software to manage Service delivered through a browser Customers use the service on demand Instant Scalability

SaaS

Examples

Your current CRM package is not managing the load or you simply don’t want to host it in-house. Use a SaaS provider such as Salesforce.com

Your email is hosted on an exchange server in your office and it is very slow. Outsource this using Hosted Exchange.

SaaS

Platform delivery model

Platforms are built upon Infrastructure, which is expensive

Estimating demand is not a science! Platform management is not fun!

PaaS

Examples

You need to host a large file (5Mb) on your website and make it available for 35,000 users for only two months duration. Use Cloud Front from Amazon.

You want to start storage services on your network for a large number of files and you do not have the storage capacity…use Amazon S3.

PaaS

Computer infrastructure delivery model

A platform virtualization environment

Computing resources, such as storing and processing capacity.

Virtualization taken a step further

IaaS

Examples

You want to run a batch job but you don’t have the infrastructure necessary to run it in a timely manner. Use Amazon EC2.

You want to host a website, but only for a few days. Use Flexiscale.

IaaS

Cloud computing and other computing techniques

An Industry Transformed

http://www.boxofficemojo.com/

Delgo www.delgo.com

Shrek, Delgo, and Others

•Why did Dreamworks use this?•Upsides?•Downsides?

Grid Computing & Cloud Computing

share a lot commonality intention, architecture and technology Difference programming model, business model,

compute model, applications, and Virtualization.

Grid Computing & Cloud Computing

the problems are mostly the samemanage large facilities;

define methods by which consumers discover, request and use resources provided by the central facilities;

implement the often highly parallel computations that execute on those resources.

Grid Computing & Cloud Computing

Virtualization Grid

do not rely on virtualization as much as Clouds do, each individual organization maintain full control of their resources

Cloudan indispensable ingredient for

almost every Cloud

23/4/11 35

Any question and any comments ?

Google App Engine

37

Google App Engine

Does one thing well: running web apps

Simple app configuration

Scalable

Secure

38

App Engine Does One Thing WellApp Engine handles HTTP(S) requests, nothing else

Think RPC: request in, processing, response out Works well for the web and AJAX; also for other services

App configuration is dead simple No performance tuning needed

App Engine Architecture

39

PythonVM

process

stdlib

app

memcachedatastore

mail

images

urlfech

statefulAPIs

stateless APIs R/O FSreq/resp

How to use Google App engine

Download Java 6

Download Eclipse and Google plug in

Register a user account in Google

Create an application (python, Java) and upload the code

In class quiz

Please answer all questions

You may be requested to answer a question later. Your performance will affect your final score.

Study Google App Engine

http://code.google.com/intl/en/appengine/docs/java/gettingstarted/