1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul...

64
1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University

Transcript of 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul...

Page 1: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

11

Database Technology

Prof. Hyoung-Joo Kim Internet Database Lab

School of Computer Sci & EngSeoul National University

Page 2: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

22

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

Page 3: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

33

What is a Database?(1/10)DBMS

A software system which provides the environment enables to store and retrieve massive data effectively

Page 4: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

44

What is a Database?(2/10)A large collection of dataData + Programs

DatabaseDatabase

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

STORESTORE

Page 5: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

55

What is a Database?(3/10) Information about register and course of

40,000 students of the Seoul Natl’ Univ.course term register grade prof

45 courses,10K records per student

10K Byte * 40,000 = 400M Byte Others: library, health center, S-card, …

course term register grade prof

Page 6: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

66

What is a Database?(4/10) Information of SAT management

profile answer rate ranking …

Profile Answer Rate ranking …

8K records per student

8K Byte * 550,000 = 4.4G Byte (109)

Year 2006: 550,000

Year 2005: 570,000

Page 7: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

77

What is a Database?(5/10) Information of mobile phone

phone number station time …

phone number station time …

60KB record per one

39M * 60 Byte * 5calls/day * 365 days = 4T Byte Korea 2006.7

China 370M in 2005

Page 8: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

88

What is a Database?(6/10) Information of resident registration

SSN name addr domicile …

SSN name addr domicile …10KB record per one

10K Byte * 470 M = 5T Byte (47millions)

Page 9: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

99

What is a Database?(7/10) Google database

8billion’s Websites, 2billion’s indexing terminology management

Usenet archive = 700 Million messages * 20KB/message = 14 TB

Page 10: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1010

What is a Database?(8/10) Hubble space telescope data from Mars

Data constructed by 2005 : over 12 TB

Constructing and sending 3~5GB’s data abroad daily

Page 11: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1111

What is a Database?(9/10) NCBI (National Center for Biotechnology Information)

GenBank• management of information of 165,000 species• add 3million’s new DNA sequence monthly

Page 12: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1212

What is a Database?(10/10) Genome map of Koreans

Venture “MacroGen” SNU Medical School

Early version: 900G Byte Final product: 15T Byte

Page 13: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1313

What do we do with Database?(1/2) Record search

Retrieve math grade of the student whose SSN is “840101-12121”

DBMS

12ms to fetch a record and check content

740,000 * 5 records = 3.7 M records

3.7M * 12ms = 44.4Kseconds = over 12 hours

If we use DBMS, it will be less than 0.1sec!

Statistical processingfor population census

Search for the correlationbetween gene and disease

Search for the purchase pattern on customer groups

Page 14: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1414

What do we do with Database?(2/2) Most (all?) computing applications use some type of a database

EDPS

MIS, ERP

OLTP

Data WarehouseERP

CRM

Database DatabaseDatabase

Database

Page 15: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1515

Warehouse

Database Management System (DBMS) (1/3)

Page 16: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1616

Warehouse

Warehouse keeper

Database Management System (DBMS) (2/3)

Page 17: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1717

user

Management of orders on-line

DBMS

Database

Management ofwages

Management ofmanager info.

profile

salestock

product

customer

Application

Database Management System (DBMS) (3/3)

Page 18: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1818

DBMS Architecture

naiveusers

naiveusers application

programmersapplication

programmers casualusers

casualusers database

administratordatabase

administrator

applicationprograms

applicationprograms system

callssystem

calls queryquery databasescheme

databasescheme

filemanager

filemanager

applicationprograms

object

applicationprograms

objectdatabasemanager

databasemanager

data manipulationlanguage

pre-compiler

data manipulationlanguage

pre-compilerquery

processorquery

processor data definitionlanguagecompiler

data definitionlanguagecompiler

DBMS

Disk storage

Page 19: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

1919

A Sample Relational Database

Page 20: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2020

SQL SQL: widely used commercial query language

E.g. find the name of the customer with customer-id 192-83-7465select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’

E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and

depositor.account-number = account.account-number

Page 21: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2121

Major Commercial DBMS in 2006(1/3)

Market Leader

Stability

Mass storage literacy

Famous CEO

10g

Page 22: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2222

Major Commercial DBMS in 2006(2/3)

PC based (Windows NT)

Microsoft!!!

Integration with Window NT/XP

Page 23: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2323

Major Commercial DBMS in 2006(3/3)

Stability

Mainframe

Informix purchase

IBM

Page 24: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2424

Database Companies in the World

Page 25: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2525

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

Page 26: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2626

Hierarchical, Network DBMS

Drawback: impossible to make out independent application

Advantage: quick data access using link

DMS 1100 (Sperry), Total (Cincom)

IMS (IBM), System/2000(MRA)

The early 70’

Page 27: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2727

Network Database example

Query

What’s the total balance of Mr. Shiver in Bronx?

Lowery Maple Queens Hodges SideHill

Brooklyn

Shiver North Bronx

900 556 647 647 801

Customer records

Amount records

Root Record

Page 28: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2828

Network DB query example

sum:=0

get first customer where customer.name=“Shiver” and customer.city =“Bronx”;

while DB_status = 0 do begin

sum:=sum+customer.amount;get next customer

where customer.name = “Shiver”

and customer.city =“Bronx”; end print(sum);

Page 29: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

2929

Relational DBMSThe late 70’ and early 80’

E.F.Codd, 1970 CACM paper, “The Relational Data Model”

Relational Algebra & Calculus The Spartan Simplicity! SQL: Structured Query Language System/R - 1976, first commercial RDBMS Ingres - 1976, first academic RDBMS

Page 30: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3030

Relational DBMS example

Select sum(amount) from customer where customer.name = “Shiver” and customer.city=“Bronx”;

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Page 31: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3131

The advent of new DB application in 80’ (1/4)

Rich data model & DBMS function

Multimedia: IMAGE, TEXT, AUDIO, VIDEO, etc.

Telecommunication

Artificial Intelligence: Expert systems

CAD/CASE/CAM: massive design data

Page 32: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3232

Massive design data in CAD/CASE/CAM

The advent of new DB application in 80’ (2/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA CAD DATA

Page 33: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3333

Artificial Intelligence: Expert systems

The advent of new DB application in 80’(3/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA Expertise DATA

Vehicle disorder

Control Drive

Break Handle Gearbox Engine

Symptoms

conclusion : engine ECU disorder

Page 34: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3434

Multimedia: image, audio, video

The advent of new DB application in 80’(4/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA MULTIMEDIA DATA

Page 35: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3535

Advent of Object Oriented DBMS

17

The mid 80’ ~ mid 90’The mid 80’ ~ mid 90’

Research prototypeORION, POSTGRES, ENCORE/ObServer

Research prototypeORION, POSTGRES, ENCORE/ObServer

Commercial Products:

O2, ObjectStore, Objectivity, Versant, etc. Commercial Products:

O2, ObjectStore, Objectivity, Versant, etc.

ODMG-93 OODB standardODMG-93 OODB standard

Page 36: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3636

Feature of Object Oriented DBMS

Large object

Persistent programming language

Semantic Data Model extensionVersion & Composite object

Object-Oriented Paradigm supportobject, object identity,

go back to traversal Network DB?Class hierarchy, inheritance

Long-duration transaction

Page 37: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3737

Object Oriented Database example

ISA relationshipIs-part-of relationship

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Page 38: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3838

OQL query of Object Oriented DBMS

select sum(customer.deposit.balance)

from Customer customer

where customer.name = “Shiver”

and customer.deposit.branch.city = “Bronx”;

Page 39: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

3939

Object Relational DBMS

1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley

System/R Engineering Extension

1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley

System/R Engineering Extension

Relational DBMS with Object Oriented function Extension within SQL & Tables!

The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall

1997, Big3 ORDBMS advent

Relational DBMS with Object Oriented function Extension within SQL & Tables!

The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall

1997, Big3 ORDBMS advent

Page 40: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4040

Object Relational Database example

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Page 41: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4141

Principal functions of Object Relational DBMS

LOB(large object)

supportAbstract

Data Type

support

Type

Inheritance

support

User definedtype &

Stored proceduresupport

Application

domain specific

extension support

SQL procedureextension

Rule/trigger

System support

Page 42: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4242

Product of Object Relational DBMS

ORACLE-8 Universal Server

Informix Universal Server

IBM DB2 Universal Database

Sybase Adaptive Server

Microsoft Access

Page 43: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4343

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

Page 44: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4444

DBMS market share(1/2) Worldwide market share for biggest sellers

of corporate databases, 2005

Oracle IBM Microsoft

48.6%22%

15%

Source: Gartner Dataquest

Page 45: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4545

DBMS market share(2/2) Worldwide sales for biggest sellers of

corporate databases, 2005

Source: Gartner Dataquest

0

1

2

3

4

5

6

7

OracleIBMMicrosoft

billions of dollars

$6.7

$3.0$2.1

Page 46: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4646

Domestic DBMS market share

source : Report for database industry and perspective in Korea, 2004

Page 47: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4747

Domestic DBMS market sales Domestic market share for biggest sellers of

corporate databases, 2004

0

10

20

30

40

50

60

OracleIBMMicrosoft

₩ 57.2

₩ 25.1₩ 45.3

Source: Gartner Dataquest, South Korea(2005)

billions of won

Page 48: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4848

Preference in domestic market

Others 3%

source : Report for database industry and perspective in Korea, 2004

Page 49: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

4949

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

Page 50: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5050

XML Technology(1/2) The late 90’ and now What is XML1)?

Developed by the W3C Semi-structured text for dissemination and publication Self-describing

1) eXtensible Markup Language

<tr> <td> <font color=“red”> 이름 </font> </td> <td> 홍길동 </td></tr><tr> <td> <b> 주소 </b> </td>

<person> <name> 홍길동 </name> <city> 서울 </city> <age>20</age> …</person>

Tagging for Display Tagging for structure and semantics

HTML XML

Page 51: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5151

XML Technology(2/2) Why XML

Standard data format for storing and exchange

<person> <name> 홍길동 </name> <city> 서울 </city> …</person>

XML

Page 52: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5252

Semantic Web(1/2) 기존의 web:

1) 환자가 검색 엔진에서 치과를 검색 2) 자신의 장소와 가까운 치과의 홈페이지를 찾음 3) 치과의 진료 스케줄을 확인하고 자신과 시간이 맞을 경우 예약 예약을 하기까지 다수의 반복 작업 필요

appointment schedule

Patient

clinic’s web pages

search engine

Page 53: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5353

Semantic Web(2/2) Semantic web:

Semantic web 으로 다음의 정보가 구축된 상태 환자의 개인 스케줄 , 각 치과의 위치 , 진료 과목 , 진료

1) 환자는 software agent 에게 예약 요청 2) 각 병원의 홈페이지의 내용이나 구조가 다르더라도 software agent 가 환자와

치과의 시멘틱웹 데이타를 분석 , 환자의 시간과 위치에서 진료 가능한 치과를 예약해 줌

Software Agents

clinic’s web pages (with Semantic web)

appointment schedule

Patient

Page 54: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5454

Knowledge discovery

Database

decision

Knowledge DiscoveryProcessing: Data mining

Data Warehouse

useful,interestinghiddeninformation

apply

Page 55: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5555

Data warehouse(1/2) Storing data of time Analyze the pattern in times Summarized data Observation data in various view point Non-volatile

Need for new data model: Dimensional model

Page 56: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5656

Data warehouse(2/2)

Sales Volumes

time Product

Sales person

Jan

Feb

Mar

WongStonebreaker

Dewitt AB

C

Page 57: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5757

Data mining(1/2) 넓은 의미

대상이 되는 데이터를 추출하는 단계에서부터 발견된 패턴을 정제 , 해석한 후 사람이 이해할 수 있는 언어[ 텍스트 , 그림 , 그래픽 ] 로 표현하는 단계까지를 포함

좁은 의미 대용량 데이터에서 흥미 있고 사람이 이해할 수 있는

패턴과 규칙성을 추출하는 여러 가지 알고리즘 [data mining algorithm] 또는 소프트웨어의 사용

Page 58: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5858

Data mining(2/2)

패턴발견

빵과 과자를 사는 사람의 80% 는 우유를 같이 산다분유와 기저귀를 사는 사람의 74% 는 맥주를 같이 산다

의사결정

맥주 소비는 분유와 기저귀 소비에 영향을 미침빵과 과자 가격 인상은 우유 소비에 영향을 미침

상품 진열대에 ( 빵 , 과자 , 우유 ), ( 분유 , 기저귀 , 맥주 ) 를 같이 진열우유 소비를 조절하기 위해 빵 , 과자 가격을 조정

업무적용

Page 59: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

5959

The emerging challenges

EnvironmentRapid spread of

Web and Internet

Rapid developmentof H/W

Disks and RAM sizeAccess time Bandwidth

Sensor Streams, Scientific dataUncertain data, Information privacy

New areas emerging

Millions of usersConnected on Web

Page 60: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

6060

The Emerging Challenges Sophisticated Data type support

sound video

image

temporal

spatial

New DBMS

Structured data

Unstructured data

Page 61: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

6161

The Emerging Challenges Sensor streams

Battery constraint, communication cost

Rapidly changing configuration(Sensors die or disconnect)

Complex forms of information integration“Locate a person from the heat, sound and vibration sensors”

Page 62: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

6262

The Emerging Challenges Reasoning about uncertain data

Scientific measurement errors Location data for moving objects Sequence, image and text similarity

Location data Sequence dataScientific measurement

Page 63: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

6363

The Emerging Challenges Personalization

Different person, different answer

WEB CRM example

Web Site Entry

Page Views

Event:Select product

Insert item to Shopping Cart

Personalized View of Recommendation

Recommendation Engine

Page 64: 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul National University.

6464

The Emerging Challenges Privacy

How to support the protection of personal or sensitive information

Access by user and usage Include purpose description in query

Alice | 25K | …

John | 40K | …

Name | income | … We just want the statistics of the income not the personal information !