1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul...
-
Upload
myrtle-teresa-thompson -
Category
Documents
-
view
218 -
download
2
Transcript of 1 Database Technology Prof. Hyoung-Joo Kim Internet Database Lab School of Computer Sci & Eng Seoul...
11
Database Technology
Prof. Hyoung-Joo Kim Internet Database Lab
School of Computer Sci & EngSeoul National University
22
Contents
Research in IDB Lab.
• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend
33
What is a Database?(1/10)DBMS
A software system which provides the environment enables to store and retrieve massive data effectively
44
What is a Database?(2/10)A large collection of dataData + Programs
DatabaseDatabase
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
STORESTORE
55
What is a Database?(3/10) Information about register and course of
40,000 students of the Seoul Natl’ Univ.course term register grade prof
45 courses,10K records per student
10K Byte * 40,000 = 400M Byte Others: library, health center, S-card, …
course term register grade prof
66
What is a Database?(4/10) Information of SAT management
profile answer rate ranking …
Profile Answer Rate ranking …
8K records per student
8K Byte * 550,000 = 4.4G Byte (109)
Year 2006: 550,000
Year 2005: 570,000
77
What is a Database?(5/10) Information of mobile phone
phone number station time …
phone number station time …
60KB record per one
39M * 60 Byte * 5calls/day * 365 days = 4T Byte Korea 2006.7
China 370M in 2005
88
What is a Database?(6/10) Information of resident registration
SSN name addr domicile …
SSN name addr domicile …10KB record per one
10K Byte * 470 M = 5T Byte (47millions)
99
What is a Database?(7/10) Google database
8billion’s Websites, 2billion’s indexing terminology management
Usenet archive = 700 Million messages * 20KB/message = 14 TB
1010
What is a Database?(8/10) Hubble space telescope data from Mars
Data constructed by 2005 : over 12 TB
Constructing and sending 3~5GB’s data abroad daily
1111
What is a Database?(9/10) NCBI (National Center for Biotechnology Information)
GenBank• management of information of 165,000 species• add 3million’s new DNA sequence monthly
1212
What is a Database?(10/10) Genome map of Koreans
Venture “MacroGen” SNU Medical School
Early version: 900G Byte Final product: 15T Byte
1313
What do we do with Database?(1/2) Record search
Retrieve math grade of the student whose SSN is “840101-12121”
DBMS
12ms to fetch a record and check content
740,000 * 5 records = 3.7 M records
3.7M * 12ms = 44.4Kseconds = over 12 hours
If we use DBMS, it will be less than 0.1sec!
Statistical processingfor population census
Search for the correlationbetween gene and disease
Search for the purchase pattern on customer groups
1414
What do we do with Database?(2/2) Most (all?) computing applications use some type of a database
EDPS
MIS, ERP
OLTP
Data WarehouseERP
CRM
Database DatabaseDatabase
Database
1515
Warehouse
Database Management System (DBMS) (1/3)
1616
Warehouse
Warehouse keeper
Database Management System (DBMS) (2/3)
1717
user
Management of orders on-line
DBMS
Database
Management ofwages
Management ofmanager info.
profile
salestock
product
customer
Application
Database Management System (DBMS) (3/3)
1818
DBMS Architecture
naiveusers
naiveusers application
programmersapplication
programmers casualusers
casualusers database
administratordatabase
administrator
applicationprograms
applicationprograms system
callssystem
calls queryquery databasescheme
databasescheme
filemanager
filemanager
applicationprograms
object
applicationprograms
objectdatabasemanager
databasemanager
data manipulationlanguage
pre-compiler
data manipulationlanguage
pre-compilerquery
processorquery
processor data definitionlanguagecompiler
data definitionlanguagecompiler
DBMS
Disk storage
1919
A Sample Relational Database
2020
SQL SQL: widely used commercial query language
E.g. find the name of the customer with customer-id 192-83-7465select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465
select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
2121
Major Commercial DBMS in 2006(1/3)
Market Leader
Stability
Mass storage literacy
Famous CEO
10g
2222
Major Commercial DBMS in 2006(2/3)
PC based (Windows NT)
Microsoft!!!
Integration with Window NT/XP
2323
Major Commercial DBMS in 2006(3/3)
Stability
Mainframe
Informix purchase
IBM
2424
Database Companies in the World
2525
Contents
Research in IDB Lab.
• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend
2626
Hierarchical, Network DBMS
Drawback: impossible to make out independent application
Advantage: quick data access using link
DMS 1100 (Sperry), Total (Cincom)
IMS (IBM), System/2000(MRA)
The early 70’
2727
Network Database example
Query
What’s the total balance of Mr. Shiver in Bronx?
Lowery Maple Queens Hodges SideHill
Brooklyn
Shiver North Bronx
900 556 647 647 801
Customer records
Amount records
Root Record
2828
Network DB query example
sum:=0
get first customer where customer.name=“Shiver” and customer.city =“Bronx”;
while DB_status = 0 do begin
sum:=sum+customer.amount;get next customer
where customer.name = “Shiver”
and customer.city =“Bronx”; end print(sum);
2929
Relational DBMSThe late 70’ and early 80’
E.F.Codd, 1970 CACM paper, “The Relational Data Model”
Relational Algebra & Calculus The Spartan Simplicity! SQL: Structured Query Language System/R - 1976, first commercial RDBMS Ingres - 1976, first academic RDBMS
3030
Relational DBMS example
Select sum(amount) from customer where customer.name = “Shiver” and customer.city=“Bronx”;
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
3131
The advent of new DB application in 80’ (1/4)
Rich data model & DBMS function
Multimedia: IMAGE, TEXT, AUDIO, VIDEO, etc.
Telecommunication
Artificial Intelligence: Expert systems
CAD/CASE/CAM: massive design data
3232
Massive design data in CAD/CASE/CAM
The advent of new DB application in 80’ (2/4)
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Previous DATA CAD DATA
3333
Artificial Intelligence: Expert systems
The advent of new DB application in 80’(3/4)
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Previous DATA Expertise DATA
Vehicle disorder
Control Drive
Break Handle Gearbox Engine
Symptoms
conclusion : engine ECU disorder
3434
Multimedia: image, audio, video
The advent of new DB application in 80’(4/4)
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
Previous DATA MULTIMEDIA DATA
3535
Advent of Object Oriented DBMS
17
The mid 80’ ~ mid 90’The mid 80’ ~ mid 90’
Research prototypeORION, POSTGRES, ENCORE/ObServer
Research prototypeORION, POSTGRES, ENCORE/ObServer
Commercial Products:
O2, ObjectStore, Objectivity, Versant, etc. Commercial Products:
O2, ObjectStore, Objectivity, Versant, etc.
ODMG-93 OODB standardODMG-93 OODB standard
3636
Feature of Object Oriented DBMS
Large object
Persistent programming language
Semantic Data Model extensionVersion & Composite object
Object-Oriented Paradigm supportobject, object identity,
go back to traversal Network DB?Class hierarchy, inheritance
Long-duration transaction
3737
Object Oriented Database example
ISA relationshipIs-part-of relationship
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
3838
OQL query of Object Oriented DBMS
select sum(customer.deposit.balance)
from Customer customer
where customer.name = “Shiver”
and customer.deposit.branch.city = “Bronx”;
3939
Object Relational DBMS
1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley
System/R Engineering Extension
1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley
System/R Engineering Extension
Relational DBMS with Object Oriented function Extension within SQL & Tables!
The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall
1997, Big3 ORDBMS advent
Relational DBMS with Object Oriented function Extension within SQL & Tables!
The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall
1997, Big3 ORDBMS advent
4040
Object Relational Database example
name street city amount
Lowerly Maple Queens 900
Shiver North Bronx 556
Shiver North Bronx 647
Hodges SideHill Brooklyn 801
Hodges SideHill Brooklyn 647
4141
Principal functions of Object Relational DBMS
LOB(large object)
supportAbstract
Data Type
support
Type
Inheritance
support
User definedtype &
Stored proceduresupport
Application
domain specific
extension support
SQL procedureextension
Rule/trigger
System support
4242
Product of Object Relational DBMS
ORACLE-8 Universal Server
Informix Universal Server
IBM DB2 Universal Database
Sybase Adaptive Server
Microsoft Access
4343
Contents
Research in IDB Lab.
• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend
4444
DBMS market share(1/2) Worldwide market share for biggest sellers
of corporate databases, 2005
Oracle IBM Microsoft
48.6%22%
15%
Source: Gartner Dataquest
4545
DBMS market share(2/2) Worldwide sales for biggest sellers of
corporate databases, 2005
Source: Gartner Dataquest
0
1
2
3
4
5
6
7
OracleIBMMicrosoft
billions of dollars
$6.7
$3.0$2.1
4646
Domestic DBMS market share
source : Report for database industry and perspective in Korea, 2004
4747
Domestic DBMS market sales Domestic market share for biggest sellers of
corporate databases, 2004
0
10
20
30
40
50
60
OracleIBMMicrosoft
₩ 57.2
₩ 25.1₩ 45.3
Source: Gartner Dataquest, South Korea(2005)
billions of won
4848
Preference in domestic market
Others 3%
source : Report for database industry and perspective in Korea, 2004
4949
Contents
Research in IDB Lab.
• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend
5050
XML Technology(1/2) The late 90’ and now What is XML1)?
Developed by the W3C Semi-structured text for dissemination and publication Self-describing
1) eXtensible Markup Language
<tr> <td> <font color=“red”> 이름 </font> </td> <td> 홍길동 </td></tr><tr> <td> <b> 주소 </b> </td>
<person> <name> 홍길동 </name> <city> 서울 </city> <age>20</age> …</person>
Tagging for Display Tagging for structure and semantics
HTML XML
5151
XML Technology(2/2) Why XML
Standard data format for storing and exchange
<person> <name> 홍길동 </name> <city> 서울 </city> …</person>
XML
5252
Semantic Web(1/2) 기존의 web:
1) 환자가 검색 엔진에서 치과를 검색 2) 자신의 장소와 가까운 치과의 홈페이지를 찾음 3) 치과의 진료 스케줄을 확인하고 자신과 시간이 맞을 경우 예약 예약을 하기까지 다수의 반복 작업 필요
appointment schedule
Patient
clinic’s web pages
search engine
5353
Semantic Web(2/2) Semantic web:
Semantic web 으로 다음의 정보가 구축된 상태 환자의 개인 스케줄 , 각 치과의 위치 , 진료 과목 , 진료
1) 환자는 software agent 에게 예약 요청 2) 각 병원의 홈페이지의 내용이나 구조가 다르더라도 software agent 가 환자와
치과의 시멘틱웹 데이타를 분석 , 환자의 시간과 위치에서 진료 가능한 치과를 예약해 줌
Software Agents
clinic’s web pages (with Semantic web)
appointment schedule
Patient
5454
Knowledge discovery
Database
decision
Knowledge DiscoveryProcessing: Data mining
Data Warehouse
useful,interestinghiddeninformation
apply
5555
Data warehouse(1/2) Storing data of time Analyze the pattern in times Summarized data Observation data in various view point Non-volatile
Need for new data model: Dimensional model
5656
Data warehouse(2/2)
Sales Volumes
time Product
Sales person
Jan
Feb
Mar
WongStonebreaker
Dewitt AB
C
5757
Data mining(1/2) 넓은 의미
대상이 되는 데이터를 추출하는 단계에서부터 발견된 패턴을 정제 , 해석한 후 사람이 이해할 수 있는 언어[ 텍스트 , 그림 , 그래픽 ] 로 표현하는 단계까지를 포함
좁은 의미 대용량 데이터에서 흥미 있고 사람이 이해할 수 있는
패턴과 규칙성을 추출하는 여러 가지 알고리즘 [data mining algorithm] 또는 소프트웨어의 사용
5858
Data mining(2/2)
패턴발견
빵과 과자를 사는 사람의 80% 는 우유를 같이 산다분유와 기저귀를 사는 사람의 74% 는 맥주를 같이 산다
의사결정
맥주 소비는 분유와 기저귀 소비에 영향을 미침빵과 과자 가격 인상은 우유 소비에 영향을 미침
상품 진열대에 ( 빵 , 과자 , 우유 ), ( 분유 , 기저귀 , 맥주 ) 를 같이 진열우유 소비를 조절하기 위해 빵 , 과자 가격을 조정
업무적용
5959
The emerging challenges
EnvironmentRapid spread of
Web and Internet
Rapid developmentof H/W
Disks and RAM sizeAccess time Bandwidth
Sensor Streams, Scientific dataUncertain data, Information privacy
New areas emerging
Millions of usersConnected on Web
6060
The Emerging Challenges Sophisticated Data type support
sound video
image
temporal
spatial
New DBMS
Structured data
Unstructured data
6161
The Emerging Challenges Sensor streams
Battery constraint, communication cost
Rapidly changing configuration(Sensors die or disconnect)
Complex forms of information integration“Locate a person from the heat, sound and vibration sensors”
6262
The Emerging Challenges Reasoning about uncertain data
Scientific measurement errors Location data for moving objects Sequence, image and text similarity
Location data Sequence dataScientific measurement
6363
The Emerging Challenges Personalization
Different person, different answer
WEB CRM example
Web Site Entry
Page Views
Event:Select product
Insert item to Shopping Cart
Personalized View of Recommendation
Recommendation Engine
6464
The Emerging Challenges Privacy
How to support the protection of personal or sensitive information
Access by user and usage Include purpose description in query
Alice | 25K | …
John | 40K | …
Name | income | … We just want the statistics of the income not the personal information !