Θέματα Συστημάτων Βάσεων Δεδομένων

25
Θέματα Συστημάτων Βάσεων Δεδομένων Ιστορία, Παρόν και Μέλλον του χώρου των Βάσεων Δεδομένων Πάνος Βασιλειάδης [email protected] Σεπτέμβρης 2003 www.cs.uoi.gr/~pvassil/courses/ readings/

description

Θέματα Συστημάτων Βάσεων Δεδομένων. Ιστορία, Παρόν και Μέλλον του χώρου των Βάσεων Δεδομένων Πάνος Βασιλειάδης [email protected] Σεπτέμβρης 2003. www.cs.uoi.gr/~pvassil/courses/readings/. Topics. Yesterday Today Tomorrow - PowerPoint PPT Presentation

Transcript of Θέματα Συστημάτων Βάσεων Δεδομένων

Page 1: Θέματα Συστημάτων Βάσεων Δεδομένων

Θέματα Συστημάτων Βάσεων Δεδομένων

Ιστορία, Παρόν και Μέλλον του χώρου των Βάσεων Δεδομένων

Πάνος Βασιλειάδης[email protected]

Σεπτέμβρης 2003

www.cs.uoi.gr/~pvassil/courses/readings/

Page 2: Θέματα Συστημάτων Βάσεων Δεδομένων

2

Topics

YesterdayTodayTomorrow

Part of these slides come from Prof. Timos Sellis’ course – many thanx!

Page 3: Θέματα Συστημάτων Βάσεων Δεδομένων

3

Topics

YesterdayTodayTomorrow

Page 4: Θέματα Συστημάτων Βάσεων Δεδομένων

4

History of the field of databases

Late 60's: network (CODASYL) & hierarchical (IMS) DBMS.

Low-level “record-at-a-time” DML, i.e. physical data structures reflected in DML (no data independence)

1970: Codd's paper -- the relational model. The most influential paper in DB research.

Set-at-a-time DML. Data independence. Allows for schema and physical storage structures to change under the covers. Truly important theory, led to "paradigm shift" in thinking and in practice.Papadimitriou: "as clear a paradigm shift as we can hope to find in computer science".Turing award

Page 5: Θέματα Συστημάτων Βάσεων Δεδομένων

5

History of the field of databases

early-to-mid-70'sraging debate between the two camps."great debate" in 1975

mid 70's: 2 full-function (sort of) prototypesIngresSystem RAncestors of essentially all today's commercial systems

Page 6: Θέματα Συστημάτων Βάσεων Δεδομένων

6

History of the field of databases

Ingres: UCB 1974-77a ``pickup team'', including Stonebraker & Wong early and pioneering. Led to Ingres Corp (CA), Sybase, MS SQL Server, Britton-Lee, Wang's PACE.

System R: IBM San Jose (now Almaden)15 PhDs. Led to IBM's SQL/DS & DB2, Oracle, HP's Allbase, Tandem's Non-Stop SQL. System R arguably got more stuff ``right''

Both were viable starting points, proved practicality of relational approach. Beautiful example of theory -> practice!!

Page 7: Θέματα Συστημάτων Βάσεων Δεδομένων

7

History of the field of databases

early 80'scommercialization of relational systems

mid 80'sSQL becomes “intergalactic standard”.DB2 becomes IBM's flagship product.IMS “sunseted”

Page 8: Θέματα Συστημάτων Βάσεων Δεδομένων

8

History of the field of databases

90’s: the age of maturitynetwork & hierarchical essentially dead (though commonly in use!)relational becomes mainstreamimprovements in terms of transactional facilities, performance and stabilityScale, scale, scale…

Page 9: Θέματα Συστημάτων Βάσεων Δεδομένων

9

Scale, scale, scale…

EOSDIS*: 1 Tb/day, keep it all for 15 years (they need tertiary storage for that)

*NASA’s Earth Observing System Data and Information System

WalMart: 365 node system, 6Tb online, 4billion row table, 200million updates daily, 4000 queries/day, 1500 users/week, 4 min DS response time w/ avg. 60000 rows

Databases make the world go round, mainly due to their ability to handle HUGE amounts of data, RELIABLY!!!

Large scale is our business…

Page 10: Θέματα Συστημάτων Βάσεων Δεδομένων

10

History of the field of databases

Late 90’s: object relational & the webSQL-1999 & early implementationssupport for ADT’s RDBMS’s as back-end for internet front-endsApplication Servers and middleware

Page 11: Θέματα Συστημάτων Βάσεων Δεδομένων

11

Topics

YesterdayTodayTomorrow

Page 12: Θέματα Συστημάτων Βάσεων Δεδομένων

12

VLDB 2003

The International Conference on Very Large DataBases (VLDB) is the top database conference. The 29th VLDB conference was held in Berlin, Germany in Sept. 2003.

To accommodate the wide spectrum of papers, VLDB 2003 was organized into three tracks: 

Core Database System Technology Infrastructure for Information Systems· Industrial Applications & Experience

http://www.vldb.informatik.hu-berlin.de/

Page 13: Θέματα Συστημάτων Βάσεων Δεδομένων

13

VLDB 2003 – from the CfP“The Core Database Technology PC will evaluate papers that

report on technology that is meant to be incorporated in the database system itself. This includes database engine functions, such as query languages, data models, query processing, views, integrity constraints, triggers, access methods, and transactions in centralized, distributed, replicated, parallel, mobile, and wireless environments.

It also includes extended data types, such as multimedia, spatial and temporal data, and system engineering issues, such as performance, high availability, security, manageability, and ease-of-use. Papers on all aspects of active and object databases, storage technology, and data management system architecture should be submitted to the Core Database Technology PC.”

Page 14: Θέματα Συστημάτων Βάσεων Δεδομένων

14

VLDB 2003 – from the CfP

“The PC covering Infrastructure for Information Systems will evaluate papers that report on methods, issues, and problems faced during the design, development and deployment of innovative solutions for information management.

Examples include workflows, advanced transaction processing features, application servers, object monitors, services in support of E-commerce, mediators and other web-oriented data facilities, metadata repositories, data and process modeling, web services, user interfaces and data visualization, data translation and migration, data cleaning, multi-agent systems, and system management.”

Page 15: Θέματα Συστημάτων Βάσεων Δεδομένων

15

VLDB 2003 – from the CfP

“The PC on Industrial Applications & Experience solicits submissions covering innovative commercial database implementations, novel applications of database technology, and experience in applying recent research advances to practical situations. The track is VLDB's way to foster the exchange of ideas and solutions between research and industry. Application areas include those of Bioinformatics/Life Science, Engineering, Mobile Systems, Enterprise Resource Planning (ERP), and other areas all of which pose technical challenges to the field of data management.”

Page 16: Θέματα Συστημάτων Βάσεων Δεδομένων

16

VLDB 2003

Submissions By Track:Core 249 Infrastructure 162 Industrial 46

Grand Total 457 Accepted: 84 (70 research, 1:6)

The field is flourishing … getting your paper accepted is hard (nice excuse)!!

Page 17: Θέματα Συστημάτων Βάσεων Δεδομένων

17

VLDB 2003

(98) Optimization and Performance (84) Advanced Search, Query, and Approximation (70) Semi-structured Data, XML (64) Internet and WWW Databases / Query Systems (63) Access Methods (44) Data Mining and Knowledge Discovery (32) Infrastructure Challenges and Opportunities (30) Databases and database services: Internet and the WWW (30) Novel / Advanced Database Applications (29) Data Integration / Federation / Mediation (29) Information Retrieval with Database Systems (29) Middleware Data Architectures (29) Special Purpose DB Techn.: Multidimensional Databases … miscellaneous other topics …

Page 18: Θέματα Συστημάτων Βάσεων Δεδομένων

18

Topics

YesterdayTodayTomorrow

Page 19: Θέματα Συστημάτων Βάσεων Δεδομένων

19

The Lowell report -- 2003

Senior database researchers gather every few years to assess the state of database research and to recommend problems and problem areas that deserve additional focus. The previous meetings were held in Laguna Beach, Ca. in 1989, in Palo Alto, Ca. (Lagunitas) in 1990, in Palo Alto, Ca. (Lagunitas II) in 1995, and at Asilomar, Ca. in 1998.The sixth ad-hoc meeting was held May 4-6, 2003 in Lowell, Mass., USA.

http://research.microsoft.com/~Gray/Lowell/

Page 20: Θέματα Συστημάτων Βάσεων Δεδομένων

20

Issues for future research

(data)Bases for everythingInformation FusionMultimedia QueryingUncertain data & PersonalizationData MiningPrivacy & Trustworthy Systems New User Interfaces100 year storage

Page 21: Θέματα Συστημάτων Βάσεων Δεδομένων

21

… no more data bases ……, it is time to stop grafting new constructs onto the traditional

architecture of the past. Instead, we should rethink basic DBMS architecture with an eye toward supporting:

Structured dataText, space, time, image, and multimedia dataProcedural data, that is data types and the methods that encapsulate themTriggersData Streams and queues

as co-equal first class components within the DBMS architecture both its interface and its implementation rather than as afterthoughts grafted on a relational core.

The participants were adamant that one should start with a clean sheet of paper.

Page 22: Θέματα Συστημάτων Βάσεων Δεδομένων

22

Issues for future research

Information Fusion: Therefore, one must perform information integration on-the-fly over perhaps millions of information sources. … the thorny problem of semantic heterogeneity remains …Multimedia Querying: … to create easy ways to analyze, summarize, search, and view the “electronic shoebox” of a person’s multimedia information. Uncertain data: …query processing must move from a deterministic model, where there is an exact answer for every query, to a stochastic one, where the query processor performs evidence accumulation to get a better and better answer to a user query.

Page 23: Θέματα Συστημάτων Βάσεων Δεδομένων

23

Issues for future research

Data mining: users … wish for tools that generate some “pearls of wisdom”.A challenge for data mining research is to develop algorithms and structures for sifting through the databases looking for such pearls, while running in background and consuming excess system resources. Another important challenge is to integrate data mining with database querying, optimization, and other facilities such as triggers.

Page 24: Θέματα Συστημάτων Βάσεων Δεδομένων

24

Issues for future research

Privacy: our community can work on security systems that include a component dealing with the prospective use to which the data will be put. Access decisions should be based not only on who is requesting the data but also on what use it will be put to. New User Interfaces: There is a crying need for better ideas in this area. PV: Major Issue!!!

Page 25: Θέματα Συστημάτων Βάσεων Δεδομένων

25

Issues for future research

100 year storage: even archived information is disappearing, because it was captured on a medium that is deteriorating (e.g. photographic film or magnetic tape) or because it was captured on a medium that requires obsolete devices (e.g. special storage drives), or because the application that is needed to interpret the information no longer works (e.g. troff). [we need] mechanisms for migration, to copy information from deteriorating or obsolete media, and for emulation, to capture methods that can interpret information that is stored for long periods (e.g. troff renderer)