Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report...

27
Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I

Transcript of Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report...

Page 1: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Advanced Topics in Databases

Fahime Raja

Niloofar Razavi

Melody Siadaty

Spring 2005

Technical Report I

Page 2: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Outline:

Main Memory databases

Transaction Processing Monitors

Transactional Workflows

Real Time Databases

Page 3: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Memory resident database systems (MMDB’s) store their data in main physical memory and provide very high-speed access.

Traditional database systems rely on the disk subsystem to retrieve and update data use an offline storage device such as magnetic tape for backup.

MMDB will use physical memory as primary storage a disk subsystem for backup.

Page 4: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Advantages: Achieving significant performance improvements over

conventional database systems

Improve of processing time

Improve of throughput rates

due to elimination of I/O overhead.

Page 5: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Vs. conventional DBMS: Volatility of main memory

using a small stable main memory to support recovery processing

High overhead for the initial load frequently create the archives, distribute them across several secondary storage devices

Implementation issues Inherent differences between disk and memory storage

Page 6: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Concurrency Control: access to main memory is so much faster than disk access,

transactions may complete more quickly in a main memory system. So, locks will not be held as long, Lock contention may not be as important as it is when the data is

disk resident. The advantage of small locking granules (fields or records) (when

data are memory resident), are removed. Very large lock granules (e.g., relations) are most appropriate

for memory resident data.

Page 7: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Implementation of the locking mechanism : conventional system: a hash table that contains entries for the

objects currently locked .

MMDB: a small number of bits in data to represent data’s lock status. the first bit is set the object is locked, If it is locked and the second bit is set,

there are one or more waiting transaction else it is free.

Page 8: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Commit Processing: necessary to have a backup copy and to keep a log of transaction activity Before a transaction can commit, its activity records must be written to the log. Logging can impact response time, since each transaction must wait for at least one

stable write before committing. Logging can also affect throughput if the log becomes a bottleneck In MMDBs, the logging represents the only disk operation each transaction will

require. Using a small amount of stable main memory hold a portion of the log.

eliminating the response time problem, since transactions need never wait for disk operations.

Pre-committing: is accomplished by releasing a transaction’s locks as soon as its log record is placed in the log, without waiting for the information to be propagated to the disk.

Group commits: can be used to relieve a log bottleneck.

Page 9: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Access Methods: in main memory access methods, data values on which the index

is built, need not be stored in the index itself, as is done in B-Trees.

random access is fast in main memory pointers can be followed quickly index structures can store pointers to the indexed data, rather than the data itself. eliminating the problem of storing variable length fields in an

index and , saving space as long as the pointers are smaller than the data they

point to!

Page 10: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Data Representation: Relational tuples can be represented as a set of pointers to data values. use of pointers is space efficient when large values appear multiple

times in the database, (the actual value needs to only be stored once.)

Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap.

Query Processing In MMDB: sequential access is not faster than random access . E.g.: sort merge join processing loses its advantage!

Although the sorted relations could be represented easily in a main memory database using pointer lists, there is really no need for this since much of the motivation for sorting is already lost .

Page 11: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Recovery: Backups of memory resident databases must be maintained on

disk or other stable storage to insure against loss of the volatile data .

1.the procedure used during normal database operation to keep the backup up-to-date (Checkpointing),

2.the procedure used to recover from a failure. (Failure Recovery) loading blocks of the database “on demand” until all of the data

has been loaded. using disk striping or disk array

there must be independent paths from the disks to memory

Page 12: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Papers: Main Memory Database Systems: An overview

Hector Garcia-Molina, Kenneth Salem, IEEE 1992 This paper surveys the major memory residence optimizations and

briefly discusses some of the memory resident systems that have been designed or implemented.

Transaction Management for a Main-Memory DatabasePiyush Burte, Boanerges Aleman-Meza, D. Brent Weatherly, Rong Wu

In this paper, the details of thread concurrency and resource locking protocols are examined, the deadlock prevention scheme, and the Java-based implementation of these design decisions. Also the effectiveness of the design with performance tests that simulate typical transactions on a highly concurrent database system are shown.

Page 13: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Main Memory Database RecoveryMargaret H. Eich, IEEE 1986

This paper examines MMDB recovery, identifies differences from traditional DBMS recovery, composes a “wish list” of MMDB recovery requirements, describes why previously proposed techniques do not satisfy these requirements, and proposes a new MMDB recovery technique which does.

* -Understanding, Modeling and Improvement Main-Memory Database Performance

Stefan Manegold, 2002, Amsterdam university

Page 14: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Main Memory Databases:

Some possible future works: Efficient loading for MMDBs including the idea of

partitioning. examining methods for distributing log and archive

database information across multiple secondary storage devices.

Simulation and testing the results

Page 15: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transaction Processing Monitor

A Transaction Processing (TP) application is a program that performs an administrative function by accessing a shared data based on behalf of an online user.

A TP system is an integrated set of products that supports TP applications ,including :

hardware such as processors, memories, disks, and communication controllers

software, such as operating systems , DBMSs, computer networks TP monitors

The main function of a TP monitor is to coordinate the flow of transaction requests between terminals or other

devices and application programs that can process these requests it imposes a certain structure on the software components of a TP system

and offers functions to support the activities of each component

Page 16: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transaction Processing Monitor

Most of TP applications are structured to perform the following steps for each terminal :

1. Interact with the terminal user to collect the transaction’s input, usually through forms and menus.

2. Translate the transaction input into a standard-format request message.

3. Start the transaction.4. Examine the request’s header to determine its type.5. Execute the request type’s application program, which may in

turn invoke DBMSs and other application programs. 6. Commit the transaction after the application has finished.7. Send the transaction’s output to the terminal.

Page 17: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transaction Processing Monitor

A TP monitor divides an application into 3 components that perform the above steps:

Message Manager performs steps 1, 2, and 7

Request Control performs steps 3, 4, and 6

Application Server performs step 5, in

collaboration with DBMSs

Page 18: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transaction Processing Monitor

Papers: Transaction Processing Monitors

Philip.A.BernsteinNovember 1990,Vol 33,No.11,Communication of ACM

In this article, it was shown that TP monitors have evolved to solve distributed computing problems that are not solved by the underlying OS, DBMS, and network. In particular, they support multithreaded processes, message routing, queuing, and system management and recovery. Sometimes, they support the transaction abstraction

Chapter 2-(Transaction Processing systems) of Distributed Database Systems

Michael Gertz, spring 2003

Page 19: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transactional Workflows:

Workflow: is an activity in which multiple tasks are executed in a coordinated way by different processing entities.

Task: some work to be done; Textual description in a file,a form,a msg,a computer program

Processing entity: that performs the tasks Person / sw system

A workflow process : directed graph P=< N, A >

a set of nodes N= {n1,n2 .... } and a set of arcs

Page 20: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transactional Workflows:

Increasingly, workflow management systems (WFMSs) are being used as the primary technology for organizations to perform their daily business processes (workflows)

architecture of a WFMS

Page 21: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transactional Workflows:

Execution: Scheduler: program submitting tasks, monitoring

events…. Task agents: controlling exec. of a task Query mechanisms : for the state of the workflow

3 architectures: Centralized: single scheduler Partially distributed: one instance of the scheduler for each

workflow Fully distributed: no scheduler, communicating agents

Page 22: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transactional Workflows:

Papers: Logic Bases Modeling and Analysis of Workflows

Hasan Davulcu , Michael Kifer, CR. Ramakrishnan, I.V. Ramakrishnan,ACM 1998

In this paper Concurrent Transaction Logic (CTL) as the language for specifying, analyzing, and scheduling of workflows is proposed. Also it has been shown that both local and global properties of workflows can be naturally represented as CTL formulas and reasoning can be done with the use of the proof theory and the semantics of this logic.

Virtual Transaction Model to Support WorkflowVasudev Krishnamoorthy,Ming-Chien Shan,ACM 2000

This paper presents a model called the virtual transaction ,to provide transactional support to the workflow applications. and also discusses on how ACID properties will be achieved using this model.

Page 23: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Transactional Workflows:

Correctness Issues in Workflow Management

Mohan Kamathy and Krithi Ramamritham

Department of Computer Science, University of Massachusetts, Amherst MA, 1996

This paper discusses about issues happened due to failures in workflows and then describes techniques for ensuring correctness of workflows.

Page 24: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Real-Time Database systems

Real-time database systems inherit many properties from both database systems and real-time systems. ACID properties

Transactions process in a RTDBS are associated with timing constraints deadlines: Hard: serious problem if the deadline is contradicted Firm: no worth if completed after the deadline Soft: diminishing value if completed after the deadline

Page 25: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Real-Time Database systems

In traditional databases only the data consistency should be preserved,

But in RTDBS : both timing and data consistency constraints

need of time-critical scheduling methods for Concurrency control Resource scheduling Commit processing Buffer management

Page 26: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Real-Time Database systems

Goal: In traditional Dbs:

minimizing the transaction response time And maximizing the throughput

In RTDBS : Maximizing the # of transactions that satisfy their deadlines.

assigning a priority to each transaction based on its deadline Earliest Deadline First Least Slack First

Slack Time: max length of the time ,the transaction can be delayed and still satisfy its deadline.

Page 27: Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report I.

Real-Time Database systems

Papers: Advances in Real-Time Database Systems Research

Azer Bestavros, Sigmod 1996 Research Issues in Real-Time Database Systems

Ozgur Ulusoy,Bilkent University In this paper a basic understanding of the issues in real-time databases is provided and

research efforts in this area are introduced. Maintaining Security in Firm Real-time Systems

Quazi N. Ahmed and Susan V. Vrbsky, Department of Computer Science,The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A.

In this papera new concurrency control algorithm for secure firm real-time databases is produced. Also, the results show that the algorithm performs fairly well in terms of security and timeliness compared to a non-secure algorithm.

*- Value-Based Scheduling in Real-Time Database SystemsJayant R. Haritsa, Michael J. Carey, and Miron Livny ,VLDB Journal 1992