Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report...
-
Upload
oscar-taylor -
Category
Documents
-
view
219 -
download
0
Transcript of Advanced Topics in Databases Fahime Raja Niloofar Razavi Melody Siadaty Spring 2005 Technical Report...
Advanced Topics in Databases
Fahime Raja
Niloofar Razavi
Melody Siadaty
Spring 2005
Technical Report I
Outline:
Main Memory databases
Transaction Processing Monitors
Transactional Workflows
Real Time Databases
Main Memory Databases:
Memory resident database systems (MMDB’s) store their data in main physical memory and provide very high-speed access.
Traditional database systems rely on the disk subsystem to retrieve and update data use an offline storage device such as magnetic tape for backup.
MMDB will use physical memory as primary storage a disk subsystem for backup.
Main Memory Databases:
Advantages: Achieving significant performance improvements over
conventional database systems
Improve of processing time
Improve of throughput rates
due to elimination of I/O overhead.
Main Memory Databases:
Vs. conventional DBMS: Volatility of main memory
using a small stable main memory to support recovery processing
High overhead for the initial load frequently create the archives, distribute them across several secondary storage devices
Implementation issues Inherent differences between disk and memory storage
Main Memory Databases:
Concurrency Control: access to main memory is so much faster than disk access,
transactions may complete more quickly in a main memory system. So, locks will not be held as long, Lock contention may not be as important as it is when the data is
disk resident. The advantage of small locking granules (fields or records) (when
data are memory resident), are removed. Very large lock granules (e.g., relations) are most appropriate
for memory resident data.
Main Memory Databases:
Implementation of the locking mechanism : conventional system: a hash table that contains entries for the
objects currently locked .
MMDB: a small number of bits in data to represent data’s lock status. the first bit is set the object is locked, If it is locked and the second bit is set,
there are one or more waiting transaction else it is free.
Main Memory Databases:
Commit Processing: necessary to have a backup copy and to keep a log of transaction activity Before a transaction can commit, its activity records must be written to the log. Logging can impact response time, since each transaction must wait for at least one
stable write before committing. Logging can also affect throughput if the log becomes a bottleneck In MMDBs, the logging represents the only disk operation each transaction will
require. Using a small amount of stable main memory hold a portion of the log.
eliminating the response time problem, since transactions need never wait for disk operations.
Pre-committing: is accomplished by releasing a transaction’s locks as soon as its log record is placed in the log, without waiting for the information to be propagated to the disk.
Group commits: can be used to relieve a log bottleneck.
Main Memory Databases:
Access Methods: in main memory access methods, data values on which the index
is built, need not be stored in the index itself, as is done in B-Trees.
random access is fast in main memory pointers can be followed quickly index structures can store pointers to the indexed data, rather than the data itself. eliminating the problem of storing variable length fields in an
index and , saving space as long as the pointers are smaller than the data they
point to!
Main Memory Databases:
Data Representation: Relational tuples can be represented as a set of pointers to data values. use of pointers is space efficient when large values appear multiple
times in the database, (the actual value needs to only be stored once.)
Pointers also simplify the handling of variable length fields since variable length data can be represented using pointers into a heap.
Query Processing In MMDB: sequential access is not faster than random access . E.g.: sort merge join processing loses its advantage!
Although the sorted relations could be represented easily in a main memory database using pointer lists, there is really no need for this since much of the motivation for sorting is already lost .
Main Memory Databases:
Recovery: Backups of memory resident databases must be maintained on
disk or other stable storage to insure against loss of the volatile data .
1.the procedure used during normal database operation to keep the backup up-to-date (Checkpointing),
2.the procedure used to recover from a failure. (Failure Recovery) loading blocks of the database “on demand” until all of the data
has been loaded. using disk striping or disk array
there must be independent paths from the disks to memory
Main Memory Databases:
Papers: Main Memory Database Systems: An overview
Hector Garcia-Molina, Kenneth Salem, IEEE 1992 This paper surveys the major memory residence optimizations and
briefly discusses some of the memory resident systems that have been designed or implemented.
Transaction Management for a Main-Memory DatabasePiyush Burte, Boanerges Aleman-Meza, D. Brent Weatherly, Rong Wu
In this paper, the details of thread concurrency and resource locking protocols are examined, the deadlock prevention scheme, and the Java-based implementation of these design decisions. Also the effectiveness of the design with performance tests that simulate typical transactions on a highly concurrent database system are shown.
Main Memory Databases:
Main Memory Database RecoveryMargaret H. Eich, IEEE 1986
This paper examines MMDB recovery, identifies differences from traditional DBMS recovery, composes a “wish list” of MMDB recovery requirements, describes why previously proposed techniques do not satisfy these requirements, and proposes a new MMDB recovery technique which does.
* -Understanding, Modeling and Improvement Main-Memory Database Performance
Stefan Manegold, 2002, Amsterdam university
Main Memory Databases:
Some possible future works: Efficient loading for MMDBs including the idea of
partitioning. examining methods for distributing log and archive
database information across multiple secondary storage devices.
Simulation and testing the results
Transaction Processing Monitor
A Transaction Processing (TP) application is a program that performs an administrative function by accessing a shared data based on behalf of an online user.
A TP system is an integrated set of products that supports TP applications ,including :
hardware such as processors, memories, disks, and communication controllers
software, such as operating systems , DBMSs, computer networks TP monitors
The main function of a TP monitor is to coordinate the flow of transaction requests between terminals or other
devices and application programs that can process these requests it imposes a certain structure on the software components of a TP system
and offers functions to support the activities of each component
Transaction Processing Monitor
Most of TP applications are structured to perform the following steps for each terminal :
1. Interact with the terminal user to collect the transaction’s input, usually through forms and menus.
2. Translate the transaction input into a standard-format request message.
3. Start the transaction.4. Examine the request’s header to determine its type.5. Execute the request type’s application program, which may in
turn invoke DBMSs and other application programs. 6. Commit the transaction after the application has finished.7. Send the transaction’s output to the terminal.
Transaction Processing Monitor
A TP monitor divides an application into 3 components that perform the above steps:
Message Manager performs steps 1, 2, and 7
Request Control performs steps 3, 4, and 6
Application Server performs step 5, in
collaboration with DBMSs
Transaction Processing Monitor
Papers: Transaction Processing Monitors
Philip.A.BernsteinNovember 1990,Vol 33,No.11,Communication of ACM
In this article, it was shown that TP monitors have evolved to solve distributed computing problems that are not solved by the underlying OS, DBMS, and network. In particular, they support multithreaded processes, message routing, queuing, and system management and recovery. Sometimes, they support the transaction abstraction
Chapter 2-(Transaction Processing systems) of Distributed Database Systems
Michael Gertz, spring 2003
Transactional Workflows:
Workflow: is an activity in which multiple tasks are executed in a coordinated way by different processing entities.
Task: some work to be done; Textual description in a file,a form,a msg,a computer program
Processing entity: that performs the tasks Person / sw system
A workflow process : directed graph P=< N, A >
a set of nodes N= {n1,n2 .... } and a set of arcs
Transactional Workflows:
Increasingly, workflow management systems (WFMSs) are being used as the primary technology for organizations to perform their daily business processes (workflows)
architecture of a WFMS
Transactional Workflows:
Execution: Scheduler: program submitting tasks, monitoring
events…. Task agents: controlling exec. of a task Query mechanisms : for the state of the workflow
3 architectures: Centralized: single scheduler Partially distributed: one instance of the scheduler for each
workflow Fully distributed: no scheduler, communicating agents
Transactional Workflows:
Papers: Logic Bases Modeling and Analysis of Workflows
Hasan Davulcu , Michael Kifer, CR. Ramakrishnan, I.V. Ramakrishnan,ACM 1998
In this paper Concurrent Transaction Logic (CTL) as the language for specifying, analyzing, and scheduling of workflows is proposed. Also it has been shown that both local and global properties of workflows can be naturally represented as CTL formulas and reasoning can be done with the use of the proof theory and the semantics of this logic.
Virtual Transaction Model to Support WorkflowVasudev Krishnamoorthy,Ming-Chien Shan,ACM 2000
This paper presents a model called the virtual transaction ,to provide transactional support to the workflow applications. and also discusses on how ACID properties will be achieved using this model.
Transactional Workflows:
Correctness Issues in Workflow Management
Mohan Kamathy and Krithi Ramamritham
Department of Computer Science, University of Massachusetts, Amherst MA, 1996
This paper discusses about issues happened due to failures in workflows and then describes techniques for ensuring correctness of workflows.
Real-Time Database systems
Real-time database systems inherit many properties from both database systems and real-time systems. ACID properties
Transactions process in a RTDBS are associated with timing constraints deadlines: Hard: serious problem if the deadline is contradicted Firm: no worth if completed after the deadline Soft: diminishing value if completed after the deadline
Real-Time Database systems
In traditional databases only the data consistency should be preserved,
But in RTDBS : both timing and data consistency constraints
need of time-critical scheduling methods for Concurrency control Resource scheduling Commit processing Buffer management
Real-Time Database systems
Goal: In traditional Dbs:
minimizing the transaction response time And maximizing the throughput
In RTDBS : Maximizing the # of transactions that satisfy their deadlines.
assigning a priority to each transaction based on its deadline Earliest Deadline First Least Slack First
Slack Time: max length of the time ,the transaction can be delayed and still satisfy its deadline.
Real-Time Database systems
Papers: Advances in Real-Time Database Systems Research
Azer Bestavros, Sigmod 1996 Research Issues in Real-Time Database Systems
Ozgur Ulusoy,Bilkent University In this paper a basic understanding of the issues in real-time databases is provided and
research efforts in this area are introduced. Maintaining Security in Firm Real-time Systems
Quazi N. Ahmed and Susan V. Vrbsky, Department of Computer Science,The University of Alabama, Tuscaloosa, AL 35487-0290, U.S.A.
In this papera new concurrency control algorithm for secure firm real-time databases is produced. Also, the results show that the algorithm performs fairly well in terms of security and timeliness compared to a non-secure algorithm.
*- Value-Based Scheduling in Real-Time Database SystemsJayant R. Haritsa, Michael J. Carey, and Miron Livny ,VLDB Journal 1992