CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class...
-
Upload
shannon-barber -
Category
Documents
-
view
214 -
download
1
Transcript of CS 8630 Database Administration, Dr. Guimaraes 10-05-2009, Physical Design and Performance Class...
CS 8630 Database Administration, Dr. Guimaraes
10-05-2009, Physical Design and Performance
ClassWill
Start Momentarily…
CS8630 Database AdministrationDr. Mario Guimaraes
CS 8630 Database Administration, Dr. Guimaraes
Overview
• Introduction: input to Physical Design, Decisions• Create Index• Rewrite SQL / Query Optimizer (Leccotech)• Denormalization, Materialized Views• Partition Database• Redundant Arrays of Inexpensive Disks (RAID)• Redefine Main memory structures (SGA in Oracle)• Change default Block Size at installation • Export/Import (drop indexes): defragment• Check Locks• Separate data by category in proper tablespaces• Redefining Client-Server Architecture
Where should a DBA start when trying to optimize ? Why ?
a) DB, b) OS, c) DB Application, 4) Other
CS 8630 Database Administration, Dr. Guimaraes
DB Design Phases
• Conceptual Design• Logical Design• Physical Design
CS 8630 Database Administration, Dr. Guimaraes
Introduction - Inputs to Physical Design
• Normalized relations.• Volume estimates.• Attribute definitions.• Data usage: entered, retrieved, deleted, updated.• Response time requirements.• Requirements for security, backup, recovery, retention,
integrity.• DBMS characteristics.• system
CS 8630 Database Administration, Dr. Guimaraes
Physical Design Decisions
• Specifying attribute data types.• Modifying the logical design.• Specifying the file organization (sometimes)• Choosing indexes.
CS 8630 Database Administration, Dr. Guimaraes
Designing Fields
• Choosing PK• Choosing data type.• Coding, compression, encryption.• Controlling data integrity.
– Default value.– Range control.– Null value control.– Referential integrity.
CS 8630 Database Administration, Dr. Guimaraes
Selection of a Primary Key
• Consider a shorter field or selecting another candidate key to substitute for a long, multi-field primary key (and all associated foreign keys.)– System-generated non-information-
carrying key– Versus– Primary key like Phone number
CS 8630 Database Administration, Dr. Guimaraes
Example of Data Dictionary
Attribute Table Null? Unique? Pkey? Fkey? Ref table Domain
CID College N Y Y N NA
4 digit integer greater than 1000
Office College Y N N N NA
character string length 10
DID Dept N Y Y N NA
4 digit integer greater than 1000
Location Dept Y N N N NA
character string length 65
CID Dept Y N N Y College
4 digit integer greater than 1000
CS 8630 Database Administration, Dr. Guimaraes
Designing Fields
• Handling missing data.– Substitute an estimate of the missing
value.– Assign default value.– Trigger a report listing missing values.– In programs, ignore missing data unless
the value is significant.
CS 8630 Database Administration, Dr. Guimaraes
• END OF INTRODUCTION TO PHYSICAL DESIGN
• START OF PERFORMANCE (INDEXES, QUERY OPTIMIZATION).
CS 8630 Database Administration, Dr. Guimaraes
INDEXES
• What is an INDEX ?• Why do we CREATE an INDEX ?
A) To speed up query B) To speed up data entry (insert/update/delete) ?C) Both ?
CS 8630 Database Administration, Dr. Guimaraes
Rules for Using Indexes
1. Use on larger tables.2. Index the primary key of each table.3. Index search fields.4. Fields in WHERE clause of SQL commands.5. Cardinality is high. For example, not on SEX, where cardinality
is 2.Typically: When there are >100 different values but not when there are <10 values.
CS 8630 Database Administration, Dr. Guimaraes
Rules for Using Indexes
6. DBMS may have limit on number of indexes per table and number of bytes per indexed field(s).
7. Null values may not be referenced from an index.
8. Use indexes heavily for non-volatile databases (Datawarehouse); limit the use of indexes for volatile databases.
CS 8630 Database Administration, Dr. Guimaraes
Different Type of Indexes
Typical Indexes• B-Trees (traditional) Indexes• Hash-cluster• Bitmap Indexes• Index-Organized Tables• Reverse-Key Indexes--------------------------------------• When we issue the command:
Create index cidx on orders (cid);What type of an index do we create ?
• General Format: Create index <iName> on <tname> (<col_name>);
CS 8630 Database Administration, Dr. Guimaraes
Indexes (Defaults)
• Anytime a PK is created, an index is automatically created.
• Anytime when the type of index is not specificied, the type of index created isa B-Trees.
CS 8630 Database Administration, Dr. Guimaraes
B-Tree (Balanced Tree)
• Most popular type of index structure for any programming language or database.
• When you don’t know what to do, the best option is usually a B-Tree. They are flexible and perform well (not very well) in several scenarios.
• It is really the B+ tree or B* tree
CS 8630 Database Administration, Dr. Guimaraes
B-Trees (continued)
• One node corresponds to one block/page(minimum disk I-O).
• Non-Leaf nodes(n keys, n+1 pointers)• Leaf-Nodes (contain n entries, where
each entry has an index and a pointer to a data block). Also, each node has a pointer to next node.
• All leaves are at the same height.
CS 8630 Database Administration, Dr. Guimaraes
Good Indexing (B-Tree) Candidates
• Table must be reasonably large• Field is queried by frequently• Field has a high cardinality (don’t index by
sex, where the cardinality is 2!!).• Badly balanced trees may inhibit
performance. Destroying and re-creating index may improve performance.
CS 8630 Database Administration, Dr. Guimaraes
Bitmap Index
• Bitmap indexes contain the key value and a bitmap listing the value of 0 or 1 (yes/no) for each row indicating whether the row contains that value or not.
• May be a good option for indexing fields that have low cardinality (opposite of B-trees).
CS 8630 Database Administration, Dr. Guimaraes
Bitmap Index (cont.)
• Syntax: Create Bitmap index ….• Bitmap index works better with equality tests = or
in (not with < or > )• Bitmap index maintenance can be expensive; an
individual bit may not be locked; a single update locks a large portion of index.
• Bitmap indexes are best in read-only datawarehouse situations
CS 8630 Database Administration, Dr. Guimaraes
Hash Indexing
• B-trees and Bitmap index keys are used to find rows requiring I/O to process index
• Hash gets rows with a key based algorithm• Rows are stored based on a hashed value• Index size should be known at index
creation• Example:
– create index cidx on orders (cid) hashed;
CS 8630 Database Administration, Dr. Guimaraes
Hash Index work best with
• Very-high cardinality columns• Only equal (=) tests are used• Index values do not change• Number of rows are known ahead of time
CS 8630 Database Administration, Dr. Guimaraes
Index-Organized Tables
• Table data is incorporated into the B-Tree using the PK as the index.
• Table data is always in order of PK. Many sorts can be avoided.
• Especially useful for “lookup” type tables• Index works best when there are few (and
small) columns in your table other than the PK.
CS 8630 Database Administration, Dr. Guimaraes
Reverse Key Indexes
• Key ‘1234’ becomes ‘4321’, etc. • Only efficient for few scenarios envolving
parallel processing and a hughe amount of data.
• By reversing key values, index blocks might be more evenly distributed reducing the likelihood of densely or sparsely populated indexes.
CS 8630 Database Administration, Dr. Guimaraes
Conclusions on Indexes
• For high-cardinality key values, B-Tree indexes are usually best.
• B-Trees work with all types of comparisons and gracefully shrink and grow as table changes.
• For low cardinality read-only environments, Bitmaps may be a good option.
CS 8630 Database Administration, Dr. Guimaraes
Denormalization
• Normally, we want to design our tables up to
3NF or BCNF (at least)• When do we want to violate 3NF / BCNF ?• When do we want to store Derived Data ?
– A) Read Only Databases ?– B) Updateable Databases ?
CS 8630 Database Administration, Dr. Guimaraes
Rules for Adding Derived Columns
• Use when aggregate values are regularly retrieved.
• Use when aggregate values are costly to calculate.
• Permit updating only of source data.• Create triggers to cascade changes
from source data.
CS 8630 Database Administration, Dr. Guimaraes
Rules for Storing Repeating Groups
• Consider storing repeating groups across columns rather than down rows when:– The repeating group has a fixed number
of occurrences, each of which has a different meaning or
– The entire repeating group is normally accessed and updated as one unit.
CS 8630 Database Administration, Dr. Guimaraes
Rules for Storing Repeating Groups Across Columns
EMPLOYEE Phone
Design Option:EMPLOYEE(EmpID, EmpName, …)EMP_PHONE(EmpID, Phone)
Another Design Option:EMPLOYEE(EmpID, EmpName, Phone1, Phone2, …)
CS 8630 Database Administration, Dr. Guimaraes
• One-to-one relationship. Student 1,1 Submits 0,1 Application
• STUDENT and APPLICATION become a single relation STUDENT instead of 2
• Many-to-many relationship. Vendor 1,N PriceQuote 1, N Item
• Physical design may suggest collapsing ITEM and PRICE_QUOTE into a single relation ITEM_QUOTE
Denormalization
CS 8630 Database Administration, Dr. Guimaraes
A possible denormalization situation:
One-to-many relationship
CS 8630 Database Administration, Dr. Guimaraes
Partitioning
• Horizontal Partitioning: Distributing the rows of a table into several separate files/locations.
• Vertical Partitioning: Distributing the columns of a table into several separate files/locations.– The primary key must be repeated in
each file.
CS 8630 Database Administration, Dr. Guimaraes
Partitioning
• Advantages of Partitioning:– Records used together are grouped together.– Each partition can be optimized for performance.– Security, recovery.– Partitions stored on different disks: contention.– Take advantage of parallel processing capability.
• Disadvantages of Partitioning:– Slow retrievals across partitions.– Complexity.
CS 8630 Database Administration, Dr. Guimaraes
RAID with four disks and striping
R RedundantA Arrays ofI InexpensiveD Disks
RAID
CS 8630 Database Administration, Dr. Guimaraes
Intro. To Query Processing
• In network and hierarchical DBMSs, low-level procedural query language is generally embedded in high-level programming language.
• Programmer’s responsibility to select most appropriate execution strategy.
• With declarative languages such as SQL, user specifies what data is required rather than how it is to be retrieved.
• Relieves user of knowing what constitutes good execution strategy
• Gives DBMS more control over system performance.• Disk access tends to be dominant cost in query
processing for centralized DBMS.
• Two main techniques for query optimization:– heuristic rules that order operations in a query; – comparing different strategies based on relative
costs, and selecting one that minimizes resource usage.
CS 8630 Database Administration, Dr. Guimaraes
Goals
• Aims of QP:– transform query written in high-level
language (e.g. SQL), into correct and efficient execution strategy expressed in low-level language (implementing RA);
– execute strategy to retrieve required data. • As there are many equivalent transformations
of same high-level query, aim of QO is to choose one that minimizes resource usage.
• Generally, reduce total execution time of query.
• Problem computationally intractable with large number of relations, so strategy adopted is reduced to finding near optimum solution.
CS 8630 Database Administration, Dr. Guimaraes
3 alternatives
Find all Managers who work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);
• Three equivalent RA queries are:
(1) (position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch) (2) (position='Manager') (city='London')(
Staff Staff.branchNo=Branch.branchNo Branch)(3) (position='Manager'(Staff)) Staff.branchNo=Branch.branchNo
(city='London' (Branch))
CS 8630 Database Administration, Dr. Guimaraes
Comparing costs
• Assume:– 1000 tuples in Staff; 50 tuples in Branch;– 50 Managers; 5 London branches;– no indexes or sort keys;– results of any intermediate operations stored on
disk;– cost of the final write is ignored;– tuples are accessed one at a time.
• Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050 (2) 2*1000 + (1000 + 50) = 3 050 (3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
• Cartesian product and join operations much more expensive than selection, and third option significantly
reduces size of relations being joined together.
CS 8630 Database Administration, Dr. Guimaraes
Phases of Query Processing
• QP has four main phases:
CS 8630 Database Administration, Dr. Guimaraes
Dynamic versus Static Optimization
• First three phases of QP can be carried out:– dynamically every time query is run;– statically when query is first submitted. – Similar to compiled vs. interpreted lang.
• Advantages of dynamic QO arise from fact that information is up to date.
• Disadvantages are that performance of query is affected, time may limit finding optimum strategy.
• Advantages of static QO are removal of runtime overhead, and more time to find optimum strategy.
• Disadvantages arise from fact that chosen execution strategy may no longer be optimal when query is run.
• Could use a hybrid approach to overcome this.
CS 8630 Database Administration, Dr. Guimaraes
Query Optimizer - Plan
• DBMSs allow you to view the query plan
• In ORACLE, you must use either set autotrace on or explain plan. Set autotrace on is much simpler. Explain plan is a little bit more efficient, but more complicated.
CS 8630 Database Administration, Dr. Guimaraes
Oracle operations (results of autotrace)
• TABLE ACCESS FULL• TABLE ACCESS BY ROWID• INDEX RANGE SCAN• INDEX UNIQUE SCAN• NESTED LOOPS
CS 8630 Database Administration, Dr. Guimaraes
• TABLE ACCESS FULL (full table scan):
Oracle will look at every row in the table to find the requested information. This is usually the slowest way to access a table.
CS 8630 Database Administration, Dr. Guimaraes
TABLE ACCESS BY ROWIDOracle will use the ROWID method to find a row in the table.ROWID is a special column detailing an exact Oracle block wherethe row can be found. This is the fastest way to access a table (faster than any index. Less flexible than any index).
CS 8630 Database Administration, Dr. Guimaraes
INDEX RANGE SCANOracle will search an index for a range of values. Usually, this even occurs when a range or between operation is specified by the query or when only the leading columns in a composite index are specified by the where clause. Can perform well or poorly, based on the size of the range and the fragmentation of the index.).
CS 8630 Database Administration, Dr. Guimaraes
INDEX UNIQUE SCANOracle will perform this operation when the table’s primary key or a unique key is part of the where clause. This is the most efficientway to search an index.
CS 8630 Database Administration, Dr. Guimaraes
NESTED LOOPS Indicates that a join operation is
occurring. Can perform well or poorly, depending on performance on the index and table
operations of the individual tables being joined.
CS 8630 Database Administration, Dr. Guimaraes
Tuning SQL and PL/SQL Queries
Sometimes, Same Query written more than 1000 ways.
Generating more than 100 execution plans.Some firms have products that re-write
correctly written SQL queries automatically.
CS 8630 Database Administration, Dr. Guimaraes
ROWID
• SELECT ROWID, …INTO :EMP_ROWID, …FROM EMP
WHERE EMP.EMP_NO = 56722FOR UPDATE;
UPDATE EMP SET EMP.NAME = …WHERE ROWID = :EMP_ROWID;
CS 8630 Database Administration, Dr. Guimaraes
ROWID (cont.)
• Fastest• Less Flexible• Are very useful for removing duplicates of
rows
CS 8630 Database Administration, Dr. Guimaraes
SELECT STATEMENT
• Not exists in place of NOT IN• Joins in place of Exists• Avoid sub-selects• Exists in place of distinct• UNION in place of OR on an index column• WHERE instead of ORDER BY