dbms unit5

7/27/2019 dbms unit5

1/26

1

UNIT V - CURRENT ISSUES

TWO MARKS

1. Write any four rules defined in relational database design?

All database management must take place using the relational database's innate functionality

All information in the database must be stored as values in a table

All database information must be accessible through the combination of a table name, primary key and

column name.

The database must useNULL valuesto indicate missing or unknown information

The database schema must be described using the relational database syntax

The database may support multiple languages, but it must support at least one language that provides full

database functionality (e.g. SQL)

The system must be able to update all updatable views

The database must provide single-operation insert, update and delete functionality

Changes to the physical structure of the database must be transparent to applications and users.

Changes to the logical structure of the database must be transparent to applications and users.

The database must natively support integrity constraints.

Changes to the distribution of the database (centralized vs. distributed) must be transparent to

applications and users.

Any languages supported by the database must not be able to subvert integrity controls

2. What are knowledge bases?

A knowledge base (abbreviated KB, kb or ) is a special kind of database for knowledge

management. A knowledge base provides a means for information to be collected, organized, shared,

searched and utilized.

3. What are characteristics of knowledge bases?

characteristics of KBSs:

intelligent information processing systems

representation of domain of interest symbolic representation

-->problem solving by symbol-manipulation Symbolic programs
http://databases.about.com/cs/sql/a/aa042803a.htmhttp://databases.about.com/cs/sql/a/aa042803a.htmhttp://databases.about.com/cs/sql/a/aa042803a.htmhttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Databasehttp://databases.about.com/cs/sql/a/aa042803a.htm


2/26

2

4. What are the main components in Knowledge bases?

knowledge-base (KB)

inference engine

case-specific databaseexplanation subsystem

knowledge acquisition subsystem

user interface ( user)

specially intefaces

developer interface ( knowledge engineer, human expert)

5. What are called active databases?

An Active Database is adatabasethat includes an event driven architecture (often in the form ofECA

rules) which can respond to conditions both inside and outside the database. Possible uses include

security monitoring, alerting, statistics gathering and authorization. Most modern relational databases

include active database features in the form ofSQL Triggers.

6. Differentiate database and information system.

7. What are deductive databases?

A deductive database is adatabase systemthat can makedeductions(i.e.: conclude additional facts)

based onrulesandfactsstored in the (deductive) database.Datalogis the language typically used to

specify facts, rules and queries in deductive databases. Deductive databases have grown out of the

desire to combine logic programmingwith relational databases to construct systems that support a

powerful formalism and are still fast and able to deal with very large datasets. Deductive databases

are more expressive than relational databases but less expressive than logic programming systems.

Deductive databases have not found widespread adoptions outside academia, but some of their

concepts are used in today's relational databases to support the advanced features of more recent

SQLstandards.

8. What are parallel databases?

Parallel machines are becoming quite common and affordable

Prices of microprocessors, memory and disks have dropped sharply

Recent desktop computers feature multiple processors and this trend is projected to accelerate

Databases are growing increasingly large

Large volumes of transaction data are collected and stored for later analysis.

multimedia objects like images are increasingly stored in databasesLarge-scale parallel database systems increasingly used for:
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wiktionary.org/wiki/rulehttp://en.wiktionary.org/wiki/rulehttp://en.wiktionary.org/wiki/rulehttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Factshttp://en.wiktionary.org/wiki/rulehttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Database


3/26

3

storing large volumes of data

processing time-consuming decision-support queries

providing high throughput for transaction processing

9. What are the types of skew?10. Attribute-value skew

Some values appear in the partitioning attributes of many tuples; all the tuples with the same value for the

partitioning attribute end up in the same partition.

Can occur with range-partitioning and hash-partitioning.

Partition skew

With range-partitioning, badly chosen partition vector may assign too many tuples to some partitions and

too few to others.

Less likely with hash-partitioning if a good hash-function is chosen.

10. How do you handling skew using histograms?

Balanced partitioning vector can be constructed from histogram in a relatively straightforward fashion.

Assume uniform distribution within each range of the histogram

Histogram can be constructed by scanning relation, or sampling (blocks containing) tuples of the relation

11. How do you handling skew using virtual processor partitioning?

Skew in range partitioning can be handled elegantly using virtual processor partitioning: Create a large

number of partitions (say 10 to 20 times the number of processors). Assign virtual processors to

partitions either in round-robin fashion or based on estimated cost of processing each virtual partition

Basic idea:

If any normal partition would have been skewed, it is very likely the skew is spread over a number of

virtual partitions

Skewed virtual partitions get spread across a number of processors, so work gets distributed evenly!

12. What is interquery Parallelism?

Queries/transactions execute in parallel with one another.

Increases transaction throughput; used primarily to scale up a transaction processing system to support a

larger number of transactions per second.

Easiest form of parallelism to support, particularly in a shared-memory parallel database, because even

sequential database systems support concurrent processing.

More complicated to implement on shared-disk or shared-nothing architectures

Locking and logging must be coordinated by passing messages between processors.

Data in a local buffer may have been updated at another processor.


4/26

4

Cache-coherency has to be maintained reads and writes of data in buffer must find latest version

of data.

Define cache coherency protocol with example.

Example of a cache coherency protocol for shared disk systems:

Before reading/writing to a page, the page must be locked in shared/exclusive mode.On locking a page, the page must be read from disk

Before unlocking a page, the page must be written to disk if it was modified.

More complex protocols with fewer disk reads/writes exist.

Cache coherency protocols for shared-nothing systems are similar. Each database page is assigned a

home processor. Requests to fetch the page or write it to disk are sent to the home processor.

14. What is meant by intraquery parallelism?

Execution of a single query in parallel on multiple processors/disks; important for speeding up long-running queries.

Two complementary forms of intraquery parallelism :

Interoperation Parallelism parallelize the execution of each individual operation in the query.

Interoperation Parallelism execute the different operations in a query expression in parallel. The

first form scales better with increasing parallelism because the number of tuples processed by each

operation is typically more than the number of operations in a query.

15. Write some design issues of parallel systems?

Parallel loading of data from external sources is needed in order to handle large volumes of incoming

data.

Resilience to failure of some processors or disks.

On-line reorganization of data and schema changes must be supported.

Also need support for on-line repartitioning and schema changes (executed concurrently with other

processing).

16. What are the multimedia data types?

Text,Image,Video,Audio mixed multimedia data

17. What is image database system?

Image Database is searchable electronic catalog or database which allows you to organize and list

images by topics, modules, or categories. The Image Database will provide the student with important

information such as image title, description, and thumbnail picture. Additional information can be

provided such as creator of the image, filename, and keywords that will help students to search

through the database for specific images. Before you and your students can use Image Database,

you must add it to your course.


5/26

5

18. Give a note on image retrieval system.

An image retrieval system is a computer system for browsing, searching and retrieving images from a

large database of digital images. Most traditional and common methods of image retrieval utilize

some method of addingmetadatasuch ascaptioning,keywords, or descriptions to the images so that

retrieval can be performed over the annotation words. Manual image annotation is time-consuming,

laborious and expensive; to address this, there has been a large amount of research done on

automatic image annotation. Additionally, the increase in social web applicationsand the semantic

webhave inspired the development of several web-based image annotation tools.

19. Give a note on image search and methods.

Image search is a specialized data search used to find images. To search for images, a user may

provide query terms such as keyword, image file/link, or click on some image, and the system will

return images "similar" to the query. The similarity used for search criteria could be meta tags, color

distribution in images, region/shape attributes, etc.

Image meta search- search of images based on associated metadata

such as keywords, text, etc.

Content-based image retrieval (CBIR) the application of computer

vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead

retrieves images based on similarities in their contents (textures, colors, shapes etc.) to a user-

supplied query image or user-specified image features.

20. How the image data are classified?

Archives - usually contain large volumes of structured or semi-

structured homogeneous data pertaining to specific topics.

Domain-Specific Collection - this is a homogeneous collection

providing access to controlled users with very specific objectives. Examples of such a

collection are biomedical and satellite image databases.

Enterprise Collection - a heterogeneous collection of images

that is accessible to users within an organizations intranet. Pictures may be stored in many

different locations.

Personal Collection - usually consists of a largely

homogeneous collection and is generally small in size, accessible primarily to its owner, and

usually stored on a local storage media.
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Database


6/26

6

Web - World Wide Web images are accessible to everyone with

an Internet connection. These image collections are semi-structured, non-homogeneous and

massive in volume, and are usually stored in large disk arrays.

21. What are text databases?

Large collections of documents from various sources: news articles, research papers, books,

digital libraries, e-mail messages, and web pages, library database, etc.

UNIT IV - DATABASE DESIGN ISSUES

TWO MARKS

1. What are attributes? Give example.

An entity is represented by a set of attributes. Attributes are descriptive properties possessed by eachmember of an entity set.Example: possible attributes of customer entity are customer name, customer id, Customer Street,customer city.

2. Define single valued and multi valued attributes.

Single valued attributes: attributes with a single value for a particular entity are called single valuedattributes.Multi valued attributes: Attributes with a set of value for a particular entity are called multivalve

attributes.

3. What are stored and derived attributes?

Stored attributes: The attributes stored in a data base are called stored attributes.Derived attributes: The attributes that are derived from the stored attributes are called derivedattributes.

4. Define weak and strong entity sets?

Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.Strong entity set: Entity set that has a primary key is termed a strong entity set.

5. What is the use of integrity constraints?

Integrity constraints ensure that changes made to the database by authorized users do not result in aloss of data consistency. Thus integrity constraints guard against accidental damage to the database.

6. Mention the various levels in security measures.

Database system, Operating system, Network, Physical, Human


7/26

7

7. What are the types of attributes?

Simple: Each entity has a single atomic value for the attribute.Composite: The attribute may be composed of several components.Multi-valued: An entity may have multiple values for that attribute.

8. Write the entity set corresponding to the entity type car.

Registration (RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)Example:((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999,(red,balck))

9. What are Weak entity types?

An entity that does not have a key attribute. A Weak attribute entity must participate in an identifyingrelationship type with an owner or identifying entity type.

10. Define Normalization.

Normalization is carried out in practice so that the resulting designs are of high quality and meet thedesirable properties

11. What is Demoralization?

The process of storing the join of higher normal form relations as a base relation which is in a lowernormal form

12. What is the need of normalization?

Mixing attributes of multiple entities may cause problemsInformation is stored redundantly wasting storageProblems with update anomalies

13. What is integrity? And what are the types of integrity?

Integrity here refers to the CORRECTNESS & CONSISTENCY of the datastored in the databaseTypes: Database Integrity, Entity integrity,Referential Integrity

14. What is meant by consistency?

It ensures the truthfulness of the database. Theconsistencyproperty ensures that any transaction thedatabase performs will take it from one consistent state to another. The consistency property does notsay how the DBMS should handle an inconsistency other than ensure the database is clean at theend of the transaction.

15. What is an entity relationship model?

The entity relationship model is a collection of basic objects called entities and relationship amongthose objects. An entity is a thing or object in the real world that is distinguishable from other objects.

16. Define query optimization.

Query optimization refers to the process of finding the lowest cost method of evaluating a givenquery.
http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)


8/26

8

17. What are the tunable parameters?

Tuning of hardware

Tuning of schemaTuning of indicesTuning of materialized viewsTuning of transactions

18. What is hardware tuning?

Even well-tuned transactions typically require a few I/O operationsTypical disk supports about 100 random I/O operations per secondSuppose each transaction requires just 2 random I/O operations.

Then to support ntransactions per second, we need to stripe data

across n/50 disks (ignoring skew)Number of I/O operations per transaction can be reduced by keepingmore data in memory

If all data is in memory, I/O needed only for writesKeeping frequently used data in memory reduces disk accesses,

reducing number of disks required, but has a memory costWhat is mean by schema tuning?

Vertically partition relations to isolate the data that is accessed mostoften -- only fetch needed information.

E.g., split accountinto two, (account-number,

branch-name) and (account-number, balance).Improve performance by storing a denormalized relationE.g., store join of account and deposi tor;branch-name and balance information isrepeated for each holder of an account, but

join need not be computed repeatedly.Cluster together on the same disk page records that would

match in a frequently required join,

compute join very efficiently when required.

What is Index tuning?Create appropriate indices to speed up slow queries/updatesSpeed up slow updates by removing excess indices (tradeoff between

queries and updates)Choose type of index (B-tree/hash) appropriate for most frequent types

of queries.Choose which index to make clustered>Index tuning wizards look at past history of queries and updates (the

workload) and recommend which indices would be best for theworkload


9/26

9

22.What is the use of performance Simulation?Performance simulation using queuing model useful to predictbottlenecks as well as the effects of tuning changes, even without

access to real system.What are temporal databases?

Non Temporalstore only a single state of the real world, usually the most recent stateclassified as snapshot databasesapplication developers and database designers need to code for time varying data requirements eg

history tables, forecast reports etcTemporal

stores upto two dimensions of time i.e VALID (stated) time and TRANSACTION (logged) timeClassified as historical, rollback or bi -temporalNo need for application developers or database designers to code for time varying data requirements

i.e time is inherently supportedGive Examples of application domains dealing with time varying data.Financial Apps (e.g. history of stock market data)Insurance Apps (e.g. when were the policies in effect)

Reservation Systems (e.g. when is which room in a hotel booked)Medical Information Management Systems (e.g. patient records)Decision Support Systems (e.g. planning future contigencies)CRM applications (eg customer history / future)HR applications (e.g Date tracked positions in hierarchies)What is a SDBMS?

A SDBMS is a software module thatcan work with an underlying DBMS supports spatial data models, spatial abstract data types (ADTs) and a

query language from which these ADTs are callablesupports spatial indexing, efficient algorithms forprocessing spatial operations, and domain specific rules for query optimization

Examples: Oracle Spatial data cartridge, ESRI SDE

26. Give examples for spatial databases.

Consider a spatial dataset with:County boundary (dashed white line)Census block - name, area, population, boundary (dark line)->Water bodies (dark polygons)Satellite Imagery (gray scale pixels)Storage in a SDBMS table:

create table census_blocks (name string,area float,population number,boundary polygon );

27. How is a SDBMS different from a GIS?

GIS is a software to visualize and analyze spatial data using spatial analysis functions such asSearch Thematic search, search by region, (re-)classificationLocation analysis Buffer, corridor, overlayTerrain analysis Slope/aspect, catchment, drainage networkFlow analysis Connectivity, shortest pathDistribution Change detection, proximity, nearest neighborSpatial analysis/Statistics Pattern, centrality, autocorrelation, indices of similarity, topology: hole

descriptionMeasurements Distance, perimeter, shape, adjacency, directionGIS uses SDBMS

to store, search, query, share large spatial data sets


10/26

10

28. What are the components of a SDBMS?

Recall: a SDBMS is a software module thatcan work with an underlying DBMSsupports spatial data models, spatial ADTs and a query language from which these ADTs are callablesupports spatial indexing, algorithms for processing spatial operations, and domain specific rules for query

optimization]-->Components includeSpatial data models, query language, query processing, file organization and indices, query optimization,

etc.

UNIT III EMERGING SYSTEMS

1.What is client/server model?Networked computing modelProcesses distributed between clients and serversClient Workstation (usually a PC) that requests and uses a service

Server Computer (PC/mini/mainframe) that provides a serviceFor DBMS, server is a database server

2.Give notes on database Server Architectures.2-tiered approachClient is responsible forI/O processing logicSome business rules logicServer performs all data storage and access processing DBMS is only on server

3What are the advantages of database server?Clients do not have to be as powerful

Greatly reduces data traffic on the networkImproved data integrity since it is all processed centrallyStored procedures some business rules done on server>4What are the advantages of Three-Tier Architectures?ScalabilityTechnological flexibilityLong-term cost reductionBetter match of systems to business needsImproved customer serviceCompetitive advantageReduced risk5.What are the challenges of Three-tier Architectures?High short-term costsTools and trainingExperienceIncompatible standardsLack of compatible end-user tools6.What are the Client/Server Security levels are used?Network environment complex security issuesSecurity levels:System-level password security-->for allowing access to the systemDatabase-level password security

for determining access privileges to tables; read/update/insert/delete privileges Secure client/server communication via encryption


11/26

11

>7.>Define data warehousing.]-->Data sources often store only current data, not historical dataCorporate decision making requires a unified view of all organizational data, including historical

dataA data warehouse is a repository (archive) of information gathered from multiple sources, stored

under a unified schema, at a single siteGreatly simplifies querying, permits study of historical trendsShifts decision support query load away from transaction processing systems

8What are the Design Issues in data warehousing?When and how to gather dataSource driven architecture: data sources transmit new information to warehouse, either

continuously or periodically (e.g. at night)Destination driven architecture: warehouse periodically requests new information from data

sourcesKeeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is

too expensive>What schema to use

Schema integrationData cleansingE.g. correct mistakes in addresses (misspellings, zip code errors)Merge address lists from different sources and purge duplicatesHow to propagate updatesWarehouse schema may be a (materialized) view of schema from data sources What data to summarizeRaw data may be too large to store on-lineAggregate values (totals/subtotals) often sufficeQueries on raw data can often be transformed by query optimizer to use aggregate values 9What is the data warehouse Schemas?Dimension values are usually encoded using small integers and mapped to full values via

dimension tablesResultant schema is called a star schemaMore complicated schema structuresSnowflake schema: multiple levels of dimension tablesConstellation: multiple fact tables10.Give a example for data warehouse Schema

11.Define Data Mining.Data mining is the process of semi-automatically analyzing large databases to find usefulpatterns

12.Define prediction.Prediction based on past history

Predict if a credit card applicant poses a good credit risk, based on some attributes (income, jobtype, age, ..) and past history

Predict if a pattern of phone calling card usage is likely to be fraudulent13.Give some examples of prediction mechanisms.

]-->ClassificationGiven a new item whose class is unknown, predict to which class it belongs Regression formulaeGiven a set of mappings for an unknown function, predict the function result for a new parameter

value14.What are the descriptive Patterns in data mining?

Associations: Find books that are often bought by similar customers. If a new such

customer buys one such book, suggest the others too. Associations may be used as a firststep in detecting causation. E.g. association between exposure to chemical X and cancer,


12/26

12

Clusters: E.g. typhoid cases were clustered in an area surrounding a contaminated wellDetection of clusters remains important in detecting epidemics

15.What is classification Rules? And give example.Classifications rules help assign new objects to classes.E.g., given a new automobile insurance applicant, should he or she be classified as low risk,

medium risk or high risk?Classification rules for above example could use a variety of data, such as educational level,

salary, age, etc.

person P, P.degree = masters and P.income > 75,000

P.credit = excellent

person P, P.degree = bachelors and

(P.income 25,000 and P.income 75,000)

P.credit = goodRules are not necessarily exact: there may be some misclassificationsClassification rules can be shown compactly as a decision tree.

16Define Decision Tree.

A decision tree is a decision support tool that uses a tree-likegraphormodelof decisionsand their possible consequences, including chance event outcomes, resource costs, andutility. It is one way to display analgorithm. Decision trees are commonly used inoperationsresearch, specifically in decision analysis, to help identify a strategy most likely to reach agoal. Another use of decision trees is as a descriptive means for calculating conditionalprobabilities.

17.How do you construct of Decision Trees?Training set: a data sample in which the classification is already known.Greedy top down generation of decision trees.Each internal node of the tree partitions the data into groups based on a partitioning attribute,

and a partitioning condition for the nodeLeaf node:

all (or most) of the items at the node belong to the same class, orAll attributes have been considered, and no further partitioning is possible.

18.How do you find the best splits?Categorical attributes (with no meaningful order):Multi-way split, one child for each valueBinary split: try all possible breakup of values into two sets, and pick the best Continuous-valued attributes (can be sorted in a meaningful order)Binary split:Sort values, try each as a split point

E.g. if values are 1, 10, 15, 25, split at 1, 10, 15Pick the value that gives best splitMulti-way split:A series of binary splits on the same attribute has roughly equivalent effect

19.What is meant by Regression?

Regression deals with the prediction of a value, rather than a class.Given values for a set of variables, X1, X2, , Xn, we wish to predict the value of a variable Y.

One way is to infer coefficients a0, a1, a1, , an such thatY= a0 + a1 *X1 + a2 *X2+ + an *XnFinding such a linear polynomial is called linear regression.

In general, the process of finding a curve that fits the data is also called curve fitting.The fit may only be approximate

because of noise in the data, orbecause the relationship is not exactly a polynomialRegression aims to find coefficients that give the best possible fit.
http://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Diagram


13/26

13

20.Give an example for association rules.Retail shops are often interested in associations between different items that people buy.Someone who buys bread is quite likely also to buy milkA person who bought the book Database System Concepts is quite likely also to buy the book

Operating System Concepts.Associations information can be used in several ways.E.g. when a customer buys a particular book, an online shop may suggest associated books. Association rules:

breadmilk DB-Concepts, OS-Concepts NetworksLeft hand side: antecedent, right hand side: consequentAn association rule must have an associated population; the population consists of a set of

instancesE.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the

populationRules have an associated support, as well as an associated confidence.

21.

What is meant by support and confidence?Support is a measure of what fraction of the population satisfies both the antecedent andthe consequent of the rule.

E.g. suppose only 0.001 percent of all purchases include milk and screwdrivers. The support for

the rule is milkscrewdrivers is low.Confidence is a measure of how often the consequent is true when the antecedent is true.

E.g. the rule breadmilkhas a confidence of 80 percent if 80 percent of the purchases thatinclude bread also include milk.

22.Define Clustering.Clustering: Intuitively, finding clusters of points in the given data such that similar points lie in the

same cluster

Can be formalized using distance metrics in several waysGroup points into ksets (for a given k) such that the average distance of points from the centroid

of their assigned group is minimizedCentroid: point defined by taking average of coordinates in each dimension.Another metric: minimize average distance between every pair of points in a clusterHas been studied extensively in statistics, but on small data setsData mining systems aim at clustering techniques that can handle very large data setsE.g. the Birch clustering algorithm (more shortly)

23.What is meant by Hierarchical ClusteringExample from biological classificationthe word classification here does not mean a prediction mechanism)

chordatamammalia reptilialeopards humans snakes crocodiles

Other examples: Internet directory systems (e.g. Yahoo, more on this later)Agglomerative clustering algorithmsBuild small clusters, then cluster small clusters into bigger clusters, and so onDivisive clustering algorithmsStart with all items in a single cluster, repeatedly refine (break) clusters into smaller ones

24.Explain about Birch Algorithm.Clustering algorithms have been designed to handle very large datasetsE.g. the Birch algorithm

Main idea: use an in-memory R-tree to store points that are being clustered


14/26

14

Insert points one at a time into the R-tree, merging a new point with an existing cluster if is less

than some distance away If there are more leaf nodes than fit in memory, merge existingclusters that are close to each other At the end of first pass we get a large number ofclusters at the leaves of the R-tree Merge clusters to reduce the number of clusters

25.What is text mining?Text mining: application of data mining to textual documentscluster Web pages to find related pagescluster pages a user has visited to organize their visit historyclassify Web pages automatically into a Web directoryData visualization systems help users examine large volumes of data and detect patterns

visuallyCan visually encode large amounts of information on a single screen Humans are very good a detecting visual patterns

26What are called as web databases?Semantic web architecture and applications are a dramatic departure from earlier

database and applications generations. Semantic processing includes these earlierstatistical and natural langue techniques, and enhances these with semantic processingtools. First, Semantic Web architecture is the automated conversion and storage ofunstructured text sources in a semantic web database. Second, Semantic Web applicationsautomatically extract and process the concepts and context in the database in a range ofhighly flexible tools.

What are structured and unstructured Data?Semantic Web architecture and applications handle both structured and unstructured

data. Structured data is stored in relational databases with static classification systems, andalso in discrete documents. These databases and documents can be processed andconverted to Semantic Web databases, and then processed with unstructured data. Much of

the data we read, produce and share is now unstructured; emails, reports, presentations,media content, web pages. And, these documents are stored in many different formats; text,email files, Microsoft word processor, spreadsheet, presentation files, Lotus Notes,Adobe.pdf, and HTML. It is difficult, expensive, slow and inaccurate to attempt to classifyand store these in a structured database. All of these sources can be automaticallyconverted to a common Semantic Web database, and integrated into one commoninformation source.

Compare and contract with synthetic vs Artificial Intelligence.Semantic Web technology is NOT Artificial Intelligence. AI was a mythical marketing goalto create thinking machines. The Semantic Web supports a much more limited and realisticgoal. This is Synthetic Intelligence. The concepts and relationships stored in the SemanticWeb database are synthesized, or brought together and integrated, to automatically create

a new summary, analysis, report, email, alert; or launch another machine application. Thegoal of Synthetic Intelligence information systems is bringing together all information sourcesand user knowledge, and synthesizing these in global networks.


15/26

15

29.What are mobile databases?

The main advantage of using a mobile database in your application is offline access todatain other words, the ability to read and update data without a network connection. Thishelps avoid problems such as dropped connections, low bandwidth, and high latency thatare typical on wireless networks today.Fully connected information space

Each node of the information space has some communication capability.Some node can process information.Some node can communicate through voice channel.Some node can do both

Can be created and maintained by integrating legacy database systems, and wired andwireless systems (PCS, Cellular system, and GSM)

30.What is a Mobile Database System (MDS)?A system with the following structural and functional properties

Distributed system with mobile connectivityFull database system capabilityComplete spatial mobilityBuilt on PCS/GSM platformWireless and wired communication capability

UNIT II - OBJECT ORIENTED DATABASES

1. What us object databases?

Became commercially popular in mid 1990sYou can store the data in the same format as you use it. No paradigm shift.Did not reach full potential till the classes they store were decoupled from the database schema.Open source implementation available low cost solution now exists.

2. What is Object Oriented Database? (OODB)

A database system that incorporates all the important object-oriented conceptsSome additional features

oUnique Object identifiers

oPersistent object handling

Is the coupling of Object Oriented (OOP) Programming principles with Database Management System(DBMS) principles

oProvides access to persisted objects using the same OO-programming language

3. What are the advantages of OODBS?

Designer can specify the structure of objects and their behavior (methods)

Better interaction with object-oriented languages such as Java and C++Definition of complex and user-defined typesEncapsulation of operations and user-defined methods

4. List out the Object Database Vendors.

Matisse Software Inc.,

Objectivity Inc.,

Poet's FastObjects,

Computer Associates,

eXcelon Corporation

Db4o


16/26

16

5.Differentiate OODB and Relational DB?

OODB Relational DB

1 Uses an OO model.Data is a collection of objects

whose behavior, state, andrelationships are stored as aphysical entity.

Uses record-oriented model.Data is a collection of record types

(relations), each having collection ofrecords or tuples stored in a file.

2 Language dependence (OOLanguage specific.

Language independence (via SQL)

3 No impedance mismatch inapplication using OODB.

Impedance mismatch in OOapplication. Mapping must beperformed.

6.Write about modeling and design in OODBMS.

Basically, an OODBMS is an object database that provides DBMS capabilities to objects that havebeen created using an object-oriented programming language (OOPL). The basic principle is to addpersistence to objects and to make objects persistent. Consequently application programmers whouse OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and thelanguage has some kind of Persistent class, Database class, Database Interface, or Database APIthat provides DBMS functionality as, effectively, an extension of the OOPL.

7. What is Object data modeling?

An object consists of three parts: structure (attribute, and relationship to other objects like

aggregation, and association), behavior (a set of operations) and characteristic of types(generalization/serialization). An object is similar to an entity in ER model; therefore we begin with anexample to demonstrate the structure and relationship.

8. Define Attributes.

Attributes are like the fields in a relational model. However in the Book example we have,forattributes publishedBy and writtenBy, complex types Publisher and Author,which are also objects.

Attributes with complex objects, in RDNS, are usually other tableslinked by keys to the employeetable.Define Relationships.

Relationships: publish and written by are associations with I: N and 1:1 relationship; composed of is

an aggregation (a Book is composed of chapters). The 1: N relationship is usually realized asattributes through complex types and at the behavioral level. For example,

10. What is Generalization and serialization?

Generalization/Serialization is a relationship, which is supported in OODB through class hierarchy.An ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A subclass inherits allthe attribute and method of its superclass.

11. Define Message.

Message: means by which objects communicate, and it is a request from one object to another toexecute one of its methods. For example:Publisher_object.insert (Rose, 123,) i.e. request to execute the insert method on a Publisherobject )


17/26

17

12. Define Method.

Method: defines the behavior of an object. Methods can be used to change state by modifying itsattribute values. To query the value of selected attributes the method that responds to the messageexample is the method insert defied in the Publisher class.

13.What are the drawbacks of persistent programming languages?

oDue to power of most programming languages, it is easy to make programming errors that

damage the database.

oComplexity of languages makes automatic high-level optimization more difficult.

oDo not support declarative querying as well as relational databases

14.What are the types of Query languages?

Declarative query language Not computationally complete

Syntax based on SQL (select, from, where)

Additional flexibility (queries with user defined operators and types)

15.Give an example for Complex Data?

A Water Resource Management example1.A database of state wide water projects2.Includes a library of picture slides3.Indexing according to predefined concepts prohibitively expensive

16.What are the types of queries can be used in water resource management?

1.Geographic locations2.Reservoir levels during droughts3.Recent flood conditions, etc

17.What is Mutiversion concurrency control?

Multiversion concurrency control (abbreviated MCC orMVCC), in thedatabasefield ofcomputerscience, is a concurrency control method commonly used by database management systems toprovide concurrent access to the database and in programming languages to implementtransactionalmemory.

18.What are types of failures?

System crashes: due to hardware or software errors, resulting in loss of main memory.Media failures: due to problem with disk head or unreadable mediaApplication errors: due to logical errors in the program which may cause transactions to fail.Natural disasters: physical loss of media. (Fire, food, earth quakes, terrorism, etc)

Sabotage: intentional corruption or destruction of data.Carelessness: unintentional destruction of data by user or operator.

19. What are the general failures?

Transaction failures Transaction abortsSystem failures failure of processor, main memory, power supplyMedia failures failure of secondary storage

20. What is meant by concurrency control?Concurrency control ensures that correct results for concurrent operations are generated, whilegetting those results as quickly as possible. Thus concurrency control is an essential element for

correctness in any system where two database transactions or more, executed with time overlap, canaccess the same data, e.g., virtually in any general-purpose database system.
http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Database


18/26

18

21. What is meant by media recovery?

Media recovery deals with failures of the storage media holding the permanent database, in particulardisk failures. The traditional database approach for media recovery uses archive copies (dumps) of

the database as well as archive logs. Archive copies represent snapshots of the database and areperiodically taken. The archive log contains the log records for all committed changes which are notyet reflected in the archive copy.

22. Define logging and recovery.

Logging and recovery ensure that failures are masked to the users of transaction-based datamanagement systems by providing automatic treatment for different kinds of failures, such astransaction failures, system failures (crashes), media failures and disasters. The main goal is toguarantee the atomicity (A) and durability (D) properties of ACID transactions by providing undorecovery for failed transactions and redo recovery for committed transactions. Logging is the task ofcollecting redundant data needed for recovery

23. What is meant by transaction?A transaction comprises a unit of work performed within adatabase management system(or similarsystem) against a database, and treated in a coherent and reliable way independent of othertransactions.

24.What are the two main purposes of transaction?

1.To provide reliable units of work that allow correct recovery from failures and keep a databaseconsistent even in cases of system failure, when execution stops (completely or partially) and manyoperations upon a database remain uncompleted, with unclear status.

2.To provide isolation between programs accessing a database concurrently. If this isolation is notprovided the programs outcome are possibly erroneous.

25. What is Crash recovery?Crash recovery is needed when the whole database (transaction) system fails, e.g. due to a hardwareor software error. All transactions which were active and not yet committed at crash time have failedso that their changes must be undone. The changes for transactions that have committed before thecrash must survive. A redo recovery is needed for all changes of committed transactions that havebeen lost by the crash because the changed pages resided only in main memory but were not yetwritten out to the permanent database.

26. What is meant by disaster recovery?Disaster recovery can be achieved by maintaining a backup copy of the database at a geographicallyremote location. By continuously transferring log data from the primary database to the backup andapplying the changes there, the backup can be kept (almost) up-to-date.

What is data persistence?Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly.The life span of data needs to be determined directly or indirectly be the user and must not bedependent on system features. Additionally data once stored in a database must not be lost. Changesof a database which are done by a transaction are persistent. When a transaction is finished even a

system crash cannot put the data in danger.

Define transaction recovery.Transaction recovery (rollback) is performed when a transaction fails during normalProcessing, e.g. due to a program error or invalid input data. The log records in the log buffer and inthe log file are used to undo the changes of the failed transaction in reverse order.
http://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_system


19/26

19

UNIT I [DISTRIBUTED DATABASES]

1. What are advantages of distributed databases over conventional database?

mimics organisational structure with datalocal access and autonomy without exclusioncheaper to create and easier to expandimproved availability/reliability/performance by removing reliance on a central siteReduced communication overheadMost data access is local, less expensive and performs betterImproved processing powerMany machines handling the database rather than a single server

more complex to implementmore costly to maintainsecurity and integrity controlstandards and experience are lackingDesign issues are more complex

2. Write about distributed databases architecture


20/26

20

Defines the structure of the system

ocomponents identified

ofunctions of each component defined

ointerrelationships and interactions between components defined

3. What is a distributed database?

A logically interrelated collection of shared data (and a description of this data), physically distributedover a computer network(DDBMS) is the software that manages the DDB and provides an access mechanism that makes thisdistribution transparent to the users.

DDBS = DB + Communication

non-centralised DDBMS

Motivated by need to integrate operational data and to provide controlled accessmanages the Distributed databasemakes the distribution transparent to the user

4. What are implicit assumptions in DBMS?

Data stored at a number of sites each site logicallyconsists of a single processor.Processors at different sites are interconnected by a computer network no multiprocessorsoparallel database systems

Distributed database is a database, not a collection of files data logically related as exhibited in theusers access patterns

orelational data model

D-DBMS is a full-fledged DBMS

onot remote file system, not a TP system

5. Write the different dimensions of the problem in DDBMS

Distribution

oWhether the components of the system are located on the same machine or not

Heterogeneity

oVarious levels (hardware, communications, operating system)

oDBMS important one

data model, query language,transaction management algorithmsAutonomy

oNot well understood and most troublesome

oVarious versions

Design autonomy: Ability of a component DBMS to decide on issues related to its own design.

Communication autonomy: Ability of a component DBMS to decide whether and how to communicatewith other DBMSs.Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants

to.

6. What are the issues in DDBMS?

Data Allocation - Where to locate data and whether to replicate?Data Fragmentation - Partition the databaseDistributed catalogue managementDistributed transactionsDistributed QueriesMaking all of the above transparentto the user is the key of DDBMSs

ReplicationIf a site (or network path) fails, the data held there is unavailable


21/26

21

Consider replication (duplication) of data to improve availabilityNo replication: Disjoint fragmentsPartial replication: Site dependentFull replication: Slows down update for consistency, expensive

7. What are the advantages of distributed databases?

Capacity and incremental growthIncrease reliability and availabilityModularityReduced communication overheadProtection of valuable dataEfficiency and Flexibility

8. What are the disadvantages of distributed databases?

DDB design more complex, fragmentation & replication; extra work must be done by the DBAs toensure that the distributed nature of the system is transparent.

Economics,Concurrency control,Inexperience,Security,Difficult to maintain integrity

9. Write the Applications of DDBMS.

Manufacturing - especially multi-plant manufacturingMilitary command and controlElectronic fund transfers and electronic tradingCorporate MISAirline restrictions

Hotel chainsAny organization which has a decentralized organization structure

10. What is relation and sub-relations?

Relationviews are subsets of relations localityextra communication

Fragments of relations (sub-relations)

concurrent execution of a number of transactions that access different portions of a relationviews that cannot be defined on a single fragment will require extra processingsemantic data control (especially integrity enforcement) more difficult

11. What are the types of Fragmentation?

Horizontal Fragmentation (HF)

splitting the database by rowse.g. A-J in site 1, K-S in site 2 and T-Z in site 3oPrimary Horizontal Fragmentation (PHF)

oDerived Horizontal Fragmentation (DHF)

Vertical Fragmentation (VF)

Splitting database by columns/fieldse.g. columns/fields 1-3 in site A, 4-6 in site BTake the primary key to all sites


22/26

22

Hybrid Fragmentation (HF)Horizontal and vertical could even be combined

12. Define Vertical Fragmentation. And what is the advantage of VF?

Has been studied within the centralized contextodesign methodology

ophysical clustering

More difficult than horizontal, because more alternatives exist.

oTwo approaches :

ogrouping

attributes to fragments

osplitting

relation to fragmentsOverlapping fragments

ogrouping

Non-overlapping fragments

osplittingWe do not consider the replicated key attributes to be overlapping.

Advantage: Easier to enforce functional dependencies (for integrity checking etc.)

13. What are the Information Requirements in VF?

Application Information

oAttribute affinities

a measure that indicates how closely related the attributes areThis is obtained from more primitive usage data

oAttribute usage values

Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1,A2,,An],

14. What is the significance of Query processing?

using client-server architectureuser creates queryClient parses and sends to server(s) (SQL?)servers return appropriate Tablesclient combines into one TableIssue of data transfer cost over a network

ooptimise the query to transfer the least amount

15. What are the Query processing components?

Query language that is used

oSQL: intergalactic data speak

Query execution methodology

oThe steps that one goes through in executing high-level (declarative) user queries.

Query optimizationoHow do we determine the best execution plan?

1 if attribute Aj is referenced by query qi

0 otherwiseuse(qi,Aj) =


23/26

23

16. What are the objectives of Query Optimization?

Minimize a cost functionI/O cost + CPU cost + communication costThese might have different weights in different distributed environmentsWide area networks

ocommunication cost will dominatelow bandwidthlow speedhigh protocol overhead

omost algorithms ignore all other cost components

Local area networks

ocommunication cost not that dominant

ototal cost function should be considered

Can also maximize throughput

17. What are the types of Optimizers?

Exhaustive search

ocost-based

ooptimal

ocombinatorial complexity in the number of relations

oHeuristics

not optimalregroup common sub-expressionsperform selection, projection firstreplace a join by a series of semijoinsreorder operations to reduce intermediate relation sizeoptimize individual operations

18. What is Optimization Granularity

Single query at a time

ocannot use common intermediate results

Multiple queries at a time

oefficient if many similar queries

odecision space is much larger

19. What are the types of optimization Timing?

Static

ocompilation optimize prior to the execution

odifficult to estimate the size of the intermediate results error propagation

ocan amortize over many executions

oR*

Dynamic

orun time optimization

oexact information on the intermediate relation sizes

ohave to reoptimize for multiple executions

oDistributed INGRES

Hybrid

ocompile using a static algorithm

oif the error in estimate sizes > threshold, reoptimize at run time

oMERMAID

20. What are statistics in query optimization?


24/26

24

Relation

ocardinality

osize of a tuple

ofraction of tuples participating in a join with another relation

Attribute

ocardinality of domain

oactual number of distinct valuesCommon assumptions

oindependence between different attribute values

ouniform distribution of attribute values within their domain

21. What are the decision sites are used in Query Optimization?

Centralized

osingle site determines the best schedule

osimple

oneed knowledge about the entire distributed database

Distributedocooperation among sites to determine the schedule

oneed only local information

ocost of cooperation

Hybrid

oone site determines the global schedule

each site optimizes the local sub queries

22. What are the types of network topologies used in DDBMS?

Wide area networks (WAN) point-to-pointocharacteristics

low bandwidthlow speedhigh protocol overhead

ocommunication cost will dominate; ignore all other cost factors

oglobal schedule to minimize communication cost

olocal schedules according to centralized query optimization

Local area networks (LAN)

ocommunication cost not that dominant

ototal cost function should be considered

obroadcasting can be exploited (joins)

o

special algorithms exist for star networks

23. What is Query Decomposition?

Input: Calculus query on global relations

Normalization

omanipulate query quantifiers and qualification

Analysis

odetect and reject incorrect queries

opossible for only a subset of relational calculus

Simplification

oeliminate redundant predicates

Restructuring


25/26

25

ocalculus query algebraic query

omore than one translation is possible

ouse transformation rules

24. What is Data Localization?

Input: Algebraic query on distributed relationsDetermine which fragments are involvedLocalization program

osubstitute for each global query its materialization program

ooptimize

25. What is Global Query Optimization?

Input: Fragment queryFind the best(not necessarily optimal) global schedule

oMinimize a cost function

oDistributed join processing

Bushy vs. linear treesWhich relation to ship where?Ship-whole vs ship-as-needed

oDecide on the use of semijoins

Semijoin saves on communication at the expense of more local processing.

oJoin methods

nested loop vs ordered joins (merge join or hash join)

26. What are the rules of DDBMS?

1.Local autonomy2.No reliance on a central site

3.Continuous operation4.Location independence5.Fragmentation independence6.Replication independence7.Distributed Query processing8.Distributed transaction processing9.Hardware independence10.Operating System independence11.Network independence12.Database independence

27. Define Transaction.

A transaction is a collection of actions that make consistent transformations of system states whilepreserving system consistency.

Concurrency transparency

Failure transparency

28. What are the properties of Transaction?

Atomicity: All or nothingConsistency: No violation of integrity constraintsIsolation: Concurrent changes invisible and serializableDurability: Committed updates persist

29. What is concurrency control?


26/26

The problem of synchronizing concurrent transactions such that the consistency of the database ismaintained while, at the same time, maximum degree of concurrency is achieved.

Anomalies:

oLost updates

The effects of some transactions are not reflected on the database.

oInconsistent retrievals

A transaction, if it reads the same data item more than once, should always read the same value.

30. What are the modes in distributed locks?

Four modes of management possible:

oCentralised 2PL

Read any copy, update all for updatesSingle site, bottleneck, failure?

oPrimary Copy 2PL

Distributes locks, one copy designated primary, others slavesOnly primary copy locked for updates, slaves updated later

oDistributed 2PL

Each site manages its own data locksAll copies locked for an update, high cost of comms

oMajority Locking

31. What are the Distributed Reliability Protocols?

Commit protocolsTermination protocolsRecovery protocols

Independent recovery non-blocking termination

dbms unit5

Documents

Transcript of dbms unit5