dbms unit5

download dbms unit5

of 26

Transcript of dbms unit5

  • 7/27/2019 dbms unit5

    1/26

    1

    UNIT V - CURRENT ISSUES

    TWO MARKS

    1. Write any four rules defined in relational database design?

    All database management must take place using the relational database's innate functionality

    All information in the database must be stored as values in a table

    All database information must be accessible through the combination of a table name, primary key and

    column name.

    The database must useNULL valuesto indicate missing or unknown information

    The database schema must be described using the relational database syntax

    The database may support multiple languages, but it must support at least one language that provides full

    database functionality (e.g. SQL)

    The system must be able to update all updatable views

    The database must provide single-operation insert, update and delete functionality

    Changes to the physical structure of the database must be transparent to applications and users.

    Changes to the logical structure of the database must be transparent to applications and users.

    The database must natively support integrity constraints.

    Changes to the distribution of the database (centralized vs. distributed) must be transparent to

    applications and users.

    Any languages supported by the database must not be able to subvert integrity controls

    2. What are knowledge bases?

    A knowledge base (abbreviated KB, kb or ) is a special kind of database for knowledge

    management. A knowledge base provides a means for information to be collected, organized, shared,

    searched and utilized.

    3. What are characteristics of knowledge bases?

    characteristics of KBSs:

    intelligent information processing systems

    representation of domain of interest symbolic representation

    -->problem solving by symbol-manipulation Symbolic programs

    http://databases.about.com/cs/sql/a/aa042803a.htmhttp://databases.about.com/cs/sql/a/aa042803a.htmhttp://databases.about.com/cs/sql/a/aa042803a.htmhttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Knowledge_managementhttp://en.wikipedia.org/wiki/Databasehttp://databases.about.com/cs/sql/a/aa042803a.htm
  • 7/27/2019 dbms unit5

    2/26

    2

    4. What are the main components in Knowledge bases?

    knowledge-base (KB)

    inference engine

    case-specific databaseexplanation subsystem

    knowledge acquisition subsystem

    user interface ( user)

    specially intefaces

    developer interface ( knowledge engineer, human expert)

    5. What are called active databases?

    An Active Database is adatabasethat includes an event driven architecture (often in the form ofECA

    rules) which can respond to conditions both inside and outside the database. Possible uses include

    security monitoring, alerting, statistics gathering and authorization. Most modern relational databases

    include active database features in the form ofSQL Triggers.

    6. Differentiate database and information system.

    7. What are deductive databases?

    A deductive database is adatabase systemthat can makedeductions(i.e.: conclude additional facts)

    based onrulesandfactsstored in the (deductive) database.Datalogis the language typically used to

    specify facts, rules and queries in deductive databases. Deductive databases have grown out of the

    desire to combine logic programmingwith relational databases to construct systems that support a

    powerful formalism and are still fast and able to deal with very large datasets. Deductive databases

    are more expressive than relational databases but less expressive than logic programming systems.

    Deductive databases have not found widespread adoptions outside academia, but some of their

    concepts are used in today's relational databases to support the advanced features of more recent

    SQLstandards.

    8. What are parallel databases?

    Parallel machines are becoming quite common and affordable

    Prices of microprocessors, memory and disks have dropped sharply

    Recent desktop computers feature multiple processors and this trend is projected to accelerate

    Databases are growing increasingly large

    Large volumes of transaction data are collected and stored for later analysis.

    multimedia objects like images are increasingly stored in databasesLarge-scale parallel database systems increasingly used for:

    http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wiktionary.org/wiki/rulehttp://en.wiktionary.org/wiki/rulehttp://en.wiktionary.org/wiki/rulehttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Factshttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/SQLhttp://en.wikipedia.org/wiki/Relational_databasehttp://en.wikipedia.org/wiki/Logic_programminghttp://en.wikipedia.org/wiki/Dataloghttp://en.wikipedia.org/wiki/Factshttp://en.wiktionary.org/wiki/rulehttp://en.wikipedia.org/wiki/Deductive_reasoninghttp://en.wikipedia.org/wiki/Database_systemhttp://en.wikipedia.org/wiki/Database_triggerhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Event_Condition_Actionhttp://en.wikipedia.org/wiki/Database
  • 7/27/2019 dbms unit5

    3/26

    3

    storing large volumes of data

    processing time-consuming decision-support queries

    providing high throughput for transaction processing

    9. What are the types of skew?10. Attribute-value skew

    Some values appear in the partitioning attributes of many tuples; all the tuples with the same value for the

    partitioning attribute end up in the same partition.

    Can occur with range-partitioning and hash-partitioning.

    Partition skew

    With range-partitioning, badly chosen partition vector may assign too many tuples to some partitions and

    too few to others.

    Less likely with hash-partitioning if a good hash-function is chosen.

    10. How do you handling skew using histograms?

    Balanced partitioning vector can be constructed from histogram in a relatively straightforward fashion.

    Assume uniform distribution within each range of the histogram

    Histogram can be constructed by scanning relation, or sampling (blocks containing) tuples of the relation

    11. How do you handling skew using virtual processor partitioning?

    Skew in range partitioning can be handled elegantly using virtual processor partitioning: Create a large

    number of partitions (say 10 to 20 times the number of processors). Assign virtual processors to

    partitions either in round-robin fashion or based on estimated cost of processing each virtual partition

    Basic idea:

    If any normal partition would have been skewed, it is very likely the skew is spread over a number of

    virtual partitions

    Skewed virtual partitions get spread across a number of processors, so work gets distributed evenly!

    12. What is interquery Parallelism?

    Queries/transactions execute in parallel with one another.

    Increases transaction throughput; used primarily to scale up a transaction processing system to support a

    larger number of transactions per second.

    Easiest form of parallelism to support, particularly in a shared-memory parallel database, because even

    sequential database systems support concurrent processing.

    More complicated to implement on shared-disk or shared-nothing architectures

    Locking and logging must be coordinated by passing messages between processors.

    Data in a local buffer may have been updated at another processor.

  • 7/27/2019 dbms unit5

    4/26

    4

    Cache-coherency has to be maintained reads and writes of data in buffer must find latest version

    of data.

    Define cache coherency protocol with example.

    Example of a cache coherency protocol for shared disk systems:

    Before reading/writing to a page, the page must be locked in shared/exclusive mode.On locking a page, the page must be read from disk

    Before unlocking a page, the page must be written to disk if it was modified.

    More complex protocols with fewer disk reads/writes exist.

    Cache coherency protocols for shared-nothing systems are similar. Each database page is assigned a

    home processor. Requests to fetch the page or write it to disk are sent to the home processor.

    14. What is meant by intraquery parallelism?

    Execution of a single query in parallel on multiple processors/disks; important for speeding up long-running queries.

    Two complementary forms of intraquery parallelism :

    Interoperation Parallelism parallelize the execution of each individual operation in the query.

    Interoperation Parallelism execute the different operations in a query expression in parallel. The

    first form scales better with increasing parallelism because the number of tuples processed by each

    operation is typically more than the number of operations in a query.

    15. Write some design issues of parallel systems?

    Parallel loading of data from external sources is needed in order to handle large volumes of incoming

    data.

    Resilience to failure of some processors or disks.

    On-line reorganization of data and schema changes must be supported.

    Also need support for on-line repartitioning and schema changes (executed concurrently with other

    processing).

    16. What are the multimedia data types?

    Text,Image,Video,Audio mixed multimedia data

    17. What is image database system?

    Image Database is searchable electronic catalog or database which allows you to organize and list

    images by topics, modules, or categories. The Image Database will provide the student with important

    information such as image title, description, and thumbnail picture. Additional information can be

    provided such as creator of the image, filename, and keywords that will help students to search

    through the database for specific images. Before you and your students can use Image Database,

    you must add it to your course.

  • 7/27/2019 dbms unit5

    5/26

    5

    18. Give a note on image retrieval system.

    An image retrieval system is a computer system for browsing, searching and retrieving images from a

    large database of digital images. Most traditional and common methods of image retrieval utilize

    some method of addingmetadatasuch ascaptioning,keywords, or descriptions to the images so that

    retrieval can be performed over the annotation words. Manual image annotation is time-consuming,

    laborious and expensive; to address this, there has been a large amount of research done on

    automatic image annotation. Additionally, the increase in social web applicationsand the semantic

    webhave inspired the development of several web-based image annotation tools.

    19. Give a note on image search and methods.

    Image search is a specialized data search used to find images. To search for images, a user may

    provide query terms such as keyword, image file/link, or click on some image, and the system will

    return images "similar" to the query. The similarity used for search criteria could be meta tags, color

    distribution in images, region/shape attributes, etc.

    Image meta search- search of images based on associated metadata

    such as keywords, text, etc.

    Content-based image retrieval (CBIR) the application of computer

    vision to the image retrieval. CBIR aims at avoiding the use of textual descriptions and instead

    retrieves images based on similarities in their contents (textures, colors, shapes etc.) to a user-

    supplied query image or user-specified image features.

    20. How the image data are classified?

    Archives - usually contain large volumes of structured or semi-

    structured homogeneous data pertaining to specific topics.

    Domain-Specific Collection - this is a homogeneous collection

    providing access to controlled users with very specific objectives. Examples of such a

    collection are biomedical and satellite image databases.

    Enterprise Collection - a heterogeneous collection of images

    that is accessible to users within an organizations intranet. Pictures may be stored in many

    different locations.

    Personal Collection - usually consists of a largely

    homogeneous collection and is generally small in size, accessible primarily to its owner, and

    usually stored on a local storage media.

    http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Computer_visionhttp://en.wikipedia.org/wiki/Content-based_image_retrievalhttp://en.wikipedia.org/wiki/Image_meta_searchhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Semantic_webhttp://en.wikipedia.org/wiki/Web_applicationhttp://en.wikipedia.org/wiki/Automatic_image_annotationhttp://en.wikipedia.org/wiki/Keyword_(internet_search)http://en.wikipedia.org/wiki/Captionhttp://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Digital_imagehttp://en.wikipedia.org/wiki/Database
  • 7/27/2019 dbms unit5

    6/26

    6

    Web - World Wide Web images are accessible to everyone with

    an Internet connection. These image collections are semi-structured, non-homogeneous and

    massive in volume, and are usually stored in large disk arrays.

    21. What are text databases?

    Large collections of documents from various sources: news articles, research papers, books,

    digital libraries, e-mail messages, and web pages, library database, etc.

    UNIT IV - DATABASE DESIGN ISSUES

    TWO MARKS

    1. What are attributes? Give example.

    An entity is represented by a set of attributes. Attributes are descriptive properties possessed by eachmember of an entity set.Example: possible attributes of customer entity are customer name, customer id, Customer Street,customer city.

    2. Define single valued and multi valued attributes.

    Single valued attributes: attributes with a single value for a particular entity are called single valuedattributes.Multi valued attributes: Attributes with a set of value for a particular entity are called multivalve

    attributes.

    3. What are stored and derived attributes?

    Stored attributes: The attributes stored in a data base are called stored attributes.Derived attributes: The attributes that are derived from the stored attributes are called derivedattributes.

    4. Define weak and strong entity sets?

    Weak entity set: entity set that do not have key attribute of their own are called weak entity sets.Strong entity set: Entity set that has a primary key is termed a strong entity set.

    5. What is the use of integrity constraints?

    Integrity constraints ensure that changes made to the database by authorized users do not result in aloss of data consistency. Thus integrity constraints guard against accidental damage to the database.

    6. Mention the various levels in security measures.

    Database system, Operating system, Network, Physical, Human

  • 7/27/2019 dbms unit5

    7/26

    7

    7. What are the types of attributes?

    Simple: Each entity has a single atomic value for the attribute.Composite: The attribute may be composed of several components.Multi-valued: An entity may have multiple values for that attribute.

    8. Write the entity set corresponding to the entity type car.

    Registration (RegistrationNumber, State), VehicleID, Make, Model, Year, (Color)Example:((ABC 123, TEXAS), TK629, Ford Mustang, convertible, 1999,(red,balck))

    9. What are Weak entity types?

    An entity that does not have a key attribute. A Weak attribute entity must participate in an identifyingrelationship type with an owner or identifying entity type.

    10. Define Normalization.

    Normalization is carried out in practice so that the resulting designs are of high quality and meet thedesirable properties

    11. What is Demoralization?

    The process of storing the join of higher normal form relations as a base relation which is in a lowernormal form

    12. What is the need of normalization?

    Mixing attributes of multiple entities may cause problemsInformation is stored redundantly wasting storageProblems with update anomalies

    13. What is integrity? And what are the types of integrity?

    Integrity here refers to the CORRECTNESS & CONSISTENCY of the datastored in the databaseTypes: Database Integrity, Entity integrity,Referential Integrity

    14. What is meant by consistency?

    It ensures the truthfulness of the database. Theconsistencyproperty ensures that any transaction thedatabase performs will take it from one consistent state to another. The consistency property does notsay how the DBMS should handle an inconsistency other than ensure the database is clean at theend of the transaction.

    15. What is an entity relationship model?

    The entity relationship model is a collection of basic objects called entities and relationship amongthose objects. An entity is a thing or object in the real world that is distinguishable from other objects.

    16. Define query optimization.

    Query optimization refers to the process of finding the lowest cost method of evaluating a givenquery.

    http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)http://en.wikipedia.org/wiki/Consistency_(database_systems)
  • 7/27/2019 dbms unit5

    8/26

    8

    17. What are the tunable parameters?

    Tuning of hardware

    Tuning of schemaTuning of indicesTuning of materialized viewsTuning of transactions

    18. What is hardware tuning?

    Even well-tuned transactions typically require a few I/O operationsTypical disk supports about 100 random I/O operations per secondSuppose each transaction requires just 2 random I/O operations.

    Then to support ntransactions per second, we need to stripe data

    across n/50 disks (ignoring skew)Number of I/O operations per transaction can be reduced by keepingmore data in memory

    If all data is in memory, I/O needed only for writesKeeping frequently used data in memory reduces disk accesses,

    reducing number of disks required, but has a memory costWhat is mean by schema tuning?

    Vertically partition relations to isolate the data that is accessed mostoften -- only fetch needed information.

    E.g., split accountinto two, (account-number,

    branch-name) and (account-number, balance).Improve performance by storing a denormalized relationE.g., store join of account and deposi tor;branch-name and balance information isrepeated for each holder of an account, but

    join need not be computed repeatedly.Cluster together on the same disk page records that would

    match in a frequently required join,

    compute join very efficiently when required.

    What is Index tuning?Create appropriate indices to speed up slow queries/updatesSpeed up slow updates by removing excess indices (tradeoff between

    queries and updates)Choose type of index (B-tree/hash) appropriate for most frequent types

    of queries.Choose which index to make clustered>Index tuning wizards look at past history of queries and updates (the

    workload) and recommend which indices would be best for theworkload

  • 7/27/2019 dbms unit5

    9/26

    9

    22.What is the use of performance Simulation?Performance simulation using queuing model useful to predictbottlenecks as well as the effects of tuning changes, even without

    access to real system.What are temporal databases?

    Non Temporalstore only a single state of the real world, usually the most recent stateclassified as snapshot databasesapplication developers and database designers need to code for time varying data requirements eg

    history tables, forecast reports etcTemporal

    stores upto two dimensions of time i.e VALID (stated) time and TRANSACTION (logged) timeClassified as historical, rollback or bi -temporalNo need for application developers or database designers to code for time varying data requirements

    i.e time is inherently supportedGive Examples of application domains dealing with time varying data.Financial Apps (e.g. history of stock market data)Insurance Apps (e.g. when were the policies in effect)

    Reservation Systems (e.g. when is which room in a hotel booked)Medical Information Management Systems (e.g. patient records)Decision Support Systems (e.g. planning future contigencies)CRM applications (eg customer history / future)HR applications (e.g Date tracked positions in hierarchies)What is a SDBMS?

    A SDBMS is a software module thatcan work with an underlying DBMS supports spatial data models, spatial abstract data types (ADTs) and a

    query language from which these ADTs are callablesupports spatial indexing, efficient algorithms forprocessing spatial operations, and domain specific rules for query optimization

    Examples: Oracle Spatial data cartridge, ESRI SDE

    26. Give examples for spatial databases.

    Consider a spatial dataset with:County boundary (dashed white line)Census block - name, area, population, boundary (dark line)->Water bodies (dark polygons)Satellite Imagery (gray scale pixels)Storage in a SDBMS table:

    create table census_blocks (name string,area float,population number,boundary polygon );

    27. How is a SDBMS different from a GIS?

    GIS is a software to visualize and analyze spatial data using spatial analysis functions such asSearch Thematic search, search by region, (re-)classificationLocation analysis Buffer, corridor, overlayTerrain analysis Slope/aspect, catchment, drainage networkFlow analysis Connectivity, shortest pathDistribution Change detection, proximity, nearest neighborSpatial analysis/Statistics Pattern, centrality, autocorrelation, indices of similarity, topology: hole

    descriptionMeasurements Distance, perimeter, shape, adjacency, directionGIS uses SDBMS

    to store, search, query, share large spatial data sets

  • 7/27/2019 dbms unit5

    10/26

    10

    28. What are the components of a SDBMS?

    Recall: a SDBMS is a software module thatcan work with an underlying DBMSsupports spatial data models, spatial ADTs and a query language from which these ADTs are callablesupports spatial indexing, algorithms for processing spatial operations, and domain specific rules for query

    optimization]-->Components includeSpatial data models, query language, query processing, file organization and indices, query optimization,

    etc.

    UNIT III EMERGING SYSTEMS

    1.What is client/server model?Networked computing modelProcesses distributed between clients and serversClient Workstation (usually a PC) that requests and uses a service

    Server Computer (PC/mini/mainframe) that provides a serviceFor DBMS, server is a database server

    2.Give notes on database Server Architectures.2-tiered approachClient is responsible forI/O processing logicSome business rules logicServer performs all data storage and access processing DBMS is only on server

    3What are the advantages of database server?Clients do not have to be as powerful

    Greatly reduces data traffic on the networkImproved data integrity since it is all processed centrallyStored procedures some business rules done on server>4What are the advantages of Three-Tier Architectures?ScalabilityTechnological flexibilityLong-term cost reductionBetter match of systems to business needsImproved customer serviceCompetitive advantageReduced risk5.What are the challenges of Three-tier Architectures?High short-term costsTools and trainingExperienceIncompatible standardsLack of compatible end-user tools6.What are the Client/Server Security levels are used?Network environment complex security issuesSecurity levels:System-level password security-->for allowing access to the systemDatabase-level password security

    for determining access privileges to tables; read/update/insert/delete privileges Secure client/server communication via encryption

  • 7/27/2019 dbms unit5

    11/26

    11

    >7.>Define data warehousing.]-->Data sources often store only current data, not historical dataCorporate decision making requires a unified view of all organizational data, including historical

    dataA data warehouse is a repository (archive) of information gathered from multiple sources, stored

    under a unified schema, at a single siteGreatly simplifies querying, permits study of historical trendsShifts decision support query load away from transaction processing systems

    8What are the Design Issues in data warehousing?When and how to gather dataSource driven architecture: data sources transmit new information to warehouse, either

    continuously or periodically (e.g. at night)Destination driven architecture: warehouse periodically requests new information from data

    sourcesKeeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is

    too expensive>What schema to use

    Schema integrationData cleansingE.g. correct mistakes in addresses (misspellings, zip code errors)Merge address lists from different sources and purge duplicatesHow to propagate updatesWarehouse schema may be a (materialized) view of schema from data sources What data to summarizeRaw data may be too large to store on-lineAggregate values (totals/subtotals) often sufficeQueries on raw data can often be transformed by query optimizer to use aggregate values 9What is the data warehouse Schemas?Dimension values are usually encoded using small integers and mapped to full values via

    dimension tablesResultant schema is called a star schemaMore complicated schema structuresSnowflake schema: multiple levels of dimension tablesConstellation: multiple fact tables10.Give a example for data warehouse Schema

    11.Define Data Mining.Data mining is the process of semi-automatically analyzing large databases to find usefulpatterns

    12.Define prediction.Prediction based on past history

    Predict if a credit card applicant poses a good credit risk, based on some attributes (income, jobtype, age, ..) and past history

    Predict if a pattern of phone calling card usage is likely to be fraudulent13.Give some examples of prediction mechanisms.

    ]-->ClassificationGiven a new item whose class is unknown, predict to which class it belongs Regression formulaeGiven a set of mappings for an unknown function, predict the function result for a new parameter

    value14.What are the descriptive Patterns in data mining?

    Associations: Find books that are often bought by similar customers. If a new such

    customer buys one such book, suggest the others too. Associations may be used as a firststep in detecting causation. E.g. association between exposure to chemical X and cancer,

  • 7/27/2019 dbms unit5

    12/26

    12

    Clusters: E.g. typhoid cases were clustered in an area surrounding a contaminated wellDetection of clusters remains important in detecting epidemics

    15.What is classification Rules? And give example.Classifications rules help assign new objects to classes.E.g., given a new automobile insurance applicant, should he or she be classified as low risk,

    medium risk or high risk?Classification rules for above example could use a variety of data, such as educational level,

    salary, age, etc.

    person P, P.degree = masters and P.income > 75,000

    P.credit = excellent

    person P, P.degree = bachelors and

    (P.income 25,000 and P.income 75,000)

    P.credit = goodRules are not necessarily exact: there may be some misclassificationsClassification rules can be shown compactly as a decision tree.

    16Define Decision Tree.

    A decision tree is a decision support tool that uses a tree-likegraphormodelof decisionsand their possible consequences, including chance event outcomes, resource costs, andutility. It is one way to display analgorithm. Decision trees are commonly used inoperationsresearch, specifically in decision analysis, to help identify a strategy most likely to reach agoal. Another use of decision trees is as a descriptive means for calculating conditionalprobabilities.

    17.How do you construct of Decision Trees?Training set: a data sample in which the classification is already known.Greedy top down generation of decision trees.Each internal node of the tree partitions the data into groups based on a partitioning attribute,

    and a partitioning condition for the nodeLeaf node:

    all (or most) of the items at the node belong to the same class, orAll attributes have been considered, and no further partitioning is possible.

    18.How do you find the best splits?Categorical attributes (with no meaningful order):Multi-way split, one child for each valueBinary split: try all possible breakup of values into two sets, and pick the best Continuous-valued attributes (can be sorted in a meaningful order)Binary split:Sort values, try each as a split point

    E.g. if values are 1, 10, 15, 25, split at 1, 10, 15Pick the value that gives best splitMulti-way split:A series of binary splits on the same attribute has roughly equivalent effect

    19.What is meant by Regression?

    Regression deals with the prediction of a value, rather than a class.Given values for a set of variables, X1, X2, , Xn, we wish to predict the value of a variable Y.

    One way is to infer coefficients a0, a1, a1, , an such thatY= a0 + a1 *X1 + a2 *X2+ + an *XnFinding such a linear polynomial is called linear regression.

    In general, the process of finding a curve that fits the data is also called curve fitting.The fit may only be approximate

    because of noise in the data, orbecause the relationship is not exactly a polynomialRegression aims to find coefficients that give the best possible fit.

    http://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Diagramhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Conditional_probabilityhttp://en.wikipedia.org/wiki/Objective_%28goal%29http://en.wikipedia.org/wiki/Decision_analysishttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Operations_researchhttp://en.wikipedia.org/wiki/Algorithmhttp://en.wikipedia.org/wiki/Utilityhttp://en.wikipedia.org/wiki/Probabilityhttp://en.wikipedia.org/wiki/Causal_modelhttp://en.wikipedia.org/wiki/Diagram
  • 7/27/2019 dbms unit5

    13/26

    13

    20.Give an example for association rules.Retail shops are often interested in associations between different items that people buy.Someone who buys bread is quite likely also to buy milkA person who bought the book Database System Concepts is quite likely also to buy the book

    Operating System Concepts.Associations information can be used in several ways.E.g. when a customer buys a particular book, an online shop may suggest associated books. Association rules:

    breadmilk DB-Concepts, OS-Concepts NetworksLeft hand side: antecedent, right hand side: consequentAn association rule must have an associated population; the population consists of a set of

    instancesE.g. each transaction (sale) at a shop is an instance, and the set of all transactions is the

    populationRules have an associated support, as well as an associated confidence.

    21.

    What is meant by support and confidence?Support is a measure of what fraction of the population satisfies both the antecedent andthe consequent of the rule.

    E.g. suppose only 0.001 percent of all purchases include milk and screwdrivers. The support for

    the rule is milkscrewdrivers is low.Confidence is a measure of how often the consequent is true when the antecedent is true.

    E.g. the rule breadmilkhas a confidence of 80 percent if 80 percent of the purchases thatinclude bread also include milk.

    22.Define Clustering.Clustering: Intuitively, finding clusters of points in the given data such that similar points lie in the

    same cluster

    Can be formalized using distance metrics in several waysGroup points into ksets (for a given k) such that the average distance of points from the centroid

    of their assigned group is minimizedCentroid: point defined by taking average of coordinates in each dimension.Another metric: minimize average distance between every pair of points in a clusterHas been studied extensively in statistics, but on small data setsData mining systems aim at clustering techniques that can handle very large data setsE.g. the Birch clustering algorithm (more shortly)

    23.What is meant by Hierarchical ClusteringExample from biological classificationthe word classification here does not mean a prediction mechanism)

    chordatamammalia reptilialeopards humans snakes crocodiles

    Other examples: Internet directory systems (e.g. Yahoo, more on this later)Agglomerative clustering algorithmsBuild small clusters, then cluster small clusters into bigger clusters, and so onDivisive clustering algorithmsStart with all items in a single cluster, repeatedly refine (break) clusters into smaller ones

    24.Explain about Birch Algorithm.Clustering algorithms have been designed to handle very large datasetsE.g. the Birch algorithm

    Main idea: use an in-memory R-tree to store points that are being clustered

  • 7/27/2019 dbms unit5

    14/26

    14

    Insert points one at a time into the R-tree, merging a new point with an existing cluster if is less

    than some distance away If there are more leaf nodes than fit in memory, merge existingclusters that are close to each other At the end of first pass we get a large number ofclusters at the leaves of the R-tree Merge clusters to reduce the number of clusters

    25.What is text mining?Text mining: application of data mining to textual documentscluster Web pages to find related pagescluster pages a user has visited to organize their visit historyclassify Web pages automatically into a Web directoryData visualization systems help users examine large volumes of data and detect patterns

    visuallyCan visually encode large amounts of information on a single screen Humans are very good a detecting visual patterns

    26What are called as web databases?Semantic web architecture and applications are a dramatic departure from earlier

    database and applications generations. Semantic processing includes these earlierstatistical and natural langue techniques, and enhances these with semantic processingtools. First, Semantic Web architecture is the automated conversion and storage ofunstructured text sources in a semantic web database. Second, Semantic Web applicationsautomatically extract and process the concepts and context in the database in a range ofhighly flexible tools.

    What are structured and unstructured Data?Semantic Web architecture and applications handle both structured and unstructured

    data. Structured data is stored in relational databases with static classification systems, andalso in discrete documents. These databases and documents can be processed andconverted to Semantic Web databases, and then processed with unstructured data. Much of

    the data we read, produce and share is now unstructured; emails, reports, presentations,media content, web pages. And, these documents are stored in many different formats; text,email files, Microsoft word processor, spreadsheet, presentation files, Lotus Notes,Adobe.pdf, and HTML. It is difficult, expensive, slow and inaccurate to attempt to classifyand store these in a structured database. All of these sources can be automaticallyconverted to a common Semantic Web database, and integrated into one commoninformation source.

    Compare and contract with synthetic vs Artificial Intelligence.Semantic Web technology is NOT Artificial Intelligence. AI was a mythical marketing goalto create thinking machines. The Semantic Web supports a much more limited and realisticgoal. This is Synthetic Intelligence. The concepts and relationships stored in the SemanticWeb database are synthesized, or brought together and integrated, to automatically create

    a new summary, analysis, report, email, alert; or launch another machine application. Thegoal of Synthetic Intelligence information systems is bringing together all information sourcesand user knowledge, and synthesizing these in global networks.

  • 7/27/2019 dbms unit5

    15/26

    15

    29.What are mobile databases?

    The main advantage of using a mobile database in your application is offline access todatain other words, the ability to read and update data without a network connection. Thishelps avoid problems such as dropped connections, low bandwidth, and high latency thatare typical on wireless networks today.Fully connected information space

    Each node of the information space has some communication capability.Some node can process information.Some node can communicate through voice channel.Some node can do both

    Can be created and maintained by integrating legacy database systems, and wired andwireless systems (PCS, Cellular system, and GSM)

    30.What is a Mobile Database System (MDS)?A system with the following structural and functional properties

    Distributed system with mobile connectivityFull database system capabilityComplete spatial mobilityBuilt on PCS/GSM platformWireless and wired communication capability

    UNIT II - OBJECT ORIENTED DATABASES

    1. What us object databases?

    Became commercially popular in mid 1990sYou can store the data in the same format as you use it. No paradigm shift.Did not reach full potential till the classes they store were decoupled from the database schema.Open source implementation available low cost solution now exists.

    2. What is Object Oriented Database? (OODB)

    A database system that incorporates all the important object-oriented conceptsSome additional features

    oUnique Object identifiers

    oPersistent object handling

    Is the coupling of Object Oriented (OOP) Programming principles with Database Management System(DBMS) principles

    oProvides access to persisted objects using the same OO-programming language

    3. What are the advantages of OODBS?

    Designer can specify the structure of objects and their behavior (methods)

    Better interaction with object-oriented languages such as Java and C++Definition of complex and user-defined typesEncapsulation of operations and user-defined methods

    4. List out the Object Database Vendors.

    Matisse Software Inc.,

    Objectivity Inc.,

    Poet's FastObjects,

    Computer Associates,

    eXcelon Corporation

    Db4o

  • 7/27/2019 dbms unit5

    16/26

    16

    5.Differentiate OODB and Relational DB?

    OODB Relational DB

    1 Uses an OO model.Data is a collection of objects

    whose behavior, state, andrelationships are stored as aphysical entity.

    Uses record-oriented model.Data is a collection of record types

    (relations), each having collection ofrecords or tuples stored in a file.

    2 Language dependence (OOLanguage specific.

    Language independence (via SQL)

    3 No impedance mismatch inapplication using OODB.

    Impedance mismatch in OOapplication. Mapping must beperformed.

    6.Write about modeling and design in OODBMS.

    Basically, an OODBMS is an object database that provides DBMS capabilities to objects that havebeen created using an object-oriented programming language (OOPL). The basic principle is to addpersistence to objects and to make objects persistent. Consequently application programmers whouse OODBMSs typically write programs in a native OOPL such as Java, C++ or Smalltalk, and thelanguage has some kind of Persistent class, Database class, Database Interface, or Database APIthat provides DBMS functionality as, effectively, an extension of the OOPL.

    7. What is Object data modeling?

    An object consists of three parts: structure (attribute, and relationship to other objects like

    aggregation, and association), behavior (a set of operations) and characteristic of types(generalization/serialization). An object is similar to an entity in ER model; therefore we begin with anexample to demonstrate the structure and relationship.

    8. Define Attributes.

    Attributes are like the fields in a relational model. However in the Book example we have,forattributes publishedBy and writtenBy, complex types Publisher and Author,which are also objects.

    Attributes with complex objects, in RDNS, are usually other tableslinked by keys to the employeetable.Define Relationships.

    Relationships: publish and written by are associations with I: N and 1:1 relationship; composed of is

    an aggregation (a Book is composed of chapters). The 1: N relationship is usually realized asattributes through complex types and at the behavioral level. For example,

    10. What is Generalization and serialization?

    Generalization/Serialization is a relationship, which is supported in OODB through class hierarchy.An ArtBook is a Book, therefore the ArtBook class is a subclass of Book class. A subclass inherits allthe attribute and method of its superclass.

    11. Define Message.

    Message: means by which objects communicate, and it is a request from one object to another toexecute one of its methods. For example:Publisher_object.insert (Rose, 123,) i.e. request to execute the insert method on a Publisherobject )

  • 7/27/2019 dbms unit5

    17/26

    17

    12. Define Method.

    Method: defines the behavior of an object. Methods can be used to change state by modifying itsattribute values. To query the value of selected attributes the method that responds to the messageexample is the method insert defied in the Publisher class.

    13.What are the drawbacks of persistent programming languages?

    oDue to power of most programming languages, it is easy to make programming errors that

    damage the database.

    oComplexity of languages makes automatic high-level optimization more difficult.

    oDo not support declarative querying as well as relational databases

    14.What are the types of Query languages?

    Declarative query language Not computationally complete

    Syntax based on SQL (select, from, where)

    Additional flexibility (queries with user defined operators and types)

    15.Give an example for Complex Data?

    A Water Resource Management example1.A database of state wide water projects2.Includes a library of picture slides3.Indexing according to predefined concepts prohibitively expensive

    16.What are the types of queries can be used in water resource management?

    1.Geographic locations2.Reservoir levels during droughts3.Recent flood conditions, etc

    17.What is Mutiversion concurrency control?

    Multiversion concurrency control (abbreviated MCC orMVCC), in thedatabasefield ofcomputerscience, is a concurrency control method commonly used by database management systems toprovide concurrent access to the database and in programming languages to implementtransactionalmemory.

    18.What are types of failures?

    System crashes: due to hardware or software errors, resulting in loss of main memory.Media failures: due to problem with disk head or unreadable mediaApplication errors: due to logical errors in the program which may cause transactions to fail.Natural disasters: physical loss of media. (Fire, food, earth quakes, terrorism, etc)

    Sabotage: intentional corruption or destruction of data.Carelessness: unintentional destruction of data by user or operator.

    19. What are the general failures?

    Transaction failures Transaction abortsSystem failures failure of processor, main memory, power supplyMedia failures failure of secondary storage

    20. What is meant by concurrency control?Concurrency control ensures that correct results for concurrent operations are generated, whilegetting those results as quickly as possible. Thus concurrency control is an essential element for

    correctness in any system where two database transactions or more, executed with time overlap, canaccess the same data, e.g., virtually in any general-purpose database system.

    http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Transactional_memoryhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Concurrency_controlhttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Computer_sciencehttp://en.wikipedia.org/wiki/Database
  • 7/27/2019 dbms unit5

    18/26

    18

    21. What is meant by media recovery?

    Media recovery deals with failures of the storage media holding the permanent database, in particulardisk failures. The traditional database approach for media recovery uses archive copies (dumps) of

    the database as well as archive logs. Archive copies represent snapshots of the database and areperiodically taken. The archive log contains the log records for all committed changes which are notyet reflected in the archive copy.

    22. Define logging and recovery.

    Logging and recovery ensure that failures are masked to the users of transaction-based datamanagement systems by providing automatic treatment for different kinds of failures, such astransaction failures, system failures (crashes), media failures and disasters. The main goal is toguarantee the atomicity (A) and durability (D) properties of ACID transactions by providing undorecovery for failed transactions and redo recovery for committed transactions. Logging is the task ofcollecting redundant data needed for recovery

    23. What is meant by transaction?A transaction comprises a unit of work performed within adatabase management system(or similarsystem) against a database, and treated in a coherent and reliable way independent of othertransactions.

    24.What are the two main purposes of transaction?

    1.To provide reliable units of work that allow correct recovery from failures and keep a databaseconsistent even in cases of system failure, when execution stops (completely or partially) and manyoperations upon a database remain uncompleted, with unclear status.

    2.To provide isolation between programs accessing a database concurrently. If this isolation is notprovided the programs outcome are possibly erroneous.

    25. What is Crash recovery?Crash recovery is needed when the whole database (transaction) system fails, e.g. due to a hardwareor software error. All transactions which were active and not yet committed at crash time have failedso that their changes must be undone. The changes for transactions that have committed before thecrash must survive. A redo recovery is needed for all changes of committed transactions that havebeen lost by the crash because the changed pages resided only in main memory but were not yetwritten out to the permanent database.

    26. What is meant by disaster recovery?Disaster recovery can be achieved by maintaining a backup copy of the database at a geographicallyremote location. By continuously transferring log data from the primary database to the backup andapplying the changes there, the backup can be kept (almost) up-to-date.

    What is data persistence?Data persistence means that in a DBMS all data is maintained as long as it is not deleted explicitly.The life span of data needs to be determined directly or indirectly be the user and must not bedependent on system features. Additionally data once stored in a database must not be lost. Changesof a database which are done by a transaction are persistent. When a transaction is finished even a

    system crash cannot put the data in danger.

    Define transaction recovery.Transaction recovery (rollback) is performed when a transaction fails during normalProcessing, e.g. due to a program error or invalid input data. The log records in the log buffer and inthe log file are used to undo the changes of the failed transaction in reverse order.

    http://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_systemhttp://en.wikipedia.org/wiki/Database_management_system
  • 7/27/2019 dbms unit5

    19/26

    19

    UNIT I [DISTRIBUTED DATABASES]

    1. What are advantages of distributed databases over conventional database?

    mimics organisational structure with datalocal access and autonomy without exclusioncheaper to create and easier to expandimproved availability/reliability/performance by removing reliance on a central siteReduced communication overheadMost data access is local, less expensive and performs betterImproved processing powerMany machines handling the database rather than a single server

    more complex to implementmore costly to maintainsecurity and integrity controlstandards and experience are lackingDesign issues are more complex

    2. Write about distributed databases architecture

  • 7/27/2019 dbms unit5

    20/26

    20

    Defines the structure of the system

    ocomponents identified

    ofunctions of each component defined

    ointerrelationships and interactions between components defined

    3. What is a distributed database?

    A logically interrelated collection of shared data (and a description of this data), physically distributedover a computer network(DDBMS) is the software that manages the DDB and provides an access mechanism that makes thisdistribution transparent to the users.

    DDBS = DB + Communication

    non-centralised DDBMS

    Motivated by need to integrate operational data and to provide controlled accessmanages the Distributed databasemakes the distribution transparent to the user

    4. What are implicit assumptions in DBMS?

    Data stored at a number of sites each site logicallyconsists of a single processor.Processors at different sites are interconnected by a computer network no multiprocessorsoparallel database systems

    Distributed database is a database, not a collection of files data logically related as exhibited in theusers access patterns

    orelational data model

    D-DBMS is a full-fledged DBMS

    onot remote file system, not a TP system

    5. Write the different dimensions of the problem in DDBMS

    Distribution

    oWhether the components of the system are located on the same machine or not

    Heterogeneity

    oVarious levels (hardware, communications, operating system)

    oDBMS important one

    data model, query language,transaction management algorithmsAutonomy

    oNot well understood and most troublesome

    oVarious versions

    Design autonomy: Ability of a component DBMS to decide on issues related to its own design.

    Communication autonomy: Ability of a component DBMS to decide whether and how to communicatewith other DBMSs.Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants

    to.

    6. What are the issues in DDBMS?

    Data Allocation - Where to locate data and whether to replicate?Data Fragmentation - Partition the databaseDistributed catalogue managementDistributed transactionsDistributed QueriesMaking all of the above transparentto the user is the key of DDBMSs

    ReplicationIf a site (or network path) fails, the data held there is unavailable

  • 7/27/2019 dbms unit5

    21/26

    21

    Consider replication (duplication) of data to improve availabilityNo replication: Disjoint fragmentsPartial replication: Site dependentFull replication: Slows down update for consistency, expensive

    7. What are the advantages of distributed databases?

    Capacity and incremental growthIncrease reliability and availabilityModularityReduced communication overheadProtection of valuable dataEfficiency and Flexibility

    8. What are the disadvantages of distributed databases?

    DDB design more complex, fragmentation & replication; extra work must be done by the DBAs toensure that the distributed nature of the system is transparent.

    Economics,Concurrency control,Inexperience,Security,Difficult to maintain integrity

    9. Write the Applications of DDBMS.

    Manufacturing - especially multi-plant manufacturingMilitary command and controlElectronic fund transfers and electronic tradingCorporate MISAirline restrictions

    Hotel chainsAny organization which has a decentralized organization structure

    10. What is relation and sub-relations?

    Relationviews are subsets of relations localityextra communication

    Fragments of relations (sub-relations)

    concurrent execution of a number of transactions that access different portions of a relationviews that cannot be defined on a single fragment will require extra processingsemantic data control (especially integrity enforcement) more difficult

    11. What are the types of Fragmentation?

    Horizontal Fragmentation (HF)

    splitting the database by rowse.g. A-J in site 1, K-S in site 2 and T-Z in site 3oPrimary Horizontal Fragmentation (PHF)

    oDerived Horizontal Fragmentation (DHF)

    Vertical Fragmentation (VF)

    Splitting database by columns/fieldse.g. columns/fields 1-3 in site A, 4-6 in site BTake the primary key to all sites

  • 7/27/2019 dbms unit5

    22/26

    22

    Hybrid Fragmentation (HF)Horizontal and vertical could even be combined

    12. Define Vertical Fragmentation. And what is the advantage of VF?

    Has been studied within the centralized contextodesign methodology

    ophysical clustering

    More difficult than horizontal, because more alternatives exist.

    oTwo approaches :

    ogrouping

    attributes to fragments

    osplitting

    relation to fragmentsOverlapping fragments

    ogrouping

    Non-overlapping fragments

    osplittingWe do not consider the replicated key attributes to be overlapping.

    Advantage: Easier to enforce functional dependencies (for integrity checking etc.)

    13. What are the Information Requirements in VF?

    Application Information

    oAttribute affinities

    a measure that indicates how closely related the attributes areThis is obtained from more primitive usage data

    oAttribute usage values

    Given a set of queries Q = {q1, q2,, qq} that will run on the relation R[A1,A2,,An],

    14. What is the significance of Query processing?

    using client-server architectureuser creates queryClient parses and sends to server(s) (SQL?)servers return appropriate Tablesclient combines into one TableIssue of data transfer cost over a network

    ooptimise the query to transfer the least amount

    15. What are the Query processing components?

    Query language that is used

    oSQL: intergalactic data speak

    Query execution methodology

    oThe steps that one goes through in executing high-level (declarative) user queries.

    Query optimizationoHow do we determine the best execution plan?

    1 if attribute Aj is referenced by query qi

    0 otherwiseuse(qi,Aj) =

  • 7/27/2019 dbms unit5

    23/26

    23

    16. What are the objectives of Query Optimization?

    Minimize a cost functionI/O cost + CPU cost + communication costThese might have different weights in different distributed environmentsWide area networks

    ocommunication cost will dominatelow bandwidthlow speedhigh protocol overhead

    omost algorithms ignore all other cost components

    Local area networks

    ocommunication cost not that dominant

    ototal cost function should be considered

    Can also maximize throughput

    17. What are the types of Optimizers?

    Exhaustive search

    ocost-based

    ooptimal

    ocombinatorial complexity in the number of relations

    oHeuristics

    not optimalregroup common sub-expressionsperform selection, projection firstreplace a join by a series of semijoinsreorder operations to reduce intermediate relation sizeoptimize individual operations

    18. What is Optimization Granularity

    Single query at a time

    ocannot use common intermediate results

    Multiple queries at a time

    oefficient if many similar queries

    odecision space is much larger

    19. What are the types of optimization Timing?

    Static

    ocompilation optimize prior to the execution

    odifficult to estimate the size of the intermediate results error propagation

    ocan amortize over many executions

    oR*

    Dynamic

    orun time optimization

    oexact information on the intermediate relation sizes

    ohave to reoptimize for multiple executions

    oDistributed INGRES

    Hybrid

    ocompile using a static algorithm

    oif the error in estimate sizes > threshold, reoptimize at run time

    oMERMAID

    20. What are statistics in query optimization?

  • 7/27/2019 dbms unit5

    24/26

    24

    Relation

    ocardinality

    osize of a tuple

    ofraction of tuples participating in a join with another relation

    Attribute

    ocardinality of domain

    oactual number of distinct valuesCommon assumptions

    oindependence between different attribute values

    ouniform distribution of attribute values within their domain

    21. What are the decision sites are used in Query Optimization?

    Centralized

    osingle site determines the best schedule

    osimple

    oneed knowledge about the entire distributed database

    Distributedocooperation among sites to determine the schedule

    oneed only local information

    ocost of cooperation

    Hybrid

    oone site determines the global schedule

    each site optimizes the local sub queries

    22. What are the types of network topologies used in DDBMS?

    Wide area networks (WAN) point-to-pointocharacteristics

    low bandwidthlow speedhigh protocol overhead

    ocommunication cost will dominate; ignore all other cost factors

    oglobal schedule to minimize communication cost

    olocal schedules according to centralized query optimization

    Local area networks (LAN)

    ocommunication cost not that dominant

    ototal cost function should be considered

    obroadcasting can be exploited (joins)

    o

    special algorithms exist for star networks

    23. What is Query Decomposition?

    Input: Calculus query on global relations

    Normalization

    omanipulate query quantifiers and qualification

    Analysis

    odetect and reject incorrect queries

    opossible for only a subset of relational calculus

    Simplification

    oeliminate redundant predicates

    Restructuring

  • 7/27/2019 dbms unit5

    25/26

    25

    ocalculus query algebraic query

    omore than one translation is possible

    ouse transformation rules

    24. What is Data Localization?

    Input: Algebraic query on distributed relationsDetermine which fragments are involvedLocalization program

    osubstitute for each global query its materialization program

    ooptimize

    25. What is Global Query Optimization?

    Input: Fragment queryFind the best(not necessarily optimal) global schedule

    oMinimize a cost function

    oDistributed join processing

    Bushy vs. linear treesWhich relation to ship where?Ship-whole vs ship-as-needed

    oDecide on the use of semijoins

    Semijoin saves on communication at the expense of more local processing.

    oJoin methods

    nested loop vs ordered joins (merge join or hash join)

    26. What are the rules of DDBMS?

    1.Local autonomy2.No reliance on a central site

    3.Continuous operation4.Location independence5.Fragmentation independence6.Replication independence7.Distributed Query processing8.Distributed transaction processing9.Hardware independence10.Operating System independence11.Network independence12.Database independence

    27. Define Transaction.

    A transaction is a collection of actions that make consistent transformations of system states whilepreserving system consistency.

    Concurrency transparency

    Failure transparency

    28. What are the properties of Transaction?

    Atomicity: All or nothingConsistency: No violation of integrity constraintsIsolation: Concurrent changes invisible and serializableDurability: Committed updates persist

    29. What is concurrency control?

  • 7/27/2019 dbms unit5

    26/26

    The problem of synchronizing concurrent transactions such that the consistency of the database ismaintained while, at the same time, maximum degree of concurrency is achieved.

    Anomalies:

    oLost updates

    The effects of some transactions are not reflected on the database.

    oInconsistent retrievals

    A transaction, if it reads the same data item more than once, should always read the same value.

    30. What are the modes in distributed locks?

    Four modes of management possible:

    oCentralised 2PL

    Read any copy, update all for updatesSingle site, bottleneck, failure?

    oPrimary Copy 2PL

    Distributes locks, one copy designated primary, others slavesOnly primary copy locked for updates, slaves updated later

    oDistributed 2PL

    Each site manages its own data locksAll copies locked for an update, high cost of comms

    oMajority Locking

    31. What are the Distributed Reliability Protocols?

    Commit protocolsTermination protocolsRecovery protocols

    Independent recovery non-blocking termination