Assignment Adbms

ANCHIT CHANDRA10308828

1) Mobile DatabaseA mobile database is a database that can be connected to by a mobile computing device

over a mobile network. The client and server have wireless connections. A cache is

maintained to hold frequent data and transactions so that they are not lost due to connection

failure. A database is a structured way to organize information. This could be a list of

contacts, price information or distance travelled.

The use of laptops, mobiles and PDAs is increasing and likely to increase in the future with

more and more applications residing in the mobile systems. While those same analysts can’t

tell us exactly which applications will be the most popular, it is clear that a large percentage

will require the use of a database of some sort. Many applications such as databases would

require the ability to download information from an information repository and operate on this

information even when out of range or disconnected.

An example of this is a mobile workforce. In this scenario user would require to access and

update information from files in the home directories on a server or customer records from a

database. This type of access and work load generated by such users is different from the

traditional workloads seen in client–server systems of today. With the advent of mobile

databases, now users can load up their smart phones or PDAs with mobile databases to

exchange mission-critical data remotely without worrying about time or distance. Mobile

databases let employees enter data on the fly. Information can be synchronized with a server

database at a later time.

Need for mobile databases

Mobile users must be able to work without a wireless connection due to poor or

even non-existent connections.

Applications must provide significant interactivity.

Applications must be able to access local device/vehicle hardware, such

as printers, bar code scanners, or GPS units (for mapping or Automatic Vehicle

Location systems).

Bandwidth must be conserved (a common requirement on wireless networks that

charge per megabyte or data transferred).

Users don't require access to truly live data, only recently modified data.

Limited life of power supply (battery).

The changing topology of network

If an application meets any of those requirements, the chances are good that you will be

required to build a mobile database application with synchronization.


http://en.wikipedia.org/wiki/File:MDS_Architechture.JPG


Mobile database system architecture

For any mobile architecture, things to be considered are:

Users are not attached to a fixed geographical location

Mobile computing devices: low-power, low-cost, portable

Wireless networks

Mobile computing constraints

Three parties

Mobile databases typically involve three parties: fixed hosts, mobile units, and base

stations. Fixed hosts perform the transaction and data management functions with the help

of database servers. Mobile units are portable computers that move around a geographical

region that includes the cellular network (or "cells") that these units use to communicate to

base stations. (Note that these networks need not be cellular telephone networks.) Base

stations are two-way radios, installations in fixed locations, that pass communications with

the mobile units to and from the fixed hosts. They are typically low-power devices such as

mobile phones, portable phones, or wireless routers.

When a mobile unit leaves a cell serviced by a particular base station, that station

transparently transfers the responsibility for the mobile unit's transaction and data support to

whichever base station covers the mobile unit's new location.

Uses

Mobile databases are highly concentrated in the retail and logistic industry. It is increasingly

being used in aviation and transportation industry.

Example

Sybase Inc.’s SQL Anywhere dominates the mobile-database field, with about 68 per cent of

the mobile database market.

2) Geographic information system


A geographic information system (GIS), geographical information system,

or geospatial information system is the system that captures, stores, analyses, manages,

and presents data with reference to geographic location data. In the simplest terms, GIS is the

merging of cartography, statistical analysis, and database technology. GIS may be used

in archaeology, geography, cartography, remote sensing, land surveying, public

utility management, natural resource management, precision

agriculture, photogrammetry, urban planning, emergency management, landscape

architecture, navigation, aerial video, and localized search engines.

As GIS can be thought of as a system, it digitally creates and "manipulates" spatial areas that

may be jurisdictional, purpose or application oriented for which a specific GIS is developed.

Hence, a GIS developed for an application, jurisdiction, enterprise, or purpose may not be

necessarily interoperable or compatible with a GIS that has been developed for some other

application, jurisdiction, enterprise, or purpose. What goes beyond a GIS is a spatial data

infrastructure (SDI), a concept that has no such restrictive boundaries.

Therefore, in a general sense, the term describes any information system that integrates,

stores, edits, analyses, shares, and displays geographic information for informing decision

making. GIS applications are tools that allow users to create interactive queries (user-created

searches), analyse spatial information, edit data, maps, and present the results of all these

operations. Geographic information science is the science underlying the geographic

concepts, applications and systems.

GIS Techniques Modern GIS maps are created using Computer Aided Design (CAD) software to digitally render

geographical maps, onto which can be superimposed any spatially located variables (i.e.

rainfall, demographic data, etc.). GIS maps are used to create a visual representation of raw

data (attributes) to allow for more efficient analysis and better decision making.

There are many GIS software applications available, ranging from open-source software such

as GRASS to proprietary applications created by such organisations as AutoDesk and ESRI.

There are also many specialist GIS applications developed for particular industries (such as

General Electric’s Smallworld, developed for use in GIS mapping of public utilities).

Applications of GIS

1) Earthquake Mapping, 2) Market Research, 3) Demographics, 4) Health Research & 5)

Census Data

3) Genome Data Management

http://www.bestpricecomputers.co.uk/glossary/decision-support-systems.htm

http://www.bestpricecomputers.co.uk/glossary/computer-aided-design.htm


In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA.

Figure: The volume of data declines from processing of the raw image data to the text sequence and quality files. Further removal of redundant data by comparison with a reference genome reduces the data volume to the megabyte-kilobyte scale.

The International Nucleotide Sequence Databases provide the principle repositories for DNA sequence data. In addition to hosting the text sequence data, they host basic annotation and, in many cases, the raw data from which the text sequences were derived. The storage of raw data for the new technologies is problematic due to the vast size of the images. The cost of storing the gigabytes of raw data produced has been estimated to be greater than the cost of generating the data in the first place. It is now common practice to delete the raw image files once they have been processed to produce the relatively small text sequence and quality data files. While the long-term storage of the text sequence files is feasible using current tape and disc technology, maintaining the data in a readily usable form where it may readily be interrogated by users is more of a challenge. The GenBank sequence repository continues to increase in size exponentially, and searching this data using standard sequence comparison algorithms takes an extensive amount of time. Additionally, a large amount of computing power is needed to run standard tools such as BLAST. New sequence comparison tools such as Zoom (10) have been developed specifically for second-generation short read sequences; however, it may be some time before a standard comparison tool equivalent to BLAST becomes prevalent for short reads.

A greater challenge for sequence storage and management is the increasing quantity of redundant data being generated by the new sequencing technologies. The submission of complete resequence data to the international repositories would result in the storage of highly redundant data sets, bloating the size of the database and reducing the efficiency of queries. As an increasing number of reference genome sequences become available and the cost of resequencing continues to decline, the problem of data redundancy will increase to a point where storage within the primary data repositories becomes impractical.

The theory that sequence repositories will constantly increase in size is likely to be challenged with the increasing availability of reference genome sequences. Once a reference genome sequence has been produced, users are predominantly interested in variation from this reference.

4) Multimedia DatabaseA multimedia database is a database that hosts one or more primary media file types such as .txt (documents), .jpg (images), .swf (videos), .mp3 (audio), etc. And loosely fall into three main categories:

http://en.wikipedia.org/wiki/RNA

http://en.wikipedia.org/wiki/RNA_virus

http://en.wikipedia.org/wiki/DNA

http://en.wikipedia.org/wiki/Hereditary

http://en.wikipedia.org/wiki/Genetics

http://en.wikipedia.org/wiki/Molecular_biology


Static media (time-independent, i.e. images and handwriting) Dynamic media (time-dependent, i.e. video and sound bytes) Dimensional media (i.e. 3D games or computer-aided drafting programs- CAD)All primary media files are stored in binary strings of zeros and ones, and are encoded according to file type.

The term "data" is typically referenced from the computer point of view, whereas the term "multimedia" is referenced from the user point of view.

Types of Multimedia Databases

There are numerous different types of multimedia databases, including:

The Authentication Multimedia Database (also known as a Verification Multimedia Database, i.e. retina scanning), is a 1:1 data comparison

The Identification Multimedia Database is a data comparison of one-to-many (i.e. passwords and personal identification numbers

A newly-emerging type of multimedia database, is the Biometrics Multimedia Database; which specializes in automatic human verification based on the algorithms of their behavioural or physiological profile. This method of identification is superior to traditional multimedia database methods requiring the typical input of personal identification numbers and passwords-Due to the fact that the person being identified does not need to be physically present, where the identification check is taking place. This removes the need for the person being scanned to remember a PIN or password. Fingerprint identification technology is also based on this type of multimedia database.

Difficulties Involved with Multimedia Databases

The difficulty of making these different types of multimedia databases readily accessible to humans is:

The tremendous amount of bandwidth they consume; Creating Globally-accepted data-handling platforms, such as Joomla, and the special

considerations that these new multimedia database structures require. Creating a Globally-accepted operating system, including applicable storage and

resource management programs need to accommodate the vast Global multimedia information hunger.

Multimedia databases need to take into accommodate various human interfaces to handle 3D-interactive objects, in an logically-perceived manner (i.e. SecondLife.com).

Accommodating the vast resources required to utilize artificial intelligence to it's fullest potential- including computer sight and sound analysis methods.

The historic relational databases (i.e the Binary Large Objects – BLOBs- developed forSQL databases to store multimedia data) do not conveniently support content-based searches for multimedia content.

5) Parallel database

A parallel database system seeks to improve performance through parallelization of various operations, such as loading data, building indexes and evaluating queries. Although data may


be stored in a distributed fashion, the distribution is governed solely by performance considerations. Parallel databases improve processing and input/output speeds by using multiple CPUs and disks in parallel. Centralized and client–server database systems are not powerful enough to handle such applications. In parallel processing, many operations are performed simultaneously, as opposed to serial processing, in which the computational steps are performed sequentially.

Parallel databases can be roughly divided into three categories:

Shared memory architecture, where multiple processors share the main memory space, as well as mass storage (e.g. hard disk drives).

Shared disk architecture, where each node has its own main memory, but all nodes share mass storage, usually a storage area network. In practice, each node usually also has multiple processors.

Shared nothing architecture, where each node has its own mass storage as well as main memory.

Example

Objectivity/DB IBM DATABASE2 Teradata

6) Spatial database

A spatial database is a database that is optimized to store and query data that is related to objects in space, including points, lines and polygons. While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types. These are typically called geometry or feature.


The Open Geospatial Consortium created the Simple Features specification and sets standards for adding spatial functionality to database systems.

Features of spatial databases

Database systems use indexes to quickly look up values and the way that most databases index data is not optimal for spatial queries. Instead, spatial databases use a spatial index to speed up database operations.

In addition to typical SQL queries such as SELECT statements, spatial databases can perform a wide variety of spatial operations. The following query types and many more are supported by the Open Geospatial Consortium:

Spatial Measurements: Finds the distance between points, polygon area, etc. Spatial Functions: Modify existing features to create new ones, for example by

providing a buffer around them, intersecting features, etc. Spatial Predicates: Allows true/false queries such as 'is there a residence located

within a mile of the area we are planning to build the landfill?' Constructor Functions: Creates new features with an SQL query specifying the

vertices (points of nodes) which can make up lines. If the first and last vertex of a line are identical the feature can also be of the type polygon (a closed line).

Observer Functions: Queries which return specific information about a feature such as the location of the center of a circle

Not all spatial databases support these query types.

Example

All OpenGIS Specifications compliant products Boeing's Spatial Query Server (Official Site) spatially enables Sybase ASE. SpatiaLite extends Sqlite with spatial datatypes, functions, and utilities. IBM DB2 Spatial Extender can be used to enable any edition of DB2, including the

free DB2 Express-C, with support for spatial types Oracle Spatial Microsoft SQL Server has support for spatial types since version 2008

7) Temporal DatabaseTemporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time, for example, the history of the stock market or the movement of employees within an organisation. Thus, a temporal database stores a collection of time related data.

A temporal database is formed by compiling, storing temporal data. The difference between temporal data and non-temporal data is that a time period is appended to data expressing when it was valid or stored in the database. The data stored by conventional databases consider data to be valid at present time as in the time instance “now”. When data in such a database is modified, removed or inserted, the state of the database is overwritten to form a


new state. The state prior to any changes to the database is no longer available. Thus, by associate time with data, it is possible to store the different database states.

In essence, temporal data is formed by time-stamping ordinary data (type of data we associate and store in conventional databases). In a relational data model, tuples are time-stamped and in an object-oriented data model, objects/attributes are timestamped. Each ordinary data has two time values attached to it, a start time and an end time to establish the time interval of the data. In a relational data model, relations are extended to have two additional attributes, one for start time and another for end time.

Different Forms of Temporal Databases

Time can be interpreted as valid time (when data occurred or is true in reality) or transaction time (when data was entered into the database).

A historical database stores data with respect to valid time. A rollback database stores data with respect to transaction time. A bitemporal database stores data with respect to both valid and transaction time –

they store the history of data with respect to valid time and transaction time.

Application domains of Temporal Data

Examples of application domains dealing with temporal data are:

Financial Applications – e.g. history of stock markets; share prices Reservation Systems – e.g. when was a flight booked Medical Systems – e.g. patient records Computer Applications – e.g. history of file back ups Archive Management Systems – e.g. sporting events, publications and journals.

8) Database AdministrationDatabase administration is the function of managing and maintaining database

management systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM

DB2 and Microsoft SQL Server need ongoing management. As such, corporations that use

DBMS software often hire specialized IT (Information Technology) personnel called Database

Administrators or DBAs.

DBA Responsibilities

Installation, configuration and upgrading of Database server software and related

products.


Evaluate Database features and Database related products.

Establish and maintain sound backup and recovery policies and procedures.

Take care of the Database design and implementation.

Implement and maintain database security (create and maintain users and roles, assign

privileges).

Database tuning and performance monitoring.

Application tuning and performance monitoring.

Setup and maintain documentation and standards.

Plan growth and changes (capacity planning).

Work as part of a team and provide 24x7 support when required.

Do general technical troubleshooting and give consultation to development teams.

Interface with DBMS vendor for technical support.

Types of Database Administration

There are three types of DBAs:

Systems DBAs (also referred to as Physical DBAs, Operations DBAs or Production

Support DBAs):

Focus on the physical aspects of database administration such as DBMS installation,

configuration, patching, upgrades, backups, restores, refreshes, performance

optimization, maintenance and disaster recovery.

Development DBAs:

Focus on the logical and development aspects of database administration such

as data model design and maintenance, DDL (data definition language) generation,

SQL writing and tuning, coding stored procedures, collaborating with developers to

help choose the most appropriate DBMS feature/functionality and other pre-production

activities.

Application DBAs:

Straddle the fence between the DBMS and the application software and are

responsible for ensuring that the application is fully optimized for the database and

vice versa. They usually manage all the application components that interact with the

database and carry out activities such as application installation and patching,

application upgrades, database cloning, building and running data cleanup routines,

data load process management, etc.


Nature of database administration

The degree to which the administration of a database is automated dictates the skills and

personnel required to manage databases. On one end of the spectrum, a system with minimal

automation will require significant experienced resources to manage; perhaps 5-10 databases

per DBA. Alternatively an organization might choose to automate a significant amount of the

work that could be done manually therefore reducing the skills required to perform tasks. As

automation increases, the personnel needs of the organization splits into highly skilled

workers to create and manage the automation and a group of lower skilled "line" DBAs who

simply execute the automation.

Database administration work is complex, repetitive, time-consuming and requires significant

training. Since databases hold valuable and mission-critical data, companies usually look for

candidates with multiple years of experience. Database administration often requires DBAs to

put in work during off-hours (for example, for planned after hours downtime, in the event of a

database-related outage or if performance has been severely degraded). DBAs are commonly

well compensated for the long hours.

Database administration tools

Often, the DBMS software comes with certain tools to help DBAs manage the DBMS. Such

tools are called native tools. For example, Microsoft SQL Server comes with SQL Server

Enterprise Manager and Oracle has tools such asSQL*Plus and Oracle Enterprise Manager/Grid

Control.

9) Data warehouseA data warehouse (DW) is a database used for reporting. The data is offloaded from the

operational systems for reporting. The data may pass through an operational data store for

additional operations before it is used in the DW for reporting.

A data warehouse maintains its functions in three layers: staging, integration, and

access. Staging is used to store raw data for use by developers (analysis and support).


The integration layer is used to integrate data and to have a level of abstraction from users.

The access layer is for getting data out for users.

This definition of the data warehouse focuses on data storage. The main source of the data is

cleaned, transformed, catalogued and made available for use by managers and other

business professionals for data mining, online analytical processing, market research and

decision support. However, the means to retrieve and analyze data, to extract, transform and

load data, and to manage the data dictionary are also considered essential components of a

data warehousing system. Many references to data warehousing use this broader context.

Thus, an expanded definition for data warehousing includes business intelligence tools, tools

to extract, transform and load data into the repository, and tools to manage and

retrieve metadata.

Architecture

Operational database layer

The source data for the data warehouse — An organization's Enterprise Resource

Planning systems fall into this layer.

Data access layer

The interface between the operational and informational access layer — Tools

to extract, transform, load data into the warehouse fall into this layer.

Metadata layer

The data dictionary — This is usually more detailed than an operational system data

dictionary. There are dictionaries for the entire warehouse and sometimes dictionaries

for the data that can be accessed by a particular reporting and analysis tool.

Informational access layer

The data accessed for reporting and analyzing and the tools for reporting and

analyzing data — This is also called the data mart. Business intelligence tools fall into

this layer. The Inmon-Kimball differences about design methodology, discussed later in

this article, have to do with this layer

Evolution in organization use

These terms refer to the level of sophistication of a data warehouse:

Offline operational data warehouse

Data warehouses in this initial stage are developed by simply copying the data off of

an operational system to another server where the processing load of reporting

against the copied data does not impact the operational system's performance.


Offline data warehouse

Data warehouses at this stage are updated from data in the operational systems on a

regular basis and the data warehouse data are stored in a data structure designed to

facilitate reporting.

Real-time data warehouse

Data warehouses at this stage are updated every time an operational system

performs a transaction (e.g. an order or a delivery or a booking).

Integrated data warehouse

These data warehouses assemble data from different areas of business, so users can

look up the information they need across other systems.

Benefits

Some of the benefits that a data warehouse provides are as follows:

A data warehouse provides a common data model for all data of interest regardless of

the data's source. This makes it easier to report and analyze information than it would

be if multiple data models were used to retrieve information such as sales invoices,

order receipts, general ledger charges, etc.

Prior to loading data into the data warehouse, inconsistencies are identified and

resolved. This greatly simplifies reporting and analysis.

Information in the data warehouse is under the control of data warehouse users so

that, even if the source system data are purged over time, the information in the

warehouse can be stored safely for extended periods of time.

Because they are separate from operational systems, data warehouses provide

retrieval of data without slowing down operational systems.

Data warehouses can work in conjunction with and, hence, enhance the value of

operational business applications, notably customer relationship management (CRM)

systems.

Data warehouses facilitate decision support system applications such as trend reports

(e.g., the items with the most sales in a particular area within the last two years),

exception reports, and reports that show actual performance versus goals.

10) Data Mining

Data mining is usually defined as searching, analyzing and sifting through large amounts of data to find relationships, patterns, or any significant statistical correlations. With the advent of computers, large databases and the internet, it is easier than ever to collect millions, billions and even trillions of pieces of data that can then be systematically analyzed to help look for relationships and to seek solutions to difficult problems. Besides governmental uses, many marketers use data mining to find strong consumer patterns and relationships. Large


organizations and educational institutions also data mine to find significant correlations that can enhance our society.

While data mining is amoral in the fact that it only looks for strong statistical correlations or relationships, it can be used for either good or not so good purposes. For instance, many government organizations depend on data mining to help them create solutions for many societal problems. Marketers use data mining to help them pin point and focus their attention on certain segments of the market to sell to, and in some cases black hat hackers can use data mining to steal and scam thousands of people.

How does data mining work?

Large amounts of data are collected. Usually most entities that perform data mining are large corporations and government agencies. They have been collecting data for decades and they have lots of data to sift through. If it is a fairly new business or individual, one can purchase certain types of data in order to mine for one’s own purposes. In addition, data can also be stolen from large depositories by hackers by hacking their way into a large database or simply stealing laptops that are ill protected.

Assignment Adbms

Documents

Transcript of Assignment Adbms