Data Base Systems
-
Upload
sreejith-vanery -
Category
Documents
-
view
229 -
download
0
Transcript of Data Base Systems
-
8/12/2019 Data Base Systems
1/22
Data base systems
INTRODUCTIONThe primary memory of a computer is limited and hence
programs and data are deleted from primary memory once their
use is over. These programs and data are organised into files for
permanent storage on secondary storage device for reuse. Thesefiles are structured in a particular way depending upon the typeof access required and the media on which they are stored. Ifthe data requires quick access, it is stored on disks and if it
requires only serial processing the data is usually stored on tape.The file is made up of a number of records. The recordsare a group of fields and each field is made up of some bits ofdata. Each file is given a name for its identify. The namegenerally consists of two parts: the first is a single-word name
and the second, a three-letter extension name to indicate thetype of file. For instance .COB, . PRG etc. for program files and
.OBF, .OAT etc. for data files. For example, in stock.dat, stock
is the first part of the file name and .dat is the extension.A file holds records of logically similar data. Each record
consists of a set of fields for data. Each field holds. data ofdefined nature like date field holds only dates, name field holds
only names, etc. The computer files are organised on physicalstorage devices like magnetic tape, disk and CD-ROM.
Data and Information
Dat,a is the result of measurements of various attributes ofentities such as product, student, inventory item and employee.The measurements may be recorded in alphabetical, numerical,image, voice or other forms. Thus, the raw and unanalysed
numbers and facts about entities constitute data. On the otherhand information results from data when they are organised or
structured in some meaningful ways. The processed data haveto be placed in a context for have them to derive meaning andrelevance. Relevance in turn adds to the value of information
in decisions and actions. Data processing requires some infusionof intelligence ( meaning, purpose and usefulness) into data to
generate information. The application of intelligence may bein the form of some principles, knowledge, experience andintuition to convert data into information.
Definition of Information
The term 'information' is a very common word and it
conveys some meaning to the recipient. Itis very difficult todefine it comprehensively. Yet, Davis and Olson 1 give a fairlygood definition. They define information as "data that has
been processed into a form that is meaningful to the recipientand is of real or perceived value in current or prospective actions
or decisions".This implies that information is:Processed dataIt has a form
-
8/12/2019 Data Base Systems
2/22
. It is meaningful to the recipientIt has a value, and,
It is useful in current or prospective decisions oractions.
Differences between data and information
Though the words 'data' and 'information' are often used
interchangeably, there is clear distinction between the two.Some of the major differences are as follows:Data are facts but information, though based on data,is not fact.
Though information arises from data, all data do notbecome information. There is a lot of selective filteringof data before processing them into information. Data are the result of routine recording of events and
activities taking place. Generation of information isuser-driven which is not always automatic.
Data are independent of users whereas information is
user dependent. Most information reports are designedto meet anticipated information needs of a user or a
group of users. That is, information for one user isvery likely to be data for other users.
Field, Record and FileA file is a collection of related records. A record is madeup of a number of fields to hold data items. Each field is madeup of a number of storage spaces. Each storage space can holda byte of information. A collection of logically related files
forms a database. It usually contains quite a few files holdingdata, which can be accessed by many users.
Roll no, name, sex and address are the field names. Eachfield reserves some spaces for storage of respective data. Forexample, Roll No has a 7 byte storage space, Name has 30 bytesstorage and so on. Roll No field holds data items 9501101,9501105 and 9501112 as roll numbers of students. ARUN GOKUL,
RAJESH KUMAR etc. are data items in the name field. Each lineof fields relates to an entity: student. Attributes of the studententitysuch as roll no, sex and address become the field names.
-
8/12/2019 Data Base Systems
3/22
Data fields hold the basic elements of data in them. Allattributes of an entity taken together form a record. When
such related records are put together, that collection is calleda file. Record d,esign can be logical or physical. Logical designrepresents the logical relationship among the data items in thefield. The physical record design means the way data items are
physically stored on some media like disk and tape,
File OrganisationThe file organisation means the way the records are written
up in a file and depends on:(i) File activity,(ii) Volatility of information, and(iii) Storage deviceFile activity means the properties of records processed in
one run. If only a few records are accessed in a single run,activity is low. If the file activity is low, it can be stored on disk
device for efficient file processing. On the other hand, if a
good number of records are accessed in any given time, the fileactivity is high and such files can be stored on tapes so that
processing is more efficient and less costly.File volatility means the proportion of record changes. If
records are changed very frequently, the volatility is very high.For high volatility files such as seat reservation files in atransport firm, disk medium is more efficient and offers a finiteaccess. If only.magnetic tapes are available, then files areorganised in sequential organisation. On the other hand
magnetic disks offer more flexibility as they support bothsequential access and direct access.
Other considerations in file organisation are:
(i) Response time; direct access for quick response(ii) Cost of storage medium
(iii) Volume of storage, and,(iv) Security of data
Methods of File Organisation1) Serial file organisation2) Sequential organisation
3) Indexed sequential organisation4) Direct file organisation
1. Serial file OrganisationThe records in a serial file are stored randomly and are
generally appended at the end of a file as the data originate.
The logical order of records with respect to a key field does notbear any relation to the order of physical storage of such records
in the file. It is also referred to as non-keyed sequential file.
2. Sequential file organisation
This file can be created on a magnetic tape or disk. Eachrecord is written up on the tape or disk one by one logicallyordered on one or more key fields. For example, ordering can
be in the ascending order of roll no in case of a student file.
-
8/12/2019 Data Base Systems
4/22
The records are stored on a sorted order. If new records areadded or existing records are deleted, the file has to be resorted
in case of disk file. If the file is stored on a magnetictape, another new file has to be created to update the existingfile with the changes to be effected since creation or last updateof the file. This is done to maintain the proper sequence of the
records in the file. The advantages of sequential file are simpleorganisation and ease in accessing records sequentially.To minimise the cost of update, the new records are
bunched in a transaction file and the master file (that is theoriginal file which is relatively permanent) is updated in a singlerun leading to the creation of a new master file. This file update
is called grand father-father-son update, as there will be threefiles any time.
3. Indexed-sequential file organisationAn index is a combination of key and storage address of
records. This file organisation creates an index file in additionto the data file created. The index file holds pairs of key and
storage address of records in the data file. The index file helpsin randomly locating records in the data file as the physicalstorage location of the record is obtained from the index file.
This file organisation supports both sequential access ard randomaccess of records in the file.
4. Direct File Organisation
These files are created on disks or CD-ROMs. In direct fileorganisation a hashing technique is used to generate storage
address of records in the file. There are quite a number of waysof converting a key (such as roll no for a student file, and
product-code for an inventory file) to a numeric value. The keys
may be numeric, alphabetic or alphanumeric. In the case of
alphabetic and alphanumeric keys, numeric key value has to begenerated. Direct mapping is done by performing somearithmetic manipulation of the key value, called hashing. Thehashing function, h (k), generates a value for each key, WhlCh is used as an address for storage location.
Direct file supportsdirect access of files and minimises the access time of records.The records need not be sorted before storage as in an indexedsequentialfile.
Modes of File Access
The computer file can be accessed in three modes:sequential, random and dynamic.
1.Sequential Access
This means that for accessing a record sequentially, thefile has to be read from the beginning, that is record 1, record2, and so on until the required record is reached. The accesstime of a single record depends on where in the file the record
is stored. That is, if it is the first record in the file, it takesmuch less time to access than a record that is at the end of thefile.
2.Random Access
This method takes the same time for accessing the record
-
8/12/2019 Data Base Systems
5/22
in the file wherever the record is physically located in the file.The storage location of the record is obtained by converting
the key value of the record into its numeric location address byhash function. Then the record is located directly.
3.Dynamic Access
This mode combines both sequential and random modes of
access. At times, it may be required to start sequential accessfrom a given record only. For example a file holds 2000 recordsand records numbered 1220 to 1250 are to be accessed for
processing. In this case, it is better to locate the record number1229 randomly and access the remaining records in sequentialmode.
File Updating
Updating of files means making" the file current by
incorporating changes to the records held in it or adding newrecords to it. If data are very large or are likely to change
occasionally, such data are held in a master file. Master filesare relatively permanent and are used for referring to the data
there in when required. Data arising out of day-to-daytransactions change very often and they are, therefore, held ina temporary file called transaction file.
The master files have to be made current by incorporatingchanges in data to the master files. This process is called fileupdating. There are three ways in which these changes areeffected: addition of a record to master file, deletion of a recordfrom, and modification of a record held in, master file.
Methods of Updating Sequential file
Sequential files can be updated in two ways: direct updating
and grand father-father-son updating.
Direct updatingIn case of direct update, the data are processed online
and files are updated directly, that is no back up files aremaintained. The direct update keeps all files updated and
enables real-time response. It saves disk space as transactionfiles are not opened for temporary storage of data. But it isvery difficult to recreate a file if it is corrupted or deleted
accidentally. Deletion of records is also not possible. For directupdating, the data must be stored in random access files.
Examples of random access storage devices are magnetic disks,magnetic drums and CDROMs.
Grand Father-Father-Son update
In this method two files are used as input files and theyresult in the creation of a new updated master file. The two
input files are the master files requiring updating and theTransaction file containing the transaction data of the period.Both the files are to be sorted in the same order on the samekey before updating starts.
Updating Process
Both the master file and transaction file are read(1) The keys are then compared
-
8/12/2019 Data Base Systems
6/22
(2) If the master file key is less than the transaction filekey, no change is required. The record is copied to
the new master file.(3) If the master file key is equal to Transaction file key,then the record is to be either deleted or modified.(4) If the master file key is greater than transaction file
key, then it means that the transaction file record isnew and is therefore to be copied to the new masterfile.
(5) Three generations of files are maintained always.Hence the name Grandfather-father-son update.
Indexed File UpdatingIndexed file has random access capability. Indexed filesallow direct updating. Whenever any change in data takes place,
the particular record is randomly accessed and updated. Thedisadvantage of direct updating is that no back up files are
maintained and it may be difficult to undo changes effected.
Indexed file or Indexed sequential file organisation keepsin addition to data files an index or table that lists the address
of records on disk (namely, track and sector number) accordingto the contents of the key field. The key chosen must be able
to identify a record uniquely. Any record in the file can be readat any time. Updating is easier in case of indexed files as onlythose records requiring modification need only be read andmodified. Indexed file is highly suitable where quick responseis required; for example, airline reservation or railway
reservation requires direct updating.
Database System
A database is a set of logically connected data files that
have common access methods between them. It storestransaction data. It does not contain any input or output data.
The input data may cause a change to operational data but arenot part of the database. Similarly, the output data mean the
reports or query responses from the system. The input data andoutput data are transient and they are not stored in thedatabase.
The database system gives centralised control over thedatabase resources. The advantages of centralised control over
the data are1:Redundancy can be reduced,
Inconsistency can be avoided,
The data can be shared,Standards can. be enforced,
Security restrictions can be applied, and,Integrity can be maintained.
The concept of IRM calls for treating information as anorganisational resource. In traditional file management system,applications owned their own data and it was not shared withother applications. Each application defined its data, created
-
8/12/2019 Data Base Systems
7/22
its file structure and stored the data conveniently to be accessedby its application program. Thus applications like payroll,
inventory management etc. owned their own data. Severalapplications stored the same data item in many files. This causeda lot of duplication in data storage and the consequent datainconsistency, as the related files were not updated
simultaneously. Often application programs had to be modifiedto use data files of other applications.Database is a centrally controlled, integrated collectionof logically organised data. The central control ensures datasharing among applications and enforces database security
procedures. The data items in the database are logically related
and this helps in integration of database.Advantages of Database Systems
The database system approach has the following advantages Data independence
The data are logically designed into databases and theyare independent of applications. Since the data are programindependent,
any application can use them without anymodification to the code. Data shareability
Database permits simultaneous multiple access to thedatabase. Thus, multiple users can share the same data. Data integrity
Access to the database is controlled by the databasemanagement system. The system authorises personnel for
entering, editing and deleting data. It also authorises people toaccess data for various data processing activities. Since thedatabase stores one data item only in one place and updates it
with fresh transaction data automatically, there is little chance
of inconsistency in the database. Data availabilityThe database is centrally controlled and access to data is
permitted through an authorisation scheme. The data resources are therefore available to the users in the
organisation subjectto the authorisation procedure. Data evolvabilityThe database is flexible and can store huge quantity ofdata. It can evolve as the number of applications and queries
increase to meet their data requirements.Components of Database System
The common database components are:
Database filesThe database files store the transaction data.DBMSIt is a set of programs that manages the database. It
performs a number of tasks like controlling access to thedatabase, making security checks etc.Host level language interface systemThis system interacts with application programs andinterprets their data requests that are issued in high-level
-
8/12/2019 Data Base Systems
8/22
language.Natural language interface
DBMS needs to process queries and data requests issued toit in natural languages called English-like language. The naturallanguage interface performs interpreting the queries andrequests in natural language. It also facilitates managerial
interac;tion with the database for decision support applications.Application programsThe application programs request for data from thedatabase. The data independence permits the applications touse the data for a variety of purposes.Data Dictionary
The data dictionary contains schema of the database. Itdefines each data item in the database, lists its structure,
source, person authorised to modify it etc.
Report generator
The system generates output for users in the form of queryresponse or reports. It might also produce documents like invoice
and process ad-hoc queries and special report requests.Users of Database SystemsThere are three broad classes of users for organisational
database systems. They are:1. Application programmers who write application
programs that manipulate the data in the database.
2. End-users who access the database by invokingapplication programs or through a structured query
language, and,3. Database Administrator who is responsible for
planning, designing, creating and maintaining the
database.
Database Management System (DBMS)DBMS is a set of system programs that manages the entiredatabase. It controls access to files. It updates files and retrievesdata from the files on request by applications for processing.
DBMS maintains database by adding, deleting and modifyingrecords in database. It permits multiple users to access thesame files simultaneously. It acts as an interface between theapplication programs and the data in the database. If the userwants some data from the database, the DBMS processes the
request, locates the data in the database and displays them forthe user. In traditional file management system, the user needs
to specify both the data and its storage location. DBMS requires
storing the database on direct access storage devices.DBMS is general-purpose system software. It works inconjunction with the operating systems to create, process, store,retrieve, control and manage data. Its tasks include defining,
constructing, and manipulating database for applications.Defining database involves specifying data types, datastructures, storage constraints etc. Constructing database meansstoring the data on storage medium under the control of theDBMS. Database manipulation includes merging databases,
-
8/12/2019 Data Base Systems
9/22
generating reports, processing queries etc.The three main components of a DBMS are data definition
language, data manipulation language, and data dictionary.
Data Definition Language
The contents of database are created using the data
definition language. It defines relationships between differentdata elements and serves as an interface for application
programs that use the data.
Database Manipulation Language
Data is processed and updated using a language called datamanipulation language. It allows a user to query database andreceive summary or customised reports. The data manipulationlanguage is usually integrated with other programminglanguages, many of which are 3GLs or 4GLs.
Each database package has its own query language withunique rules and instruction formats. Hence there is no universal
query language. Query language is used to access the data for
report generation, query processing and other data processingactivities.
Structured Query Language (SQL) is a non-procedurallanguage that deals with data, data integrity, data manipulation,
data access, data retrieval, data query and data security. MostDBMS packages use some version of SQL whose primary purposeis to allow users to query a database and generate ad-hoc reportsthat provide customised information.
Data Dictionary
Data dictionary is an electronic document that containsdata definition and data use for every data type in the database.
It describes the data and its characteristics such as its location,
size and type. It identifies its origin, use, ownership and methodsof accessing and security of data. DBMS uses data dictionary to
store all details of data such as data definition, data storage,data use and access privileges.
Database Administrator (DBA)Organisations that implement database systems constitutea function called database administration to supervise the
organisational database resources. Database administratorsupervises the database administration function. The job of
database administrator is to plan, design, create, modify andmaintain the database of the organisation with special emphasis
on security and data integrity. He is not much concerned with
the details of the application programs that access the database for data. He maintains the schema and datadictionary. Any
change in the form of data item, its creation etc. can only bedone by the database administrator.His specific responsibilities include: Guiding the initial design of the database, and laterdeveloping and extending it to meet growing
organisational requirements. Establishing the database and monitoring the use of
-
8/12/2019 Data Base Systems
10/22
it. Deciding on the content of the database. He has to
see that the relevant data are collected and stored inthe database. Establishing and monitoring database control and- security policies and procedures.
Servicing database users by educating and trainingthem in the use of the database.
Disadvantages of Database
The following are some of the disadvantages of database:
Higher data processing costs
The database system causes higher data processing costs.
This is due to the strict and elaborate procedure for data access,updating and processing.
Increased hardware and software costsIt requires more direct access memory capacity, greater
communication capability (including communication software),and additional processing power. This increases the hardware
and software costs.Data insecurity and integrityMost of the security and integrity problems are related to
the fact that many users have access rights to the database.Elaborate security systems are implemented to protect thedatabase and to prevent unauthorised access.
Insufficient database expertiseDatabase technology is complex. Most organisations do not
have enough personnel with necessary expertise to implementand manage database systems.
Database Architecture
The purpose of database is to facilitate huge storage andquick retrieval of data from the database. There are three basic
ways of organising data in a database. They are hierarchical,network and relational structures.
Hierarchical StructureThe relationships between records form a hierarchy. Therecords or aggregates of data are logically conceived to be stored
at different levels of hierarchy. The structure looks like a treewith branches turned upside down. The relation between entities
is structured in such a way as to link it with only one data itemat the higher level. In a hierarchical database, the relationship
between records is one of parent-child. One record can be linked
to only record at the higher level. Data stored in a lower levelnode (child record) can be accessed only through the higherlevel
node (parent record).
Network Structure
This structure can represent more complex logicalrelationships. This structure permits multiple relations betweendata items. One entity linked up to any number of other types
of entities. That is, it allows many-to-many relationships amongrecords. Any data element can be related to any number of
-
8/12/2019 Data Base Systems
11/22
other data elements.
Relational Structure
Relational Slructure is the most recent of these threestructures. All data elements stored in the database areconceived to be stored in tables. Different data tables arelinked up using common type of data item in different tables.
The table is called a relation; the columns of the table arecalled domains and the r0WS are called tuples. A tuple containsvalues of data items called data elements of an entity.
Data Mining and Data Warehousing
Large organisations have huge quantity of data in theirdatabases and they are still growing. Until recently, businesscomputing
technologies concentrated on data capture storageand retrieval. But, the need to interpret and find patterns in
the huge data is growing and computing technologies are makingit possible now. Data mining is the focus of the new class of technologies being developed to help
business find meaning indata lying idle. The data mining helps in drawing inferences
from the data and in understanding the customer, products andmarkets betteT.Data mining employs a host of techniques; some very old
like the statistical techniques including linear programming, andothers are recently developed and are known as data analysis,machine learning, online analytical processing etc. These
techniques help in discovering new patterns in data.Huge databases have necessitated the need for data
- warehousing. Data warehousing means organising large amountsof data and making them available company-wide to users. Datawarehousing is an integral part of data mining. The quality and
quantity of data available for data mining is a function of data
warehousing. Data mining helps in identifying preferences ofcustomers groups and deciding on promotional material toinfluence their buying habits. The information can be used in
product development, product customisation and target
marketing. Data mining represents a new trend in the use ofinformation technology. The focus has shifted from data storageand retrieval to data analysis for making inferences.
Relational Database Management System (RDBMS)
A DBMS that is based onrelational model
is called as RDBMS. Relation model is most
successful mode of all three models. Designed by E.F. Codd, relational model is based
on the theory of sets and relations of mathematics.Relational model represents data in the form a table.A table is a two dimensionalarray containing rows and columns. Each row contains datarelated to an entity such
as a student. Each column contains the data related to asingle attribute of the entity
-
8/12/2019 Data Base Systems
12/22
such as student name.One of the reasons behind the succes
s of relational model is its simplicity. It is easy tounderstand the data and easy to manipulate.Another important advantage with relational model,compared with remaining two
models is, it doesnt bind data with relationship betwe en data item. Instead it allowsyou to have dynamic relationship between entities usingthe values of the columns.Almost all Database systems that are sold in the market,now- a-days, have either
complete or partial implementation of relational model.
Figure 1 shows how data is represented in relational model and what are the terms
used to refer to various components of a table. The following are the terms used in relational model.
Tuple / RowA single row in the table is called as tuple. Eachrow represents the data of asingle entity.Attribute / Column
A column stores an attribute of the entity. For exa
mple, if details of students arestored then student name is an attribute; course isanother attribute and so on.
Column NameEach column in the table is given a name. This name isused to refer to value in the
column.Table Name
Each table is given a name. This is used to refer to the
-
8/12/2019 Data Base Systems
13/22
table. The name depicts thecontent of the table.
The following are two other terms, primary key and foreign key, that are veryimportant in relational model.Primary Key
A table contains the data related entities. If you take STUDETNS table, it contains datarelated to students. For each student there will be onerow in the table. Eachstudentsdata in the table must be uniquely identified. In o
rder to identify each entity uniquelyin the table, we use a column in the table. That colum
n, which is used to uniquelyidentify entities (students) in the table is called as pr
imary key.In c
ase of STUDENTS table (see figure 1) we can use ROLLNOas the primary key as itin not duplicated.
So a primary key can be defined as aset of columns used to uniquelyidentify rows of a table.
Some other examples for primary keys are account numberin bank, product code of
products, employee number of an employeeComposite Primary KeyIn some tables a single column cannot be used to uniquely
identify entities (rows). In
that case we have to use two or more columns to uniquelyidentify rows of the table.When a primary key contains two or more columns it is called as composite primary
key.In figure 2, we have PAYMENTS table, which contains the details of payments made bythe students. Each row in the table contains roll numberof the student, payment date
and amount paid. Neither of the columns can uniquelyidentify rows. So we have to
combine ROLLNO and DP to uniquely identify rows in t
he table. As primary key isconsisting of two columns it is called as composite primary key
-
8/12/2019 Data Base Systems
14/22
Figure 2:Composite Primary Key
Foreign KeyIn relational model, we often store data in different tables and put them together to
get complete information. For example, in PAYMENTStable we have only ROLLNO of
the student. To get remaining information about thestudent we have to useSTUDETNS table. Roll number in PAYMENTS table can be
used to obtain remaininginformation about the student.The relationship between entities student and paymentis one-to-many. One studentmay make payment for many times. As we already h
ave ROLLNO column in PAYMENTStable, it is possible to join with STUDENTS table andget information about parententity (student).Roll number column of PAYMENTS table is called as
foreign keyas it is used to joinPAYMENTS table with STUDENTS table. So foreign keyis the key on the many side of
the relationship.
-
8/12/2019 Data Base Systems
15/22
Figure 3:Foreign Key
ROLLNO column of PAYMENTS table must derive its valuesfrom ROLLNO column ofSTUDENTS table.
When a child table contains a row that doesnt refer toa corresponding parent key, it
is called asorphan record. We must not have orphan records, as theyare result of lack
of data integrity.Integrity Rules
Data integrity is to be maintained at any cost. If data loses integrity it becomesgarbage. So every effort is to be made to ensure dataintegrity is maintained. Thefollowing are the main integrity rules that are to b
e followed.Domain integrityData is said to contain domain integrity when the value of a column is derived fromthe domain. Domain is the collection of potential valu
es. For example, column date ofjoining must be a valid date. All valid dates form on
e domain. If the value of date ofjoining is an invalid date, then it is said to violatedomain integrity.
Entity integrityThis specifies that all values in primary key must be not
null and unique. Each entitythat is stored in the table must be uniquely identified. Every table must contain a
primary key and primary key must be not null and uni
-
8/12/2019 Data Base Systems
16/22
que.Referential Integrity
This specifies that a foreign key must be either null ormust have a value that isderived from corresponding parent key. For example, if we have a table called
BATCHES, then ROLLNO column of the table will be referencing ROLLNO column ofSTUDENTS table. All the values of ROLLNO column of BATCHES table must be derivedfrom ROLLNO column of STUDENTS table. This is because ofthe fact that no student
who is not part of STUDENTS table can join a batchRelational Algebra
A set of operators used to perform operations on tablesis called as
relationalalgebra
. Operators in relational algebra take one or moretables as parameters and
produce one table as the result.
The following are operators in relational algebra:UnionIntersect
Difference or minusProject
SelectJoinUnion
This takes two tables and returns all rows that are belo
nging to either first or secondtable (or both). See figure 4.
Figure 4:
Union, Intersect and Minus
-
8/12/2019 Data Base Systems
17/22
i ntersect
This takes two tables and returns all rows that are belo
nging to first and second table.
See figure 4.Difference or Minus
This takes two tables and returns all rows that exist inthe first table and not in the
second table. See figure 4.ProjectTakes a single table and returns the vertical subset of t
he table. See figure 1.5.Select
Takes a single table and returns a horizontal subset of the table. That means it returns
only those rows that satisfy the condition. See figure 1.5.
Figure 5:Project, Select and Join
JoinRows of two table are combined based on the given colum
n(s) values. The tablesbeing joined must have a common column. See figure 5.
Structured Query Language (SQL)Almost all relational database management systems use SQL
(Structured QueryLanguage) for data manipulation and retrieval. SQL
is the standard language forrelational database systems. SQL is a non-procedural language, where you need to
concentrate on what you want, not on how you get it.Put it in other way, you need
not be concerned with procedural details.
-
8/12/2019 Data Base Systems
18/22
SQL Commands are divided into four categories, depending upon what they do.DDL (Data Definition Language)
DML (Data Manipulation Language)DCL (Data Control Language)Query (Retrieving data)DDL
commands are used to define the data. For example, CREATE TABLE.DMLcommands such as, INSERT and DELETE are used to manipulate data.DCLcommands are used to control access to data. For example, GRANT.Query
is used to retrieve data using SELECT.DML and Query are also collectively called as DML. And DDL and DCL are called as DDL
Data processing Methods
Data that is stored is processed in three different ways.Processing data means
retrieving data and deriving information from data.Depending upon where it is doneand how it is done, there are three methods.
Centralized data processingDe-centralized data processingDistributed data processing
Centralized data processingIn this method the entire data is stored in one place a
nd processed there itself.Mainframe is best example for this kind of processing. The entire data is stored and
processed on mainframe. All programs, invoked from clien
ts (dumb terminals), areexecuted on the mainframe and data is also stored in mainframe
Figure 6:Centralized data processing.As you can see in figure 6, all terminals are attached to mainframe. Terminals do not
have any processing ability. They take input from users
-
8/12/2019 Data Base Systems
19/22
and send output to users.Decentralized data processing
In this data is processed at various places. A typical example is each departmentcontaining its own system for its own data processing needs.See figure 7, for an
example of decentralized data processing. Each department stores data related toitself and runs all programs that process its data. But the biggest drawback of thistype of data processing is that data is to be duplicated.As common data is to be
stored in each machine, it is called asredundancy
. This redundancy will cause datainconsistency. That means the data stored by two departme
nts will not agree witheach other.
Data in this mode is duplicated, as there is no means tostore common data in one
place and access from all machines
Figure 7:Decentralized Data Processing
Distributed Data Processing (Client/Server)In this data processing method, data process is distributed
between client and server.Server takes care of managing data. Client interacts wi
th user. For example, if youassume a process where we need to draw a graph to show the number of students in a
given month for each subject, the following steps will take place:
-
8/12/2019 Data Base Systems
20/22
Figure 8:Distributed data processing
.1 First, client interacts with user and takes input (month
name) from user and thenpasses it to server.2.Server then will query the database to get data rela
tedto the month, which is sentto server, and will send data back to client.3.The client will then use the data retrieved from data
base to draw a graph.
If you look at the above process, the client and serverare equally participating in the
process. That is the reason this type of data processing is called as distributed. The
process is evenly distributed between client and server. C
lient is a program written in
one of the font-end tools such as Visual basic or Delphi.Server is a databasemanagement system such as Oracle, SQL Server etc. The language used to send
commands from client to server is SQL (see figure 8).This is also called as two-tier client/server architecture.
In this we have only two tiers(layers) one is server and another is client.The following is an example of 3-tier client server, where client interacts with user onone side and interacts with application server on anothe
r side. Application, which
processes and validates data, takes the request from clientand sends the request inthe language understood by database server. Application servers are generally object
oriented. They expose a set of object, whose methods areto be invoked by client to
perform the required operation.
Application server takes some burden from database serverand some burden from
-
8/12/2019 Data Base Systems
21/22
client.
Figure 9:
3-tier client-server architecture.
In 3-tier client/server architecture, database server andapplication server may reside
on different machines or on the same machine. Since theadvent of web applicationwe are also seeing more than 3-tiers, which is called as n
-tier architecture. Forexample, the following is the sequence in a typical webapplication.1.Client- web browser, sends request to web server.
2.Web server executes the request page, which may be an AS
P or JSP.3.
ASP or JSP will access application server.4.Application server then will access database server.
SummaryA DBMS is used to store and manipulate data. A DBMS based on relational model isRDBMS. Primary key is used for unique identificationof rows and foreign key to join
tables. Relational algebra is a collection of operators used to operate on tables. Wewill see how to practically use these operators in laterchapter.SQL is a language commonly used in RDBMS to store and r
etrieve data. In my opinion,
SQL is one of the most important languages if you aredealing with an RDBMS becausetotal data access is done using SQL.
SQL can execute queries against a database
-
8/12/2019 Data Base Systems
22/22
SQL can retrieve data from a database
SQL can insert records in a database
SQL can update records in a database
SQL can delete records from a database
SQL can create new databases
SQL can create new tables in a database
SQL can create stored procedures in a database
SQL can create views in a database
SQL can set permissions on tables, procedures, and views