Lecture 16 TIM 50 Autumn 2012 - Course Web Pages · Lecture 16 TIM 50 Autumn 2012 Tuesday November...
Transcript of Lecture 16 TIM 50 Autumn 2012 - Course Web Pages · Lecture 16 TIM 50 Autumn 2012 Tuesday November...
Lecture 16TIM 50 Autumn 2012Tuesday November 20, 2012
FOUNDATIONS OF BUSINESS INTELLIGENCE:DATABASES AND
INFORMATION MANAGEMENT
Announcement
1. The grades for every assignment will be given in eCommons.2. It's important to check webpage to get the latest informationand assignments changes.3. No Office hours on Wednesday, Friday( 11/21, 11/23)
No Class on Thanks Giving Day, 11/22 Thursday
Final Exam 1st Choice December 7, Friday2nd Choice December 10, Monday depending on Schedule Permission
Format is same as MidtermCovering Up to Midterm 30‐ %
After Midterm 70+ %
The problems of managing data resources in a traditional fileenvironment
Important database design principlesThe database management systemThe capabilities and value of a database management systemTools and technologies for accessing information from databasesBusiness Intelligence, Data MiningThe role of information policy, data administration, and data quality assurance in the management of a firm’s data resources
Topics of Business intelligences
Foundation of business Intelligence
Data Base SystemInformation Management
File Management SystemFile Processing Procedure
Data Base SystemsDBMS,SQL
Intelligence from Collection of DataInformation ManagementBusiness Applications
Data Integrity ControlBusiness Data Maintenances
Division OrientedPaper File SystemsManual Processing
System InefficienciesLonger Business CycleNo Firm wise Information or Data AccessNo Data SecurityNo Decision son Integrated Data and informHigh Business Process Expenditure
Data Base SystemsRelational DBObject Oriented DBDBMS
Less RedundancyData IntegrityEfficiencyData Confidentiality
Data redundancy: Data inconsistency: Program‐data dependence:Lack of flexibilityPoor securityLack of data sharing and availability
File organization conceptsDatabase: Group of related filesFile: Group of records of same type Record: Group of related fieldsField: Group of characters as word(s) or number
Describes an entity (person, place, thing on which we store information)
OrganizingDatainaTraditionalFileEnvironment
Attribute: Each characteristic, or quality, describing entityE.g., Attributes Date or Grade belong to entity COURSE
A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.
THE DATA HIERARCHY
Old Business Process;Files maintained separately by different departments
Data redundancy: Presence of duplicate data in multiple files
Data inconsistency: Same attribute has different values
Program‐data dependence:When changes in program requires changes to data accessed by program
Lack of flexibilityPoor securityLack of data sharing and availability
Problems with the traditional file environment
The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources.
TRADITIONAL FILE PROCESSING
Business Processes with Old Data Processing
System InefficienciesLonger Business CycleNo Firm wise Information or Data AccessNo Data SecurityNo Decision on Integrated Data and informationHigh Business Process Expenditure
Database – collection of persistent data frombusiness divisions
Database Management System (DBMS) –software system that supports creation,population, and querying of a database
Introduction of Data Processing System
DatabaseServes many business applications by centralizing data and controlling redundant data across division boundaries
Database management system (DBMS)Interfaces between applications and physical data filesSeparates logical and physical views of dataSolves problems of traditional file environment
Controls redundancyEliminates inconsistencyUncouples programs and dataEnables organization to centrally manage data and data security
TheDatabaseApproachtoDataManagement
14.13
DefinitionAlthough it is difficult to give a universally agreed definitionof a database, we use the following common definition:
Definition:A database is a collection of related, logically coherent
data used by the application programs in an organization.
14.14Database architecture
DATABASE ARCHITECTURE
The American NationalStandards Institute/StandardsPlanning and RequirementsCommittee (ANSI/SPARC)has established a three-levelarchitecture for a DBMS:internal, conceptual andexternal .
14.15
HardwareThe hardware is the physical computer system that allowsaccess to data.
SoftwareThe software is the actual program that allows users toaccess, maintain and update data. In addition, the softwarecontrols which user can access which parts of the data in thedatabase.
ConfidentialityThe data in a database is stored physically on the storagedevices. In a database, data is a separate entity from thesoftware that accesses it.
14.16
Users
In a DBMS, the term users has a broad meaning. We candivide users into two categories: end users and applicationprograms.
ProceduresThe last component of a DBMS is a set of procedures orrules that should be clearly defined and followed by the usersof the database.
14.17
Advantages of databasesComparing the flat-file system, we can mention severaladvantages for a database system.
Less redundancyIn a flat-file system there is a lot of redundancy. Forexample, in the flat file system for a university, the names ofprofessors and students are stored in more than one file.
Avoidance of InconsistencyInconsistencyIf the same piece of information is stored in more than oneplace, then any changes in the data need to occur in all placesthat data is stored.
14.18
EfficiencyA database is usually more efficient that a flat file system,because a piece of information is stored in fewer locations.
Data integrityIn a database system it is easier to maintain data integrity ,because a piece of data is stored in fewer locations.
ConfidentialityIt is easier to maintain the confidentiality of the informationif the storage of data is centralized in one location.
Data integrity contains guidelines for, data retention, specifying or guaranteeing the length of time of data can be retained
Evolution of database systems
• 2000 and beyond – multi –tier, client‐server,• Distributed environments, • Web‐based, • Content‐addressable storage, data mining
DATA BASE MODEL OVERVIEW
• ER‐Model• Hierarchical Model• Network Model• Relational Model• Object‐Oriented Model(s)
ER‐Model
• Data Structures• Integrity Constraints• Operations
The ER‐Model is extremely successful as a database design model Translation algorithms to many data models Commercial database design tools, e.g., ERwin No generally accepted query
language No database system is based on the model
ER: Entry Relation
ER‐Model ‐ Integrity Constraints
A
E
E1 E3E2
RE1 E21 n
RE1 E2
RE1 E2R
cardinality: 1:n for E1:E2 in R
total participation of E2 in R
weak entity type E2; identifying relationship type R key attribute
dxp
disjointexclusion
partition
14.24
Hierarchical Database ModelIn the hierarchical model, data is organized as an invertedtree. Each entity has only one parent but can have severalchildren. At the top of the hierarchy, there is one entity,which is called the root.
An example of the hierarchical model representing a university
14.25
Network Database ModelIn the network model, the entities are organized in a graph,in which some entities can be accessed through several paths(Figure 14.4).
An example of the network model representing a university
14.26
Object-Oriented Databases(OODB)An object-oriented database tries to keep the advantages of therelational model and at the same time allows applications to accessstructured data. In an object-oriented database, objects and theirrelations are defined. In addition, each object can have attributesthat can be expressed as fields.
XMLThe query language normally used for objected-oriented databasesis XML (Extensible Markup Language). As we discussed inChapter 6, XML was originally designed to add markupinformation to text documents, but it has also found its applicationas a query language in databases. XML can represent data withnested structures.
Object‐Oriented Model
based on the object‐oriented paradigm,e.g., Simula, Smalltalk, C++, Java
object‐oriented model has object‐oriented repository model; adds persistence and database capabilities; (see ODMG‐93, ODL, OQL)
object‐oriented commercial systems include GemStone, Ontos, Orion‐2, Statice, Versant, O2
14.28
Relational Database ModelIn the relational model, data is organized in two-dimensionaltables called relations. The tables or relations are, however,related to each other, as we will see shortly.
An example of the relational model representing a university
Relational DBMS;Represent data as two‐dimensional tables called relations or files.
Each table contains data on entity and attributesTable: grid of columns and rowsRows (tuples): Records for different entitiesFields (columns): Represents attribute for entityKey field: Field used to uniquely identify each recordPrimary key: Field in table used for key fieldsForeign key: Primary key used in second table as
look‐up field to identify records from original table
In the relational database management system (RDBMS), the data isrepresented as a set of relations.
14.30
RelationsA relation appears as a two-dimensional table. The RDBMSorganizes the data so that its external view is a set ofrelations or tables. This does not mean that data is stored astables: the physical storage of the data is independent of theway in which the data is logically organized.
An example of a relation
14.31
A relation in an RDBMS has the following features:
Name. Each relation in a relational database should havea name that is unique among other relations.
Attributes. Each column in a relation is called anattribute. The attributes are the column headings in thetable in Figure 14.6.
Tuples. Each row in a relation is called a tuple. A tupledefines a collection of attribute values. The total numberof rows in a relation is called the cardinality of therelation. Note that the cardinality of a relation changeswhen tuples are added or deleted. This makes thedatabase dynamic.
Schemas
• The name of a relation and the set of attributes for a relation is called a schema.
• We show the schema for the relation with the relation name followed by a parenthesized list of its attributes.
• Movies (title, year, length).• Relational database schema = collection of relation schemas.
A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table.
RELATIONAL DATABASE TABLES
A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table.
RELATIONAL DATABASE TABLES
Operations of a Relational DBMSThree basic operations used to develop useful sets of data
SELECT: Creates subset of data of all records that meet stated criteriaJOIN: Combines relational tables to provide user with more information than available in individual tablesPROJECT: Creates subset of columns in table, creating tables with only the information specified
The select, join, and project operations enable data from two different tables to be combined and only selected attributes to be displayed.
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
Relational Database Example
• Relational Database Management System (RDBMS)– Consists of a number of tables and single schema (definition of tables
and attributes)– Students (sid, name, login, age, gpa),Students identifies the table
sid, name, login, age, gpa identify attributes, sid is primary key
An Example Table
• Students (sid: string, name: string, login: string, age: integer, gpa: real) S1
sid name login age gpa50000 Dave dave@cs 19 3.353666 Jones jones@cs 18 3.453688 Smith smith@ee 18 3.253650 Smith smith@math 19 3.853831 Madayan madayan@music 11 1.853832 Guldu guldu@music 12 2.0
Another table: Courses
• Courses (cid, instructor, quarter, dept) E
cid instructor quarter dept
Carnatic101 Jane Fall 06 Music
Reggae203 Bob Summer 06 Music
Topology101 Mary Spring 06 Math
History105 Alice Fall 06 History
Keys
• Primary key – minimal subset of fields that is unique identifier for a tuple– sid is primary key for Students– cid is primary key for Courses
• Foreign key –connections between tables– Courses (cid, instructor, quarter, dept)– Students (sid, name, login, age, gpa)– How do we express which students take each
course?
Many to many relationships• In general, need a new table
Enrolled(cid, grade, studid)Studid is foreign key that references sid in Student table
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
sid name login50000 Dave dave@cs
53666 Jones jones@cs
53688 Smith smith@ee
53650 Smith smith@math
53831 Madayan
madayan@music
53832 Guldu guldu@music
EnrolledStudent
Foreignkey
Relational Algebraprocess for working
• Collection of operators for specifying queries• Query describes step‐by‐step procedure for computing answer (i.e., operational)
• Each operator accepts one or two relations as input and returns a relation as output
• Relational algebra expression composed of multiple operators
Basic operators
• Selection – return rows that meet some condition
• Projection – return column values• Union• Cross product• Difference• Other operators can be defined in terms of basic operators
Simplified Schema Example
• Courses (cid, instructor, quarter, dept)• Students (sid, name, gpa)• Enrolled (cid, grade, studid)
Set Operations
• Union (R U S)– All tuples in R or S (or both)– R and S must have same number of fields– Corresponding fields must have same domains
• Intersection (R ∩ S)– All tuples in both R and S
• Set difference (R – S)– Tuples in R and not S
Set Operations (continued)
• Cross product or Cartesian product (R x S)– All fields in R followed by all fields in S– One tuple (r,s) for each pair of tuples r R, s S
SelectionSelect students with gpa higher than 3.3 from S1:
σgpa>3.3(S1)
S1sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
sid name gpa53666 Jones 3.453650 Smith 3.8
Projection
Project name and gpa of all students in S1:name, gpa(S1)S1Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
name gpaDave 3.3Jones 3.4Smith 3.2Smith 3.8Madayan 1.8Guldu 2.0
Combine Selection and Projection• Project name and gpa of students in S1 with gpa higher than 3.3:
name,gpa(σgpa>3.3(S1))
Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
name gpaJones 3.4Smith 3.8
Example: Intersection
sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
sid name gpa53666 Jones 3.453688 Smith 3.253700 Tom 3.553777 Jerry 2.853832 Guldu 2.0
S1 S2
S1 S2 =
sid name gpa53666 Jones 3.453688 Smith 3.253832 Guldu 2.0
Joins• Combine information from two or more tables• Example: students enrolled in courses:S1 S1.sid=E.studidE
Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
S1E
Joins
sid name gpa cid grade studid53666 Jones 3.4 History105 B 5366653650 Smith 3.8 Topology112 A 5365053831 Madayan 1.8 Carnatic101 C 5383153832 Guldu 2.0 Reggae203 B 53832
Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0
cid grade studid
Carnatic101 C 53831
Reggae203 B 53832
Topology112 A 53650
History 105 B 53666
S1E
A1 A2 A3 ... An
a1 a2 a3 an
b1 b2 a3 cn
a1 c2 b3 bn...
x1 v2 d3 wn
Relational Data Model: summary
• Set theoretic• Domain — set of values
• like a data type• Cartesian product (or product)
• D1 D2 ... Dn• n‐tuples (V1,V2,...,Vn)• s.t., V1 D1, V2 D2,...,Vn Dn
–Relation=subset of cartesian product of one or more domains
• FINITE only; empty set allowed–Tuples = members of a relation inst.–Arity = number of domains–Components = values in a tuple–Domains — corresp. with attributes–Cardinality = number of tuples
Relation as tableRows = tuplesColumns = componentsNames of columns = attributesRelation name + set of attribute names= schemaREL (A1,A2,...,An)
Arity
Cardinality
Attributes
Component
Tuple
What is Object Oriented Database? (OODB)
• A database system that incorporates all the important object‐oriented concepts
• Some additional features– Unique Object identifiers– Persistent object handling
Object‐Oriented Concepts
Abstract Data Types Class definition, provides extension to complex attribute types
Encapsulation Implementation of operations and object structure hidden
Inheritance Sharing of data within hierarchy scope, supports code reusability
Polymorphism• Operator overloading
Object‐Oriented DBMS (OODBMS)
Stores data and procedures as objectsObjects can be graphics, multimedia, Java appletsRelatively slow compared with relational DBMS for processing large numbers of transactionsHybrid object‐relational DBMS: Provide capabilities of both OODBMS and relational DBMS
Object Relationships
Object-Oriented Databases
Nelson Caballero - 4/16/2001
• Support data abstraction, encapsulation, and inheritance.
• Allow object identification and communication.
• Reuse and modify objects.
• Deal with complex data types.
EmployeeNameParentsDate of BirthSexGetAge()ComputeSalary()
Methods
Attributes
Class representation Object Inheritance
Object Relationships
Advantages of OODBS
• Designer can specify the structure of objects and their behavior (methods)
• Multimedia Contents• Better interaction with object‐oriented languages such as Java and C++
• Definition of complex and user‐defined types• Encapsulation of operations and user‐defined methods
Relational and Object-Oriented Databases
Database Management System
Nelson Caballero - 4/16/2001
A software system that enables users to create and maintain the database.
Engineering design applications.
Multimedia applications. Knowledge bases. Applications with demanding distribution
and concurrency.
Applications that require advanced
features.
Electronic devices with embeddedsoftware.
Decision support applications.
Ordinary business applications.
Applications that integrate with
legacy systems.
Conservative implementations.
Source: Object oriented Modeling and design for database applications. Blaha, M. and Premerlani, W.
Object Oriented
• A specific type of software for creating, storing,organizing, and accessing data from a database
• Separates the logical and physical views of the data
• Logical view: how end users view data• Physical view: how data are actually structured and
organized
• Examples of DBMS: Microsoft Access, DB2, OracleDatabase, Microsoft SQL Server, MYSQL
Database management system (DBMS)
A single human resources database provides many different views of data, depending on the information requirements of the user. Illustrated here are two possible views, one of interest to a benefits specialist and one of interest to a member of the company’s payroll department.
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
Capabilities of Database Management Systems
Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fieldsData dictionary: Automated or manual file storing definitions of data elements and their characteristicsData manipulation language(DML): Used to add, change, delete, retrieve data from database
Data that describes the properties or characteristics of other data
Does not include sample dataAllows database designers and users to understand the meaning of the data
Meta data
Structured Query Language (SQL)
Microsoft Access user tools for generation SQLMany DBMS have report generation capabilities for creating polished reports (Crystal Reports)
Each database will have a set of schemas associated with a catalog.
Schema = the structure that contains descriptions of objects created by a user (base tables, views, constraints)
14.67
Structured Query Language
Structured Query Language (SQL) is the languagestandardized by the American National Standards Institute(ANSI) and the International Organization forStandardization (ISO) for use on relational databases.
It is a declarative rather than procedural language, whichmeans that users declare what they want without having towrite a step-by-step procedure. The SQL language was firstimplemented by the Oracle Corporation in 1979, withvarious versions of SQL being released since then.
SQL Is:• The standard and most common language for relational database management systems
• An SQL‐based relational database application involves a user interface, a set of tables in the database, and a RDBMS with an SQL capability
• Within the RDBMS SQL will be used to create the tables, translate user requests, maintain the data dictionary and system catalog, update an maintain the tables, establish security, and carry out backup and recovery procedures
3 types of SQL commands
• Data Definition Language (DDL) commands ‐ that define a database, including creating, altering, and dropping tables and establishing constraints
• Data Manipulation Language (DML) commands ‐ that maintain and query a database
• Data Control Language (DCL) commands ‐ that control a database, including administering privileges and committing data
14.71
InsertThe insert operation is a unary operation—that is, it isapplied to a single relation. The operation inserts a new tupleinto the relation. The insert operation uses the followingformat:
Figure 14.7 An example of an insert operation
14.72
DeleteThe delete operation is also a unary operation. The operationdeletes a tuple defined by a criterion from the relation. Thedelete operation uses the following format:
An example of a delete operation
14.73
UpdateThe update operation is also a unary operation that is appliedto a single relation. The operation changes the value of someattributes of a tuple. The update operation uses the followingformat:
An example of an update operation
14.74
SelectThe select operation is a unary operation. The tuples (rows)in the resulting relation are a subset of the tuples in theoriginal relation.
An example of an select operation
14.75
ProjectThe project operation is also a unary operation and createsanother relation. The attributes (columns) in the resultingrelation are a subset of the attributes in the original relation.
Figure 14.11 An example of a project operation
14.76
JoinThe join operation is a binary operation that combines tworelations on common attributes.
An example of a join operation
14.77
UnionThe union operation takes two relations with the same set ofattributes.
An example of a union operation
14.78
IntersectionThe intersection operation takes two relations and creates anew relation, which is the intersection of the two.
An example of an intersection operation
14.79
Difference
The difference operation is applied to two relations with thesame attributes. The tuples in the resulting relation are thosethat are in the first relation but not the second.
Figure 14.15 An example of a difference operation
The design of any database is a lengthy and involvedtask that can only be done through a step-by-stepprocess.
The first step normally involves interviewing potentialusers of the database.
The second step is to build an entity-relationshipmodel (ERM) that defines the entities, the attributes ofthose entities and the relationship between thoseentities.
DATABASE DESIGN
Designing Databases
Conceptual (logical) design: Abstract model from businessperspective
Physical design: How database is arranged on direct‐access storage devices
Design process identifiesRelationships among data elements, redundant database elementsMost efficient way to group data elements to meet business requirements, needs of application programs
NormalizationStreamlining complex groupings of data to minimize redundant data elements and awkward many‐to‐many relationships
14.82
Entity-relationship models (ERM)Database Design
In this step, the database designer creates an entity-relationship (E-R) diagram to show the entities for whichinformation needs to be stored and the relationship betweenthose entities. E-R diagrams uses several geometric shapes,but we use only a few of them here:
Rectangles represent entity setsEllipses represent attributesDiamonds represent relationship setsLines link attributes to entity sets and link entity sets
to relationships sets
14.83
A very simple E-R diagram with three entity sets, their attributesand the relationship between the entity sets.
14.84
From E-R diagrams to relationsAfter the E-R diagram has been finalized, relations (tables) in therelational database can be created.
Relations for entity setsFor each entity set in the E-R diagram, we create a relation (table) inwhich there are n columns related to the n attributes defined for thatset.
Entities, attributes and relationships in an E-R diagram
14.85
We can have three relations (tables), one for each entity setdefined in Figure .
Relations for entity set
14.86
Relations for relationship sets
For each relationship set in the E-R diagram, we create arelation (table). This relation has one column for the key ofeach entity set involved in this relationship and also onecolumn for each attribute of the relationship itself if therelationship has attributes (not in our case).
14.87
The relations for these relationship sets are added to the previousrelations for the entity set and shown
Relations for E-R diagram
14.88
NormalizationNormalization is the process by which a given set ofrelations are transformed to a new set of relations with amore solid structure.
Normalization is needed to allow any relation in the databaseto be represented, to allow a language like SQL to usepowerful retrieval operations composed of atomicoperations, to remove anomalies in insertion, deletion, andupdating, and reduce the need for restructuring the databaseas new data types are added.
14.89
First normal form (1NF)
When we transform entities or relationships into tabularrelations, there may be some relations in which there aremore values in the intersection of a row or column.
Figure 14.19 An example of 1NF
14.90
Second normal form (2NF)
In each relation we need to have a key (called a primary key)on which all other attributes (column values) need to depend.For example, if the ID of a student is given, it should bepossible to find the student’s name.
An example of 2NF
14.91
Other normal forms
Other normal forms use more complicated dependenciesamong attributes. We leave these dependencies to booksdedicated to the discussion of database topics.
An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.
AN UNNORMALIZED RELATION FOR ORDERExample
NORMALIZED TABLES CREATED FROM ORDER
An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.
Entity‐relationship diagramUsed by database designers to document the data modelIllustrates relationships between entities
Map binary relationships• The procedure for representing relationships depends on both the
degree of the relationships (unary, binary, ternary) and the cardinalities of the relationships
Map binary one‐to‐one relationships(1:1)
In a 1:1 relationship, the association in one direction is nearly always optional one, whilst the association in the other direction is mandatory one
You should include in the relation on the optional side of the relationship the foreign key of the entity type that has the mandatory participation in the 1:1 relationship
Map binary one‐to‐one relationships
• Any attributes associated wit the relationship itself are also included in the same relation as the foreign key
• The following Fig. Shows a binary 1:1 relationship between NURSE and CARE_CENTER, where each care centre must have a nurse who is in charge of that centre – so the association from care centre to nurse is a mandatory one, while the association from nurse to care centre is an optional one (since any nurse may or may not be in charge of a care centre)
Map binary one‐to‐many (1:M) relationships
• First create a relation for each of the two entity types participating in the relationship
• Next include the primary key attribute(s) of the entity on the one‐side as a foreign key in the relation that is on the many‐side
• ‘Submits’ relationship in the following Fig. shows the primary key Customer_ID of CUSTOMER (the one‐side) included as a foreign key in ORDER (the many‐side) (signified by the arrow)
Example of mapping a 1:M relationship
Relationship between customers and orders
Note the mandatory one
Map binary many‐to‐many (M:N) relationships
• If such a relationship exists between entity types A and B, we create a new relation C, then include as foreign keys in C the primary keys for A and B, then these attributes become the primary key of C
• In the following Fig., first a relation is created for VENDOR and RAW_MATERIALS, then a relation QUOTE is created for the ‘Supplies’ relationship – with primary key formed from a combination of Vendor_ID and Material_ID (primary keys of VENDOR and RAW_MATERIALS). These are foreign keys that point to the respective primary keys
Example of mapping an M:N relationshipER diagram (M:N)
The Supplies relationship will need to become a separate relation
This diagram shows the relationships between the entities SUPPLIER, PART, LINE_ITEM, and ORDER that might be used to model the database
AN ENTITY‐RELATIONSHIP DIAGRAMThis graphic shows an example of an entity relationship diagram.
It shows that one ORDER can contain many LINE_ITEMs. (A PART can beordered many times and appear many times as a line item in a single order.)
Each LINE ITEM can contain only one PART.
Each PART can have only one SUPPLIER, but many PARTs can be providedby the same SUPPLIER.
Distributing databases:Operations
Storing database in more than one place
Partitioned: Separate locations store different parts of database
Replicated: Central database duplicated in entiretyat different locations
There are alternative ways of distributing a database. The central database can be partitioned (a) so that each remote processor has the necessary data to serve its own local needs. The central database also can be replicated (b) at all remote locations.
Distributed Databases
Very large databases and systems require special capabilities, tools To analyze large quantities of data To access data from multiple systems
Three key techniques1.Data warehousing 2.Data mining3.Tools for accessing internal databases through the Web
UsingDatabasestoImproveBusinessPerformanceandDecisionMaking
3‐106
DATABASE MANAGEMENT SYSTEM TOOLS
Five software components:1. DBMS engine2. Data definition subsystem3. Data manipulation subsystem4. Application generation subsystem5. Data administration subsystem
3‐108
DBMS Engine
• DBMS engine – accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device
• DBMS engine separates the logical from the physical
3‐109
DBMS Engine
• Physical view – how information is physically arranged, stored, and accessed on some type of storage device
• Logical view – how you as a knowledge worker need to arrange and access information
• With a database, you only concern yourself with your logical view
3‐110
Data Definition Subsystem
• Data definition subsystem – helps you create and maintain the data dictionary and define the structure of the files in a database
• You must create a data dictionary before entering information into a database
• Module J covers this for Microsoft Access
3‐111
Data Manipulation Subsystem
• Data manipulation subsystem – helps you add, change, and delete information
• This is your primary DBMS interface as you work with a database– Views– Report generators– QBE tools– SQL
3‐112
Views
• View – allows you to see the contents of a database file– Make whatever changes you want– Perform simple sorting– Query to find the location of information– Looks similar to a workbook with no row numbers
3‐114
Report Generators
• Report generator – helps you quickly define formats of reports and what information you want to see in a report
• You can save report formats and generate reports at any time with up‐to‐date information
3‐116
QBE Tools
• Query‐by‐example (QBE) tool – helps you graphically design the answer to a question
• “What driver most often delivers concrete to Triple A Homes?”
3‐118
SQL
• Structured query language (SQL) –standardized fourth‐generation language found in most DBMSs
• Performs the same task as a QBE tool– But uses a sentence structure instead of point‐and‐click interface
• SQL is used mostly by IT people
3‐119
Application Generation Subsystem
• Application generation subsystem – contains facilities to help you develop transaction‐intensive applications– Data entry screen (called forms)– Programming languages
• Used mostly by IT specialists
3‐120
Data Administration Subsystem
• Data administration subsystem – helps you manage the overall database environment– Backup and recovery– Security management– Query optimization– Concurrency control– Change management
3‐121
Data Administration Subsystem
• Backup and recovery– Periodically back up information– Recover a database if a failure occurs
• Security management– Who has access to what information– Who can perform certain tasks (e.g., add, change, or delete) on information
3‐122
Data Administration Subsystem
• Query optimization– Restructure physical view of information to optimize response times to queries
• Concurrency control– What happens if two people makes changes to the same information at the same time?
3‐123
Data Administration Subsystem
• Change management– What is the effect of structural changes to a database?
– What if you add a new column?– What happens if you delete a column?– What happens if you change a column’s attributes?
3‐124
DATA WAREHOUSES AND DATA MINING
• Data warehouses support OLAP and decision making
• Data warehouses do not support OLTP• Data‐mining tools are the tools you use to work with a data warehouse– DBMS software = database– Data‐mining tools = data warehouse
3‐125
What Is a Data Warehouse?
• Data warehouse – logical collection of information – gathered from operational databases – used to create business intelligence that supports business analysis activities and decision‐making tasks
Components of a Data Warehouse
9‐126 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
3‐127
Data Warehouse Summary
• Multidimensional• Rows and columns• Also layers• Many times called hypercubes
The view that is showing is product versus region. If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.
MULTIDIMENSIONAL DATA MODEL
TheDatabaseApproachtoDataManagement
3‐129
Functions
• Online transaction processing (OLTP) – the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information– Databases support OLTP– Operational database – databases that support OLTP
3‐130
Functions
• Online analytical processing (OLAP) – the manipulation of information to support decision making– Databases can support some OLAP– Data warehouses only support OLAP, not OLTP– Why?– Data warehouses are special forms of databases that support decision making
Online analytical processing (OLAP)
Supports multidimensional data analysisViewing data using multiple dimensionsEach aspect of information (product, pricing, cost, region, time period) is different dimension
E.g., how many washers sold in the East in June compared with other regions?
OLAP enables rapid, online answers to ad hoc queries
Data marts: Subset of data warehouseSummarized or highly focused portion of firm’s data for use by specific population of usersTypically focuses on single subject or line of business
Data warehouse: Stores current and historical data from many core operational transaction systemsConsolidates and standardizes information for use across enterprise, but data cannot be alteredData warehouse system will provide query, analysis, and reporting tools
3‐133
Data Marts• Data warehouses can support all of an organization’s information
• Data marts have subsets of an organizationwide data warehouse
• Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept
Components of a Data Mart
9‐134 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall
Business Intelligence(BI):
Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions
E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customersPrinciple tools include:
Software for database query and reportingOnline analytical processing (OLAP)Data mining
Object in Business Information Systems
Data mining:More discovery driven than OLAPFinds hidden patterns, relationships in large databases and infers rules to predict future behaviorE.g., Finding patterns in customer data for one‐to‐one marketing campaigns or to identify profitable customers.
Types of information obtainable from data mining
Associations, Sequences, ClassificationClustering, Forecasting
UsingDatabasestoImproveBusinessPerformanceandDecisionMaking
Predictive analysis in Data Mining;
Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events
E.g., Probability a customer will respond to an offer
3‐143
What Are Data‐Mining Tools?
• Data‐mining tools – software tools that you use to query information in a data warehouse– Query‐and‐reporting tools– Intelligence agents– Multidimensional analysis tools– Statistical tools
3‐146
Query‐And‐Reporting Tools
• Query‐and‐reporting tools – similar to QBE tools, SQL, and report generators in the typical database environment
3‐147
Intelligent Agents
• Use various artificial intelligence tools such as neural networks and fuzzy logic to form the basis for “information discovery” and building business intelligence
• Help you find hidden patterns in information
3‐148
Multidimensional Analysis Tools
• Multidimensional analysis (MDA) tools –slice‐and‐dice techniques that allow you to view multidimensional information from different perspectives– Bring new layers to the front– Reorganize rows and columns
3‐149
Statistical Tools
• Help you apply various mathematical models to the information stored in a data warehouse to discover new information– Regression– Analysis of variance– And so on
© Gabriele Piccoli
Enterprise Application Integration
• “Re‐architecting” existing programs so that an intermediate layer, termed middleware, is developed between the applications and the databases
• Designed to make calls to the middleware layer rather than the other applications
• Streamlines maintenance process because changes to an application will not affect all the interfaces connected to it
© Gabriele Piccoli
The EAI Approach
ERP
Legacy Application
Database 1
Middlew
are
Database 2
Legacy Application
SCM
SCM: Supply Chain ManagementERP: Enterprise Resource Planning
Expert Systems vs. DSS
Expert System• Inject expert knowledge in
to a computer system.• Automate decision making.• The decision environments
have structure• The alternatives and goals
are often established in advance.
• The expert system can eventually replace the human decision maker.
Decision Support System• Extract or gain knowledge
from a computer system• Facilitates decision making• Unstructured environment• Alternatives may not be
fully realized yet• Use goals and the system
data to establish alternatives and outcomes, so a good decision can be made
What challenges does the increase in unstructured data present for businesses?How does text‐mining improve decision‐making?What kinds of companies are most likely to benefit from text mining software?
In what ways could text mining potentially lead to the erosion of personal information privacy?
WHAT CAN BUSINESSES LEARN FROM TEXT MINING? Text mining
Extracts key elements from large unstructured data sets (e.g., stored e‐mails)
Web miningDiscovery and analysis of useful patterns and information from WWW
E.g., to understand customer behavior, evaluate effectiveness of Web site, etc.
Web content miningKnowledge extracted from content of Web pages
Web structure miningE.g., links to and from Web page
Web usage miningUser interaction data recorded by Web server
Databases and the WebMany companies use Web to make some internal databases available to customers or partnersTypical configuration includes:
Web serverApplication server/middleware/CGI scriptsDatabase server (hosting DBM)
Advantages of using Web for database access:Ease of use of browser softwareWeb interface requires few or no changes to databaseInexpensive to add Web interface to system
Firms use the Web to make information from theirinternal databases available to customers and partners
• Middleware and other software make this possible
• Database servers• CGI(Computer Gateway Interface)• Web interfaces provide familiarity to users andsavings over redesigning and rebuilding legacysystems
Users access an organization’s internal database through the Web using their desktop PCs and Web browser software.
LINKING INTERNAL DATABASES TO THE WEB
Establishing an information policyFirm’s rules, procedures, roles for sharing, managing, standardizing dataData administration:
Firm function responsible for specific policies and procedures to manage data
Data governance: Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations
Database administration:Defining, organizing, implementing, maintaining database; performed by database design and management group
ManagingDataResources
Basic : True Data
Good: Many(File, Record)
Better : Organized(Database, Data Where house)
Best : Analysis, Intelligence( Data mining, Intelligence)
Nature and Quality of Data
3‐169
MANAGING THE INFORMATION RESOURCE
• Information is an organizational resource• Just like people, capital, and equipment• It must be managed effectively based on True data and Systems
3‐170
MANAGING THE INFORMATION RESOURCE
• Who should oversee your organization’s information resource?– Chief information officer (CIO) – oversees an organization’s information resource
– Data administration – plans for, oversees the development of, and monitors the information resource
– Database administration – technical and operational aspects of managing information
3‐171
MANAGING THE INFORMATION RESOURCE
• Is information ownership a consideration?– If you create information, you “own” it– You will also share it with others– Because you “own” it, you are responsible for its quality
3‐172
MANAGING THE INFORMATION RESOURCE
• How “clean” must your information be?– Duplicate information (records) must be eliminated
– Inaccurate information must be corrected– Information forms the basis of business intelligence
– If your business intelligence is bad, you will make poor decisions
Ensuring data qualityMore than 25% of critical data in Fortune 1000 company databases are inaccurate or incompleteMost data quality problems stem from faulty input
Before new database in place, need to:Identify and correct faulty data Establish better routines for editing data once database in operation
Data quality audit:Structured survey of the accuracy and level of completeness of the data in an information system
Survey samples from data files, orSurvey end users for perceptions of quality
Data cleansingSoftware to detect and correct data that are incorrect, incomplete, improperly formatted, or redundantEnforces consistency among different sets of data from separate information systems
Assess the business impact of credit bureaus’ data quality problems for the credit bureaus, for lenders, for individuals.
Are any ethical issues raised by credit bureaus’ data quality problems?
Analyze the people, organization, and technology factors responsible for credit bureaus’ data quality problems.
What can be done to solve these problems?
CREDIT BUREAU ERRORS—BIG PEOPLE PROBLEMS
3‐176
Data Mining as a Career Opportunity
• Knowledge of data mining can be a substantial career opportunity for you– Query and Analysis and Enterprise Analytic Tools (Business Objects)
– Business Intelligence and Information Access tools (SAS)
– Many in Cognos (the data warehouse leader)– PowerAnalyzer (Informatica)
SAS: System Analysis Scientist
Describe how a relational database organizesdata and compare its benefits
Identify and describe the principles of a databasemanagement system.
Evaluate tools and technologies for providinginformation from databases to improve businessperformance and decision making.
Review ?
3‐178
CAN YOU…
Describe business intelligence and its roleCompare databases and data warehouses by
OLTP and OLAPDefine 5 software components of a DBMS
3‐179
CAN YOU…
List/describe key characteristics of a data warehouse
Define 4 major types of data‐mining toolsList key considerations in managing information
as a resource
DSS: Decision Support Systems and
AI: Artificial IntelligenceIn Business
Appendix for business Intelligence
AI in BusinessSome Commercial Applications• Decision Support• Expert Systems• Information Retrieval• Virtual Reality• Robotics
I’m ready to do some business
Overview of AI• Goal of AI
– develop computer systems that exhibit intelligence or simulate the ability to think
• AI pioneered by Computer Science• But, AI involves a combination of
– Computer Science, Biology, Psychology, Linguistics, Mathematics,Engineering
What really is Intelligence?
• Specifically, what are the signs of Intelligent Behavior?
• Think about it for a while
Which of the following is the best example of intelligent behavior?
101 2 3 4
25% 25%25%25%
1. Ability to add numbers2. Ability to see and recognize
objects3. Ability to adapt to
surroundings4. Ability to learn for mistakes
What really is Intelligence?• You are about to start an online chat (IM) with two entities:– One entity is a human– The other is a computer
• After hours of conversation, you can not tell which entity is a computer.
• Does this mean the computer is Intelligent?
Intelligent Behavior• What are some of the signs, attributes, or characteristics of Intelligent Behavior
Characteristics of Intelligent Behavior
1. Learn from experience & apply the knowledge
Computer can automatically improve performance based on Experience Machine Learning Computational Learning
Characteristics of Intelligent Behavior
2. Handle complex situations Computer Systems can often handle
complexity better than humans Consider a process control system that
must simultaneous track 100 different system variables.
Characteristics of Intelligent Behavior
3. Solve problems when important information is missing
Computer Systems can find patterns and deal with all sorts of missing information
Characteristics of Intelligent Behavior
4. React quickly & correctly to new situations; Acquire & Apply Knowledge
Here is where computers start to fail. Adapting to completely new situations is a
problem for computer systems. Its very difficult to design a computer system
that can combine, connect, and acquire knowledge to solve completely new problems
Characteristics of Intelligent Behavior
5. Determine what is important.6. Exhibit creativity and imagination7. Process visual information efficiently8. Use reason to solve problems
These are some other Characteristics that humans possess.
Computer systems have a lot of catching up to do.
Which of the following do computer need to catch up on?
101 2 3 4
25% 25%25%25%
1. Determine what is important.2. Exhibit creativity and
imagination3. Process visual information
efficiently4. Use reason to solve problems
AI in Business• AI continues to improve and evolve.• Scientists and Engineers are pushing the envelope of what is possible.
• In Business, there is a better understanding of the capabilities of Intelligent Computer Systems
• It is important to know which types of problems are suited for humans, and which are suited for Computers.
Human Intelligence vs. AI
Attribute HumanIntelligence
ArtificialIntelligence
Use a variety of information sources
High High
Ability to acquire large amounts of external info.
Medium High
Ability to do rapid, accurate, and complex calculations
Low High
Ability to transfer information rapidly
Low High
Human Intelligence vs. AI
Attribute HumanIntelligence
ArtificialIntelligence
Ability to use sensors or senses High Medium
Creativity or imagination High Low
Ability to learn from experience High Medium
Ability of be adaptive High Medium
AI: Commercial Domains• Decision Support
– Integrating the advantages of AI with Human Intelligence.
– More intelligent Interfaces– More intelligent processing for massive data
AI: Commercial Domains• Information Retrieval
– Automatic simplification for massive data– Natural language technology: computer can speak our language.
AI: Commercial Domains• Virtual Reality
– Better training environment from pilots to doctors
• Robotics– Bringing the precision and speed of computers into the physical world
– Goes beyond manufacturing and assembly lines; Baggage Inspection, Bomb Removal, Replacement Limbs.
Expert Systems• The idea is to inject expert knowledge in to a computer system.
• The primary purpose is to automate decision making.
• The decision environments have structure• The alternatives and goals are often established in advance.
Expert Systems vs. DSSExpert System• Inject expert knowledge in
to a computer system.• Automate decision making.• The decision environments
have structure• The alternatives and goals
are often established in advance.
• The expert system can eventually replace the human decision maker.
Decision Support System• Extract or gain knowledge
from a computer system• Facilitates decision making• Unstructured environment• Alternatives may not be
fully realized yet• Use goals and the system
data to establish alternatives and outcomes, so a good decision can be made
What is the biggest difference between a Decision Support System and an MIS
101 2 3 4
25% 25%25%25%
1. DSS’s are interactive and ad hoc
2. DSS’s focus on transforming information into knowledge
3. MIS’s focus on transforming data into information
4. All of the above
What is the biggest difference between an MIS and TPS
101 2 3 4
25% 25%25%25%
1. in a TPS there is no analysis
2. an MIS focuses on reports
3. an TPS focuses on updating a database
4. All of the above
How is the analysis different for a MIS vs. DSS
101 2 3 4
25% 25%25%25%
1. MIS: Analysis involves computing aggregates
2. MIS: Analysis involves creating useful charts and graphs
3. DSS: Connects information with decisions
4. DSS: Builds scenarios
Some Interesting Applications of Expert Systems
• Triage – Medical Diagnosis (Medical Expert System)– User enters symptoms– System makes diagnosis– Doctors collective expertise is captured in the system
• Patriot Missile Guidance System– Radar identifies Scud missile– System steers Patriot missile to it intercepts Scud missile– Laws of physics, expert knowledge about missile trajectory is captured in the system
• Financial Decision Making – Currency Trading
Expert System Categories
• Decision Making– buy/sell– risk/no risk– rain/ no rain
• Trouble Shooting / Diagnosis– Hello welcome to Dell; how can I help you?
– Suddenly an idiot seems like an expert.
• Selection/Classification– Tell me what you see, expert system figures out what it really is...
• Process Monitoring and Control– Robot control, assembly‐line control, missile control
• Design/Configuration– Specify what you want, expert system figures out specifically how to do it.
Expert System Components
Knowledge base
user
Expert System Software
UserInterface
Engine
Raw Data or Facts
Expert or Knowledge Engineer
Knowledge Acquisition Program
Expert System Development Process
Expert System Components
Knowledge base
Non‐expert
Robot
Missile
Expert System Software
Interface Engine
Raw Data or Facts
Expert or Knowledge Engineer
Knowledge Acquisition Program
Expert System Development Process