Lecture 16 TIM 50 Autumn 2012 - Course Web Pages · Lecture 16 TIM 50 Autumn 2012 Tuesday November...

Lecture 16TIM 50 Autumn 2012Tuesday November 20, 2012

FOUNDATIONS OF BUSINESS INTELLIGENCE:DATABASES AND

INFORMATION MANAGEMENT

Announcement

1. The grades for every assignment will be given in eCommons.2. It's important to check webpage to get the latest informationand assignments changes.3. No Office hours on Wednesday, Friday( 11/21, 11/23)

No Class on Thanks Giving Day, 11/22 Thursday

Final Exam 1st Choice December 7, Friday2nd Choice December 10, Monday depending on Schedule Permission

Format is same as MidtermCovering Up to Midterm 30‐ %

After Midterm 70+ %

The problems of managing data resources in a traditional fileenvironment

Important database design principlesThe database management systemThe capabilities and value of a database management systemTools and technologies for accessing information from databasesBusiness Intelligence, Data MiningThe role of information policy, data administration, and data quality assurance in the management of a firm’s data resources

Topics of Business intelligences

Foundation of business Intelligence

Data Base SystemInformation Management

File Management SystemFile Processing Procedure

Data Base SystemsDBMS,SQL

Intelligence from Collection of DataInformation ManagementBusiness Applications

Data Integrity ControlBusiness Data Maintenances

Division OrientedPaper File SystemsManual Processing

System InefficienciesLonger Business CycleNo Firm wise Information or Data AccessNo Data SecurityNo Decision son Integrated Data and informHigh Business Process Expenditure

Data Base SystemsRelational DBObject Oriented DBDBMS

Less RedundancyData IntegrityEfficiencyData Confidentiality

Data redundancy: Data inconsistency: Program‐data dependence:Lack of flexibilityPoor securityLack of data sharing and availability

File organization conceptsDatabase: Group of related filesFile: Group of records of same type Record: Group of related fieldsField: Group of characters as word(s) or number

Describes an entity (person, place, thing on which we store information)

OrganizingDatainaTraditionalFileEnvironment

Attribute: Each characteristic, or quality, describing entityE.g., Attributes Date or Grade belong to entity COURSE

A computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.

THE DATA HIERARCHY

Information as Processed Data

Old Business Process;Files maintained separately by different departments

Data redundancy: Presence of duplicate data in multiple files

Data inconsistency: Same attribute has different values

Program‐data dependence:When changes in program requires changes to data accessed by program

Lack of flexibilityPoor securityLack of data sharing and availability

Problems with the traditional file environment

The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources.

TRADITIONAL FILE PROCESSING

Business Processes with Old Data Processing

System InefficienciesLonger Business CycleNo Firm wise Information or Data AccessNo Data SecurityNo Decision on Integrated Data and informationHigh Business Process Expenditure

Database – collection of persistent data frombusiness divisions

Database Management System (DBMS) –software system that supports creation,population, and querying of a database

Introduction of Data Processing System

DatabaseServes many business applications by centralizing data and controlling redundant data across division boundaries

Database management system (DBMS)Interfaces between applications and physical data filesSeparates logical and physical views of dataSolves problems of traditional file environment

Controls redundancyEliminates inconsistencyUncouples programs and dataEnables organization to centrally manage data and data security

TheDatabaseApproachtoDataManagement

14.13

DefinitionAlthough it is difficult to give a universally agreed definitionof a database, we use the following common definition:

Definition:A database is a collection of related, logically coherent

data used by the application programs in an organization.

14.14Database architecture

DATABASE ARCHITECTURE

The American NationalStandards Institute/StandardsPlanning and RequirementsCommittee (ANSI/SPARC)has established a three-levelarchitecture for a DBMS:internal, conceptual andexternal .

14.15

HardwareThe hardware is the physical computer system that allowsaccess to data.

SoftwareThe software is the actual program that allows users toaccess, maintain and update data. In addition, the softwarecontrols which user can access which parts of the data in thedatabase.

ConfidentialityThe data in a database is stored physically on the storagedevices. In a database, data is a separate entity from thesoftware that accesses it.

14.16

Users

In a DBMS, the term users has a broad meaning. We candivide users into two categories: end users and applicationprograms.

ProceduresThe last component of a DBMS is a set of procedures orrules that should be clearly defined and followed by the usersof the database.

14.17

Advantages of databasesComparing the flat-file system, we can mention severaladvantages for a database system.

Less redundancyIn a flat-file system there is a lot of redundancy. Forexample, in the flat file system for a university, the names ofprofessors and students are stored in more than one file.

Avoidance of InconsistencyInconsistencyIf the same piece of information is stored in more than oneplace, then any changes in the data need to occur in all placesthat data is stored.

14.18

EfficiencyA database is usually more efficient that a flat file system,because a piece of information is stored in fewer locations.

Data integrityIn a database system it is easier to maintain data integrity ,because a piece of data is stored in fewer locations.

ConfidentialityIt is easier to maintain the confidentiality of the informationif the storage of data is centralized in one location.

Data integrity contains guidelines for, data retention, specifying or guaranteeing the length of time of data can be retained

Evolution of Database Technologies

Evolution of database systems

• 2000 and beyond – multi –tier, client‐server,• Distributed environments, • Web‐based, • Content‐addressable storage, data mining

DATA BASE MODEL OVERVIEW

• ER‐Model• Hierarchical Model• Network Model• Relational Model• Object‐Oriented Model(s)

ER‐Model

• Data Structures• Integrity Constraints• Operations

The ER‐Model is extremely successful as a database design model Translation algorithms to many data models Commercial database design tools, e.g., ERwin No generally accepted query

language No database system is based on the model

ER: Entry Relation

ER‐Model ‐ Integrity Constraints

A

E

E1 E3E2

RE1 E21 n

RE1 E2

RE1 E2R

cardinality: 1:n for E1:E2 in R

total participation of E2 in R

weak entity type E2; identifying relationship type R key attribute

dxp

disjointexclusion

partition

14.24

Hierarchical Database ModelIn the hierarchical model, data is organized as an invertedtree. Each entity has only one parent but can have severalchildren. At the top of the hierarchy, there is one entity,which is called the root.

An example of the hierarchical model representing a university

14.25

Network Database ModelIn the network model, the entities are organized in a graph,in which some entities can be accessed through several paths(Figure 14.4).

An example of the network model representing a university

14.26

Object-Oriented Databases(OODB)An object-oriented database tries to keep the advantages of therelational model and at the same time allows applications to accessstructured data. In an object-oriented database, objects and theirrelations are defined. In addition, each object can have attributesthat can be expressed as fields.

XMLThe query language normally used for objected-oriented databasesis XML (Extensible Markup Language). As we discussed inChapter 6, XML was originally designed to add markupinformation to text documents, but it has also found its applicationas a query language in databases. XML can represent data withnested structures.

Object‐Oriented Model

based on the object‐oriented paradigm,e.g., Simula, Smalltalk, C++, Java

object‐oriented model has object‐oriented repository model; adds persistence and database capabilities; (see ODMG‐93, ODL, OQL)

object‐oriented commercial systems include GemStone, Ontos, Orion‐2, Statice, Versant, O2

14.28

Relational Database ModelIn the relational model, data is organized in two-dimensionaltables called relations. The tables or relations are, however,related to each other, as we will see shortly.

An example of the relational model representing a university

Relational DBMS;Represent data as two‐dimensional tables called relations or files.

Each table contains data on entity and attributesTable: grid of columns and rowsRows (tuples): Records for different entitiesFields (columns): Represents attribute for entityKey field: Field used to uniquely identify each recordPrimary key: Field in table used for key fieldsForeign key: Primary key used in second table as

look‐up field to identify records from original table

In the relational database management system (RDBMS), the data isrepresented as a set of relations.

14.30

RelationsA relation appears as a two-dimensional table. The RDBMSorganizes the data so that its external view is a set ofrelations or tables. This does not mean that data is stored astables: the physical storage of the data is independent of theway in which the data is logically organized.

An example of a relation

14.31

A relation in an RDBMS has the following features:

Name. Each relation in a relational database should havea name that is unique among other relations.

Attributes. Each column in a relation is called anattribute. The attributes are the column headings in thetable in Figure 14.6.

Tuples. Each row in a relation is called a tuple. A tupledefines a collection of attribute values. The total numberof rows in a relation is called the cardinality of therelation. Note that the cardinality of a relation changeswhen tuples are added or deleted. This makes thedatabase dynamic.

Schemas

• The name of a relation and the set of attributes for a relation is called a schema.

• We show the schema for the relation with the relation name followed by a parenthesized list of its attributes.

• Movies (title, year, length).• Relational database schema = collection of relation schemas.

A relational database organizes data in the form of two‐dimensional tables. Illustrated here are tables for the entities SUPPLIER and PART showing how they represent each entity and its attributes. Supplier Number is a primary key for the SUPPLIER table and a foreign key for the PART table.

RELATIONAL DATABASE TABLES

Operations of a Relational DBMSThree basic operations used to develop useful sets of data

SELECT: Creates subset of data of all records that meet stated criteriaJOIN: Combines relational tables to provide user with more information than available in individual tablesPROJECT: Creates subset of columns in table, creating tables with only the information specified

The select, join, and project operations enable data from two different tables to be combined and only selected attributes to be displayed.

THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS

Relational Database Example

• Relational Database Management System (RDBMS)– Consists of a number of tables and single schema (definition of tables

and attributes)– Students (sid, name, login, age, gpa),Students identifies the table

sid, name, login, age, gpa identify attributes, sid is primary key

An Example Table

• Students (sid: string, name: string, login: string, age: integer, gpa: real) S1

sid name login age gpa50000 Dave dave@cs 19 3.353666 Jones jones@cs 18 3.453688 Smith smith@ee 18 3.253650 Smith smith@math 19 3.853831 Madayan madayan@music 11 1.853832 Guldu guldu@music 12 2.0

Another table: Courses

• Courses (cid, instructor, quarter, dept) E

cid instructor quarter dept

Carnatic101 Jane Fall 06 Music

Reggae203 Bob Summer 06 Music

Topology101 Mary Spring 06 Math

History105 Alice Fall 06 History

Keys

• Primary key – minimal subset of fields that is unique identifier for a tuple– sid is primary key for Students– cid is primary key for Courses

• Foreign key –connections between tables– Courses (cid, instructor, quarter, dept)– Students (sid, name, login, age, gpa)– How do we express which students take each

course?

Many to many relationships• In general, need a new table

Enrolled(cid, grade, studid)Studid is foreign key that references sid in Student table

cid grade studid

Carnatic101 C 53831

Reggae203 B 53832

Topology112 A 53650

History 105 B 53666

sid name login50000 Dave dave@cs

53666 Jones jones@cs

53688 Smith smith@ee

53650 Smith smith@math

53831 Madayan

madayan@music

53832 Guldu guldu@music

EnrolledStudent

Foreignkey

Relational Algebraprocess for working

• Collection of operators for specifying queries• Query describes step‐by‐step procedure for computing answer (i.e., operational)

• Each operator accepts one or two relations as input and returns a relation as output

• Relational algebra expression composed of multiple operators

Basic operators

• Selection – return rows that meet some condition

• Projection – return column values• Union• Cross product• Difference• Other operators can be defined in terms of basic operators

Simplified Schema Example

• Courses (cid, instructor, quarter, dept)• Students (sid, name, gpa)• Enrolled (cid, grade, studid)

Set Operations

• Union (R U S)– All tuples in R or S (or both)– R and S must have same number of fields– Corresponding fields must have same domains

• Intersection (R ∩ S)– All tuples in both R and S

• Set difference (R – S)– Tuples in R and not S

Set Operations (continued)

• Cross product or Cartesian product (R x S)– All fields in R followed by all fields in S– One tuple (r,s) for each pair of tuples r R, s S

SelectionSelect students with gpa higher than 3.3 from S1:

σgpa>3.3(S1)

S1sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0

sid name gpa53666 Jones 3.453650 Smith 3.8

Projection

Project name and gpa of all students in S1:name, gpa(S1)S1Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0

name gpaDave 3.3Jones 3.4Smith 3.2Smith 3.8Madayan 1.8Guldu 2.0

Combine Selection and Projection• Project name and gpa of students in S1 with gpa higher than 3.3:

name,gpa(σgpa>3.3(S1))

Sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0

name gpaJones 3.4Smith 3.8

Example: Intersection

sid name gpa50000 Dave 3.353666 Jones 3.453688 Smith 3.253650 Smith 3.853831 Madayan 1.853832 Guldu 2.0

sid name gpa53666 Jones 3.453688 Smith 3.253700 Tom 3.553777 Jerry 2.853832 Guldu 2.0

S1 S2

S1 S2 =

sid name gpa53666 Jones 3.453688 Smith 3.253832 Guldu 2.0

Joins• Combine information from two or more tables• Example: students enrolled in courses:S1 S1.sid=E.studidE


cid grade studid

Carnatic101 C 53831

Reggae203 B 53832

Topology112 A 53650

History 105 B 53666

S1E

Joins

sid name gpa cid grade studid53666 Jones 3.4 History105 B 5366653650 Smith 3.8 Topology112 A 5365053831 Madayan 1.8 Carnatic101 C 5383153832 Guldu 2.0 Reggae203 B 53832


cid grade studid

Carnatic101 C 53831

Reggae203 B 53832

Topology112 A 53650

History 105 B 53666

S1E

A1 A2 A3 ... An

a1 a2 a3 an

b1 b2 a3 cn

a1 c2 b3 bn...

x1 v2 d3 wn

Relational Data Model: summary

• Set theoretic• Domain — set of values

• like a data type• Cartesian product (or product)

• D1 　　D2 　　... 　 Dn• n‐tuples (V1,V2,...,Vn)• s.t., V1 　　D1, V2 　　D2,...,Vn　　Dn

–Relation=subset of cartesian product of one or more domains

• FINITE only; empty set allowed–Tuples = members of a relation inst.–Arity = number of domains–Components = values in a tuple–Domains — corresp. with attributes–Cardinality = number of tuples

Relation as tableRows = tuplesColumns = componentsNames of columns = attributesRelation name + set of attribute names= schemaREL (A1,A2,...,An)

Arity

Cardinality

Attributes

Component

Tuple

What is Object Oriented Database? (OODB)

• A database system that incorporates all the important object‐oriented concepts

• Some additional features– Unique Object identifiers– Persistent object handling

Object‐Oriented Concepts

Abstract Data Types Class definition, provides extension to complex attribute types

Encapsulation Implementation of operations and object structure hidden

Inheritance Sharing of data within hierarchy scope, supports code reusability

Polymorphism• Operator overloading

Object‐Oriented DBMS (OODBMS)

Stores data and procedures as objectsObjects can be graphics, multimedia, Java appletsRelatively slow compared with relational DBMS for processing large numbers of transactionsHybrid object‐relational DBMS: Provide capabilities of both OODBMS and relational DBMS

Object Relationships

Object-Oriented Databases

Nelson Caballero - 4/16/2001

• Support data abstraction, encapsulation, and inheritance.

• Allow object identification and communication.

• Reuse and modify objects.

• Deal with complex data types.

EmployeeNameParentsDate of BirthSexGetAge()ComputeSalary()

Methods

Attributes

Class representation Object Inheritance

Object Relationships

Advantages of OODBS

• Designer can specify the structure of objects and their behavior (methods)

• Multimedia Contents• Better interaction with object‐oriented languages such as Java and C++

• Definition of complex and user‐defined types• Encapsulation of operations and user‐defined methods

Relational and Object-Oriented Databases

Database Management System

Nelson Caballero - 4/16/2001

A software system that enables users to create and maintain the database.

Engineering design applications.

Multimedia applications. Knowledge bases. Applications with demanding distribution

and concurrency.

Applications that require advanced

features.

Electronic devices with embeddedsoftware.

Decision support applications.

Ordinary business applications.

Applications that integrate with

legacy systems.

Conservative implementations.

Source: Object oriented Modeling and design for database applications. Blaha, M. and Premerlani, W.

Object Oriented

• A specific type of software for creating, storing,organizing, and accessing data from a database

• Separates the logical and physical views of the data

• Logical view: how end users view data• Physical view: how data are actually structured and

organized

• Examples of DBMS: Microsoft Access, DB2, OracleDatabase, Microsoft SQL Server, MYSQL

Database management system (DBMS)

A single human resources database provides many different views of data, depending on the information requirements of the user. Illustrated here are two possible views, one of interest to a benefits specialist and one of interest to a member of the company’s payroll department.

HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS

Capabilities of Database Management Systems

Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fieldsData dictionary: Automated or manual file storing definitions of data elements and their characteristicsData manipulation language(DML): Used to add, change, delete, retrieve data from database

Data that describes the properties or characteristics of other data

Does not include sample dataAllows database designers and users to understand the meaning of the data

Meta data

Structured Query Language (SQL)

Microsoft Access user tools for generation SQLMany DBMS have report generation capabilities for creating polished reports (Crystal Reports)

Each database will have a set of schemas associated with a catalog.

Schema = the structure that contains descriptions of objects created by a user (base tables, views, constraints)

14.67

Structured Query Language

Structured Query Language (SQL) is the languagestandardized by the American National Standards Institute(ANSI) and the International Organization forStandardization (ISO) for use on relational databases.

It is a declarative rather than procedural language, whichmeans that users declare what they want without having towrite a step-by-step procedure. The SQL language was firstimplemented by the Oracle Corporation in 1979, withvarious versions of SQL being released since then.

SQL Is:• The standard and most common language for relational database management systems

• An SQL‐based relational database application involves a user interface, a set of tables in the database, and a RDBMS with an SQL capability

• Within the RDBMS SQL will be used to create the tables, translate user requests, maintain the data dictionary and system catalog, update an maintain the tables, establish security, and carry out backup and recovery procedures

A simplified schematic of a typical SQL environment

3 types of SQL commands

• Data Definition Language (DDL) commands ‐ that define a database, including creating, altering, and dropping tables and establishing constraints

• Data Manipulation Language (DML) commands ‐ that maintain and query a database

• Data Control Language (DCL) commands ‐ that control a database, including administering privileges and committing data

14.71

InsertThe insert operation is a unary operation—that is, it isapplied to a single relation. The operation inserts a new tupleinto the relation. The insert operation uses the followingformat:

Figure 14.7 An example of an insert operation

14.72

DeleteThe delete operation is also a unary operation. The operationdeletes a tuple defined by a criterion from the relation. Thedelete operation uses the following format:

An example of a delete operation

14.73

UpdateThe update operation is also a unary operation that is appliedto a single relation. The operation changes the value of someattributes of a tuple. The update operation uses the followingformat:

An example of an update operation

14.74

SelectThe select operation is a unary operation. The tuples (rows)in the resulting relation are a subset of the tuples in theoriginal relation.

An example of an select operation

14.75

ProjectThe project operation is also a unary operation and createsanother relation. The attributes (columns) in the resultingrelation are a subset of the attributes in the original relation.

Figure 14.11 An example of a project operation

14.76

JoinThe join operation is a binary operation that combines tworelations on common attributes.

An example of a join operation

14.77

UnionThe union operation takes two relations with the same set ofattributes.

An example of a union operation

14.78

IntersectionThe intersection operation takes two relations and creates anew relation, which is the intersection of the two.

An example of an intersection operation

14.79

Difference

The difference operation is applied to two relations with thesame attributes. The tuples in the resulting relation are thosethat are in the first relation but not the second.

Figure 14.15 An example of a difference operation

The design of any database is a lengthy and involvedtask that can only be done through a step-by-stepprocess.

The first step normally involves interviewing potentialusers of the database.

The second step is to build an entity-relationshipmodel (ERM) that defines the entities, the attributes ofthose entities and the relationship between thoseentities.

DATABASE DESIGN

Designing Databases

Conceptual (logical) design: Abstract model from businessperspective

Physical design: How database is arranged on direct‐access storage devices

Design process identifiesRelationships among data elements, redundant database elementsMost efficient way to group data elements to meet business requirements, needs of application programs

NormalizationStreamlining complex groupings of data to minimize redundant data elements and awkward many‐to‐many relationships

14.82

Entity-relationship models (ERM)Database Design

In this step, the database designer creates an entity-relationship (E-R) diagram to show the entities for whichinformation needs to be stored and the relationship betweenthose entities. E-R diagrams uses several geometric shapes,but we use only a few of them here:

Rectangles represent entity setsEllipses represent attributesDiamonds represent relationship setsLines link attributes to entity sets and link entity sets

to relationships sets

14.83

A very simple E-R diagram with three entity sets, their attributesand the relationship between the entity sets.

14.84

From E-R diagrams to relationsAfter the E-R diagram has been finalized, relations (tables) in therelational database can be created.

Relations for entity setsFor each entity set in the E-R diagram, we create a relation (table) inwhich there are n columns related to the n attributes defined for thatset.

Entities, attributes and relationships in an E-R diagram

14.85

We can have three relations (tables), one for each entity setdefined in Figure .

Relations for entity set

14.86

Relations for relationship sets

For each relationship set in the E-R diagram, we create arelation (table). This relation has one column for the key ofeach entity set involved in this relationship and also onecolumn for each attribute of the relationship itself if therelationship has attributes (not in our case).

14.87

The relations for these relationship sets are added to the previousrelations for the entity set and shown

Relations for E-R diagram

14.88

NormalizationNormalization is the process by which a given set ofrelations are transformed to a new set of relations with amore solid structure.

Normalization is needed to allow any relation in the databaseto be represented, to allow a language like SQL to usepowerful retrieval operations composed of atomicoperations, to remove anomalies in insertion, deletion, andupdating, and reduce the need for restructuring the databaseas new data types are added.

14.89

First normal form (1NF)

When we transform entities or relationships into tabularrelations, there may be some relations in which there aremore values in the intersection of a row or column.

Figure 14.19 An example of 1NF

14.90

Second normal form (2NF)

In each relation we need to have a key (called a primary key)on which all other attributes (column values) need to depend.For example, if the ID of a student is given, it should bepossible to find the student’s name.

An example of 2NF

14.91

Other normal forms

Other normal forms use more complicated dependenciesamong attributes. We leave these dependencies to booksdedicated to the discussion of database topics.

An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.

AN UNNORMALIZED RELATION FOR ORDERExample

NORMALIZED TABLES CREATED FROM ORDER

An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one‐to‐one correspondence between Order_Number and Order_Date.

Entity‐relationship diagramUsed by database designers to document the data modelIllustrates relationships between entities

Map binary relationships• The procedure for representing relationships depends on both the

degree of the relationships (unary, binary, ternary) and the cardinalities of the relationships

Map binary one‐to‐one relationships(1:1)

In a 1:1 relationship, the association in one direction is nearly always optional one, whilst the association in the other direction is mandatory one

You should include in the relation on the optional side of the relationship the foreign key of the entity type that has the mandatory participation in the 1:1 relationship

Map binary one‐to‐one relationships

• Any attributes associated wit the relationship itself are also included in the same relation as the foreign key

• The following Fig. Shows a binary 1:1 relationship between NURSE and CARE_CENTER, where each care centre must have a nurse who is in charge of that centre – so the association from care centre to nurse is a mandatory one, while the association from nurse to care centre is an optional one (since any nurse may or may not be in charge of a care centre)

Mapping a binary 1:1 relationship

Binary 1:1 relationship

Map binary one‐to‐many (1:M) relationships

• First create a relation for each of the two entity types participating in the relationship

• Next include the primary key attribute(s) of the entity on the one‐side as a foreign key in the relation that is on the many‐side

• ‘Submits’ relationship in the following Fig. shows the primary key Customer_ID of CUSTOMER (the one‐side) included as a foreign key in ORDER (the many‐side) (signified by the arrow)

Example of mapping a 1:M relationship

Relationship between customers and orders

Note the mandatory one

Map binary many‐to‐many (M:N) relationships

• If such a relationship exists between entity types A and B, we create a new relation C, then include as foreign keys in C the primary keys for A and B, then these attributes become the primary key of C

• In the following Fig., first a relation is created for VENDOR and RAW_MATERIALS, then a relation QUOTE is created for the ‘Supplies’ relationship – with primary key formed from a combination of Vendor_ID and Material_ID (primary keys of VENDOR and RAW_MATERIALS). These are foreign keys that point to the respective primary keys

Example of mapping an M:N relationshipER diagram (M:N)

The Supplies relationship will need to become a separate relation

This diagram shows the relationships between the entities SUPPLIER, PART, LINE_ITEM, and ORDER that might be used to model the database

AN ENTITY‐RELATIONSHIP DIAGRAMThis graphic shows an example of an entity relationship diagram.

It shows that one ORDER can contain many LINE_ITEMs. (A PART can beordered many times and appear many times as a line item in a single order.)

Each LINE ITEM can contain only one PART.

Each PART can have only one SUPPLIER, but many PARTs can be providedby the same SUPPLIER.

Distributing databases:Operations

Storing database in more than one place

Partitioned: Separate locations store different parts of database

Replicated: Central database duplicated in entiretyat different locations

There are alternative ways of distributing a database. The central database can be partitioned (a) so that each remote processor has the necessary data to serve its own local needs. The central database also can be replicated (b) at all remote locations.

Distributed Databases

Very large databases and systems require special capabilities, tools To analyze large quantities of data To access data from multiple systems

Three key techniques1.Data warehousing 2.Data mining3.Tools for accessing internal databases through the Web

UsingDatabasestoImproveBusinessPerformanceandDecisionMaking

3‐106

DATABASE MANAGEMENT SYSTEM TOOLS

Five software components:1. DBMS engine2. Data definition subsystem3. Data manipulation subsystem4. Application generation subsystem5. Data administration subsystem

3‐107

DATABASE MANAGEMENT SYSTEM TOOLS

3‐108

DBMS Engine

• DBMS engine – accepts logical requests from the various other DBMS subsystems, converts them into their physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device

• DBMS engine separates the logical from the physical

3‐109

DBMS Engine

• Physical view – how information is physically arranged, stored, and accessed on some type of storage device

• Logical view – how you as a knowledge worker need to arrange and access information

• With a database, you only concern yourself with your logical view

3‐110

Data Definition Subsystem

• Data definition subsystem – helps you create and maintain the data dictionary and define the structure of the files in a database

• You must create a data dictionary before entering information into a database

• Module J covers this for Microsoft Access

3‐111

Data Manipulation Subsystem

• Data manipulation subsystem – helps you add, change, and delete information

• This is your primary DBMS interface as you work with a database– Views– Report generators– QBE tools– SQL

3‐112

Views

• View – allows you to see the contents of a database file– Make whatever changes you want– Perform simple sorting– Query to find the location of information– Looks similar to a workbook with no row numbers

3‐113

Views

3‐114

Report Generators

• Report generator – helps you quickly define formats of reports and what information you want to see in a report

• You can save report formats and generate reports at any time with up‐to‐date information

3‐115

Report Generators

3‐116

QBE Tools

• Query‐by‐example (QBE) tool – helps you graphically design the answer to a question

• “What driver most often delivers concrete to Triple A Homes?”

3‐117

QBE Tools

3‐118

SQL

• Structured query language (SQL) –standardized fourth‐generation language found in most DBMSs

• Performs the same task as a QBE tool– But uses a sentence structure instead of point‐and‐click interface

• SQL is used mostly by IT people

3‐119

Application Generation Subsystem

• Application generation subsystem – contains facilities to help you develop transaction‐intensive applications– Data entry screen (called forms)– Programming languages

• Used mostly by IT specialists

3‐120

Data Administration Subsystem

• Data administration subsystem – helps you manage the overall database environment– Backup and recovery– Security management– Query optimization– Concurrency control– Change management

3‐121


• Backup and recovery– Periodically back up information– Recover a database if a failure occurs

• Security management– Who has access to what information– Who can perform certain tasks (e.g., add, change, or delete) on information

3‐122


• Query optimization– Restructure physical view of information to optimize response times to queries

• Concurrency control– What happens if two people makes changes to the same information at the same time?

3‐123


• Change management– What is the effect of structural changes to a database?

– What if you add a new column?– What happens if you delete a column?– What happens if you change a column’s attributes?

3‐124

DATA WAREHOUSES AND DATA MINING

• Data warehouses support OLAP and decision making

• Data warehouses do not support OLTP• Data‐mining tools are the tools you use to work with a data warehouse– DBMS software = database– Data‐mining tools = data warehouse

3‐125

What Is a Data Warehouse?

• Data warehouse – logical collection of information – gathered from operational databases – used to create business intelligence that supports business analysis activities and decision‐making tasks

Components of a Data Warehouse

9‐126 Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall

3‐127

Data Warehouse Summary

• Multidimensional• Rows and columns• Also layers• Many times called hypercubes

The view that is showing is product versus region. If you rotate the cube 90 degrees, the face that will show is product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.

MULTIDIMENSIONAL DATA MODEL

TheDatabaseApproachtoDataManagement

3‐129

Functions

• Online transaction processing (OLTP) – the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information– Databases support OLTP– Operational database – databases that support OLTP

3‐130

Functions

• Online analytical processing (OLAP) – the manipulation of information to support decision making– Databases can support some OLAP– Data warehouses only support OLAP, not OLTP– Why?– Data warehouses are special forms of databases that support decision making

Online analytical processing (OLAP)

Supports multidimensional data analysisViewing data using multiple dimensionsEach aspect of information (product, pricing, cost, region, time period) is different dimension

E.g., how many washers sold in the East in June compared with other regions?

OLAP enables rapid, online answers to ad hoc queries

Data marts: Subset of data warehouseSummarized or highly focused portion of firm’s data for use by specific population of usersTypically focuses on single subject or line of business

Data warehouse: Stores current and historical data from many core operational transaction systemsConsolidates and standardizes information for use across enterprise, but data cannot be alteredData warehouse system will provide query, analysis, and reporting tools

3‐133

Data Marts• Data warehouses can support all of an organization’s information

• Data marts have subsets of an organizationwide data warehouse

• Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept

Components of a Data Mart


3‐135

Data Marts

Business Intelligence(BI):

Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions

E.g., Harrah’s Entertainment analyzes customers to develop gambling profiles and identify most profitable customersPrinciple tools include:

Software for database query and reportingOnline analytical processing (OLAP)Data mining

Object in Business Information Systems

More definition

Data mining:More discovery driven than OLAPFinds hidden patterns, relationships in large databases and infers rules to predict future behaviorE.g., Finding patterns in customer data for one‐to‐one marketing campaigns or to identify profitable customers.

Types of information obtainable from data mining

Associations, Sequences, ClassificationClustering, Forecasting

UsingDatabasestoImproveBusinessPerformanceandDecisionMaking

Predictive analysis in Data Mining;

Uses data mining techniques, historical data, and assumptions about future conditions to predict outcomes of events

E.g., Probability a customer will respond to an offer

3‐142

Information Vs. Intelligence

3‐143

What Are Data‐Mining Tools?

• Data‐mining tools – software tools that you use to query information in a data warehouse– Query‐and‐reporting tools– Intelligence agents– Multidimensional analysis tools– Statistical tools

3‐144

What Are Data‐Mining Tools?

Converging Disciplines


3‐146

Query‐And‐Reporting Tools

• Query‐and‐reporting tools – similar to QBE tools, SQL, and report generators in the typical database environment

3‐147

Intelligent Agents

• Use various artificial intelligence tools such as neural networks and fuzzy logic to form the basis for “information discovery” and building business intelligence

• Help you find hidden patterns in information

3‐148

Multidimensional Analysis Tools

• Multidimensional analysis (MDA) tools –slice‐and‐dice techniques that allow you to view multidimensional information from different perspectives– Bring new layers to the front– Reorganize rows and columns

3‐149

Statistical Tools

• Help you apply various mathematical models to the information stored in a data warehouse to discover new information– Regression– Analysis of variance– And so on

© Gabriele Piccoli

Enterprise Application Integration

• “Re‐architecting” existing programs so that an intermediate layer, termed middleware, is developed between the applications and the databases

• Designed to make calls to the middleware layer rather than the other applications

• Streamlines maintenance process because changes to an application will not affect all the interfaces connected to it

Meta Data Operations

© Gabriele Piccoli

The EAI Approach

ERP

Legacy Application

Database 1

Middlew

are

Database 2

Legacy Application

SCM

SCM: Supply Chain ManagementERP: Enterprise Resource Planning

DSS Characteristics and Capabilities

DSS ComponentsData Management Subsystem

• DSS database • DBMS • Data directory • Query facility

A Web‐Based DSS Architecture

Expert Systems vs. DSS

Expert System• Inject expert knowledge in

to a computer system.• Automate decision making.• The decision environments

have structure• The alternatives and goals

are often established in advance.

• The expert system can eventually replace the human decision maker.

Decision Support System• Extract or gain knowledge

from a computer system• Facilitates decision making• Unstructured environment• Alternatives may not be

fully realized yet• Use goals and the system

data to establish alternatives and outcomes, so a good decision can be made

Artificial Intelligence and Decision Support Systemin Bussiness

are attached in Appendix

Webs, Documents areData Where House Too

What challenges does the increase in unstructured data present for businesses?How does text‐mining improve decision‐making?What kinds of companies are most likely to benefit from text mining software?

In what ways could text mining potentially lead to the erosion of personal information privacy?

WHAT CAN BUSINESSES LEARN FROM TEXT MINING? Text mining

Extracts key elements from large unstructured data sets (e.g., stored e‐mails)

Web miningDiscovery and analysis of useful patterns and information from WWW

E.g., to understand customer behavior, evaluate effectiveness of Web site, etc.

Web content miningKnowledge extracted from content of Web pages

Web structure miningE.g., links to and from Web page

Web usage miningUser interaction data recorded by Web server

Databases and the WebMany companies use Web to make some internal databases available to customers or partnersTypical configuration includes:

Web serverApplication server/middleware/CGI scriptsDatabase server (hosting DBM)

Advantages of using Web for database access:Ease of use of browser softwareWeb interface requires few or no changes to databaseInexpensive to add Web interface to system

Firms use the Web to make information from theirinternal databases available to customers and partners

• Middleware and other software make this possible

• Database servers• CGI(Computer Gateway Interface)• Web interfaces provide familiarity to users andsavings over redesigning and rebuilding legacysystems

Users access an organization’s internal database through the Web using their desktop PCs and Web browser software.

LINKING INTERNAL DATABASES TO THE WEB

Establishing an information policyFirm’s rules, procedures, roles for sharing, managing, standardizing dataData administration:

Firm function responsible for specific policies and procedures to manage data

Data governance: Policies and processes for managing availability, usability, integrity, and security of enterprise data, especially as it relates to government regulations

Database administration:Defining, organizing, implementing, maintaining database; performed by database design and management group

ManagingDataResources

Basic : True Data

Good: Many(File, Record)

Better : Organized(Database, Data Where house)

Best : Analysis, Intelligence( Data mining, Intelligence)

Nature and Quality of Data

3‐169

MANAGING THE INFORMATION RESOURCE

• Information is an organizational resource• Just like people, capital, and equipment• It must be managed effectively based on True data and Systems

3‐170


• Who should oversee your organization’s information resource?– Chief information officer (CIO) – oversees an organization’s information resource

– Data administration – plans for, oversees the development of, and monitors the information resource

– Database administration – technical and operational aspects of managing information

3‐171


• Is information ownership a consideration?– If you create information, you “own” it– You will also share it with others– Because you “own” it, you are responsible for its quality

3‐172


• How “clean” must your information be?– Duplicate information (records) must be eliminated

– Inaccurate information must be corrected– Information forms the basis of business intelligence

– If your business intelligence is bad, you will make poor decisions

Ensuring data qualityMore than 25% of critical data in Fortune 1000 company databases are inaccurate or incompleteMost data quality problems stem from faulty input

Before new database in place, need to:Identify and correct faulty data Establish better routines for editing data once database in operation

Data quality audit:Structured survey of the accuracy and level of completeness of the data in an information system

Survey samples from data files, orSurvey end users for perceptions of quality

Data cleansingSoftware to detect and correct data that are incorrect, incomplete, improperly formatted, or redundantEnforces consistency among different sets of data from separate information systems

Assess the business impact of credit bureaus’ data quality problems for the credit bureaus, for lenders, for individuals.

Are any ethical issues raised by credit bureaus’ data quality problems?

Analyze the people, organization, and technology factors responsible for credit bureaus’ data quality problems.

What can be done to solve these problems?

CREDIT BUREAU ERRORS—BIG PEOPLE PROBLEMS

3‐176

Data Mining as a Career Opportunity

• Knowledge of data mining can be a substantial career opportunity for you– Query and Analysis and Enterprise Analytic Tools (Business Objects)

– Business Intelligence and Information Access tools (SAS)

– Many in Cognos (the data warehouse leader)– PowerAnalyzer (Informatica)

SAS: System Analysis Scientist

Describe how a relational database organizesdata and compare its benefits

Identify and describe the principles of a databasemanagement system.

Evaluate tools and technologies for providinginformation from databases to improve businessperformance and decision making.

Review ?

3‐178

CAN YOU…

Describe business intelligence and its roleCompare databases and data warehouses by

OLTP and OLAPDefine 5 software components of a DBMS

3‐179

CAN YOU…

List/describe key characteristics of a data warehouse

Define 4 major types of data‐mining toolsList key considerations in managing information

as a resource

DSS: Decision Support Systems and

AI: Artificial IntelligenceIn Business

Appendix for business Intelligence

AI in BusinessSome Commercial Applications• Decision Support• Expert Systems• Information Retrieval• Virtual Reality• Robotics

I’m ready to do some business

Overview of AI• Goal of AI

– develop computer systems that exhibit intelligence or simulate the ability to think

• AI pioneered by Computer Science• But, AI involves a combination of

– Computer Science, Biology, Psychology, Linguistics, Mathematics,Engineering

What really is Intelligence?

• Specifically, what are the signs of Intelligent Behavior?

• Think about it for a while

Which of the following is the best example of intelligent behavior?

101 2 3 4

25% 25%25%25%

1. Ability to add numbers2. Ability to see and recognize

objects3. Ability to adapt to

surroundings4. Ability to learn for mistakes

What really is Intelligence?• You are about to start an online chat (IM) with two entities:– One entity is a human– The other is a computer

• After hours of conversation, you can not tell which entity is a computer.

• Does this mean the computer is Intelligent?

Intelligent Behavior• What are some of the signs, attributes, or characteristics of Intelligent Behavior

Characteristics of Intelligent Behavior

1. Learn from experience & apply the knowledge

Computer can automatically improve performance based on Experience Machine Learning Computational Learning


2. Handle complex situations Computer Systems can often handle

complexity better than humans Consider a process control system that

must simultaneous track 100 different system variables.


3. Solve problems when important information is missing

Computer Systems can find patterns and deal with all sorts of missing information


4. React quickly & correctly to new situations; Acquire & Apply Knowledge

Here is where computers start to fail. Adapting to completely new situations is a

problem for computer systems. Its very difficult to design a computer system

that can combine, connect, and acquire knowledge to solve completely new problems


5. Determine what is important.6. Exhibit creativity and imagination7. Process visual information efficiently8. Use reason to solve problems

These are some other Characteristics that humans possess.

Computer systems have a lot of catching up to do.

Which of the following do computer need to catch up on?

101 2 3 4

25% 25%25%25%

1. Determine what is important.2. Exhibit creativity and

imagination3. Process visual information

efficiently4. Use reason to solve problems

AI in Business• AI continues to improve and evolve.• Scientists and Engineers are pushing the envelope of what is possible.

• In Business, there is a better understanding of the capabilities of Intelligent Computer Systems

• It is important to know which types of problems are suited for humans, and which are suited for Computers.

Human Intelligence vs. AI

Attribute HumanIntelligence

ArtificialIntelligence

Use a variety of information sources

High High

Ability to acquire large amounts of external info.

Medium High

Ability to do rapid, accurate, and complex calculations

Low High

Ability to transfer information rapidly

Low High

Human Intelligence vs. AI

Attribute HumanIntelligence

ArtificialIntelligence

Ability to use sensors or senses High Medium

Creativity or imagination High Low

Ability to learn from experience High Medium

Ability of be adaptive High Medium

AI: Application Domains

AI: Commercial Domains• Decision Support

– Integrating the advantages of AI with Human Intelligence.

– More intelligent Interfaces– More intelligent processing for massive data

AI: Commercial Domains• Information Retrieval

– Automatic simplification for massive data– Natural language technology: computer can speak our language.

AI: Commercial Domains• Virtual Reality

– Better training environment from pilots to doctors

• Robotics– Bringing the precision and speed of computers into the physical world

– Goes beyond manufacturing and assembly lines; Baggage Inspection, Bomb Removal, Replacement Limbs.

Expert Systems• The idea is to inject expert knowledge in to a computer system.

• The primary purpose is to automate decision making.

• The decision environments have structure• The alternatives and goals are often established in advance.

Expert Systems vs. DSSExpert System• Inject expert knowledge in

to a computer system.• Automate decision making.• The decision environments

have structure• The alternatives and goals

are often established in advance.

• The expert system can eventually replace the human decision maker.

Decision Support System• Extract or gain knowledge

from a computer system• Facilitates decision making• Unstructured environment• Alternatives may not be

fully realized yet• Use goals and the system

data to establish alternatives and outcomes, so a good decision can be made

What is the biggest difference between a Decision Support System and an MIS

101 2 3 4

25% 25%25%25%

1. DSS’s are interactive and ad hoc

2. DSS’s focus on transforming information into knowledge

3. MIS’s focus on transforming data into information

4. All of the above

What is the biggest difference between an MIS and TPS

101 2 3 4

25% 25%25%25%

1. in a TPS there is no analysis

2. an MIS focuses on reports

3. an TPS focuses on updating a database

4. All of the above

How is the analysis different for a MIS vs. DSS

101 2 3 4

25% 25%25%25%

1. MIS: Analysis involves computing aggregates

2. MIS: Analysis involves creating useful charts and graphs

3. DSS: Connects information with decisions

4. DSS: Builds scenarios

Some Interesting Applications of Expert Systems

• Triage – Medical Diagnosis (Medical Expert System)– User enters symptoms– System makes diagnosis– Doctors collective expertise is captured in the system

• Patriot Missile Guidance System– Radar identifies Scud missile– System steers Patriot missile to it intercepts Scud missile– Laws of physics, expert knowledge about missile trajectory is captured in the system

• Financial Decision Making – Currency Trading

Expert System Categories

• Decision Making– buy/sell– risk/no risk– rain/ no rain

• Trouble Shooting / Diagnosis– Hello welcome to Dell; how can I help you?

– Suddenly an idiot seems like an expert.

• Selection/Classification– Tell me what you see, expert system figures out what it really is...

• Process Monitoring and Control– Robot control, assembly‐line control, missile control

• Design/Configuration– Specify what you want, expert system figures out specifically how to do it.

Expert System Components

Knowledge base

user

Expert System Software

UserInterface

Engine


Knowledge base

user


UserInterface

Engine

Raw Data or Facts

Expert or Knowledge Engineer

Knowledge Acquisition Program

Expert System Development Process


Knowledge base

Non‐expert

Robot

Missile


Interface Engine

Raw Data or Facts

Expert or Knowledge Engineer

Knowledge Acquisition Program

Expert System Development Process

Expert System vs. DSS

Model Base

Analytical & Statistical Models

Someone with Knowledge

Decision Maker

DSS Software

UserInterface

Engine

Raw Data or Facts

Data Management Extraction, Generation, Validation, etc.

DSS Processes

Lecture 16 TIM 50 Autumn 2012 - Course Web Pages · Lecture 16 TIM 50 Autumn 2012 Tuesday November...

Documents

Transcript of Lecture 16 TIM 50 Autumn 2012 - Course Web Pages · Lecture 16 TIM 50 Autumn 2012 Tuesday November...