Statistical Reasoning CHAPTER 5. Chapter 5 5.1 – EXPLORING DATA.
chapter 5
description
Transcript of chapter 5
1 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
CHAPTER 5MANAGING ORGANIZATIONAL DATA AND INFORMATION
2 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Learning Objectives Discuss traditional data file organization and its problems Explain how a database approach overcomes the problems
associated with traditional file environment, and discuss the advantages of the database approach
Describe how the three most common data models organize data, and the advantages and disadvantages of each model
Describe how a multidimensional data model organizes data Distinguish between a data warehouse and a data mart Discuss the similarities and difference between data mining and
text mining
3 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Chapter OverviewBasics of Data Arrangement and Access• The Data Hierarchy• Storing and Accessing Records
The Traditional File Environment• Problems with the File Approach
Databases: The Modern Approach• Locating Data in Databases• Creating the Database
Database Management Systems• Logical versus Physical View• DBMS Components
Logical Data Models• Hierarchical Model• Network Model• Relational Model• Advantages and Disadvantages of the Three Models• Emerging Models• Other Models
Data Warehouse• Multidimensional Model• Data Marts• Data Mining• Text Mining
4 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
The Problem customers are classified as good , bad, or
ugly by the cost of doing business with them and the profits they return
Case: FedEx Pinpoints Profitable Customers
keep the good customers, improve the bad customers, and drop the ugly ones easy to identify customers who spend money with them but difficult to
identify customers who are profitable for them
5 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Case (continued…)
use a data warehouse, stocked with customer data, that allows the company to compare the complex mix of marketing and servicing costs that go into retaining each individual customer versus the revenues he, she, or it might bring in
The Solution
The Results “good” customers - expect a phone call if their shipping volumes falter, which can prevent defections before they occur “bad” customers – can be turned into profitable customers by charging higher shipping rates “ugly” customers – can be ignored
6 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
What have we learned from this case??
Case (continued…)
Customized strategies can be developed to cut costs, transform the marginal customer into a profitable customer, and permit more profitable pricing structures
Other types of data can give an organization important feedback about its products, services, markets, and coming trends
Organizations can now scrutinize their customers (or other data) very carefully with advanced data management andanalysis tools
7 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Basics of Data Arrangementand Access
The Data Hierarchy Field - a logical grouping of characters into a word, a small group of
words, or a complete number Record - a logical grouping of related fields File - a logical grouping of related records Database - a logical grouping of related files Entity - a person, place, thing, or event about which information is
maintained Attribute - each characteristic or quality describing a particular entity Primary Key - field that uniquely identifies the record Secondary Key - field that has some identifying information, but
typically does not identify the file with complete accuracy
8 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Basics of Data Arrangementand Access (continued …)
Storing and Accessing Records Indexed Sequential Access Method (ISAM)
» uses an index of key fields to locate individual records» index - lists the key field of each record and where that record is
physically located in storage» track index - shows the highest value of the key field that can be
found on a specific track
Direct File Access Method» uses the key field to locate the physical address of a record» transform algorithm - translates the key field directly into the
record’s storage location on disk
Traditional File Environment The organization has multiple
applications with related data files
9 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 4 Computer Software
Each application has a specific data file related to it,
containing all the data records needed by the application
Each application comes with an associated
application-specific data file
10 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Problems with the file approach data redundancy - the same piece of information could be
duplicated in several places data inconsistency - the various copies of the data no longer
agree data isolation - difficulty in accessing data from different
applications security - new applications may be added to the system on an
ad hoc basis data integrity - data values must often meet integrity constraints application/data independence - the applications and data in
computer systems should be independent
Traditional File Environment (continued …)
11 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Database : The Modern Approach Database Management System
provides access to all the data Example : University administration
Registrar Office Class Programs
Accounting Dept. Accounts Programs
Athletics Dept. Sports Programs
DatabaseManagement
System
Academic Info.Team DataEmployee DataTuition DataFinancial AidStudent DataCourse DataCourse Data Registration Data
12 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Locating Data in Databases Centralized database
» all the related files are in one physical location» used on large, mainframe computers» saves the expenses associated with multiple computers» provides database administrators with the ability to work on a database as a whole at one location» files are not accessible except via the centralized host computer» recovery from disasters can be more easily accomplished at a central location» vulnerable to a single pint of failure» speed problem
Database : The Modern Approach (continued …)
13 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Locating Data in Databases (cont’)
Distributed database» complete copies of a database, or portions of a database, are
in more than one location, which is usually close to the user» replicated database - complete copies of the entire
database are delivered to many locations, primarily to alleviate the single-point-of-failure problems of a centralized database as well as to increase user access responsiveness
» partitioned databases - these are subdivided, a portion of the entire database in each location
Database : The ModernApproach (continued …)
14 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Centralized vs. Distributed Databases
Central Location
New York
UserNew York
UserLos Angeles
Centralized Database
Central Location
New York
UserNew York
Distributed Database
UserChicago
UserLos Angeles
UserKansas City
Chicago
New York
Kansas City
Los Angeles
15 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Creating a Database Conceptual design - an abstract model of the database from the
user or business perspective Physical design - shows the way a database is actually arranged
with a storage devices Entity-relationship (ER) modeling
» process of planning the database design» ER diagram - document of the conceptual data model» Entity classes Instance Identifiers Relationships
Normalization » method for analyzing and reducing a relational database to its most
streamlined form for minimum redundancy, maximum data integrity, and best processing performance
Database : The ModernApproach (continued …)
16 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Database Management Systems A software program (or group of programs) that provides access to a
databases Permits an organization to store data in one location, from which it can be
updated and retrieved Provides access to the stored data by various application programs Provides mechanisms for maintaining the integrity of stored information,
managing security and user access, recovering information when the system fails, and accessing various database functions form within an application written in a third-generation, fourth-generation, or object-oriented language
17 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Logical versus Physical View Physical view - deals with the actual, physical
arrangement and location of data in the direct access storage devices (DASD)
Logical view - represents data in a format that is meaningful to a user and to the software programs that process that data
DBMS (continued …)
18 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
DBMS Components Data model
» defines the way data are conceptually structured
Data definition language (DDL)» defines what types of information are in the database and how
they will be structured
» functions of the DDL
> provide a means for associating related data
> indicate the unique identifiers (or keys) of the records
> set up security access and change restrictions
DBMS (continued …)
19 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
DBMS Components (cont’)
Data manipulation language (DML)» used with third-generation, fourth-generation, or object-
oriented languages to query the contents of the database, store or update information in the database, and develop database applications
» Structured query language (SQL) - most popular relational database language, combining both DML and DDL features
Data Dictionary» stores definitions of data elements and data characteristics
DBMS (continued …)
20 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Logical Data Models A manager’s ability to use a database is highly dependent
on how the database is structured logically and physically.
In a logically structuring database, businesses need to consider the characteristics of the data and how the data will be accessed.
Three common data models : hierarchical, network, and relational
Using these models, database designer can build logical or conceptual view of data that can then be physically implemented into virtually any database with any DBMS.
21 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Hierarchical Database Model structures data into an inverted “tree” in which each record
contains two elements rigidly
Logical Data Models (continued …)
1st : a single root or master field, often called a key, which identifies the type location or ordering of the records
2nd : a variable number of subordinate fields, which defines the rest of the data within a record
all fields have only one “parent”, each parent may have many “children” advantage : speed and efficiency problem : access to data is predefined before the programs; and each relationship
must be explicitly defined when the database is created
22 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Hierarchical Data ModelSales
RegionEast Coast Midwest West Coast
ProductCategory
China Stemware Flatware
China Stemware Flatware
China Stemware Flatware
Plates Bowls Plates Bowls Plates Bowls Product
23 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Network Database Model creates relationship among data through a linked-list
structure in which subordinate records (members) can be linked to more than one data element (owner)
pointer - explicit link, storage addresses that contain the location of a related record
many-to-many relationships are possible complexity : for every set of linked data elements, a
pair of pointers must be maintained
Logical Data Models (continued …)
24 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Relational Database Model based on a simple concept of tables in order to capitalize on
characteristics of rows and columns of data relations - tables tuple - row attribute - column select operation - creates a subset consisting of all records in the
file that meet stated criteria join operation - combines relational tables to provide the user
with more information than is available in individual tables project operation - creates a subset consisting of columns in a
table, permitting the user to create new tables that contain only the information required
Logical Data Models (continued …)
25 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Relational Database Model
Smith, A. Dir. Accounting 43 China
Jones, W. Dir. Total QualityManagement
32 Stemware
Lee, J. Dir. InformationTechnology
46 China
Durham, K. Manager, Production 35 Stemware
Stone, L. Administrative Asst. 28 Flatware
26 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Company Data ModelsMODEL ADVANTAGES DISADVANTAGES
Hierarchicaldatabase
Speed and efficiency in search
Access to data is predefined by exclusively hierarchical relationships, predetermined by administrator. Limited search/ query flexibility. Not all data is naturally hierarchical.
Network database
Many more relationships between data elements can be defined. Greater speed and efficiency than relational database models.
The most complicated model to design, implement, and maintain. Greater query flexibility than hierarchical model, but less than relational model.
Relationaldatabase
Conceptual simplicity; no predefined relationships among data. High flexibility in ad hoc querying. New data and records can be added easily
Lower processing efficiency and speed. Data redundancy is common, requiring additional maintenance.
27 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Emerging Data Models Object-oriented database model - an object - a small amount of
data put together with all the data needed in order to perform an operation with that data
» Object - similar to an entity in that it represents a person, place, or thing, but it also contains all of the data that the object needs in order to perform an operation
» Attributes - characteristics that describe the state of that object» Method - an operation, action, or a behavior the object may undergo» Messages - from other objects activate operations contained within the
object» Class - all the messages to which the object will respond, as well as the
way in which objects of this class are implemented
Logical Data Models (continued …)
28 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Emerging Data Models (cont’) Object-relational database model - adds new object storage
capabilities to relational database management systems Hypermedia database model - stores chunks of information in a form
of nodes connected by links established by the user
Other Database Models Geographical information database - contains locational data for
overlaying on maps or images Knowledge database- stores decision rules used to evaluate situations
and help users make decisions like an experts Multimedia database - stores data on many media : sounds, video,
images, graphics animation, and text.
Logical Data Models (continued …)
29 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses
A data warehouse is a relational and or multidimensional database management system designed to support management decision making.
The data in the “warehouse” is stored in a single, agreed-upon format even when underlying operational databases store the data differently.
30 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data WarehousesFramework and View
OperationalSystem/Data
Legacy
OLTT
External
Select
Extract
Transform
Maintain
Integrate
Preparation
MetadataReposition
EnterpriseData
Warehouse
Target Database(s)(RDB, MDDB)
DataMart
DataMart
DataMart
Marketing
RiskManagement
Engineering
APIS
MLDDLEWARE
Access
DataMining
ApplicationsEIS/DSS
Custom-BuiltApplication(4GL tools)
Production Reporting
Tools
Relational Query Tools
OLAP/ROLAP
Web Browsers
31 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses (continued ...) Data Warehouse Offers Many Business Advantages
It provides business users with a “customer-centric” view of the company’s heterogeneous data by helping to integrate data from sales, service, manufacturing and distribution, and other customer-related business systems.
It provides added value to the company’s customers by allowing them to access better information when data warehouse is coupled with Internet technology.
It consolidates data about individual customers and provides a repository of all customer contacts for segmentation modeling, customer retention planning, and cross-sales analysis.
32 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses (continued ...)
Data Warehouse Advantages (cont’)
It removes barriers among functional areas by offering a way to reconcile views from multiple sources, thus providing a look at activities that cross functional lines.
It reports on trends across multidivisional and/or multinational operating units, including trends or relationships in areas such as merchandising, production planning, and so forth.
33 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses (continued ...)
Multidimensional Database Model can be the core of data warehouses data are stored in arrays consists of at least three dimensions dimensions are the edges of the cube, and represent the
primary “views” of the business data the data are intimately related and can be viewed and analyzed
from different perspectives, which are called dimensions allows for the effective, efficient, and convenient storage and
retrieval of large volumes of data
34 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses (continued ...)
Data Marts a scaled-down version of a data warehouse that focuses on a
particular subject area usually designed to support the unique business requirements
of a specific department or business process. Example : Marketing data mart
takes less time to build, costs less, and less complex the indiscriminate introduction of multiple data marts with no
linkage to each other, or to an enterprise data warehouse, will cause problems
35 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
Data Warehouses (continued ...)
Data Mining provides a means of extracting previously unknown,
predictive information from the base of accessible data in data warehouses
discovers hidden patterns, correlations, and relationships among organizational data
predicts future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions
functions of data mining» classification » clustering » association» sequencing » forecasting
36 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
For Accounting Data gathered about each transaction (business
event) in the organization is stored in its databases
For Finance Computerized databases external to the
organization, such as CompuStat or Dow Jones, provides financial data on organizations in its industry
What’s in IT for Me?
37 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
For Marketing Databases including customer name, address,
purchase, amount, etc, help to plan targeted marketing campaigns and to evaluate the success of previous campaigns.
Data mining is critical for many marketing efforts to remain competitive.
For Production/Operations Management Organizational databases are accessed for
determining optimum inventory levels for parts in a production process
Information in databases are used to know when to perform required service on machines
What’s in IT for Me? (continued …)
38 Introduction to Information TechnologyTurban, Rainer and Potter
Chapter 5 Managing Organizational Data and Information
For Human Resources Management Organizational databases contain extensive data
on employees, such as name, address, gender, race, age, salary, hiring date, current job descriptions, past job descriptions, and past performance evaluations
For MIS Vacancies for MIS include data entry and data
storage management to database management and data analyst
What’s in IT for Me? (continued …)