Business Intelligence Introduction & Overview Business Intelligence Introduction & Overview.
Business Intelligence Session
description
Transcript of Business Intelligence Session
![Page 1: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/1.jpg)
Business Intelligence
![Page 2: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/2.jpg)
What is Business Intelligence (BI)
• Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making.
• Systems that provide directed background data and reporting tools to support and improve the decision-making process.
• A popularized, umbrella term used to describe a set of concepts and methods to improve business decision making by using fact-based support systems. The term is sometimes used interchangeably with briefing books and executive information systems.
• Business Intelligence is a broad category of applications and technologies for gathering, storing, analyzing, and providing access to data to help clients make better business decisions.
• A system that collects, integrates, analyses and presents business information to support better business decision making.
• Business Intelligence is an environment in which business users receive information that is reliable, secure, consistent, understandable, easily manipulated and timely...facilitating more informed decision making
![Page 3: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/3.jpg)
What is Business Intelligence (BI)
Improving organizations by providing business insights to all employees leading to better, faster, more relevant decisions
![Page 4: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/4.jpg)
What is Business Intelligence (BI)
Business Intelligence enables the business to make intelligent, fact-based decision
Aggregate Data
Database, Data Mart, Data Warehouse, ETL Tools,
Integration Tools
Present Data
EnrichData
Inform a Decision
Reporting Tools, Dashboards, Static
Reports, Mobile Reporting, OLAP Cubes
Add Context to Create Information, Descriptive Statistics, Benchmarks, Variance to Plan or LY
Decisions are Fact-based and Data-driven
![Page 5: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/5.jpg)
Content
The business determines the “what”, BI enables the “how”
Performance
Minimize report creation and collection times (near zero)
Usability
Delivery Method Push vs Pull
Medium Excel, PDF, Dashboard, Cube, Mobile Device
Enhance Digestion Fewer clicks
Tell a Story Trend, Context, Related Metrics, Multiple Views
CPU – Content, Performance, Usability
![Page 6: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/6.jpg)
Why is Business Intelligence So Important?
Time
With Business Intelligence, we can get data to you in a timely manner.
Making Business Decisions is a Balance
Data Opinion
(aka Best Professional Judgment)
In the absence of data, business decisions are often made by the HiPPO.
![Page 7: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/7.jpg)
Core Capabilities of BI
![Page 8: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/8.jpg)
Why do companies need BI?
Tactical / Strategic BI
What’s the best that can happen?
What will happen next?
What if these trends continue?
Why is this happening?
What actions are needed?
Where exactly is the problem?
How many, how often, where?
What happened?
Sophistication of Intelligence
Operational BI
Optimization
Predictive Modeling
Forecasting/extrapolation
Statistical analysis
Alerts
Query/drill down
Ad hoc reports
Standard reportsCo
mp
etit
ive
Ad
van
tag
e
![Page 9: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/9.jpg)
Benefits of Business Intelligence
• Improve Management Processes– planning, controlling, measuring and/or changing resulting in increased revenues and reduced costs
• Improve Operational Processes– fraud detection, order processing, purchasing.. resulting in increased revenues and reduced costs
• Predict the Future
![Page 10: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/10.jpg)
BI Golden Rules
• Data Quality & Accuracy
• Data Consistency
• Data Timeliness
“Get the right information to the right people at the right time”
![Page 11: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/11.jpg)
Gartner BI Maturity Model
![Page 12: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/12.jpg)
Business Intelligence
Reporting
Business Intelligence
Data Mining
Knowledge Management
Expert Systems
![Page 13: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/13.jpg)
Reporting ToolsSoftware that reads data
Processes data by:FilteringSortingGroupingSimple calculations
Produces formatted summaries of data
![Page 14: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/14.jpg)
OLAPOn-Line Analytical Processing
Consolidation
Drill-Down
Slicing and Dicing
![Page 15: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/15.jpg)
Drill Down
![Page 16: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/16.jpg)
Slice and Dice
![Page 17: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/17.jpg)
OLAP ExampleReport A
Report B
![Page 18: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/18.jpg)
Data WarehouseLarge Database
Subject-Oriented
Integrated
Time-Variant
Nonvolatile
User-Friendly Interface
![Page 19: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/19.jpg)
Data Warehouse System
Oper-ational DB
OtherDB
ExternalDB
DataWare-house
Reporting
Data Mining
KM
Expert
![Page 20: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/20.jpg)
Data MartsMini data warehouses
Hold subsets of data from the data warehouse
Data focuses on a specific aspect of the company
![Page 21: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/21.jpg)
Data Mining ToolsSoftware that searches through data
Uses complex statistical calculations
OutputsTrendsPatternsCorrelationsExceptions
![Page 22: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/22.jpg)
Knowledge Management• Process
• Creating Value from Intellectual Property
• Sharing Knowledge with Others
![Page 23: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/23.jpg)
Expert SystemsKnowledge captured as rules.
![Page 24: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/24.jpg)
BI ChallengesPoor data quality
Context
User resistance
![Page 25: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/25.jpg)
Database Management System (DMBS)
![Page 26: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/26.jpg)
Database Management System (DMBS)
DBMS contains information about a particular enterprise• Collection of interrelated data• Set of programs to access the data • An environment that is both convenient and efficient to use
Database Applications:• Banking: all transactions• Airlines: reservations, schedules• Universities: registration, grades• Sales: customers, products, purchases• Online retailers: order tracking, customized recommendations• Manufacturing: production, inventory, orders, supply chain• Human resources: employee records, salaries, tax deductions
Databases touch all aspects of our lives
![Page 27: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/27.jpg)
Purpose of Database Systems
In the early days, database applications were built directly on top of file systemsDrawbacks of using file systems to store data:
Data redundancy and inconsistencyMultiple file formats, duplication of information in different files
Difficulty in accessing data Need to write a new program to carry out each new task
Data isolation — multiple files and formatsIntegrity problems
Integrity constraints (e.g. account balance > 0) become “buried” in program code rather than being stated explicitlyHard to add new constraints or change existing ones
![Page 28: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/28.jpg)
Purpose of Database Systems (contd.)
Drawbacks of using file systems (cont.) Atomicity of updates
Failures may leave database in an inconsistent state with partial updates carried outExample: Transfer of funds from one account to another should either complete or not happen at all
Concurrent access by multiple usersConcurrent accessed needed for performanceUncontrolled concurrent accesses can lead to inconsistencies
Example: Two people reading a balance and updating it at the same time
Security problemsHard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
![Page 29: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/29.jpg)
View of DataView of DataAn architecture for a database system
View 1
Physicallevel
Logical level
View 2 View n…
View level
![Page 30: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/30.jpg)
Levels of Abstraction
Physical level: describes how a record (e.g. customer) is stored.Logical level: describes data stored in database, and the relationships among the data.
type customer = record name: string; street: string; city: integer;end;
View level: application programs hide details of data types. Views can also hide information (e.g. salary) for security purposes.
![Page 31: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/31.jpg)
Instances and SchemasSimilar to types and variables in programming languages
Schema – the logical structure of the database Example: The database consists of information about a set of customers and accounts and the relationship between them)Analogous to type information of a variable in a programPhysical schema: database design at the physical levelLogical schema: database design at the logical level
Instance – the actual content of the database at a particular point in time Analogous to the value of a variable
Physical Data Independence – the ability to modify the physical schema without changing the logical schema
Applications depend on the logical schemaIn general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.
![Page 32: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/32.jpg)
Data IndependenceAbility to modify a schema definition in one level without affecting a schema definition in the other levels.The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.Two levels of data independence
Physical data independenceLogical data independence
![Page 33: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/33.jpg)
Data ModelsA collection of tools for describing:
DataData relationshipsData semanticsData constraints
Object-based logical modelsEntity-relationship modelObject-oriented modelSemantic modelFunctional model
Record-based logical modelsRelational model (e.g., SQL/DS, DB2)Network modelHierarchical model (e.g., IMS)
![Page 34: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/34.jpg)
The Entity-Relationship Model
Models an enterprise as a collection of entities and relationships
Entity: a “thing” or “object” in the enterprise that is distinguishable from other objects
Described by a set of attributesRelationship: an association among several entities
Represented diagrammatically by an entity-relationship diagram:
![Page 35: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/35.jpg)
Relational Model
Example of tabular data in the relational model:
name ssn street city account-numberJohnson 192-83-7465 Alma Palo Alto A-101Smith 019-28-3746 North Rye A-215Johnson 192-83-7465 Alma Palo Alto A-201Jones 321-12-3123 Main Harrison A-217Smith 019-28-3746 North Rye A-201
account-number balanceA-101 500A-201 900A-215 700A-217 750
![Page 36: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/36.jpg)
Data Definition Language (DDL)Specification notation for defining the database schemaDDL compiler generates a set of tables stored in a data dictionaryData dictionary contains metadata (data about data)Data storage and definition language – special type of DDL in which the storage structure and access methods used by the database system are specified
![Page 37: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/37.jpg)
Data Manipulation Language (DML)
Language for accessing and manipulating the data organized by the appropriate data modelTwo classes of languages
Procedural – user specifies what data is required and how to get those dataNonprocedural – user specifies what data is required without specifying how to get those data
![Page 38: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/38.jpg)
Transaction ManagementA transaction is a collection of operations that performs a single logical function in a database application.Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g. power failures and operating system crashes) and transaction failures.Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
![Page 39: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/39.jpg)
Storage ManagementStorage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.The storage manager is responsible to the following tasks:
Interaction with the file manager Efficient storing, retrieving and updating of data
Issues:Storage accessFile organizationIndexing and hashing
![Page 40: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/40.jpg)
Database AdministratorCoordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs:
Database administrator’s duties include:Schema definitionStorage structure and access method definitionSchema and physical organization modificationGranting user authority to access the databaseSpecifying integrity constraintsActing as liaison with usersMonitoring performance and responding to changes in requirements
![Page 41: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/41.jpg)
Database Users
Users are differentiated by the way they expect to interact with the system.Application programmers: interact with system through DML calls.Specialized users: write specialized database applications that do not fit into the traditional data processing framework Sophisticated users: form requests in a database query language.Naive users: invoke one of the permanent application programs that have been written previously
![Page 42: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/42.jpg)
Overall System Structure
naïve users application sophisticated database (tellers, agents, etc) programmers users administrator
application interface
Application program
query database scheme
users
application program object code query evaluation
engine
Embedded DML precompiler
DML compiler
DDL interpreter
transaction manager
buffer manager
query processor
storage manager
database- management system
File manager
indices Statistical data
Data files Data dictionary
disk storage
![Page 43: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/43.jpg)
SQL (Structured Query Language)
SQL: widely used non-procedural languageExample: Find the name of the customer with customer-id 192-83-7465
select customer.customer_namefrom customerwhere customer.customer_id = ‘192-83-7465’
Example: Find the balances of all accounts held by the customer with customer-id 192-83-7465
select account.balancefrom depositor, accountwhere depositor.customer_id = ‘192-83-7465’ and
depositor.account_number = account.account_number
Application programs generally access databases through one ofLanguage extensions to allow embedded SQLApplication program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent to a database
![Page 44: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/44.jpg)
Database Design
The process of designing the general structure of the database:
Logical Design – Deciding on the database schema. Database design requires that we find a “good” collection of relation schemas.
Business decision – What attributes should we record in the database?Computer Science decision – What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
Physical Design – Deciding on the physical layout of the database
![Page 45: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/45.jpg)
Data Warehousing
![Page 46: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/46.jpg)
In the Beginning, life was simple…
![Page 47: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/47.jpg)
But…
![Page 48: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/48.jpg)
Our information needs…
![Page 49: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/49.jpg)
Kept growing. (The Spider web)
SOURCE: William H. Inmon
![Page 50: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/50.jpg)
Purpose
To explore and discuss the purpose and principles of
data warehousing.
![Page 51: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/51.jpg)
So What Is a Data Warehouse?
A data warehouse is the data repository of an enterprise. It is generally used for research and decision support.
By comparison: an OLTP (on-line transaction processor) or operational system is used to deal with the everyday running of one aspect of an enterprise.
OLTP systems are usually designed independently of each other and it is difficult for them to share information.
![Page 52: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/52.jpg)
Why Do We Need Data Warehouses?
Consolidation of information resourcesImproved query performanceSeparate research and decision support functions from the operational systemsFoundation for data mining, data visualization, advanced reporting and OLAP tools
![Page 53: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/53.jpg)
What Is a Data Warehouse Used for?
Knowledge discoveryMaking consolidated reportsFinding relationships and correlationsData miningExamples
Banks identifying credit risksInsurance companies searching for fraudMedical research
![Page 54: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/54.jpg)
How Do Data Warehouses Differ From Operational Systems?
Data warehouse Operational systemSubject oriented Transaction oriented
Large (hundreds of GB up to several TB)
Small (MB up to several GB)
Historic data Current data
De-normalized table structure (few tables, many columns per table)
Normalized table structure (many tables, few columns per table)
Batch updates Continuous updates
Usually very complex queries Simple to complex queries
![Page 55: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/55.jpg)
Design Differences
Star Schema
Data WarehouseOperational System
ER Diagram
![Page 56: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/56.jpg)
Supporting a Complete Solution
Operational System-Data Entry
Data Warehouse-Data Retrieval
![Page 57: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/57.jpg)
What Is a Data Warehouse?In many organizations, we want a central “store” of all of our entities, concepts, metadata, and historical information
For doing data validation, complex mining, analysis, prediction, …This is the data warehouse
To this point we’ve focused on scenarios where the data “lives” in the sources – here we may have a “master” version (and archival version) in a central database
For performance reasons, availability reasons, archival reasons, …
![Page 58: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/58.jpg)
Master Data ManagementOne of the “modern” uses of the data warehouse is not only to support analytics but to serve as a reference to all of the entities in the organization
A cleaned, validated repository of what we know… which can be linked to by data sources… which may help with data cleaning
… and which may be the basis of data governance (processes by which data is created and modified in a systematic way, e.g., to comply with gov’t regulations)
There is an emerging field called master data management out the process of creating these
![Page 59: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/59.jpg)
Data Warehouses, Data Marts, and Operational Data Stores
Data Warehouse – The queryable source of data in the enterprise. It is comprised of the union of all of its constituent data marts.Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group.Operational Data Store (ODS) – A point of integration for operational systems that developed independent of each other. Since an ODS supports day to day operations, it needs to be continually updated.
![Page 60: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/60.jpg)
Building a Data Warehouse
AnalysisDesign
Import dataInstall front-end tools
Test and deploy
Data Warehouse Lifecycle
![Page 61: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/61.jpg)
Stage 1: Analysis
Identify:Target Questions
Data needsTimeliness of data
Granularity
Create an enterprise-level data dictionaryDimensional analysisIdentify facts and dimensions
Analysis– Design– Import data– Install front-end tools– Test and deploy
![Page 62: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/62.jpg)
Stage 2: Design
Star schemaData TransformationAggregates Pre-calculated ValuesHW/SW Architecture
– Analysis
Design– Import data– Install front-end tools– Test and deploy
Dimensional Modeling
![Page 63: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/63.jpg)
Dimensional Modeling
Fact Table – The primary table in a dimensional model that is meant to contain measurements of the business.Dimension Table – One of a set of companion tables to a fact table. Most dimension tables contain many textual attributes that are the basis for constraining and grouping within data warehouse queries.
![Page 64: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/64.jpg)
Stage 3: Import Data
Identify data sourcesExtract the needed data from existing systems to a data staging areaTransform and Clean the data
Resolve data type conflictsResolve naming and key conflictsRemove, correct, or flag bad dataConform Dimensions
Load the data into the warehouse
– Analysis– Design
Import data– Install front-end tools– Test and deploy
![Page 65: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/65.jpg)
Importing Data Into the Warehouse
OLTP 1
OLTP 2
OLTP 3
Data Staging Area DataWarehouse
Operational Systems(source systems)
![Page 66: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/66.jpg)
Stage 4: Install Front-end Tools
Reporting toolsData mining toolsGISEtc.
– Analysis– Design– Import data
Install front-end tools– Test and deploy
![Page 67: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/67.jpg)
Stage 5: Test and Deploy
Usability testsSoftware installationUser trainingPerformance tweaking based on usage
– Analysis– Design– Import data– Install front-end tools
Test and deploy
![Page 68: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/68.jpg)
On-Line Analytical Processing (OLAP)
Making Decision Support Possible
![Page 69: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/69.jpg)
69
Strengths of OLAP
• It is a powerful visualization paradigm
• It provides fast, interactive response times
• It is good for analyzing time series
• It can be useful to find some clusters and outliers
• Many vendors offer OLAP tools
![Page 70: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/70.jpg)
70
Month1 2 3 4 76 5
Pro
du
ct
Toothpaste
JuiceColaMilk
Cream
Soap
Regio
n
WS
N
Dimensions: Product, Region, TimeHierarchical summarization paths
Product Region TimeIndustry Country Year
Category Region Quarter
Product City Month Week
Office Day
Multi-dimensional Data
• “Hey…I sold $100M worth of goods”
![Page 71: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/71.jpg)
71
Data Cube Lattice• Cube lattice
• ABC AB AC BC A B C none
• Can materialize some groupbys, compute others on demand• Question: which groupbys to materialze?• Question: what indices to create• Question: how to organize data (chunks, etc)
![Page 72: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/72.jpg)
72
Visualizing Neighbors is simpler
1 2 3 4 5 6 7 8AprMayJunJulAugSepOctNovDecJanFebMar
Month Store SalesApr 1Apr 2Apr 3Apr 4Apr 5Apr 6Apr 7Apr 8May 1May 2May 3May 4May 5May 6May 7May 8Jun 1Jun 2
![Page 73: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/73.jpg)
73
A Visual Operation: Pivot (Rotate)
10
47
30
12
Juice
Cola
Milk
Cream
NYLA
SF
3/1 3/2 3/3 3/4
Date
Month
Reg
ion
Product
![Page 74: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/74.jpg)
74
“Slicing and Dicing”
Product
Sales Channel
Regio
ns
Retail Direct Special
Household
Telecomm
Video
Audio IndiaFar East
Europe
The Telecomm Slice
![Page 75: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/75.jpg)
75
Roll-up and Drill Down
• Sales Channel• Region• Country• State • Location Address• Sales Representative
Roll
Up
Higher Level ofAggregation
Low-levelDetails
Drill-D
ow
n
![Page 76: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/76.jpg)
76
Nature of OLAP Analysis• Aggregation -- (total sales,
percent-to-total)• Comparison -- Budget vs. Expenses• Ranking -- Top 10, quartile analysis• Access to detailed and aggregate
data• Complex criteria specification• Visualization
![Page 77: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/77.jpg)
77
Organizationally Structured Data
• Different Departments look at the same detailed data in different ways. Without the detailed, organizationally structured data as a foundation, there is no reconcilability of data
marketing
manufacturing
sales
finance
![Page 78: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/78.jpg)
78
Multidimensional Spreadsheets
• Analysts need spreadsheets that support
• pivot tables (cross-tabs)• drill-down and roll-up• slice and dice• sort• selections• derived attributes
• Popular in retail domain
![Page 79: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/79.jpg)
79
OLAP - Data Cube
• Idea: analysts need to group data in many different ways• eg. Sales(region, product, prodtype, prodstyle, date, saleamount)• saleamount is a measure attribute, rest are dimension attributes• groupby every subset of the other attributes
• materialize (precompute and store) groupbys to give online response• Also: hierarchies on attributes: date -> weekday,
date -> month -> quarter -> year
![Page 80: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/80.jpg)
80
Relational OLAP: 3 Tier DSS
Data Warehouse ROLAP Engine Decision Support Client
Database Layer Application Logic Layer Presentation Layer
Store atomic data in industry standard RDBMS.
Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality.
Obtain multi-dimensional reports from the DSS Client.
![Page 81: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/81.jpg)
81
MD-OLAP: 2 Tier DSSMDDB Engine MDDB Engine Decision Support Client
Database Layer Application Logic Layer Presentation Layer
Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data.
Obtain multi-dimensional reports from the DSS Client.
![Page 82: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/82.jpg)
OLAPNeed for More Intensive Decision Support4 Main Characteristics
Multidimensional data analysisAdvanced Database SupportEasy-to-use end-user interfacesSupport Client/Server architecture
![Page 83: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/83.jpg)
Multidimensional Data Analysis Techniques
Advanced Data Presentation Functions3-D graphics, Pivot Tables, Crosstabs, etc.Compatible with Spreadsheets & Statistical packagesAdvanced data aggregations, consolidation and classification across time dimensionsAdvanced computational functionsAdvanced data modeling functions
![Page 84: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/84.jpg)
Advanced Database Support
Advanced Data Access FeaturesAccess to many kinds of DBMS’s, flat files, and internal and external data sourcesAccess to aggregated data warehouse dataAdvanced data navigation (drill-downs and roll-ups)Ability to map end-user requests to the appropriate data sourceSupport for Very Large Databases
![Page 85: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/85.jpg)
Easy-to-Use End-User Interface
Graphical User InterfacesMuch more useful if access is kept simple
![Page 86: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/86.jpg)
Client/Server ArchitectureFramework for the new systems to be designed, developed and implementedDivide the OLAP system into several components that define its architecture
Same ComputerDistributed among several computer
![Page 87: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/87.jpg)
OLAP Architecture3 Main Modules
GUI (Graphical User Interface)Analytical Processing LogicData-processing Logic
![Page 88: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/88.jpg)
OLAP Client/Server Architecture
![Page 89: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/89.jpg)
Relational OLAPRelational Online Analytical Processing
OLAP functionality using relational database and familiar query tools to store and analyze multidimensional data
Multidimensional data schema supportData access language & query performance for multidimensional dataSupport for Very Large Databases
![Page 90: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/90.jpg)
Multidimensional Data Schema Support
Decision Support Data tends to beNonnormalizedDuplicatedPreaggregated
Star SchemaSpecial Design technique for multidimensional data representationsOptimize data query operations instead of data update operations
![Page 91: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/91.jpg)
Star SchemasData Modeling Technique to map multidimensional decision support data into a relational databaseCurrent Relational modeling techniques do not serve the needs of advanced data requirements
![Page 92: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/92.jpg)
Data Warehouse Architecture
At the top – a centralized database
Generally configured for queries and appends – not transactionsMany indices, materialized views, etc.
Data is loaded and periodically updated via Extract/Transform/Load (ETL) tools
Data Warehouse
ETL ETL ETL ETL
RDBMS1 RDBMS2
HTML1 XML1
ETL pipelineoutputs
ETL
![Page 93: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/93.jpg)
ETL ToolsETL tools are the equivalent of schema mappings in virtual integration, but are more powerfulArbitrary pieces of code to take data from a source, convert it into data for the warehouse:
import filters – read and convert from data sourcesdata transformations – join, aggregate, filter, convert datade-duplication – finds multiple records referring to the same entity, merges themprofiling – builds tables, histograms, etc. to summarize dataquality management – test against master values, known business rules, constraints, etc.
![Page 94: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/94.jpg)
Example ETL Tool Chain
• This is an example for e-commerce loading
• Note multiple stages of filtering (using selection or join-like operations), logging bad records, before we group and load
Invoice line items
SplitDate-time
Filterinvalid
JoinFilterinvalid
Invalid dates/ times
Invaliditems
Itemrecords
Filternon -
match
Invalidcustomers
Group by customer
Customerbalance
Customerrecords
![Page 95: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/95.jpg)
An Introduction to Text Mining
![Page 96: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/96.jpg)
Predictive analysis helps connect data to effective action by drawing reliable conclusions about current conditions and future events.
— Gareth Herschel, Research Director, Gartner Group
Predictive Analytics: Defined
Predictive analysis:•Leverages an organization’s business knowledge by applying sophisticated analytic techniques to enterprise data•It turns that data into insights that lead to the development of programs to increase revenues, reduce costs, improve processes, and prevent criminal or fraudulent activities•It encourage actions that demonstrably change how people behave as your customers, employees, patients, students, and citizensBottom line: it turns data into effective actions that positively impact your bottom line
![Page 97: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/97.jpg)
Unstructured Data Management
Text Mining is a subset of Unstructured Data Management.
UDM can be broken down into:Content and Document ManagementSearch and Retrieval XML database and toolsCategorization, Classification, and Visualization
![Page 98: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/98.jpg)
Data Warehouse vs. Document Warehouse
Data warehouseWho, what, when, where, how muchInternally focusedOperational information Rarely include external information
Document warehouseWhyMay not be internally focusedMay contain a range of informationOften integrate external information
![Page 99: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/99.jpg)
Document Warehouse Features
There is no single document structure or document typeDocuments are drawn from multiple sourcesEssential features of documents are automatically extracted and explicitly stored in the document warehouseDocument warehouses are designed to integrate semantically related documents
![Page 100: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/100.jpg)
100
Building the Document Warehouse
IdentifySources
RetrieveDocument
TextAnalysis
Pre-processDocument
CompileMetadata
![Page 101: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/101.jpg)
101
Predict, Impact, Deploy
Customer
Data
Attitudes
Actions
Attributes
Business User
Grow
Retain
Fraud
Outcomes
Attract
Data Collection
Text
Surveys
WebChannel
OperationalSystems
Text
Bu
sin
ess
UI
Expert UIExpert UI
Concepts
Concept Maps
Clustering
Categoriza-tion
Trending
Information Extraction
Prediction
NLP
![Page 102: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/102.jpg)
The Building Blocks of Language
MorphologySyntaxSemanticsPhonologyPragmatics
![Page 103: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/103.jpg)
MorphologyUnderstanding words
StemsAffixes
PrefixSuffix
Inflectional elements
Reducing complexity of
analysis
Reduces complexity of
representation
Supports text mining
Noun
PrefixNoun Stem
Suffix
- abledisputein -
![Page 104: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/104.jpg)
SyntaxThe Bank of Canada will curb inflation with higher interest rates
Prepositional phrase
Adjective
Sentence
Noun phrase Verb phrase
NounVerbAux
Noun phrase
NounAdjective
Noun
The Bank ofCanada
inflationcurbwill
Interest rateshigher
with
![Page 105: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/105.jpg)
SemanticsThe meaning of it allApproaches to meaning
Semantic networksDeductive logicRule-based systems
Useful for classification
![Page 106: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/106.jpg)
Problems with NLPLimitations of Natural Language Processing
Correctly identifying the role of noun phrasesRepresenting abstract conceptsClassifying synonymsRepresenting the number of concepts
![Page 107: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/107.jpg)
Problems with NLPLimitations of technology
Language specific designs are requiredClassification speedClassifying hybrid words and sentences
![Page 108: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/108.jpg)
Underlying Technology is Based on Linguistics
The Linguistic Approach:Does not treat a document as a bag of words
Removes ambiguity by extracting structured concepts
Concepts are the DNA of text.
Text is unstructured, ambiguous, and language dependent.
![Page 109: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/109.jpg)
From Text to Concepts
Morphology
Syntax
Semantics StatisticsLinguistic
Terminology
Extractor
ScalableAccurate
Customizable Discovery-Oriented
•Compound words•Proper nouns•Figures•Named entities•Domain specifics
•Speed•Multiple formats•Multiple languages
•SPSS dictionaries•User dictionaries•Extraction rules•Extraction patterns
•Known terms•Unknown terms•New terms
•Trends
![Page 110: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/110.jpg)
“The process of discovering meaningful new relationships, patterns and trends by sifting through data using pattern recognition technologies as well as statistical and mathematical techniques.”- The Gartner group.
Data Mining
![Page 111: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/111.jpg)
Why data mining?Data Mining software generally employs modeling algorithms
designed to handle non-linearities and unusual patterns in dataAs opposed to classical linear models (e.g., linear regression) that aren’t as
capable
A related issue is ‘noise’ in the data: where, for example, 2 seemingly similar sets of inputs yield a different output
![Page 112: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/112.jpg)
Use the cross industry standard process for data mining (CRISP-DM)
Based on real-world lessons:Focus on business issues
User-centric & interactiveFull processResults are used
A Data Mining Methodology
![Page 113: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/113.jpg)
Data Mining is not…Keep in mind that data mining is not…
“Blind” application of analysis/modeling algorithmsBrute-force crunching of bulk dataBlack box technologyMagic
![Page 114: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/114.jpg)
114
Back to the Process
Text Mining
![Page 115: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/115.jpg)
UnderstandingBusiness Understanding
Determine objectiveAssess situationDetermine data mining goalsProduce project plan
Data UnderstandingCollect initial dataDescribe dataExplore dataVerify data quality
![Page 116: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/116.jpg)
Data PreparationData
Data setData set descriptionSelect dataClean dataConstruct data set / Integrate dataFormat data
TextConcept extractionConcept combinationConcept assessment
![Page 117: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/117.jpg)
ModelingSelect modeling technique
Universe of techniquesAppropriate techniques
DataText
RequirementsConstraintsSelected tools
Generate test designRun model(s)Assess model(s)
![Page 118: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/118.jpg)
EvaluationResults = Models + Findings
Evaluate resultsReview processDetermine next steps
![Page 119: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/119.jpg)
DeploymentPlan deploymentPlan monitoring and maintenanceFinal reportProject review
![Page 120: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/120.jpg)
Unsupervised methods:Group patients by drugs and demographic information and try to find
unusual patients
Supervised methods:Attempt to predict amount due and find sets of cases where the amount
due is very different from the predicted amount
Data Mining Approaches
![Page 121: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/121.jpg)
Copyright 2003-4, SPSS Inc. 121
What Does Data Mining Do?• Data mining uses existing data to:
• Predict• Category membership• Numeric Value• Ie. Credit risk
• Group• Cluster (group) things together based
on their characteristics• Ie. Different types of TV viewers
• Associate• Find events that occur together, or in
a sequence• Ie. Beer and diapers
• Find outliers• Identify cases that don’t follow
expected behavior• Ie. Fraudulent behaviour
![Page 122: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/122.jpg)
Data Mining vs. Text MiningData Mining
• Process directly• Identify causal
relationship• Structured numeric
transaction data residing in rational data warehouse
• Text Mining• Linguistic processing or
natural language processing (NLP)
• Discover heretofore unknown information[2]
• Applications deal with much more diverse and eclectic collections of systems and formats[4]
![Page 123: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/123.jpg)
ConfusionIs text mining the same as information extraction? No!Information Extraction (IE)
Extract facts about pre-specified entities, events or relationships from unrestricted text sources.No novelty: only information is already present is extracted.
![Page 124: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/124.jpg)
Two Foundations
Information Retrieval (IR)
Artificial Intelligence (AI)
![Page 125: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/125.jpg)
Information RetrievalThe science of searching for
• Information in documents• Documents themselves• Metadata which describe documents• Text, sound, images or data, within database:
relational stand-alone database or hypertext networked databases such as the Internet or intranets.
![Page 126: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/126.jpg)
Information RetrievalGerard Salton
Functional overview of IR
![Page 127: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/127.jpg)
Artificial IntelligenceArtificial intelligence (AI) is a branch of computer science and engineering that deals with intelligent behavior, learning, and adaptation in machines.
![Page 128: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/128.jpg)
Natural Language ProcessingNatural language processing (NLP) is a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Statistical natural language processing uses stochastic, probabilistic and statistical methods to resolve some of the difficulties : e.g. text segmentation, word sense disambiguation
![Page 129: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/129.jpg)
Data Warehousing, OLAP and data mining: what and why?
![Page 130: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/130.jpg)
130
Data Warehouse Architecture
Data Warehouse Engine
Optimized Loader
ExtractionCleansing
AnalyzeQuery
Metadata Repository
RelationalDatabases
LegacyData
Purchased Data
ERPSystems
![Page 131: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/131.jpg)
Data Warehouse for Decision Support & OLAP
Putting Information technology to help the knowledge worker make faster and better decisions
Which of my customers are most likely to go to the competition?
What product promotions have the biggest impact on revenue?
How did the share price of software companies correlate with profits over last 10 years?
![Page 132: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/132.jpg)
Decision SupportUsed to manage and control business
Data is historical or point-in-time
Optimized for inquiry rather than update
Use of the system is loosely defined and can be ad-hoc
Used by managers and end-users to understand the business and make judgements
![Page 133: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/133.jpg)
Data Mining works with Warehouse Data
Data Warehousing provides the Enterprise with a memory
Data Mining provides the Enterprise with intelligence
![Page 134: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/134.jpg)
We want to know ...Given a database of 100,000 names, which persons are the least likely to default on their credit cards?
Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer?
If I raise the price of my product by Rs. 2, what is the effect on my ROI?
If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result?
If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues?
Which of my customers are likely to be the most loyal?
Data Mining helps extract such information
![Page 135: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/135.jpg)
What makes data mining possible?
Advances in the following areas are making data mining deployable:• data warehousing • better and more data (i.e., operational, behavioral, and
demographic) • the emergence of easily deployed data mining tools and • the advent of new data mining techniques.
![Page 136: Business Intelligence Session](https://reader035.fdocuments.net/reader035/viewer/2022062407/563db9da550346aa9aa08aec/html5/thumbnails/136.jpg)
Why Separate Data Warehouse?
Missing data: Decision support requires historical data, which op dbs do not typically maintain.
Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources.
Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.