© 1999 FORWISS Data Warehouse: Methodology and Tools FORWISS - Bavarian Research Centre for...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of © 1999 FORWISS Data Warehouse: Methodology and Tools FORWISS - Bavarian Research Centre for...
© 1999 FORWISS
Data Warehouse:Data Warehouse: Methodology and Tools Methodology and Tools
FORWISS - Bavarian Research Centre for Knowledge Based Systems
Concepts, Architectures and Products
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
2
OverviewOverview
The Process of Planning and Building a Warehouse
Data Warehouse Architecture (revisited)
Classification of Tools
Focus: OLAP Tools– Multidimensional Data Modeling
– OLAP Architectures
– OLAP query languages
Tool Demonstration
Summary
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
3
OLAP Design CycleOLAP Design Cycle
RequirementAnalysis
Conceptual Design(Implementation
Independent)
Using the DataWarehouse
Logical + Physical Design(e.g. Product specific)
Implementation
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
4
OperationalData Sources
Data-Migration Middleware (Populations-Tools)
DataStorage
RepositoryRepository
DataAnalysis
Reporting, OLAP,Data Mining
Data Warehouse ArchitectureData Warehouse Architecture
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
5
Classification of ToolsClassification of Tools
Frontend Tools
Data Storage Tools (Databases)
ETL Tools– Extraction
– Transformation
– Loading
Repository Systems– Metadata Storage
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
6
Repository SystemsRepository Systems
Manage Different Kinds of Metadata
– Business Metadata
» E.g. How is revenue computed
– Technical Metadata
» When was data last loaded from which system
» Data model for OLTP and OLAP databases
Functionality
– Communication ‚hub‘ for different tools
– Guides user exploration
– Guides development process
– Impact analysis
E.g. Viasoft Rochade, Softlab Enabler,...
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
7
ETL ToolsETL Tools
Extraction: Range of Supported Data Sources
– Mainframe legacy databases
– COBOL Files
– Relational Databases
– Filebased data storage (Excel, Word,XML,...)
Transformation
– (Graphical) Specification of Transformation Rules (Expressive Power)
Loading
– Ability to use database features (e.g. bulk loading)
Process Management
– Scheduling, Monitoring, Error Handling
Informatica PowerMart, Hummingbird Genio, Acta...
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
8
Databases for DWDatabases for DW
Special Indexing Techniques– Multidimensional Indexes
– Bitmap Indexes
– Foreign Column Indexes
Support for Materialized Views (Preaggregation)
Special Analytical Capabilities (e.g. SQL Extensions)– Top N
– Ranking
Bulk Loading Capabilities– Offline, No concurrency control
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
9
ReportingReporting
Interactive OLAPInteractive OLAP
Ad hoc-Queries
Data MiningData Mining
Additional BenefitAdditional Benefit Number of UsersNumber of Users
What happened?
What happened?
Why did it happen?Why did
it happen?
What will happen?What will happen?
What happened why and how?
What happened why and how?
Frontend ToolsFrontend Tools
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
10
The User‘s view (OLAP Tool)The User‘s view (OLAP Tool)
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
11
Multidimensional OLAP (MOLAP)Multidimensional OLAP (MOLAP)
Multidim.Database
Frontend Tool
specialized database technology
multidimensional storage structures
E.g. Hyperion Essbase, Oracle Express, Cognos
PowerPlay (Server)
+ Query Performance
+ Powerful MD Model
+ write access
Database Features multiuser access/ backup and recovery
Sparsity Handling -> DB Explosion
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
12
Relational OLAP (ROLAP)Relational OLAP (ROLAP)
ROLAP- Engine
Relational DB
Frontend Tool
SQL
MD-Interface
Meta Data
idea: use relational data storage
star (snowflake) schema
E.g. Microstrategy, SAP BW
+ advantages of RDBMS+ scalability, reliability, security etc.
+ Sparsity handling
Query Performance
Data Model Complexity
no write access
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
13
Client (Desktop) OLAPClient (Desktop) OLAP
Client-OLAP
proprietary data structure on the client
data stored as file
mostly RAM based architectures
E.g. Business Objects, Cognos PowerPlay
+ mobile user
+ ease of installation and use
data volume
no multiuser capabilites
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
14
ROLAP- Engine
Multidim.Database
DW-DB (mostly relational)
MOLAP ROLAP Client-OLAP
DW IntegrationDW Integration
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
15
Combining Architectures ICombining Architectures I
Multidim.Database
Drill through
highly aggregated data
dense data
95% of the analysis requirements
highly aggregated data
dense data
95% of the analysis requirements
detailed data (sparse)
5% of the requirements
detailed data (sparse)
5% of the requirements
RelationalDatabase
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
16
Combining Architectures IICombining Architectures II
Multidim. Storage
Hybrid OLAP (HOLAP)
HOLAP System
Relational Storage
Meta Data
equal treatment of MD and Rel Data
Storage type at the discretion of the administrator
Cube Partitioning
equal treatment of MD and Rel Data
Storage type at the discretion of the administrator
Cube Partitioning
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
17
OLAP StandardsOLAP Standards
Idea: define interface between client and server
Benefit: Component oriented architectures
Proposal 1: OLAP Council– union of OLAP Tool producers
– not implemented so far (even by the council members)
Proposal 2: Microsoft - OLEDB for OLAP (shot ODBO)– standardizes a data model and an MD query language (MDX)
– specification contains lots of optional functionality
– all major vendors committed themselves to the standard
– will be the de facto standard
© 1999 FORWISS
Practical Case StudyPractical Case Study
Building a Warehouse
artwork copyright Intersystems GmbHartwork copyright Intersystems GmbH
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
19
Conceptual DesignConceptual Design
RequirementAnalysis
Conceptual Design(Implementation
Independent)
Using the DataWarehouse
Logical + Physical Design(tool specific)
Implementation
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
20
The Modeling ProcessThe Modeling Process
Which business process is being modeled?
What is the subject of analysis (fact) and what is being
measured?
On what granularity level is active analysis being done?
Which properties (dimensions) determine the measures?
Which different levels of aggregation are meaningful?
What additional information is needed for the different levels?
What is the variability and the cardinality of the dimensions?
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
21
FactsFacts
Fact = Subject of Analysis Sales
Measures = Attributes describing facts Quantity, Price
Derived Measures Profit
Additivity of Measures globally additiv Quantity additiv for some dimensions Items in stock
additiv resp. to plants/
not additiv w.r.t. time
not additiv at all profit margin
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
22
DimensionsDimensions
Dimensions = static structure of business information
Used for navigating the data space
Choosing the necessary granularity
Dimension Members = Instances of a dimension– e.g. 8.12.1997 and Juli 1997 are members of dimension “time”
Structuring Dimension– using different dimension levels (hierarchies)
– using descriptive attributes
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
23
Simple HierarchiesSimple Hierarchies
Januar 99Januar 99
Februar 99Februar 99
März 99März 99
Sept. 99Sept. 99
August 99August 99
Juli 99Juli 99
April 99April 99
Juni 99Juni 99
Mai 99Mai 99
1. Quartal 991. Quartal 99
2. Quartal 992. Quartal 99
3. Quartal 993. Quartal 99
19991999
2. Halbjahr 992. Halbjahr 99
1. Halbjahr 991. Halbjahr 99
........................
MonthMonth QuarterQuarter1/2 YearPeriod
1/2 YearPeriod YearYear
Dimension Level
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
24
Unbalanced HierarchiesUnbalanced Hierarchies
......
Plant 1Plant 1
Bu 1Bu 1
Bu 2Bu 2
GreatOutdoors
GreatOutdoors
Div BDiv B
Div ADiv A
......
Plant 0815Plant 0815
Plant/SitePlant/Site BusinessUnit
BusinessUnit
BusinessDivision
BusinessDivision EnterpriseEnterprise
......
Plant1Plant1
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
25
Alternative HierarchiesAlternative Hierarchies
Customer 04Customer 04
Customer 01Customer 01
PartnerPartner
RetailerRetailer
GermanyGermany
Customer 06Customer 06
ConsumerConsumer
Customer 02Customer 02
Customer 03Customer 03
Customer 05Customer 05
BavariaBavaria
HessenHessen
HamburgHamburg
CustomerCustomer Geogr. RegionGeogr. Region CountryCountry
Customer GroupCustomer Group
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
26
Alternative PathesAlternative Pathes
Würzburg 01Würzburg 01
Munich 01Munich 01
GermanyGermany
Frankfurt 01Frankfurt 01Germany (South)Germany (South)
Germany (West)Germany (West)
Germany (North)Germany (North)
Munich 02Munich 02
Munich 03Munich 03
Würzburg 02Würzburg 02
BayernBayern
HessenHessen
HamburgHamburg
OrtOrt Geogr. RegionGeogr. Region
Sales RegionSales Region
CountryCountry
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
27
Criteria for a ‘good’ MD DesignCriteria for a ‘good’ MD Design
dimensions should be independent
dimensionality of a cube should be max. 7-8 dimensions– interpretation of results is difficult for a large number of dimensions
hierarchies should have a fan-out of max. 30– long drill-down times
– large drill-down results
– insert additional levels for structuring purposes (e.g. insert state
between city and country)
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
28
Graphical Notation (ME/R)Graphical Notation (ME/R)
A Fact and its measuresA Fact and its measures
Level-NameAttribute 1
...Attribute n
Fact-Name
Measure 1
Measure n...
.. is characterized by dimensions.. is characterized by dimensions
..can be classified according to.....can be classified according to...
A Dimension Level with attributesA Dimension Level with attributes
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
29
Sale
Revenue
Order QtyCost
Prod. Type
Line
Year
Month
Example Data ModelExample Data Model
CustomerType
Branch
Region
Country
Product
Sales RepNameCode
Day
Customer MarginRange
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
30
Cognos PowerPlay- ArchitectureCognos PowerPlay- ArchitectureS
erve
rC
li ent
PowerPlay Client
PowerCube(Proprietary,
Compressed)
Transformer
Impromptu
PowerPlay Server
OLEDB for OLAP
OLEDB Providere.g. MS OLAP Services,SAP BW,…
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
31
Logical+Physical DesignLogical+Physical Design
RequirementAnalysis
Conceptual Design(Implementation
Independent)
Using the DataWarehouse
Logical + Physical Design(tool specific)
Implementation
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
32
Practical DemonstrationPractical Demonstration
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
33
Summary and ConclusionsSummary and Conclusions
Multidimensional modeling is performed on different levels– conceptual model (tool independent level) following requirement analysis
– logical and physical design before implementation
Distinction between two types of data– quantifying data: measures, cells of the cube, fact table
– qualifying data: properties, dimensions, dimension tables
Hierarchical structures of dimensions can be complex
ME/R notation can be used to document conceptual models
Several ways to map an MD model to a relational DB
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
34
Result Granularity
A
B
Canonical Query (I)Canonical Query (I)Restriction Element
A
B
Query Result
Result Measures
m2m1
© 2001 FORWISS,Carsten Sapia - [email protected]: Tools and Projects
35
Canonical Query (II)Canonical Query (II)
Result Measures
Result Granularity
Restriction Elements
g1 g2 gn…
r1 r2 rn…
m1 … mk
Canonical Query Definition
SELECT g1,...,gn, aggr(m1),..., aggr(mk)FROM FactName, Dim1,..., Dimn
WHERE Dim1.level(r1) = r1 AND ... AND Dimn.level(rn) = rn
AND Dim1.d1=FactName.d1 AND ... AND Dimn.dn=FactName.dn
GROUP BY g1,...,gn