ITEC 423 Data Warehousing and Data Mining Lecture 3.
-
Upload
stewart-hall -
Category
Documents
-
view
223 -
download
1
Transcript of ITEC 423 Data Warehousing and Data Mining Lecture 3.
![Page 1: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/1.jpg)
ITEC 423 Data Warehousing and ITEC 423 Data Warehousing and Data MiningData MiningLecture 3Lecture 3
![Page 2: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/2.jpg)
ArchitectureArchitecture
Architecture is the art and science of designing buildings and other structures;
Architecture is as a system design decision that is usually not easily changed.
There are many different architectural choices available with different solutions for Data transfer Data Staging Area Data storage Information Delivery
![Page 3: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/3.jpg)
A General Data Warehouse A General Data Warehouse ArchitectureArchitecture
![Page 4: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/4.jpg)
A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Areawith Staging Area
![Page 5: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/5.jpg)
A General Data Warehouse A General Data Warehouse ArchitectureArchitecturewith Staging Area and Data with Staging Area and Data MartsMarts
![Page 6: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/6.jpg)
Architectural TypesArchitectural Types
![Page 7: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/7.jpg)
Architectural Types :Centralized Architectural Types :Centralized Data WarehouseData Warehouse
Takes into account the enterprise-level information requirements
Atomic level data at the lowest level of granularity is stored
Some summarized data may be included
Queries and applications access the central data warehouse.
No separate data marts
![Page 8: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/8.jpg)
Architectural Types- Architectural Types- Independent Data MartsIndependent Data Marts
Evolves in companies where the organizational units develop their own data marts for their own specific purposes
Each data mart serves a particular organizational unit
More than one version of the truth may be found
Data marts are independent of one another Different data marts may have inconsistent
data definitions and standards Such variances hinder analysis of data
across data marts.
![Page 9: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/9.jpg)
Architectural Types-Architectural Types-FederatedFederated
An existing legacy of an assortment of DSS in the form of operational systems, extracted datasets, primitive data marts, …
May not be possible to discard investment and start from scratch
Practical solution is a federated architectural type
data may be physically or logically integrated through shared key fields, overall global metadata , distributed queries, and such other methods
No one overall data warehouse
![Page 10: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/10.jpg)
Architectural Types-Architectural Types- Data-Data-Mart BusMart Bus
Conformed supermarts approach Analyzing requirements for a specific business subject such as
orders, shipments, billings, insurance claims, car rentals ... Build the first data mart (supermart) using business dimensions
and metrics These business dimensions will be shared in the future data
marts. Conform dimensions among the various data marts Result would be logically integrated supermarts that will
provide an enterprise view of the data Data marts contain atomic data organized as a dimensional
data model Results from adopting an enhanced bottom-up approach to
data warehouse development
![Page 11: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/11.jpg)
Architectural Types- Hub Architectural Types- Hub and Spokeand Spoke
Similar to the centralized data warehouse architecture: enterprise-wide data warehouse
Atomic data is stored in the centralized data warehouse The centralized data warehouse feeds data to the
dependent data marts on the spokes Dependent data marts may be developed for departmental
analytical needs, specialized queries, data mining ... Dependent data mart may have normalized, denormalized,
summarized, or dimensional data structures based on individual requirements
Most queries are directed to the dependent data marts Centralized data warehouse may also be used for querying
Results from adopting a top-down approach to data warehouse development.
![Page 12: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/12.jpg)
Building Blocks of Data Building Blocks of Data WarehousesWarehouses
![Page 13: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/13.jpg)
Production
Internal
Archived
External
Source
Data
![Page 14: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/14.jpg)
Production DataProduction Data
![Page 15: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/15.jpg)
Internal DataInternal Data
![Page 16: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/16.jpg)
Archived DataArchived Data
Backup data
old data of the operational databases are stored in archived files.
Decisions related to archiving
how often
which portions
Different methods of archiving
Recent data archived to a separate archival database that may still be online.
Older data archived to flat files on disk storage.
Oldest data archived to tape cartridges or microfilm may be kept off-site.
Data warehouse keeps historical snapshots of data.
need historical data for analysis over time.
Look into your archived data sets. Depending on your data warehouse requirements, you have to include sufficient historical data.
![Page 17: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/17.jpg)
External DataExternal DataExternal data is used especially by decision makers
statistics relating to their industry produced by external agencies and national statistical offices.
market share data of competitors.
standard values of financial indicators for their business to check on their performance.
Production data and archived data
give you a picture based on what you are doing or have done in the past.
Is not enough for understanding industry trends and compare performance
![Page 18: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/18.jpg)
Extraction
Transformation
Loading
Data Staging Compon
ent
![Page 19: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/19.jpg)
Data ExtractionData Extraction
• Source data may be from different source machines in diverse data formats.
Deals with numerous data sources
• Outside tools suitable for certain data sources• Develop in-house programs to do the data extraction.
Tools are available on the market for data extraction.
After extraction where to keep the data for further preparation?
Perform the extraction function in the legacy system
• extract the source data into a • group of flat files• a data -staging relational data base• a combination of both.
Extract the source into a separate physical environment from which moving the data into the data warehouse would be easier.
![Page 20: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/20.jpg)
Data TransformationData Transformation
Data for a data warehouse comes from many disparate sources
• Clean the data from each source: misspellings, resolution, missing data, duplicates
• Standardize data elements: data types, lengths, synonyms/homonyms
• Combine related information
• Purge useless data• Choose appropriate keys• Summarize if necessary
Data feed is not just an initial load.
• Same (maybe slightly adapted) transformation process will be applied periodically.
![Page 21: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/21.jpg)
Data LoadingData Loading
![Page 22: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/22.jpg)
Data Storage ComponentData Storage Component
A separate repository
Large volumes of historical data for analysis
not for quick retrieval of individual pieces of information
multidimensional databases store data aggregated at different levels
![Page 23: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/23.jpg)
Information Delivery Information Delivery ComponentComponent
novice user
need prefabricated reports and preset queries
casual user
need prepackaged information once in a while
business analyst
need ability to do complex analysis using the information in the data warehouse
power user
need to be able to navigate throughout the data warehouse, pick up interesting data, format his or her own queries, drill through the data layers, and create custom reports and ad hoc queries.
Ad hoc reports
complex queries, multidimensional analysis, and statistical analysis
![Page 24: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/24.jpg)
Information Delivery Information Delivery ComponentComponent
![Page 25: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/25.jpg)
•knowledge discovery systems where the mining algorithms help to discover trends and patterns from the data
•online queries and reports•scheduled reports through e-mail or intranet • information delivery over the Internet
Information Delivery Information Delivery ComponentComponentInformation fed into executive information systems (EIS) is meant for senior executives and high-level managers.
Some data warehouses also provide data to data mining applications.
In your data warehouse , you may include several information delivery mechanisms.
![Page 26: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/26.jpg)
Metadata ComponentMetadata ComponentSimilar to the data dictionary or the data catalog in a DBMS
Data about the data in the data warehouse.
key architectural component of the data warehouse.
•Operational metadata•Extraction and transformation metadata•End-user metadata
Types of Metadata:
•connects all parts of the data warehouse .•provides information about the contents and structures to the developers.
•makes the contents recognizable to the end users.
Importance of Metadata
![Page 27: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/27.jpg)
Management and Control Management and Control ComponentComponent
sits on top of all the other components.
coordinates the services and
activities within the data warehouse.
controls the data transformation and the data transfer
into the data warehouse storage.
moderates the information delivery
to the users.
works with the database
management systems and enables data to be properly
stored in the repositories.
monitors the movement of data
into the staging area and from there into the data warehouse
storage itself.
interacts with the metadata
component to perform the
management and control functions.
![Page 28: ITEC 423 Data Warehousing and Data Mining Lecture 3.](https://reader036.fdocuments.net/reader036/viewer/2022062313/56649cc95503460f9499073e/html5/thumbnails/28.jpg)