Data Integration

10
Data Integration Data Integration Combining data from different Combining data from different sources, providing a unified view of sources, providing a unified view of the data the data Data warehouse is a repository that Data warehouse is a repository that results from some types of data results from some types of data integration processes integration processes 1

description

Data Integration. Combining data from different sources, providing a unified view of the data Data warehouse is a repository that results from some types of data integration processes. Techniques for Data Integration. Consolidation (ETL) Extract/Transform/Load - PowerPoint PPT Presentation

Transcript of Data Integration

Page 1: Data Integration

Data IntegrationData Integration

Combining data from different sources, Combining data from different sources, providing a unified view of the dataproviding a unified view of the data

Data warehouse is a repository that results Data warehouse is a repository that results from some types of data integration from some types of data integration processesprocesses

1

Page 2: Data Integration

Techniques for Data Techniques for Data IntegrationIntegration

Consolidation (ETL)Consolidation (ETL) Extract/Transform/LoadExtract/Transform/Load Consolidating all data into a centralized database (like a Consolidating all data into a centralized database (like a

data warehouse)data warehouse) Data federation (EII)Data federation (EII)

Enterprise Information IntegrationEnterprise Information Integration Provides a virtual view of data without actually creating Provides a virtual view of data without actually creating

one centralized databaseone centralized database Data propagation (EAI)Data propagation (EAI)

Enterprise Application IntegrationsEnterprise Application Integrations Duplicate data across databases, with near real-time delayDuplicate data across databases, with near real-time delay

2

Page 3: Data Integration

3

The ETL ProcessThe ETL Process

Capture/ExtractCapture/Extract Scrub or data cleansingScrub or data cleansing TransformTransform Load and IndexLoad and Index

ETL = Extract, transform, and load

Page 4: Data Integration

4

Static extractStatic extract = capturing a snapshot of the source data at a point in time

Incremental extractIncremental extract = capturing changes that have occurred since the last static extract

Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse

Page 5: Data Integration

5

Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality

Fixing errors:Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies

Also:Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

Page 6: Data Integration

6

Transform = convert data from format of operational system to format of data warehouse

Record-level:Record-level:Selection–data partitioningJoining–data combiningAggregation–data summarization

Field-level:Field-level: single-field–from one field to one fieldmulti-field–from many fields to one, or one field to many

Page 7: Data Integration

7

Load/Index= place transformed data into the warehouse and create indexes

Refresh mode:Refresh mode: bulk rewriting of target data at periodic intervals

Update mode:Update mode: only changes in source data are written to data warehouse

Page 8: Data Integration

Data Transformation Data Transformation FunctionsFunctions

Record-levelRecord-level Transformation that involves obtaining Transformation that involves obtaining

the the set of recordsset of records you want from the you want from the data sourcedata source

Selection, joining, aggregationSelection, joining, aggregation Field-levelField-level

Transformation that converts data from Transformation that converts data from fields of a source record to field(s) of a fields of a source record to field(s) of a target record.target record.

Single-field vs. Multi-field Single-field vs. Multi-field transformationstransformations 8

Page 9: Data Integration

9

Single-field transformation

In general–some transformation function translates data from old form to new form

Algorithmic transformation uses a formula or logical expression

Table lookup–another approach, uses a separate table keyed by source record code

Page 10: Data Integration

10

Multifield transformation

M:1–from many source fields to one target field

1:M–from one source field to many target fields