ETL Implementation Strategy
Transcript of ETL Implementation Strategy
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 1/16
ETL Implementation Strategy
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 2/16
Contents
Buy or Build ETL
Major Factors Involved in Evaluating an ETL
An Ideal ETL Tool
ETL Implementation
ETL Process Example
Required Feature in ETL tool
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 3/16
Buying ETL ToolBuying ETL Tool
Advantages
Reduced development time
A ide range o! !eatures availa"le
Reusa"le across !uture p#ases involving data trans!ormations it#in Project
Disadvantages
Time needed to learn t#e product
Training costsMay not do everyt#ing e need $to "e supplemented it# in%#ouse
development&
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 4/16
Building ETL ToolBuilding ETL Tool
Advantages
'o up%!ront purc#asing costs
'o training costs
(peci!ically designed !or t#e purpose o! t#e project
Disadvantages
Time needed !or design) development) testing and documentation
May not #ave all t#e !eatures o! an o!! t#e s#el! product*ig# maintenance
“Recommended to buy a tool than building one or pro!ect"
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 5/16
#actors Involved in Evaluating an#actors Involved in Evaluating an ETL
Ease o use
Database Connectivity
$pdate Capabilities
Surrogate %ey Support
Change Data Capture
Intelligent &ueries
'ulti(source )oins
Aggregate Capabilities
Tool Integration
'etadata Support
Customi*ation 'ethods
Logging
Scheduling #eatures
&uality Assessment
Tool Architecture
+rice
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 6/16
CriteriaCriteria
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 7/16
CriteriaCriteria
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 8/16
ETL EnvironmentETL Environment
+reating an ETL environment requires six "asic in!rastructure components,
A ,et-or% Environment to connect source data systems to t#e
are#ouse plat!orm
A RDB'S !or t#e are#ouse
A Sort.'erge utility to integrate data !rom t#e various source systems
A method to perorm calculations
A $tility to Schedule and Run ETL batch cycles "ased on events ortimelines
A Change 'anagement utility to manage updates and version control o!
programs and scripts
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 9/16
Implementation 'ethodologyImplementation 'ethodology
-e!ine "usiness requirements !or t#e are#ouse project
Analyse t#e source systems
-evelop p#ysical data model
-esign ETL processes
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 10/16
Design StagesDesign Stages
-esigning process !ollos a systematic staged apprac#. /arious stages create asingle processing met#od !or initial and successive loads to t#e dataare#ouse. It involves 0 stages #ic# provide a modular and adjusta"letrans!ormation process !or t#e target ta"le t#at can adapt easily to c#anges int#e source systems or t#e are#ouse model desig
Stage /0 Source veriication
per!orms t#e access and extraction o! data !rom t#e source system and "uilds atemporal vie o! t#e data at t#e time o! extraction
Stage 10 Source alteration per!orm a variety o! trans!ormations unique to t#e source) depending on "usiness
requirements
Stage 20 Common interchange
applies "usiness rules and1or trans!ormation logic t#at is !requent across multipletarget ta"les
Stage 30 Target load determination
per!orms !inal !ormatting o! data to produce load%ready !iles !or t#e target ta"le2identi!ies and segregates ros to "e inserted vs. updated $i! applica"le&2 appliesremaining tec#nical meta data tagging2 and processes data into t#e R-BM(
Stage 40 Aggregation
!inal stage) uses t#e load% ready !iles !rom (tage 3 to "uild aggregation ta"lesneeded to improve query per!ormance against t#e are#ouse
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 11/16
ETL +rocess E5ampleETL +rocess E5ample
Stage /0 Source 6eriication
source system is a #uman resources $*R& ERP system
target is an organi4ation dimension ta"le t#at #appens to use type 5 sloly
c#anging dimensions
6or7ing !iles
ne
organi4ation
records
#I7$RE /8 (ource veri!ication) alteration) and common interc#ange stages.
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 12/16
ETL +rocess E5ampleETL +rocess E5ample
Stage 10 Source Alteration
e append data !rom secondary sources.
In t#is case t#e *R ERP Region ta"le 8 to t#e primary organi4ational
extract !ile
6or7ing !ile
ne
organi4ation
records
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 13/16
ETL +rocess E5ampleETL +rocess E5ample
Stage 20 Common Interchange
e !ind t#at t#e region name values stored in t#e *R ERP system do not
con!orm to t#e esta"lis#ed enterprise de!initions t#us e need to use t#e
merge in!rastructure utility to update t#e organi4ation record region names
to re!lect t#e enterprise versions 999.
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 14/16
ETL +rocess E5ampleETL +rocess E5ample
Stage 30 Target Load Determination
e compare t#e current load o! organi4ation records against t#ose
previously loaded in earlier "atc# cycles
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 15/16
ETL +rocess E5ampleETL +rocess E5ample
Stage 40 Aggregation
e !lag ne ros !or insertion, current load%cycle records t#at #ave relevant
columns t#at do not matc# t#eir corresponding organi4ation dimension ta"le
ros) ne region names) or manager I-s
8/9/2019 ETL Implementation Strategy
http://slidepdf.com/reader/full/etl-implementation-strategy 16/16
Re9uired eatures in ETL toolRe9uired eatures in ETL tool
Arc#itecture :li7e *u" spo7e or
client server scala"le and extensi"le tec#nology%
scale up as data gros
+lient plat!orm support : indos
571;01;< etc
(erver plat!orm support : (un
(olaris) *P%=>)AI> etc.(upport !or ERP sources
(upport !or parallelism
+ode generator
-ata trans!ormation met#od
(upport !or managing and "uilding
aggregates
(upport !or various industry standard
data types
-ata ?uality !unctionality !eature
Exception #andling capa"ility
ETL process management
Bac7up and recovery !eature
Metadata capture support/ieing metadata
(ecurity o! metadata
6e" integration support
(upport !or versioning
Installation procedure
(upport !or s#ara"le repository
(upport !or designing data marts
(upport !or importing data models
!rom modeling tools
(upport !or di!!erent 7ind o!
trans!ormations Adapta"ility
(upport !or grot#
A"ility to #andle various source
types
(upport !or external loader