London - New York - Dubai - Mumbai - Hong Kong 2012
The Data Migration Challenge:Elements including MDM
by Wael Elrifai
Confidential - not for redistribution
Understanding Migration
“Migration is not just about moving the data…
It’s about making the data work.”
Few source systems
Many More
Source Systems
SpecificData
Formats
Data inunknownformats
All Data
Available
Needed Data
is Missing
Documented
System
Interfaces
Unknown System
Interfaces
Valid
Data
Poor Data
Quality
Assumptions
T R U T H
Outsourcing
Legacy Retirement
M&A Integration
Application InstanceConsolidation
Application Upgrade
Application Implementation
These Application Projects have a Common Critical Requirement: Migrating Data
From legacy into new application
From previous to new version
From multiple instances to fewer
From acquired systems
From legacy into new systems
From company to outsourcer
Project Overview: Data Migration to ERP
• 200+ source systems
• Operating in 14 languages
• Different sets of users working in different regions with different applications and languages
• Highly fragmented lines of business and regions
• No concept of Data Governance or Master Data Management
• No concept of Data Quality Analysis
Methodology: Practical Data MigrationM
igration Strategy &
Governance
(MSG
)
Migration Design & Execution
(MDE)
Tech
nica
lBu
sine
ssEn
gage
men
t
System Retirement Plan(SRP)
Key Data Stakeholder Management
(KDSM)
Legacy Decommissioning
(LD)
Gap Analysis & Mapping
(GAM)
Landscape Analysis
(LA)
Data Quality Rules(DQR)
Profiling ToolData Quality Tool
Migration Controller
DM
Z
Team Structure & Communications
• Primary Business Team located in Hong Kong• 6 Business Analysts• 2 Technical Coordinators
• Primary Development Team in Hong Kong• 8 Developers
• Offshore Development Team in Mumbai, India• 4 Developers
• Unique Aspects• Agile/Scrum meetings conducted via Video Conference• Email usage limited• Assigned secretary with output immediately posted on Wiki for comments• Team Lead makes final “closing comments” on each issue
Application Migration: The Anatomy of Failure
Long development times•Often many months or even years without any ‘visible’ signs of
progress•CAUSE: failure to properly decompose development into practical,
achievable and meaningful ‘phases’ and ‘sprints’
Long development times – for individual ETL flows•Due to extensive and repeated re-working of ETL code•Resulting from failures in unit testing and user acceptance testing•CAUSE: poor and inadequate design
Considerable variations in quality & efficiency of code•Increasing time for new/other developers to modify code•CAUSE: failure to define and firmly enforce standards
Application Migration : The Anatomy of Failure
Minimal attention to data cleansing or standardisation•Leading to longer report development times•And greater inconsistencies in reporting•Effectively pushing data quality management to report developers•AND information consumers•CAUSE: failure to recognise importance and impact of employing
a systematic approach to managing data quality
Poor reliability•Arising from ‘unexpected’ variations in structure or content of
incoming source files•CAUSE: failure to cater for Murphy’s Law – i.e. the most frequent
and most obvious causes of
Application Migration : The Anatomy of Failure
Poor performance
•CAUSE: failure to give due consideration to scale and complexity
of ETL processes – during the design stage
•CAUSE: failure to fully understand the underlying causes – when
performance problems become evident
•CAUSE: failure to routinely monitor performance or undertake
adequate capacity planning – to cater for gradual or step-change
increases in data volumes
Application Migration: The Anatomy of Success
ForensicData Analysis
DetailedFunctional Design
DetailedTechnical Design
Peer ReviewTechnical Authority
BuildUnit Test
Peer ReviewTechnical Authority
UAT
IncludingMaster
Schedule
SystemTest
Entity Level‘MAPPING’
Hosted
EnforceStandards
&Reusable
Components
Data Model Design& ETL Phasing
CodeTranslations
&Master
Schedule
SprintGo Live
SoftGo Live
REUSABLECOMPONENTS
TEMPLATES
Abstraction of Rules & Reusability
• Automated ETL mapping development based on source system metadata
• Automated data type verification for flat file data based on header information
• Consistent use of a single value mapping table abstracted to accommodate data migration rules
• Automated data type verification for flat file data based on header information
• Single generic “run script” which operates based on a simple dependency matrix
•This is more important in operational rather that data migration situations, but becomes important when dependencies are complex
Data Migration Guiding PrinciplesCreating Data Standards to Reduce Complexity
Confidential - not for redistribution
Current State Environments• Source Tables• Source Attributes• Upstream Sources• Downstream Targets• Create as is Domain Model• Create as is Entity Model
Future State Environments• Enterprise Apps Data Models• ODS Data Models
Initial Common DataStandards and creation of:•Initial DQ Program•Initial Data Ownership Model•Initial Data Management •Governance Processes
Rationalize Domains and Entities across Current State
and Future State Environments
Map in all Application Environments to the Enterprise Standard
Rationalize Attributes across Current State and Future
State Environments
Common Data Standards Enterprise Representation
• Create Domain Model• Create Entity Model• Create Entity Relationship Model
Create Entity Attribute Model
ETC
ODS
DW
Customer
Sample Architecture Diagram – Subset of Project
Data Governance - 14-step (sounds like a lot!) program
1. Review available documentation on process flow
2. Agree scope of work
3. Plan and schedule meetings
4. Produce initial definitions of DG framework
5. Assemble DG working group
6. Engage with Data Stewards
7. AS-IS business process analysis
8. AS-IS data analysis
9. Define TO-BE processes
10. Define TO-BE system requirements
11. Assemble business glossary
12. Introduce standardization of business-critical data items
13. Implement DG KPI tracking and DQ exception reporting
14. Conduct periodic audit of business processes
Master Data Management - Highlights
• DON’T FORGET! Your data migration tools may end up being the real-time MDM Hub communication logic/tools as well, design appropriately
• Simplified load tools that can be used by analysts• Custom match/merge algorithms• Gray’s coding• 14 languages including European, Middle Eastern (right-to-left), East
Asian• Some transliteration rules built using statistical regression on 30m
customer records• Match/merge algorithms with discrete variables and user interface• Ability to allow users to target hotspots• Variable “sliders” - Meshed variables for hotspot analysis allows for
more merge sensitivity flexibility• Data analysis for predicting why false positives and false negatives occur• Role of each source• Types of data that most often “fails”
• Google Maps/Address integration for matching (cloud), data enhancement, and more
Testing
• Custom “Black Box” testing tool designed• Specialized for database tests• Requires addition of some metadata columns to data model
• S_ID• Batch_ID• LOAD_TIME
• Automatic storage of test cases• Test data• Documentation on test being run• User metadata• Test metadata
• Sets database into a known state• Can generate test data• Single unified interface• Fault-Fix workflow management
Documentation
• Automated• Driven by• Business requirements documented in
• Custom testing tool• Wiki documentation
• ETL tool metadata• Custom testing tool metadata
This is highly contingent on being able to enforce developer rules about documentation within tools.
Risk Mitigation
Extract data early•Data should be seen immediately. We’ve seen problems come up because
data didn’t conform to expectations.
Convert data early•Our existing build will allow for the first conversion to take place within
weeks for all objects.
Convert data often•An iterative approach to both data quality and conversion allows for repeated
analysis. This should be driven by development schedules rather than inversely by validation schedules that aren’t related to development time.
Use real data from the start• Conversion team should have direct access to source systems, without a
dependency on another team to create extracts.
Seek to incorporate external and up-to-date information about your Master Data
• Tools like Google’s business services, D&B, Bloomberg and others can help
Confidential - not for redistribution
Data Migration through Information Development
Lessons Learned
Prioritise Planning• Define business priorities and start with quick wins• Don't do everything at once – Deliver complex projects through an incremental programme• “Chunks” need to be appropriate, based on elements like homogeneity of front-end, single sets of business users across geographies, language usage, etc.
Focus on the Areas of High Complexity•Don't wait until the 11th hour to deal with Data Quality issues – Fix them early•Follow the 80/20 rule for fixing data – Does this iteratively through multiple cycles•Understand the sophistication required for Application Co-Existence and that in the• In the short term your systems will get more complex
Keep the Business Engaged• Communicate continuously on the planned approach defined in the strategy The overall Blueprint is the communications document for the life of the programme• Try not to be completely infrastructure-focused for long-running releases – Always deliver some form of new business functionality• Align the migration programme with analytical initiatives to give business users more access to data• Ensure that the Data Governance program has “teeth”
Confidential - not for redistribution
Peak Consulting UK Headquarters
90 Long Acre, Covent GardenLondon WC2E 9RZ
T: +44 (0)20 7849 3422 F: +44 (0)20 7990 9478www.peakconsulting.eu
Questions?
?
Top Related