Data Quality Essentials - v2.itweb.co.za · related tasks that should be incorporated into a...

23
Data Quality Essentials A Project Manager’s Guide to Data Quality Summary: Any data intensive project offers an opportunity to improve data quality. Initiatives ranging from CRM, MDM, ERP, Business Intelligence, data warehouse, data governance, and any data migration, consolidation, or harmonization endeavor warrant a closer look at the quality of the data that will populate the target application or system. To provide a roadmap for optimal effectiveness and coordination, we offer a detailed guide that identifies, for business team members and IT resources, the data quality- related tasks that should be incorporated into a project plan. These data quality essentials emanate from best practices gathered from field experience resulting from thousands of data management projects and successes. Harte-Hanks Trillium Software Corporate Headquarters 25 Linnell Circle Billerica, MA 01821 + 1 (978) 436-8900 www.trilliumsoftware.com [email protected]

Transcript of Data Quality Essentials - v2.itweb.co.za · related tasks that should be incorporated into a...

Data Quality Essentials A Project Manager’s Guide to Data Quality

Summary: Any data intensive project offers an opportunity to improve data quality.

Initiatives ranging from CRM, MDM, ERP, Business Intelligence, data warehouse, data

governance, and any data migration, consolidation, or harmonization endeavor warrant

a closer look at the quality of the data that will populate the target application or system.

To provide a roadmap for optimal effectiveness and coordination, we offer a detailed

guide that identifies, for business team members and IT resources, the data quality-

related tasks that should be incorporated into a project plan. These data quality

essentials emanate from best practices gathered from field experience resulting from

thousands of data management projects and successes.

Harte-Hanks Trillium Software Corporate Headquarters

25 Linnell Circle Billerica, MA 01821

+ 1 (978) 436-8900 www.trilliumsoftware.com

[email protected]

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 2 of 23 All rights reserved www.trilliumsoftware.com

About Trillium Software Trillium Software®, the global leader in Enterprise Data Quality, provides integrated technologies and services that deliver global data profiling, cleansing, enhancement, linking, and governance for customer relationship management, master data management, enterprise resource planning, supply chain management, data warehouse, e-business, and other enterprise applications. Peak-condition data drives peak performance. Offering global capabilities and global technology, the Trillium Software System® is comprised of: TS Discovery: For automated data discovery and profiling TS Discovery provides a complete and accurate view of enterprise data assets uncovering the true content, structure, rules, relationships and quality of your data, revealing issues that otherwise might remain hidden. Data governance initiatives demand greater collaboration between data management and business users in the understanding and implementation of data quality, TS Discovery empowers users to design, validate and deploy custom business and data quality content rules quickly and efficiently. TS Quality: For parsing, standardizing, and cleansing global data TS Quality cleanses, matches and unifies data across multiple data sources and data domains including customer, product, sales and financial. TS Quality delivers data parsing, standardization, cleansing and enrichment capabilities for multi-national enterprise-wide data, and the ability to implement data quality processes in high-performance, real-time environments. Director: For deploying and managing real-time data quality processes A complete data quality application server, the Director delivers high-performance cleansing and matching services across multiple platforms, servers, and applications. Using the Director, organizations can integrate and deploy batch and real-time data quality projects in multiple environments. Unique ActiveEnterprise™ Resources functionality enables users to centrally define, deploy and manage data quality processes in real-time environments. centrally through a single user interface. TS Insight: For scoring and tracking enterprise data quality A data quality dashboard, TS Insight provides visibility into the status of data quality enabling business and IT professionals, data stewards and analysts to monitor, manage and view trends of data quality metrics through intuitive scorecards, charts and graphs . Invaluable for data governance and business-led data quality implementations, TS Insight helps you measure the success of data quality initiatives.

Usage Notice

Permission to use this document is granted, provided that: (1) The copyright notice “©2008 by Harte-Hanks Trillium Software, appears in all copies, along with this permission notice. (2) Use of this document is only for informational and noncommercial or personal use and does not include copying or posting the document on any network computer or broadcasting the document through any medium. (3) The document is not modified from the original version. It is illegal to reproduce, distribute, or broadcast this document in any context without express written permission from Trillium Software®. Use for any other purpose is expressly prohibited by law, and may result in severe civil and criminal penalties. Violators will be prosecuted to the maximum extent possible.

TRILLIUM SOFTWARE®

Introduction What do large-scale data integration and migration projects share in common?

A substantial investment of money, time, and resources

Data intensive projects often involve the migration of data from legacy systems to new applications and platforms promising to deliver carefully calculated ROI. To meet expectations, such applications must be populated by accurate, high-quality data—data that is relevant, current, consistent, and accurate. If the data doesn’t meet these criteria, the technology will be undermined and initiatives will fail.

A high risk of failure

Careful planning alleviates the risk of failure. Adopting processes and technology that augment collaboration and communication among cross-functional business, subject matter expert, and IT teams helps ensure that only high-quality data is migrated into new technology platforms.

A need to manage, maintain and monitor data quality

To thwart poor data quality and ensure the long-term value of these solutions, the quality of data that populates these systems needs to be managed, maintained, and monitored regularly. To meet these challenges, companies need a data quality solution that interoperates with all types of data integration technologies, business processes and operational systems -- one that profiles, discovers, improves, monitors and governs data and scales seamlessly across applications, systems, and platforms to meet the intense demands of the global enterprise.

The need for a data quality solution

Incorporating an effective data quality solution into data integration or migration projects is essential to ensuring that correct, current and consistent data is accessible to every user. This paper describes step-by-step processes to implement data quality as part of any project including strategies to:

Gartner Research

“A strong focus on data quality will significantly increase the value of business intelligence (BI), master data management and other critical business initiatives.”

• Involve business users in the project to ensure their needs are met

• Attack a specific, limited-scope project, while considering both the big picture and the ongoing concern of data quality within your organization

• Incorporate technology into a project to expedite data quality initiatives

Here we share techniques used by successful companies to plan and successfully implement data quality processes as part of a data-intensive initiative. While technology greatly facilitates and automates data quality management, it should be applied in accordance with a measurable, objective methodology to ensure success and high project ROI. To learn more about the Trillium Software methodology, please inquire about our whitepaper: Methodology for Enterprise Data Quality and Data Governance. As you’ll see in the pages that follow process, people, and business

Copyright © 2010 Harte-Hanks Trillium Software Page 3 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 4 of 23 All rights reserved www.trilliumsoftware.com

expertise are key components to achieving data quality, leaving technology as a way to automate and improve processes.

The Six Phases of a Typical Project and Related Processes

Phase Business Processes Technical Processes Project Preparation

Build a team, define business objectives, and assess project risks that impact budget, timelines, and milestones.

Analyze current technology Assess data risks

Blueprint Creation

Craft the project plan and detailed designs to meet project requirements, while mitigating the risks discovered in the Project Preparation Phase.

Analyze source data Capture a baseline Design data architecture (schema, data model, platforms) Develop test case scenarios Define exception processes

Implementation

Execute the action plan for establishing new processes and using new technologies.

Create data quality processes: • Investigate • Cleanse and

standardize • Match and link • Enrich

Integrate data quality processes with applications and services

Rollout Preparation

Get the organization ready for the new improvements

Define production system cutover plan Perform initial cleanse/load

Go Live Transition

New processes and technologies go into production. Teams are trained to resolve any issues.

Perform ongoing data quality processing Define monitoring processing

Maintenance Tune processes Monitor data quality over time Gather new requirements and business rule changes for next phase

Tune technology Manage change request and exceptions

Note that each project phase requires both business and technical resources to work together to complete the project effectively and efficiently.

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 5 of 23 All rights reserved www.trilliumsoftware.com

Phase One: Project Preparation

Goals: Evaluate resource and time allocation requirements to execute the project. Identify issues, roadblocks and risks. Designate your team; define project scope, expectations, and deliverables; and analyze the current state of your data.

Define project team and roles

Ultimately, data and its level of quality must be supported by many people in the company, not just IT. Involving subject matter experts from affected business areas is an absolutely essential step necessary for success because successful interpretation and treatment of data is derived from both proper ‘Syntax’ and ‘Context’.

Syntax. IT is generally very capable of conforming data to proper syntax with relative ease. Example: all telephone numbers in a database should appear in the same format.

Context. Business users are generally the best source of information regarding context and meaning behind the data. Example: the significance of repeated and undocumented comments or codes embedded in a name and address field.

Each team member has a clearly defined role for making the initiative a success and must be accountable for his/her part. The table below depicts how roles and responsibilities are typically defined regarding the data quality aspects of a project.

Role Project Responsibility Executive Leadership (CIO, CFO, VP)

Endorse the data quality initiative and foster support. Secure funding. Resolve issues and remove roadblocks.

Line-Of-Business Manager

Champion the cause. Interface between IT and business. Partner with executive leadership to understand business objectives, remove barriers, decrease political opposition, and influence political change and cooperation between lines of business. Articulate the data quality problem in terms of business value and affect change to business users.

Data Steward Understand technology available to meet business objectives. Define what is possible and what is not. Develop a deep understanding of available data assets, usage, and issues. Drive specific requirements, provide feedback, participate in UAT activities.

Information Professional

Implement business rules for cleansing, standardizing, and de-duplicating the data, support data stewards, run day-to-day operations.

Identify the right technology

The right technology will help team members engage and communicate. A data analysis environment with a central library and reusable business rules provides a flexible architecture for multi-role, multi-member projects where resources need to collaborate, communicate source data and target environment designs.

TRILLIUM SOFTWARE®

Further, if the technology is built around a solid data quality methodology, its architecture will expedite the discovery, design, development and deployment of the data quality solution across projects, departments and the enterprise. Technology architecture provides an infrastructure that allows a common understanding about the issues around the quality of data, recommendations for use of data, and what transformations may need to take place as data is migrated.

Identify short and long-term business objectives

The Trillium Software System® enables teams to discover, design, develop and manage data quality from discovery through implementation.

During project planning, business objectives are defined. As part of this task, both short- and long-term data quality goals should be identified. Short term objectives usually relate directly to the project and activities regarding data movement and manipulation. Longer term goals consider how the immediate project work can be leveraged by the organization and extended for further value.

To secure long-term value and ensure that the data quality knowledge capital that accrues over time can be repeatedly applied to new projects, many companies implement operational data quality. Selecting a data quality solution that enables operational data quality, the reuse of expertise, business logic, and business rules from one project to another, enables savvy organizations to cultivate their data quality initiatives from single siloed projects to true enterprise solutions.

Trillium Software delivers operational data quality that leverages data quality knowledge capital and business rules across multiple implementations.

In the short term, begin improving data by starting small and keeping the scope well-defined. In the long term, keep in mind that if all goes well, you will have success, and you will be asked to replicate this success across the company.

Scope

Scoping draws clear parameters around the use of data and the process of capturing, moving, cleansing, standardizing, linking, and enriching it. Assess each requirement carefully to determine whether or not data targeted for use in this project can or will meet business requirements. Basic questions to answer include:

1. Does the required data exist within the organization?

2. What source or sources contain this data?

3. What is the level of data quality within each source,?

4. What cleansing, standardization, or de-duplication is necessary to meet requirements?

5. What problems or anomalies must be addressed as part of this project?

In a data migration, for example, you might be looking for certain key elements to appear in the target data model. You may first need to confirm that the anticipated target data physically exists within source systems and may then need to determine the best or most trusted data source. If taking data from multiple sources, you may have to establish a set of standards to which all source systems conform in order to produce a consistent representation of that data in the new target system.

Profiling all source data at this step discovers data structures, anomalies, and previously unknown issues to help you assess the quality of your data before target milestones are

Copyright © 2010 Harte-Hanks Trillium Software Page 6 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 7 of 23 All rights reserved www.trilliumsoftware.com

confirmed. The process of discovery and profiling will help you limit risk by enabling you to understand the state of data and whether it will indeed support project requirements.

Understanding the scope of the project early is key to successful and timely delivery. Categorize the “need-to-have data” and the “nice-to-have data,” and be prepared to drop off the nice-to-haves if time becomes short or if the effort of moving, cleansing, standardizing, etc. outweighs the anticipated business benefit.

There are ways to limit scope. For example, if you’re integrating multiple data sources, will it be one large movement of data or several smaller movements? Will the data need to be the entire database, or is 6 months enough? Working through these issues with the business team and IT will keep the project on-time and on target, and will help manage expectations during the project lifecycle so there are no surprises as the project nears a close.

Analyze current technology

Early in the process, it is a good idea to take inventory of the current data quality technology in place. Interview your key technologists and determine what is and is not meeting user expectations.

If a process or technology exists that meets user expectations, think about whether it can be applied to the new solution? If so, can it also be leveraged across other solutions in accordance with long-term business objectives? If not, is it a good source of standards or logic, which can be designed into a new data quality solution that offers more options for future growth?

Technology that resolves data quality problems are often point solutions that cannot scale to meet the needs of the entire enterprise. Success depends on selecting a solution that can meet immediate needs and extend to serve future enterprise needs.

Assess data risk

Focus on the most fundamental question: “Does your source data actually support the business objectives?” During a data risk assessment, it is crucial to ensure that the available data satisfactorily meets business requirements. Much of the legwork for this analysis is performed by IT through mapping-data-to-requirements exercises and performing extensive data investigation of source systems. Should questions arise, key business stakeholders should immediately be involved to ensure the project is ultimately successful in delivering what the business expects.

If data does not meet expectations, what are the root causes of these gaps and how must they be addressed before proceeding with your project? Does project scope need to be revised or do isolated requirements need to be classified as high risk?

Apply a data discovery process on the source data to determine if the data is viable. If the data cannot support key business requirements, the project is at a high risk of failure despite investments of time and money. Thus, before committing to development, first assess data to ensure that the project can ultimately meet user expectations.

TRILLIUM SOFTWARE®

Data discovery is the process of uncovering the unknowns about your data that you may never have thought to ask using standard profiling technology. Discovery results need to be reviewed by IT and business team members, familiar with the meaning of the data content and how the data is to be used. This team will address issues that arise early in the project lifecycle, such as data that is absent, corrupted or misfielded, and create workable solutions. For example, you may not want to incorporate specific data elements into your CRM system if there is a high degree of null values, since the data will not meet your business needs.

Automate data discovery to profile data parameters that you hadn’t even considered. Use TS Discovery to understand: Structure Patterns Data integrity Business rules Relationships Applying technology that can automate both discovery and profiling before you begin

any data integration process from legacy systems can literally predispose success or failure. Upfront profiling mitigates data challenges that increase risk and undermine downstream Key Performance Indicators (KPIs) and decision-making.

Referential integrity

Copyright © 2010 Harte-Hanks Trillium Software Page 8 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Phase II: Blueprint Creation Goals: Assess data quality issues in detail and begin to build a plan for improving data as part of your overall project. Target key IT and business players to define corporate data standards. Document a baseline measurement of the current state of data quality. The baseline will serve two purposes: It will help to enlist executive support by showing the business impact of poor data quality and it enables you to tangibly show the improvement in the data at named milestones after the new system or solution is in production.

Define success metrics

As most organizations are cost-conscious, they need to produce a business case or cost justification for any new initiative. Even when not required, it is recommended that you quantify the impact of data quality processing as a methodical way in which to measure the impact of your efforts and the value you are providing to the organization. Though frequently overlooked during project execution, these numbers are necessary to drive any future investments and promotion.

Data quality metrics and business impact

Define which data quality metrics you want to track, such as high level data-centric rules and rules that apply to a particular system or application:

Metric Business Impact Number records with changes to address data fields for compliance with USPS standards

Number duplicate records

Number processed records with incomplete mailing address but valid phone numbers or emails

Impedes marketing program success Negatively impacts budget Impacts billing effectiveness.

Number records with duplicate primary keys

Unique keys must be generated by IT, thus causing potential project delays.

Blank values for critical data fields: quantity per box or shipping dimensions

Does the customer get the right quantity of items ordered? Can the customer logistically handle the package they receive?

Adherence to standards, such as the metric or English systems of measurement

Do similar parts exist in the supply chain under different measurement systems?

Total value of bills with no invoices Total value of invoices with no bills

Does the billing system comply with regulations? Is revenue reported properly? Are orders fulfilled without a purchase order and invoice?

Copyright © 2010 Harte-Hanks Trillium Software Page 9 of 23 All rights reserved www.trilliumsoftware.com

TS Insight provides metrics and KPIs (key performance indicators) for data quality compliance to corporate standards across multiple systems and applications.

Technology can play a significant role in uncovering data conditions, such as those listed above, and establishing a recorded baseline of these conditions. Your technology of choice will help you organize and document results and manage conditions going forward. Automated data profiling analysis and exception reporting

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 10 of 23 All rights reserved www.trilliumsoftware.com

along with drill-down functionality gives you results and tools to involve non-technical users in further analysis. You can set conditions, such as those listed above, and understand immediately to what degree the metrics are met.

Formulate communication strategy

Key business users have defined metrics and identified how those metrics can be related to the business to quantitatively demonstrate value, but how will the organization hear about the upcoming results? A communication strategy should be put in place to gain consensus abut whether other members of the business community agree with the relationships that have been identified between data, its quality, and the impact of that data on the business. For an effective and useful ROI, it is essential to establish buy-in as part of an early communication plan task, and follow up with updated metrics at pre-determined milestones.

Define standards

Project team members representing the business play a key role in standards definition. The team members involved in this step represent the ultimate user audience. For example, if the end user audience will include sales and marketing and potentially shipping, someone from each of these departments should be involved in defining system standards. Also, a representative from each of the company's departments should act as a data steward to make sure data adheres to the defined standards in the new system, if not also in the source systems.

With every business, there are certain standards that can be applied to every piece of data. For example, a name and address should almost always conform to postal standards. E-mail addresses conform to a certain shape (user name, “@” sign, internet domain .) However, there may be data for which your team needs to define a new standard, i.e., a part number, an item description, supply chain data, and other non-address data. For these, you need to set the definition with the business team. As part of the process, explore the current data, decide what special data exists in required fields, and establish system standards that can then be automated and monitored for compliance.

Create executive buy-in

While the technical team members focus on technical designs for the new system or solution, business team members ensure that efforts have positive impact on the business. Most project managers find it helpful at this point to secure the endorsement of an executive. Using the data quality metrics and business impact generated in the previous step, you can keep executives in the loop about your initiative and increase visibility of enterprise data quality, garner support, and secure funding for future projects or additional resources. If there are any internal political challenges, executives can help resolve issues and remove roadblocks. If they are already well informed about your efforts, status, and potential positive impact, it will be much easier to invoke their support.

TRILLIUM SOFTWARE®

Access data

At this point in design, team members must take a deep dive into data extracts, representative of the actual data that will be used in the production system. The purpose here is to understand what mappings, transformations, processing, cleansing, etc. must be established to create and maintain data that meets the needs and standards of the new system or solution.

IT resources are generally responsible for defining data extracts and gaining appropriate access to source systems. This data can then be shared with other team members to support detailed design tasks.

Analyze source data

The same principles and benefits of a collaborative approach between IT and business team members (described earlier for risk mitigation analysis) are relevant for the in-depth source system analysis necessary for effective and efficient detailed design. Often, a collaborative effort is hindered by the fact that it becomes time consuming and inefficient to involve business users in the process of clarifying data questions, because they lack the technical skills necessary to self-sufficiently access data, investigate anomalies, and thus offer insights to drive design. Advanced data discovery tools eliminate this challenge if they offer an intuitive interface through which business users can do all the tasks named above, independent of IT, but for the purpose of collaboration with IT.

The same technology can be leveraged for risk mitigation and data metric definition to help with up-front data analysis. Additional functionality, available within a data discovery tool, presents users with statistics, results, and a window into the data, so that information can be easily digested and navigated by IT and business users alike. A special purpose data browser makes it possible for users to identify and review issues in the data, collaborate and reach consensus about what should be done.

Although both terms can refer to the same thing, consider that discovery reveals aspects of the data that you never thought to ask. Profiling analyzes the data issues you already know about. Implement a data quality solution that can accomplish both.

Discovery vs. profiling

Tip :

Use technology to facilitate source system analysis. Employ advanced profiling and data discovery functions for comprehensive column and attribute analysis. Identify potential problems within structured data fields such as dates, postal codes, product codes, customer codes, addresses or any attributes that should conform to a particular format and structure. Configure custom data quality rules, and flag any attributes that do not conform.

When analysis is complete, you’ll have a good idea of the challenges you face in integrating data and the information necessary to develop designs that address the challenges proactively. At this time, you can also revisit the project plan and confirm that appropriate time and resources have been allocated to deal with any data issues that have been uncovered.

Capture a baseline

In a previous step, business team members defined data quality metrics and business impact. Now you can take a baseline measurement. As part of the source system analysis, capture and store a baseline of each source system with information

Copyright © 2010 Harte-Hanks Trillium Software Page 11 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

about how multiple systems conform to expected metrics or business rules. It may make sense to look at each source system in isolation and across systems.

Data architecture & schema/data model

As the data model is being developed, a crucial step often overlooked is confirming that the source data supports the anticipated data model design. The best way to have confidence that this is the case is to reverse engineer the data and understand the relationships that naturally exist within the data. This should occur independently of metadata and system documentation, and should be a complete reflection of the data itself. Here again is an opportunity to leverage the technology that the team has been using and is comfortable with: a data discovery tool will already contain all the source information required for this analysis and should contain the functionality to display a data model or schema that represents the native state of the data itself. These schematics can then easily be compared and cross-referenced to intended data model plans by the data modeling team, saving them potentially weeks of manual effort and missed exceptions.

Data architecture & platforms

As you investigate the many data quality solutions on the market, there are several key considerations to keep in mind. Does the technology: Tip: Look for solutions

able to investigate, improve, and govern customer, product, financial, and supplier data. For global implementations, check for the capability to support double-byte data.

• support process execution on all platforms of the source and target systems?

• process multi-domain data?

• understand data context without manual intervention?

To take things a step further and offer more long-term value, as the designs are being set and technology investments made, revisit the long term business objectives outlined during the Blueprint phase. Evaluate vendor tools using your longer term vision to ensure that you have options for future connectivity requirements. Provide your organization with the flexibility to extend the data quality processes you design for your immediate project to other systems, in different environments, and on other platforms that exist within your technical enterprise infrastructure.

Develop test case scenarios

As you examine your data, you will uncover patterns and common occurrences in the data that require resolution. For example, names may appear in your CRM sources in any one of the following formats:

Smith, John John Smith Smith/John John and Jan Smith

It’s up to team members to decide how to standardize each of these name formats for optimal efficiency in the target systems. Should ‘John and Jan Smith’ be linked yet remain separate records in your master file or remain as a single entry?

Set up a test file or database of records that present these common data situations for QA purposes during this stage of the project. Quality assurance (QA) tasks will

Copyright © 2010 Harte-Hanks Trillium Software Page 12 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 13 of 23 All rights reserved www.trilliumsoftware.com

be completed prior to going live with new data. Test case scenario definitions effectively begin to build a list of data quality anomalies that you can leverage to build and test business rules and quality processes. Many business rules and test cases will come standard with the cleansing process of packaged data quality solutions; these rules are highly tunable to meet your organization’s specific needs. You can build others based on your needs.

Define exceptions process

In a data quality process, an exception occurs when a data element cannot be interpreted by the business rules and process engine your team has defined, i.e. an address does not contain enough information to be verified with the USPS standardization business rules.

When a data quality exception occurs, the data steward must resolve the exception and decide whether the anomaly is an unusual occurrence or whether new rules should become part of the data quality process. Your project should define a clear way to handle exceptions including automated distribution (of error records) where possible, areas of responsibility for correcting, and a method to report anomalies back to the source.

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 14 of 23 All rights reserved www.trilliumsoftware.com

Phase 3: Implement

Goals: Put the technology in place to manage the data quality lifecycle. Embrace automation whenever possible. See Appendix A for a check list of data quality best practice requirements. Although this is the most technical phase of your project plan, business users play an important role.

Create User Acceptance Test Plan

As the team creates User Acceptance Test (UAT) plans, incorporate considerations that investigate and display the data quality process results that are built into the new system or solution. The resulting UAT should include not only testing of new functionality and/or reports, but also data quality test case scenarios. By testing good and problematic input data, you encourage a wider business audience (UAT resources) to confirm that the data quality processes are producing desirable results.

Create data quality processes

During the implementation phase, the technical team develops the data quality processes defined and designed during the Blueprint phase. Typically, they include cleansing, standardization, enrichment, and matching/linking processes. For more details and best practices on data quality process creation, please refer to our white paper “Trillium Software Data Quality Methodology.”

QA initial results

The most important part of the data quality process should be that business users are happy with the results. As you begin to implement new data quality process designs, have business users run sample data through the data quality processes to ensure results meet their expectations; they can compare results before and after processing with the same data discovery tool that they used all along. Coarse-tune processes using sample data, then switch over to a complete data set for formal QA.

Once results have been verified, it’s time to load sample data into the target applications and begin testing it more thoroughly. By taking the extra step with the business during the QA cycle, you’re much more likely to be successful the first time you load data and will avoid loading and reloading data repeatedly.

Validate rules

In phase one and two, you’ve determined what you have and what you need. Rules are developed in an iterative analytic process requiring subject matter expertise about the intended meaning of the data. Business users and data analysts should work together on this process, applying the same technology and process described for analyzing source data, if additional questions come up. Give business users an opportunity to set up test data scenarios and allow them to review the results after the cleansing process.

This step presents an opportunity to review and incorporate your company-specific terminology, e.g., industry-specific terms, company-specific definitions and regional

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 15 of 23 All rights reserved www.trilliumsoftware.com

colloquialisms, that were not initially part of the standardization terminology. You can also determine any geography-specific standardization.

Tune business rules and standards

You may find that initial data quality process design does not meet expectations or act as expected. Business team members should be able to interact directly with a rules-based engine and tune the rules to produce results more in line with expectations. This requires an intuitive interface and tuning tools to be built into any products that are purchased to facilitate data quality processing. Involving business team members directly with the tuning process ensures that the rules exactly meet their needs and removes the risk of failed expectations late in the game.

Integrate data quality processes with applications and services

Once data quality processes are designed and tested, and business team members are validating and tuning rules, the processes are ready for integration into one or more applications and/or services. If architected with an eye to the future, data quality processes are reusable across multiple systems, platforms, and applications. Vendor products should likewise support reuse across the enterprise, to sustain growth over time and to provide flexibility in project deployment options.

Although your data quality solution may not need to grow over time, having the option of extending to other applications including those that result from mergers and acquisitions, and reusing business rules easily from one application to the next, is considered best practice.

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 16 of 23 All rights reserved www.trilliumsoftware.com

Phase 4: Rollout Preparation

Goals: Determine how and when the development environment is migrated to production. Before this begins, UAT must be complete and users must be trained on the changes they will encounter when using the new system or solution. Ensure that the Help Desk is prepared and able to answer technical questions.

Execute User Acceptance Testing Plan (UAT)

The User Acceptance Test plan should include a record of the business users’ sign-off of the documented scenarios and the data quality processes that influence automated changes. Different types of UAT strategies include:

• New System Test. The application is entirely new (not an enhancement or system upgrade) and everything needs to be tested.

• Regression Test. The amount of change in an existing application requires a full system retest.

• Limited Test. The amount of change in an existing application requires only change-specific testing.

Data discovery tools can streamline the UAT process by giving both business users and technical users a view into the data. Teams can collaborate and view the results of any data quality process, before and after the process is run.

Testing is invaluable to success. Make sure you have QA’d:

• All forms. Particularly when using a real-time interface into the data quality tool

• All reports. Ensures the results from the reports are as expected • Test scenarios. Test the results of the data quality process impact on

systems and applications that interface with your project

Throughout the UAT, make sure that business users have easy access to the data through the tools and technologies used throughout the project.

User training/Help Desk training

Users must be made aware of new applications going online and the Help Desk should know who to call to escalate any technical issues. Effective user training is a critical factor for a successful implementation. Here, the goal is simple: give your users the skills and confidence they need to use the new solution, to facilitate end-user adoption. Make them aware of:

• New required fields or formats as they enter data into the system • New screens or pop-ups requesting validation of automated cleansing

and matching of data • The positive impact and business benefits of new, cleaner data

TRILLIUM SOFTWARE®

• The involvement of both business users and IT users in the process of creating high quality data.

Production System Cutover Plan

As the system is rolled out to end users, operations and support teams should have all the tools, processes, and knowledge to support them. A plan for transitioning from the project team to the operations and support team is crucial.

Most product managers will create both a schedule and plan to load the new system with newly cleansed data. The migration to production generally occurs during an off-peak time. The decisions you need to make include training, if/how to phase the rollout, expertise needed when the cutover occurs, whether to run multiple systems (old and new) in tandem, and if so, for how long, whether to hire additional resources (e.g., consultants or contractors) to assist, and any additional security considerations.

Successfully complete initial cleanse/load

For many projects, the first step toward going live involves an initial load or cleanse process. Data is rarely migrated without encountering errors during the extraction, transforming, and loading of data. Errors generally fall into one of these categories:

• Incomplete errors consist of missing records or missing fields. Determine what data is not being loaded and what should happen to those records or fields without data. Tip: With a single click, the

Trillium Software System provides a complete profile of your data so you can assess the impact of business rules over time.

• Syntax relates to data formatting and how data is represented. Confirm that data is the right shape. Does the data fall within value range?

• Semantics conveys what data means. Is there hidden value in unstructured data? Do names appear in address fields, despite compliance with correct data shape? Do duplicate records vary only slightly?

If you have executed the tasks outlined so far, you have significantly reduced the likelihood of any of the above-mentioned issues from occurring. By taking the time upfront to investigate source system data thoroughly, incorporate necessary processing into your designs, and perform UAT that includes anticipated problematic data conditions, you have addressed proactively the issues that cause most project teams severe headaches late in the game.

Should something unexpected occur and require attention, you already have the resources and infrastructure in place to quickly react: your team of both IT and business users is already familiar with the project, the data, and any technology you have been using (i.e., your data discovery tool) and can swiftly look at the data and assess the problem for a quick resolution.

Copyright © 2010 Harte-Hanks Trillium Software Page 17 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 18 of 23 All rights reserved www.trilliumsoftware.com

Phase 5: Go Live

Goals: Pat your team on the back! Congratulations! Reap the benefits of new data quality processes that will immediately benefit your organization.

SWOT Team

At this stage, it’s a good idea to have in place a cross-functional SWOT (Strengths, Weaknesses, Opportunities, Threats) team including – business analysts or departmental resources familiar with business processes, performance engineers, data architects, field technicians, and contacts from any vendors, to be available on an emergency basis to provide rapid problem resolution.

Teams may adopt different processes to help them understand the problem presented and to design a response. Practitioners using problem-solving processes believe that it is important to analyze a problem thoroughly to understand it and design interventions that have a high probability of working. The intent is to intervene early after a problem is identified and to provide ways by which that problem may be alleviated and the corporation can achieve success.

Teams should meet to complete a post mortem, discussing how the project went and how to further improve on data quality during the next round.

Problem resolution – escalation hierarchy example

All support organizations have some form of processes and procedures in place to help resolve user and system-generated queries, issues, and problems in a consistent manner. In some organizations, these processes are very structured, in others, more informal.

To supplement efficient processes, it is important for the support team to have well-defined roles and responsibilities that reduce response time to customer needs. An example of an escalation hierarchy, illustrating roles and responsibilities, follows.

Tier 1: Help Desk - Help Desk technicians provide first-line support to the user community and perform any additional training and remote operations required to resolve issues. If the help desk is unable to resolve an issue, it is escalated to Tier 2.

Tier 2: Information Professionals - Typically more aware of the data aspects of the operation than the Help Desk, Information Professionals, with the aid of a data discovery tool and access to the end-user application, can troubleshoot the issue. Should they be unable to resolve it, they escalate it to Tier 3 Data Stewards.

Tier 3: Data Stewards - Depending on the nature of the problem, Information Professionals contact Data Stewards, who tend to have an enterprise view of a data subject area, as opposed to knowledge of data and processes within a given application. Although most issues are resolved by now, in rare cases, some are escalated to Tier 4.

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 19 of 23 All rights reserved www.trilliumsoftware.com

Tier 4: Project Managers - An issue usually reaches this level if an architectural change is required to resolve it. Project managers must analyze the situation and initiate appropriate action.

The hierarchy just described is merely one example of a support and escalation hierarchy. No matter what type of support hierarchy you have, it is crucial that each group within it understand its role and responsibilities. Moreover, the team must be able to quickly resolve or escalate any issues that arise.

Post Mortem

Re-run baseline processes and collect updated results for a quantified measurement of your impact. Gather up metrics, your support log and exceptions processing log, and other relevant documentation. Call a meeting to:

• Ensure that the project met the business objectives

• Ensure that the project met the outlined success criteria

• List lessons learned and use as input to improve future project delivery

• Conduct performance reviews for team members

Perform ongoing data quality processing

Now that your system is live, you not only have cleansed historical data loaded into the system or solution, but your ongoing data quality processes should be keeping new incoming data free of the problems identified and prepared for.

Define monitoring processes

Given that all systems and processes are assumed to be operating well, it is now time to ensure that appropriate monitoring processes are in place. Regularly scheduled data audits are a great way to ensure that data continues to meet expectations, and highlights any areas of quality that have slipped or new problem areas that have become evident.

To facilitate this process, many organizations leverage the technology used for risk assessment, baseline measurements, source system analysis and design, and user acceptance testing. For example, a data discovery tool in which you have trained business users to use to investigate data, collaborate with IT, and measure data metrics, as well as all the knowledge capital built into the environment is easily adapted to perform scheduled audits and ongoing monitoring. Email workflows can be defined to alert key players when problems arise or when problems exceed or fall below defined thresholds.

Monitoring ensures that you continue to meet or even exceed user expectations over time so that your data assets become a trusted source, actively used by the business.

TRILLIUM SOFTWARE®

Phase 6: Maintain

Goals: Take a day to reflect on the good work you’ve done, admit your shortcomings, and set a plan in place to improve. Do not be shy about telling the world what you have accomplished.

Announce successes

A key strategy to maintaining funding for your project is to internally publicize your successes. A data quality initiative must be constantly re-sold at every opportunity to reinforce the value you are introducing to your organization.

To communicate your successes, you may choose to:

• Create a monthly data quality email update • Establish a company Web site or intranet presence • Ask the sponsor(s) to send out a memo about the project from time to time. Feed

the sponsor business benefit information, such as improved marketing mailings savings, marketing sell-through rates, inventory management, and supply chain savings, etc.

• Identify and work closely with a select group of users to help with the communication. Identify what they are contributing to the project, and publish that information.

• Recognize and communicate how customers/users benefit from improved data.

This is also a good time to remind the company that data quality is everyone’s concern and recommend ways employees can help solve data quality issues.

Monitor

One way to keep track of data quality is by conducting a full analysis with your data discovery tool. Compare current and previous baselines to understand how your data quality initiative is progressing. Many tools let you automatically keep track of data quality using an e-mail notification feature to inform key personnel when business rules are violated, such as when data values do not meet predefined requirements, threshold values are exceeded, or nulls are present where unacceptable. Such powerful features prevent errors from impacting your business, should your enterprise use data sources that are prone to change.

Business users, data stewards, data analysts and governance teams use TS Insight to report on the state of data quality and to trend improvements over time.

Data stewards, system owners, and key business contacts can receive critical change and error alerts and view violation(s) and drill down on error(s).

Organizations with action-oriented governance programs use such features to alert key stakeholders about data anomalies. Each day, data stewards can address the issues at hand and create prioritized tasks to resolve the issues identified.

Copyright © 2010 Harte-Hanks Trillium Software Page 20 of 23 All rights reserved www.trilliumsoftware.com

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 21 of 23 All rights reserved www.trilliumsoftware.com

Gather new requirements for next phase

Armed with your success, it’s time to begin gathering requirements for the next phase. Inspired by the new intelligence available to them, business users will ask for additional data. They may ask that you add new systems to your newly developed data quality process or add additional data sources. Word will get out about your successes, and your solution and/or data quality services will be in demand. Be ready to chair meetings to gather new requirements for version 1.1 of the project.

Manage change requests/exceptions

A change request conveys a major change to the project/requirements of your new system. Having passed UAT, users may see additional opportunities to improve business processes, and you’ll need a way to manage these requests. Most project managers feel that every project should have some formal change request process. A simple change request process might look something like this:

1. User submits a change request.

2. The assigned resource, perhaps the data steward, assesses the change request to see if it's worth investigating. Compare how difficult the change is to implement versus the resulting benefit (impact). Assign the obvious change requests with limited benefit as “nice-to-haves” for future reference. Assess risks associated with making each change.

3. The data steward and project manager document and communicate the assessment to key stakeholders.

4. If the change sounds reasonable and has obvious benefit, ask the corporate sponsor to accept the changes in schedule, cost, quality and risk. Remain objective, and let the sponsor decide if the change request has merit.

5. Communicate the schedule and status of the change request to key stakeholders.

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 22 of 23 All rights reserved www.trilliumsoftware.com

Data Quality Solution Requirements Checklist As you evaluate best-fit data quality solutions, use this checklist to assess high-level strengths and differentiators across vendors and help you as you gather pertinent detail to prepare for next steps.

Requirement Trillium Software

Vendor #2

Vendor #3

Complete data quality life cycle management – integrated platform lets users profile, assess, validate, improve, and monitor global data quality across and throughout a distributed enterprise

Single and simple user experience – enables team collaboration and lets business and IT users manage the application of business standards and processes to data

Single- and Multi-byte support √

Performance – High availability and throughput in batch or real time √

Automated discovery – Match households and businesses to accepted enterprise criteria and remove/consolidate duplicate records

Out-of-the-box country support and built-in intelligence – accurately routes mixed, international data from all countries and processes through country-specific quality resources in a single pass

Worldwide Verification – Checks customer records against worldwide postal directories and local business rules

Ability to interpret data in context – Key word identification and pattern matching enable structured and free form data to be interpreted, in context, and captures valuable data input into either text/comment fields or the wrong field

Data Enrichment – Appends third-party supplemental information, geocodes for example, to your data

Purpose-built architecture – Enables seamless data transition of profiling output into cleansing engine

Ability to deploy and manage data quality processes rapidly across the enterprise – shortens time-to-value by delivering powerful menu options, controls, and tools that enable organizations to develop, deploy, manage and optimize real-time data quality processes enterprise-wide

Enterprise Connectivity – Standardize and cleanse data in third-party applications using certified APIs and interfaces

TRILLIUM SOFTWARE®

Copyright © 2010 Harte-Hanks Trillium Software Page 23 of 23 All rights reserved www.trilliumsoftware.com

Real-time, easily supportable integration – enables deployment in an SOA architecture across platforms and applications including SAP ERP and CRM, Oracle/ Siebel, TIBCO, Initiate, Siperian, Teradata, and IBM

Flexible platform to support operational data quality – Supports the reuse of business rule and knowledge capital to scale data quality across the enterprise as required and govern data according to corporate standards

Reusable data quality services – cleanse customer, product, financial, and supplier data. Enables users to apply enterprise data quality services that enforce consistent data standards in a high volume, real time environment

Multi-platform, distributed processing – runs on a number of platforms including Microsoft Windows, UNIX (AIX, HP-UX, Sun Solaris), Linux (SUSE and RedHat), AS400 and Mainframes (z/OS and OS/390)

Professional Services – consultants possess deep and broad industry expertise and extensive data quality experience

Global reporting – Easy extraction of summary metadata, data metrics, and results into any third-party reporting package