DATA QUALITY MATURITY IS AN ELUSIVE . ... Data governance objectives such as understanding the value

download DATA QUALITY MATURITY IS AN ELUSIVE . ... Data governance objectives such as understanding the value

of 9

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of DATA QUALITY MATURITY IS AN ELUSIVE . ... Data governance objectives such as understanding the value



  • Elements of Data Consumption:

    In any data system, you will find data generators and/or producer’s responsible for creating a piece of information which is either a measure of a business process or outcome, and you will discover data consumers who are interested in understanding the business process across the horizon. This demonstrates the need to integrate the data generated across different applications participating in a business process. It’s important to understand both data producers and consumers operating with different objectives and goals – and the same is translated to the applications that are built to aid such an objective.

    In a perfectly orchestrated data process, data that is generated at the data producer application gets integrated to produce reports that are simply information about the business process or the state of the process. When there is a business motive that strives to explore data further by correlating data from different domains and identifying patterns using statistical models, the information becomes a business insight, later leading to knowledge over a period of time.

    The next level of data maturity comes when insight, intelligence, and perception intersect naturally, thus leading to wisdom. Wisdom is the ultimate driver for higher-level cognitive decisions that are more intuition-driven due to vast amounts of knowledge being turned into wisdom. Future applications that use vast amounts of data (e.g., IoT, big data), with the ability to process such large data in short durations can generate vast amounts of knowledge, leading to cognitive decision systems of sagacity, otherwise known as the ability to make good judgements.

  • However, the critical nature of success in data maturity models is dependent on the quality of the data. Unfortunately, data quality is often investigated just before data integration, with a short-term focus on fixing data quality parameters. An example approach is cleansing the data through data augmentation or through standardization with a data quality tool. Ignoring data quality issues that arise within the source of truth during the data generation process can lead to a bad data experience resulting in transparency questions along with a belief that the data is not trustworthy.

    For information to move from intelligence to a sagacity state, there are plenty of human factors that need to shape the usage, such as motive and perception based on experience and the context of the data that presents itself at the time of use. If you are to provide a complete view of data issues, it’s important to co-relate data quality metrics that are objectively collected through profiling and metrics, and are subjectively collected through a user survey to help you understand usage experience and content clarity.

    Data Quality Control Framework: Data governance objectives such as understanding the value of a data asset, providing optimal use of data to the enterprise, creating awareness to data users and promoting self-service data usage can only be accomplished by providing a complete view of the data in order to have a meaningful discussion with stakeholders.

    The image below provides a complete view of the data quality parameters that can be validated across both the producer and consumer level based upon three categories that will be explored in further detail – (1) data representation, (2) data value usage, and (3) data context.


    Data Integration BI Reporting AnalyticsC ognitive PerceptiveMotive

    InsightsI ntelligence Analysis


    Data Integration BI Reporting AnalyticsC ognitive PerceptiveMotive

    InsightsI ntelligence Analysis

    Data producer

    Data Context

    Data Consumer

    Enterprise LocalP ersonal

  • What is Data Representation?

    Data representation refers to the relationship within the values that are associated with an application’s attributes corresponding to its variables. As an example, consider an application that is used for processing loan applications:

    1. At the application level, the application variable defines the loan application domain. Due to different variations at the process level, you are able to generate or measure a process, and captured as process measures.

    2. The process output is represented within the application as attributes, which describe different properties of the intended business action. In the above example, different properties or characteristics of a loan application itself are captured as data attributes.

    3. Data attributes are bound or constrained by their data value properties, such as numeric values, float or money values, percentage values, etc. These boundaries restrict the data to ensure data quality is not compromised.

    4. All data values and attributes have relationships across other attributes; they’re not binary. These relationships are important because they help define the business rules within the application and create dependencies between data values.

    Changes in data attributes, including value and relationship, are the main reasons that the quality of data represented in the application can degrade. It’s important to understand the root cause of such degraded attributes to fix the quality at the source and prevent further damage upstream.

  • Data Representation with Data Quality Controls:

    Data controls at the application level can be deployed to profile and monitor data at the time it is generated to ensure quality. Controls can measure the data quality at the data representation level, collect control metrics like volumetric information, and generate reports outlining the metrics that are independent of the complexity and sensitivity of the data itself. This leads to transparent and sharable data quality reports on the data.

    Application-level data quality controls can monitor and identify data quality issues at the design level, or when they’re created due to faulty business rules in the application where the data is originating. Such controls not only help improve the data quality at the source of data origination, but can also help improve application extensibility and usability across the enterprise.

    Below are some control parameters that can be measured at the application level:

    Control Name Description/Definition

    Null Value Representation

  • Data Value Level Data Quality Controls:

    Data movement between applications involves complex data transformations such as filter, aggregation, qualification, etc. As data moves through multiple stages of integration, the maturity of the data usage improves drastically, which results in maximizing the value of data as an enterprise asset. A data integration point is an ideal location to measure data quality metrics that can help an enterprise to understand value creation parameters of the data. Data value level quality parameters are measurable data properties and can be expressed as quantitative data metrics.

    This space is dominated by many data quality product vendors and tools. They all provide similar profiling capabilities within the tool or profiler interfaces, but such capabilities are intended to be leveraged on a static data set, to help identify data patterns and data anomalies to aid users in building data standardization rules or data cleansing routines.

    Data profilers are not usually used to monitor data in real-time and they lack the capabilities to compare quality metrics to produce data quality trends. As an example, an organization can profile a data extract for a given day and receive the profiling results, but they cannot build profiles on a particular data field that happens to occur across multiple extracts and correlate the results or trend the same for subjective interpretations.

    Infogix data controls are deployable in real-time or in batch, and can capture data quality metrics across different gradients. They can then report metrics and identify trends that can be co-related to other applications involved in the data flow process. Controls can be run in real time before or after a data integration routine.

    The following are a set of Infogix data value controls that can measure different data quality parameters at the data value level. Such controls are repeatable, reusable, quick to deploy, and capture metrics in real-time.

    Control Name Description/Definition

  • Data value-level controls can be expanded by creating additional combinations of basic controls to build upon complex data quality parameters. Alternatively, custom controls that are hand-coded with control rules can be created to measure and monitor specific contextual data value measures.

    Data Value in the Age of Big Data

    Data service channels are continually expanding their traditional data reporting. This process of organizing data into a summary of information that allows users to understand the current state of the business, while deriving business value through information insight, is becoming a more popular approach in the big data analytic world. Business teams are frequently pushed toward analysis and self-service options that allow business teams to explore the data and extract actionable insight.

    The evolution of data provisioning channels means an increased demand in metadata and business glossaries that can aid businesses in their journey of transitioning data to insight. Yet as this evolution takes place, there lie challenges around data quality when try