Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.
Data Quality
Class 4
Goals
Questions Review of SQL select Data Quality Rules
SQL
Structured Query Language Used to extract data from databases Used to insert data into a database
The Select Statement
select [all | distinct] <select_list> from [<table_name> | <view_name> ] [,[<table_name> | <view_name> ] . . .] [where <search_condition>] [group by <column_name> [, <column_name>]. . .] [having <search_conditions>] [order by {<column_name> | <select_list_number>} [asc | desc]
[,{<column_name> | <select_list_number>} [asc | desc]] . . .]
Data Quality Rules
Definitions Proscriptive Assertions Prescriptive Assertions Conditional Assertions Operational Assertions
Definitions
Nulls Domains Mappings
Proscriptive Assertions
Describe what is not allowed Used to figure out what is wrong with data Used for validation
Prescriptive Assertions
Describe what is supposed to happen with data Can be used for data population, extraction,
transformation Can also be used for validation
Conditional Assertions
Define an assertion that must be true if a condition is true
Operational Assertions
Define an action that must be taken if a condition is true
9 Classes of Rules
1. Null value rules2. Value rules3. Domain membership rules4. Domain Mappings5. Relation rules6. Table, Cross-table, and Cross-message assertions7. In-Process directives8. Operational Directives9. Other rules
Null Value Rules
Null value specification– Define GETDATE for unavailable as “fill in date”
Null values allowed– Attribute A allowed nulls {GETDATE, U, X}
Null values not allowed– Attribute B nulls not allowed
Value Rules
Value restriction ruleRestrict GRADE: value >= ‘A’ AND value <= ‘F’
AND value != ‘E’
Domain Rules
Domain Definition Domain Membership Domain Nonmembership Domain Assignment
Mapping Rules
Mapping definition Mapping membership Mapping nonmembership Mapping Assignment
Relation Rules
Completeness Exemption Consistency Derivation
Completeness
Defines when a record is complete (I.e., what fields must be present)IF (Orders.Total > 0.0), Complete With
{Orders.Billing_Street,
Orders.Billing_City,
Orders.Billing_State,
Orders.Billing_ZIP}
Exemption
Defines which fields may be missingIF (Orders.Item_Class != “CLOTHING”) Exempt
{Orders.Color,
Orders.Size
}
Consistency
Define a relationship between attributes based on field content– IF (Employees.title == “Staff Member”) Then
(Employees.Salary >= 20000 AND Employees.Salary < 30000)
Derivation
Prescriptive form of consistency rule Details how one attribute’s value is determined
based on other attributesIF (Orders.NumberOrdered > 0) Then {
Orders.Total = (Orders.NumberOrdered * Orders.Price) * 1.05
}
Table and Cross-Table Rules
Functional Dependence Primary Key Assertion Foreign Key Assertion (=referential integrity)
Functional Dependence
Functional Dependence between columns X and Y:– For any two records R1 and R2 in a table,
if field X of record R1 contains value x and field X of record R2 contains the same value x, then if field Y of record R1 contains the value y, then field Y of record R2 must contain the value y.
In other words, attribute Y is said to be determined by attribute X.
Primary Key Assertion
A set of attributes defined as a primary key must uniquely identify a record
Enforcement = testing for duplicates across defined key set
Foreign Key Assertion
When the values in field f in table T is chosen from the key values in field g in table S, field S.g is said to be a foreign key for field T.f
If f is a foreign key, the key must exist in table S, column g (=referential integrity)
In-process Directives
Definition directives (labeling information chain members)
Measurement directives Trigger directives
Operational Directives
Transformation Update
Other Rules
Approximate Searching rules Approximate Matching rules