Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

27
Data Quality Class 4
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Page 1: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Data Quality

Class 4

Page 2: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Goals

Questions Review of SQL select Data Quality Rules

Page 3: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

SQL

Structured Query Language Used to extract data from databases Used to insert data into a database

Page 4: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

The Select Statement

select [all | distinct] <select_list> from [<table_name> | <view_name> ] [,[<table_name> | <view_name> ] . . .] [where <search_condition>] [group by <column_name> [, <column_name>]. . .] [having <search_conditions>] [order by {<column_name> | <select_list_number>} [asc | desc]

[,{<column_name> | <select_list_number>} [asc | desc]] . . .]

Page 5: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Data Quality Rules

Definitions Proscriptive Assertions Prescriptive Assertions Conditional Assertions Operational Assertions

Page 6: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Definitions

Nulls Domains Mappings

Page 7: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Proscriptive Assertions

Describe what is not allowed Used to figure out what is wrong with data Used for validation

Page 8: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Prescriptive Assertions

Describe what is supposed to happen with data Can be used for data population, extraction,

transformation Can also be used for validation

Page 9: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Conditional Assertions

Define an assertion that must be true if a condition is true

Page 10: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Operational Assertions

Define an action that must be taken if a condition is true

Page 11: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

9 Classes of Rules

1. Null value rules2. Value rules3. Domain membership rules4. Domain Mappings5. Relation rules6. Table, Cross-table, and Cross-message assertions7. In-Process directives8. Operational Directives9. Other rules

Page 12: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Null Value Rules

Null value specification– Define GETDATE for unavailable as “fill in date”

Null values allowed– Attribute A allowed nulls {GETDATE, U, X}

Null values not allowed– Attribute B nulls not allowed

Page 13: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Value Rules

Value restriction ruleRestrict GRADE: value >= ‘A’ AND value <= ‘F’

AND value != ‘E’

Page 14: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Domain Rules

Domain Definition Domain Membership Domain Nonmembership Domain Assignment

Page 15: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Mapping Rules

Mapping definition Mapping membership Mapping nonmembership Mapping Assignment

Page 16: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Relation Rules

Completeness Exemption Consistency Derivation

Page 17: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Completeness

Defines when a record is complete (I.e., what fields must be present)IF (Orders.Total > 0.0), Complete With

{Orders.Billing_Street,

Orders.Billing_City,

Orders.Billing_State,

Orders.Billing_ZIP}

Page 18: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Exemption

Defines which fields may be missingIF (Orders.Item_Class != “CLOTHING”) Exempt

{Orders.Color,

Orders.Size

}

Page 19: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Consistency

Define a relationship between attributes based on field content– IF (Employees.title == “Staff Member”) Then

(Employees.Salary >= 20000 AND Employees.Salary < 30000)

Page 20: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Derivation

Prescriptive form of consistency rule Details how one attribute’s value is determined

based on other attributesIF (Orders.NumberOrdered > 0) Then {

Orders.Total = (Orders.NumberOrdered * Orders.Price) * 1.05

}

Page 21: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Table and Cross-Table Rules

Functional Dependence Primary Key Assertion Foreign Key Assertion (=referential integrity)

Page 22: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Functional Dependence

Functional Dependence between columns X and Y:– For any two records R1 and R2 in a table,

if field X of record R1 contains value x and field X of record R2 contains the same value x, then if field Y of record R1 contains the value y, then field Y of record R2 must contain the value y.

In other words, attribute Y is said to be determined by attribute X.

Page 23: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Primary Key Assertion

A set of attributes defined as a primary key must uniquely identify a record

Enforcement = testing for duplicates across defined key set

Page 24: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Foreign Key Assertion

When the values in field f in table T is chosen from the key values in field g in table S, field S.g is said to be a foreign key for field T.f

If f is a foreign key, the key must exist in table S, column g (=referential integrity)

Page 25: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

In-process Directives

Definition directives (labeling information chain members)

Measurement directives Trigger directives

Page 26: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Operational Directives

Transformation Update

Page 27: Data Quality Class 4. Goals Questions Review of SQL select Data Quality Rules.

Other Rules

Approximate Searching rules Approximate Matching rules