SQL 2012 DQS
-
Upload
lynn-langit -
Category
Technology
-
view
1.949 -
download
0
description
Transcript of SQL 2012 DQS
![Page 1: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/1.jpg)
Data Quality Services
@LynnLangit
![Page 2: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/2.jpg)
Breakthrough Insights = Better BIP
ower
Vie
w
New
sem
antic
serv
er m
odel
for S
SA
S
Dat
a Q
ualit
y S
ervi
ces
Mas
ter D
ata
Ser
vice
s
Col
umn-
stor
e
Inde
x (1
0x-
100x
fast
er)
Sem
antic
S
earc
h &
Fi
leta
ble
![Page 3: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/3.jpg)
What is Data Quality Services?
A set of tools and services that allow domain experts to improve Data
Quality• Produces result set with suggested improvements• Does NOT change source data
![Page 4: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/4.jpg)
Why Use DQS?
• Manually define, match , cleanseSME input
• Programmatically “”, then manually approve
• Can ‘learn’
Machine Cleansing
• Can incorporate 3rd party data• Can integrate with other data processes
(SSIS)Integration
![Page 5: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/5.jpg)
When to use DQS (scenarios)Issue Detail
Completeness Is all information present?
Conformity Is all data in the correct format?
Consistency Do values represent the same meaning?
Accuracy Do data objects represent their real-world values?
Validity Do data values fall within acceptable ranges?
Duplication Are there multiple copies of the same data?
![Page 6: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/6.jpg)
DQS Architecture
![Page 7: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/7.jpg)
Installing DQS
SQL Server 2012
BI Edition
Enterprise edition
Not installed by default
Client / Server / SSIS task
Grant 1 of 3 DQS roles on the DQS_Main db
Make your data accessible for SQL operations
Enable TCP/IP for remote DQS
Post Install
Must run ‘DQS Server Installer’ post SQL Install
Do MDS integration
DQS CU1
![Page 8: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/8.jpg)
DQS Components on SQL Server 2012
![Page 9: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/9.jpg)
Data Quality Services client interface
![Page 10: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/10.jpg)
How to Use DQS?
List of Basic Steps• Create/Refine/Use a Knowledge Base• Perform a Data Quality Evaluation• Generate output (results)
• List of Components• DQS Server• DQS Client(s)
![Page 11: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/11.jpg)
How to Use DQS? Start with the KB
Knowledge Bases
• Can use included KB• Can refine included KB• Can create KB from
source data• Can manually create
KB
![Page 12: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/12.jpg)
Parts of DQS KB – Domain Management
![Page 13: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/13.jpg)
Adding Domain Values
• Correct• Error• Invalid
![Page 14: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/14.jpg)
More on Domain Values
• Link as synonyms• Set as leading value
![Page 15: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/15.jpg)
Regular or Composite Domains
![Page 16: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/16.jpg)
More about KB Domain Management
• Domain Properties – Description, Language…• Reference Data – relate to 3rd party data• Domain Rules – RegEx/length, etc…rule-based• Domain Values – shows substitute values• Term-Based Relations – common word corrections
![Page 17: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/17.jpg)
Parts of DQS KB – Knowledge Discovery
![Page 18: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/18.jpg)
Parts of DQS – Knowledge Discovery – 1/2
Step two – Running Discovery
![Page 19: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/19.jpg)
Parts of DQS KB – Knowledge Discovery – 2/2
Step three – Correcting Values
![Page 20: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/20.jpg)
DQS KB – Creating a Matching Policy – 1/3
• Select data to be matched for each domain
![Page 21: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/21.jpg)
DQS KB – Creating a Matching Policy – 2/3
• Create matching rules per domains• Similar
• set similarity score, when matching score < 60• For numbers, set threshold (% or int)• For dates, set threshold (DD, MM or YY)
• Exact – identical values (score of 100)• Configure Weight, must sum to 100
• Can configure Prerequisites
![Page 22: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/22.jpg)
DQS KB – Creating a Matching Policy -3/3• Test matching rules per domains
• Click ‘Start’• Review ‘Matching Results’ tabs to compare one or more results
![Page 23: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/23.jpg)
Matching – See Results
Matching is usually performed AFTER cleansing and is focused on identifying (and removing) duplicates
![Page 24: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/24.jpg)
More Matching Output
![Page 25: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/25.jpg)
Using the DQS KB to do Data Cleaning
• Create or Open a Data Quality Project• Map the DQS KB to the new data• Perform Cleansing• Manage / View Results• Export corrected results
![Page 26: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/26.jpg)
DQS Project -- Cleansing
![Page 27: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/27.jpg)
DQS Cleaning in Process…
![Page 28: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/28.jpg)
DQS Cleaning complete
![Page 29: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/29.jpg)
DQS Cleaning – Manage Results
![Page 30: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/30.jpg)
DQS Output file Information
Export file column names (with option to include "Data and Cleansing Info“)
XXX_Source - original source column value XXX_Output - clean column value XXX_Reason - reason column value was either valid or invalid XXX_Confidence - column confidence percentage returned by the DQS
server algorithms XXX_Status - column processing status (i.e. Correct, New, Invalid, etc.)
![Page 31: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/31.jpg)
DQS Administration - General
![Page 32: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/32.jpg)
DQS Administration – Reference Data
![Page 33: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/33.jpg)
DQS Administration - Logging
![Page 34: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/34.jpg)
DQS Integration
List of Integration Points• API? – not at this time• SSIS task• MDS (Master Data Management)
![Page 35: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/35.jpg)
DQS Cleansing Task in SSIS
![Page 36: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/36.jpg)
DQS Cleansing Task in SSIS - mapping
For each input column define columns for • Source – contains input values• Output – contains correct or
corrected or invalid output values• Status – contains auto suggest,
correct, invalid or new
![Page 37: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/37.jpg)
Running Package Status
![Page 38: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/38.jpg)
DQS SSIS Task
![Page 39: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/39.jpg)
DQS SSIS Task Complex Example
![Page 40: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/40.jpg)
What is Master Data Management?
Defining MDS• Central repository for data• Rule-based• Can work with DQS
![Page 41: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/41.jpg)
Types of Data Quality Projects
• Exact matches (WHERE = WHERE <> WHERE IN)• LIKE (%string matching)
T-SQL scripts (boolean match)
• CONTAINSFull-text matching
(semantic word match)
• SEMANTICSIMIALARITIESTABLESemantic Search (semantic phrase match)
• List belowSSIS tasks - (transactional,
multi-valued matching)
• Knowledge Base - rules/matches• Data Quality project - clean / correct dataDQS (KB matching)
• Versioned Entities, Attributes and RulesMDS (One view of truth)
![Page 42: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/42.jpg)
New since RC0
• Use knowledge import from projects back to your knowledge base (KB) with Cleanse2KB
• Use the Office speller as part of the DQS client• Use Composite Domain rules
to correct values to detect rules violations
• Import values from Excel import values together with their synonyms
• Use unstructured composite domain values? KB parsing is a new feature that takes advantage of your knowledge for a
more accurate parsing
• Modify server log settings through the client UI
![Page 43: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/43.jpg)
Performance Information
![Page 44: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/44.jpg)
Resources
DQS Team Blog - here
DQS video – here
DQS on TechNet - here
More samples – here
DQS videos (playlist) - here
www.Develop.com
![Page 45: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/45.jpg)
Next Steps
• Install DQS
• Create a KB
• Try out Data Cleansing
![Page 46: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/46.jpg)
Related Session(s)
• SQL BI SQL 366 - Understanding Analysis Services in SQL Server 2012 SQL 422 – Integrating Spreadsheets with Enterprise Data SQL 245 - Why Data Warehousing Projects Fail
![Page 47: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/47.jpg)
www.TeachingKidsProgramming.org
Do a Recipe Teach a Kid (Ages 10 ++)Microsoft SmallBasic Free Courseware (recipes)
![Page 48: SQL 2012 DQS](https://reader033.fdocuments.net/reader033/viewer/2022051014/549cfc99b47959cf318b48bd/html5/thumbnails/48.jpg)
Keep up with Data
Follow me @LynnLangit
RSS my blog www.LynnLangit.com
Hire me• To help build your BI/Big Data
solution• To teach your team next gen BI
with SQL Server 2012