Company Name Cleaning & Normalization
Transcript of Company Name Cleaning & Normalization
![Page 1: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/1.jpg)
openprise!
Company Name Cleaning & Normalization
openprise!Cook Book Series
![Page 2: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/2.jpg)
openprise!
Recipe Overview
This is a recipe for cleaning and normalizing company name data • Clean and reformat company names for readability • Create company-‐alias master list • Normalize company name data using master list You will need the following raw data: • Company name
2
![Page 3: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/3.jpg)
openprise!
• Add a rule by clicking on an exis@ng rule and +.
• Put new data into a new data aDribute so you can easily compare before vs. aFer and confirm the rule is doing what it is supposed to do.
• Some configura@ons are found by clicking on: • Can’t see the open reference data? Check the seKng in your Data Catalog: • The company-‐alias master list is generated using a machine algorithm. It is very
accurate but never perfect. It is highly recommended that you review and tweak the master list before using it to normalize company names.
• Experiment with the fuzzy matching algorithm parameters to get the best results.
3
![Page 4: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/4.jpg)
openprise!4
Pipeline 1, Rule 1: Clean Company Name
TIP: Options to remove or expand words like Inc, Corp, Ltd. We highly recommend removal. Easy to read and generates better master list.
Reference Data
![Page 5: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/5.jpg)
openprise!5
Cleaned Company Names With Inc, Corp, Ltd Removed
Cleaned Company Names With Inc, Corp, Ltd Expanded
![Page 6: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/6.jpg)
openprise!6
Pipeline 1, Rule 2: Build Master List
Make sure to use the cleaned company name, not the original company name.
Start with these default values. See tuning tips on the next page.
![Page 7: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/7.jpg)
openprise!7
Company-Alias Master List Generated
![Page 8: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/8.jpg)
openprise!
• The higher the fuzziness index, the more closely the names have to match to be grouped together. For example:
• “UBS Financial” and “ABC Financial” will match on high index ~ 0.8 • “UBS Financial” and “UBC Finland” will match on lower index ~ 0.3
• The leading index dictates what % of leading text must match for the names to be grouped together. For example:
• “Department of Motor Vehicles Arizona” and “Department of Motor Vehicles Alabama” will match on an index of 70%
• “DMV Arizona and DMV Alabama” will not match on an index of 70%
• Short names can create many false groupings. Increase minimum character index
to reduce matching on short names. For example: • CSC vs. USC, or NBC vs. NBA
8
![Page 9: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/9.jpg)
openprise!9
Pipeline 2, Rule 1: Normalize Co. Name
Normalize the cleaned company
names produced by Pipeline 1 Rule 1
Reference is the Master List produced by Pipeline 1 Rule 2
![Page 10: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/10.jpg)
openprise!10
Company Names Cleaned Then Normalized
![Page 11: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/11.jpg)
openprise!
Recipe Review
11
Recommenda@ons • For marke@ng systems, consider reducing the master list down to only customers
and target accounts. It greatly reduces maintenance efforts. Want to do more? Try the following on your own: • In addi@on to normalizing company name, add parent company data to the
master list and append pipeline and sales data with parent company informa@on. This enables aggregated repor@ng and account based marke@ng.
![Page 12: Company Name Cleaning & Normalization](https://reader034.fdocuments.net/reader034/viewer/2022042216/6259c61156053b23e84412fd/html5/thumbnails/12.jpg)
openprise! Data Automa@on For Business Users openprise!
12
[email protected] TwiDer: @openprisetech www.openprisetech.com
Analytics
Rules Sharing