SoftAge DDUP
-
Upload
softage-information-technology-limited -
Category
Technology
-
view
69 -
download
0
Transcript of SoftAge DDUP
![Page 1: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/1.jpg)
1SoftAge Information Technology Ltd. : Confidential16 January 2015
SoftAge DDUPRecords Management Software which removes duplication of entries,
promotes consistency and data integrity.
![Page 2: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/2.jpg)
2SoftAge Information Technology Ltd. : Confidential16 January 2015
De-Duplication
![Page 3: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/3.jpg)
3SoftAge Information Technology Ltd. : Confidential16 January 2015
De-duplication is the process to identify multiple records of same customer from the whole subscriber base.
• There are various techniques to identify the multiple records.
• This process is to identify the duplicate records on the basis of name, father name and address.
De-Duplication
Report duplicate records, identified by the following (flexible) criteria:
• 80% Name Match
• 80% Father Name Match
• 70% Address Match
![Page 4: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/4.jpg)
4SoftAge Information Technology Ltd. : Confidential16 January 2015
Challenges
• Its very difficult to identify the duplicate records on the basis of name and address by the similar matching cases.
• Cater the spelling mistakes and similar spellings
• Percentage wise Partial match criteria
• Dealing with large volume databases ranging from 30-100 million records each.
![Page 5: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/5.jpg)
5SoftAge Information Technology Ltd. : Confidential16 January 2015
The Algorithm
Generate names “similar” to the given
name
Select address and father name from database where these generated
names match
Filter out records where 80% of father
name matches(by edit distance)
Filter out records where at least 70% of the tokens in the
address match
![Page 6: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/6.jpg)
6SoftAge Information Technology Ltd. : Confidential16 January 2015
De-duplication Web Service
• A simple and good way to incorporate the de-duplication process, to implement it as a web service.
• Whenever, deduplication of a record needed to be done, one could simply use the web service via a http request.
• The code was implemented as a web service and hosted on the IIS(Internet Information Services).
![Page 7: SoftAge DDUP](https://reader035.fdocuments.net/reader035/viewer/2022071818/55aedd661a28abd81f8b45be/html5/thumbnails/7.jpg)
7SoftAge Information Technology Ltd. : Confidential16 January 2015
Thank YouFor more details drop a CorpMail at [email protected]
Or call +919811428984