SoftAge DDUP

7
1 SoftAge Information Technology Ltd. : Confidential 16 January 2015 SoftAge DDUP Records Management Software which removes duplication of entries, promotes consistency and data integrity.

Transcript of SoftAge DDUP

Page 1: SoftAge DDUP

1SoftAge Information Technology Ltd. : Confidential16 January 2015

SoftAge DDUPRecords Management Software which removes duplication of entries,

promotes consistency and data integrity.

Page 2: SoftAge DDUP

2SoftAge Information Technology Ltd. : Confidential16 January 2015

De-Duplication

Page 3: SoftAge DDUP

3SoftAge Information Technology Ltd. : Confidential16 January 2015

De-duplication is the process to identify multiple records of same customer from the whole subscriber base.

• There are various techniques to identify the multiple records.

• This process is to identify the duplicate records on the basis of name, father name and address.

De-Duplication

Report duplicate records, identified by the following (flexible) criteria:

• 80% Name Match

• 80% Father Name Match

• 70% Address Match

Page 4: SoftAge DDUP

4SoftAge Information Technology Ltd. : Confidential16 January 2015

Challenges

• Its very difficult to identify the duplicate records on the basis of name and address by the similar matching cases.

• Cater the spelling mistakes and similar spellings

• Percentage wise Partial match criteria

• Dealing with large volume databases ranging from 30-100 million records each.

Page 5: SoftAge DDUP

5SoftAge Information Technology Ltd. : Confidential16 January 2015

The Algorithm

Generate names “similar” to the given

name

Select address and father name from database where these generated

names match

Filter out records where 80% of father

name matches(by edit distance)

Filter out records where at least 70% of the tokens in the

address match

Page 6: SoftAge DDUP

6SoftAge Information Technology Ltd. : Confidential16 January 2015

De-duplication Web Service

• A simple and good way to incorporate the de-duplication process, to implement it as a web service.

• Whenever, deduplication of a record needed to be done, one could simply use the web service via a http request.

• The code was implemented as a web service and hosted on the IIS(Internet Information Services).

Page 7: SoftAge DDUP

7SoftAge Information Technology Ltd. : Confidential16 January 2015

Thank YouFor more details drop a CorpMail at [email protected]

Or call +919811428984