Data Linkage Strategies
description
Transcript of Data Linkage Strategies
![Page 2: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/2.jpg)
2
Faculty Disclosure Information
In the past 12 months, I have not had a significant financial interest or other relationship with the manufacturer(s) of the product(s) or provider(s) of the service(s) that will be discussed in my presentation.
This presentation will not include discussion of pharmaceuticals or devices that have not been approved by the FDA.
![Page 3: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/3.jpg)
3
Acknowledgements
• University of Maine– Quansheng Song– Cecilia Cobo-Lewis
• Maine Bureau of Health– Kim Church– Pat Day– Ellie Mulcahy– Toni Wall
![Page 4: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/4.jpg)
4
![Page 5: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/5.jpg)
5
Data Linkage
![Page 6: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/6.jpg)
6
Data Linkage
![Page 7: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/7.jpg)
7
Data Linkage
![Page 8: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/8.jpg)
8
Data Linkage - Probabilistic
![Page 9: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/9.jpg)
9
Data Linkage - Probabilistic
![Page 10: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/10.jpg)
10
Data Linkage - Probabilistic
![Page 11: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/11.jpg)
11
Data Linkage - Probabilistic
![Page 12: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/12.jpg)
12
Data Linkage - Inconsistency
![Page 13: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/13.jpg)
13
Data Linkage - Inconsistency
Inconsistency DetectedInconsistency DetectedCorrecting….Correcting….
Message
![Page 14: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/14.jpg)
14
Inconsistencies
• Record in EHDI links to two records in other database
• The other source indicates the records belong to different people
• How to address depends on processing of other database
EHDI_ID=394Brad A. Graham
ID=4484Brad A. Graham
ID=7354Brad Graham
![Page 15: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/15.jpg)
15
Inconsistencies
• Other source not de-duplicated ?• Other source de-duplicated, but insufficient evidence to conclude
ID=4484 and ID=7354 are the same person ?– BD may provide additional information so that these probabilities have changed
ID=4484Brad A. Graham
ID=7354Brad Graham
EHDI_ID=394Brad A. Graham
![Page 16: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/16.jpg)
16
Inconsistencies
EHDI_ID=394John A. Graham
ID=4048John A. Graham
ID=4048Jon A. Graham
EHDI_ID=948Jon A. Graham
ID=9324Jon Graham
EHDI_ID=948 Jon Graham
![Page 17: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/17.jpg)
17
• How this "cross-over" is resolved depends on whether one or neither file is given precedence
• Influenced by probabilistic de-duplication process performed after a linkage
EHDI_ID=394John A. GrahamEHDI_ID=394
John A. GrahamID=4048
John A. GrahamID=4048
John A. Graham
ID=4048Jon A. Graham
ID=4048Jon A. Graham
EHDI_ID=948Jon A. GrahamEHDI_ID=948Jon A. Graham
ID=9324Jon Graham
ID=9324Jon Graham
EHDI_ID=948Jon Graham
EHDI_ID=948Jon Graham
Inconsistencies
![Page 18: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/18.jpg)
18
Linkage Creep
• EHDI Database contributes an individual,Catherine A. Sampson
ID Source FirstName MiddleInitial LastName PMatch
113 EHDI Catherine A Sampson
![Page 19: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/19.jpg)
19
Linkage Creep
• Link the Electronic Birth Certificate– Name is Catherine A. Simpson– Are these the same person?– Perform probabilistic match
• Require .85 probability of a match to conclude two similar records are the same (Critical p = .85)
• Probability is .90, we conclude they’re the same person
ID Source FirstName MiddleInitial LastName PMatch
113 EHDI Catherine A Sampson113 EBC Catherine A Simpson 0.90
![Page 20: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/20.jpg)
20
Linkage Creep
• Link Birth Defects Registry Data– Name is Kathy A. Simpson– Are these the same person?– Perform probabilistic match (require .85)
• PMatch is .90, we conclude they’re the same person
ID Source FirstName MiddleInitial LastName PMatch
113 EHDI Catherine A Sampson113 EBC Catherine A Simpson113 BDR Kathy A Simpson 0.90
![Page 21: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/21.jpg)
21
Linkage Creep
• If we compare to Catherine A. Sampson– PMatch=.81
– Conclude they are NOT the same individual– Would not assign same ID
• Which is correct?
ID Source FirstName MiddleInitial LastName PMatch
113 EHDI Catherine A Sampson113 EBC Catherine A Simpson113 BDR Kathy A Simpson 0.81
![Page 22: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/22.jpg)
22
Linkage Creep• When is this a problem?
– Over time, two distinct individuals may project “tendrils” composed of combinations of identifiers that statistically overlap in probabilistic space
![Page 23: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/23.jpg)
23
Linkage Creep• When is this a problem?
– Linkage creep will result in the two distinct individuals being erroneously combined under a single ID
![Page 24: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/24.jpg)
24
Linkage Creep• When is this not problem?
– Over time, certain key identifiers for an individual are expected to change
– This phenomenon will increase as a historical database grows, and as additional sources are input into a centralized system
![Page 25: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/25.jpg)
25
Linkage Creep• Complexity of “creep” in longitudinal datasets
– Black records are related to all records– Yellow and Blue records are NOT related to White
record– Yellow record is also not related to Red record at
![Page 26: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/26.jpg)
26
Linkage Creep• Forbidding “creep” will result in a single
individual being divided into two IDs over time
• Further challenge—where to divide records into additional IDs?
![Page 27: Data Linkage Strategies](https://reader035.fdocuments.net/reader035/viewer/2022062519/56814ea5550346895dbc5131/html5/thumbnails/27.jpg)
27
Tools for Evaluating Linkage• Inconsistencies can occur in deterministic linkage, but
are more common in probabilistic linkages• Probabilities that create potential for problems provide
a valuable tool for evaluating linkages– Instead of a “are two records the same person ?” Yes/No– Estimates or indices of how likely it is that two records are
the same person
• Should be able to estimate the number of erroneous linkages
• Possible to conduct a detailed examination of quality by ignoring very strong and very weak pairings, and only focusing on pairings that are ambiguous