1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

16
1 Data Linkage Data Linkage Project Project Florida’s Newborn Florida’s Newborn Screening Program Screening Program Gary Sammet Bureau of Vital Statistics

Transcript of 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Page 1: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

1

Data Linkage Data Linkage ProjectProject

Florida’s Newborn Florida’s Newborn Screening Screening ProgramProgram

Gary SammetBureau of Vital Statistics

Page 2: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

2

OutlineOutline

Data Linkage ApproachData Linkage Approach Start with Probabilistic LinkingStart with Probabilistic Linking Data Linkage Automated Process Data Linkage Automated Process

Flow Flow Data Processing Design: Linking Data Processing Design: Linking

Variables, Weights, Bonuses, Use Variables, Weights, Bonuses, Use of Jaro-Winkler of Jaro-Winkler

Data Processing Sample ResultsData Processing Sample Results

Page 3: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

3

Data Linkage ApproachData Linkage Approach

VS & LAB work closely togetherVS & LAB work closely together System can accommodate needsSystem can accommodate needs Reduce duplication of effortsReduce duplication of efforts ReconciliationReconciliation

All births have a screening recordAll births have a screening record All screening records have a birthAll screening records have a birth

Most cost effective with best Most cost effective with best resultsresults

Page 4: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

4

Start With Probabilistic Start With Probabilistic LinkingLinking

Identify linking variables - assign Identify linking variables - assign initial weight based on understanding initial weight based on understanding & experience w/data& experience w/data

Run initial linking - sort by weight & Run initial linking - sort by weight & display linkage flags to see data display linkage flags to see data patterns/anomaliespatterns/anomalies

Adjust weights as needed w/o Adjust weights as needed w/o changing codechanging code

Define deterministic rules to ensure Define deterministic rules to ensure consistent linking in automated consistent linking in automated processprocess

Page 5: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Data Data LinkageLinkage Automated Automated ProcessProcess

Drop Indexes

SourceStaging

Start

Truncate Staging Tables

Load Staging Table LABS

LABS.LifeCycle

Unduplicate LABS data based on Original

Specimen

Data Conversions for Linking

Create Indexes on Staging

Extract Birth data based on DOB and Index

VitalStat.Birthmaster

HPE.VitalStat.Birthlink

Link Data With Blocking Factor=DOB

Update flag fields

Update weighted score and flags

using Jaro-Winkler

Update weighted score adding

bonus for combinations

Unduplicate record pairs keeping

highest weighted score

VitalStat.Lablink2Undup

Stop

Link Unlinked LAB Data With Blocking Factor=DOB Year and

Month

SourceStaging

VitalStat.SourceStaging

KEY: Orange – Data Store

Blue - Process

Deterministic Linking

Page 6: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Linking Linking VariablesVariables & & WeightsWeights

Time of birth 0.

85

Facility Name and Zipcode0.

75

Facility Name w Jaro-Winkler Score .899+ and match ZipCode

0.65

Facility Name w Jaro-Winkler Score .800-.898 and match ZipCode

0.55

Facility Name w Jaro-Winkler score .899+ and match Facility City

0.65

Facility Name w Jaro-Winkler score .800-.898 and match Facility City

0.55

Facility Name w Jaro-Winkler score .899+ and Facility Address w JW score .85+

0.65

Facility Name w Jaro-Winkler score .800-.898 and Facility Address w JW score .85+

0.55

Facility Address w Jaro-Winkler score .85+ and match Facility City

0.65

Infant Last Name w JW score .899+ match to Last Name/Mother Last Name

0.65

Infant Last Name w JW score .800- .898 match to Last Name/Mother Last Name

0.30

Mother Current Last Name0.

25

Mother SSN0.

25

Mother Address0.

25

Mother Full Name – Bonus0.

25

Page 7: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Linking Linking VariablesVariables & & WeightsWeights

Sex of Infant0.2

5

Infant Full Name – Bonus0.2

5

Infant First Name w JW score .76+ and Infant Last Name w JW score 85+

0.20

Mother First Name0.2

0

Mother First Name w JW score .76+ and Mother Last Name w JW score .85+

0.20

Mother Address w JW score .85+0.2

0

Current Address to Mother Address w JW score .899+ and match City

0.20

Current Address to Mother Address w JW score .85+ and match Mother First Name

0.20

Infant Full Name w JW score of .85+ – Bonus0.1

5

Mother Full Name w JW score of .85+ – Bonus0.1

5

Father Last Name0.1

0

Plurality0.1

0

Birth Order0.1

0

Infant First Name0.0

5

Infant Last Name0.0

5

Page 8: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

8

Weight BonusesWeight Bonuses DOB, Time of Birth, Sex, Facility + DOB, Time of Birth, Sex, Facility +

ZipcodeZipcode(MFirst or MSSN) (MFirst or MSSN) BONUS = .50BONUS = .50

DOB, Time of Birth, Sex, Facility-JW + DOB, Time of Birth, Sex, Facility-JW + Zipcode (MFirst or MSSN) Zipcode (MFirst or MSSN) BONUS = .40BONUS = .40

DOB, Time of Birth, Sex, Facility + DOB, Time of Birth, Sex, Facility + ZipcodeZipcodeBONUS = .20BONUS = .20

DOB, Time of Birth, Sex, Facility-JW + DOB, Time of Birth, Sex, Facility-JW + Zipcode Zipcode BONUS = .15 BONUS = .15

Page 9: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Variables By % LinkedVariables By % LinkedDOB 99.79

SEX 97.65

BIRTH ORDER 97.39

PLURALITY 94.40

MOTHER FULL NAME – + JW 94.37

TIME OF BIRTH 93.22

MOTHER FIRST NAME 92.13

MOTHER LAST NAME 88.57

MOTHER FULL NAME 82.54

MOTHER SSN 82.06

Page 10: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

Variables By % LinkedVariables By % LinkedTOTAL FACILITY – JW 78.32

TOTAL MOTHER ADDRESS, CITY – JW 73.08

MOTHER ADDRESS, CITY – JW 56.58

LNAME 43.59

FACILITY 41.48

FACILITY – JW 36.84

FATHER LAST NAME 35.75

MOTHER ADDRESS, CITY 16.50

MOTHER FULL NAME - JW 11.83

INFANT FULL NAME 11.70

Page 11: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

11

Linking With Jaro-Linking With Jaro-WinklerWinkler

With Exact Facility + Zipcode Match With Exact Facility + Zipcode Match 41% - Facility & Zipcode must match41% - Facility & Zipcode must match

With Jaro-Winkler Facility + Zipcode With Jaro-Winkler Facility + Zipcode Match Additional Match Additional 36.84%36.84% Total Match = 77.84% vs. just 41%Total Match = 77.84% vs. just 41%

Examples:Examples:

LAB FACILITY NAMELAB FACILITY NAME

FLORIDA HOSP ORLANDO – FLORIDA HOSP ORLANDO – LABLAB

SHANDS AT THE UNIV OF SHANDS AT THE UNIV OF FLAFLA

BROWARD MED CTRBROWARD MED CTR

SHANDS AT JACKSONVILLESHANDS AT JACKSONVILLE

HOLLYWOOD BIRTH HOLLYWOOD BIRTH CENTER, INCCENTER, INC

VS FACILITY NAMEVS FACILITY NAME

FLORIDA HOSP ORLANDOFLORIDA HOSP ORLANDO

SHANDS AT UFSHANDS AT UF

BROWARD MEDICAL BROWARD MEDICAL CENTERCENTER

SHANDS JACKSONVILLESHANDS JACKSONVILLE

HOLLYWOOD BIRTH HOLLYWOOD BIRTH CENTERCENTER

Page 12: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

12

Linking Linking MotherMother Address Address & City& City

Only Only 16%16% match on match on exactexact mother mother address & city address & city

Additional Additional 56%56% match on mother match on mother address & city, using Jaro-Winkleraddress & city, using Jaro-Winkler

Total Match: 72% vs. just 16%Total Match: 72% vs. just 16%

Examples:Examples:

LAB Mother AddressLAB Mother Address VS Mother AddressVS Mother Address LAB CityLAB City VS CityVS City

2323 SAMSON ROAD2323 SAMSON ROAD 2323 SAMSON RD 2323 SAMSON RD ORLANDO ORLANDO ORLANDOORLANDO

5105 NE 75TH AVE 5105 NE 75TH AVE 5105 NE 75 AVENUE 5105 NE 75 AVENUE MIAMIMIAMI MIAMI MIAMI

1001 MAIN ST APT A1001 MAIN ST APT A 1001 MAIN ST APT A 1001 MAIN ST APT A KEY KEY WEST KEY WESTWEST KEY WEST

532 HORNET CT532 HORNET CT 532 HORNET COURT 532 HORNET COURT PENSACOLA PENSACOLA PENSACOLAPENSACOLA

101 MAGIC CIR 101 MAGIC CIR 101 MAGIC CIRCLE 101 MAGIC CIRCLE TAMPATAMPA TAMPA TAMPA

Page 13: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

13

Data Processing ResultsData Processing Results

LAB Data with DOB 12/1-31/2010 LAB Data with DOB 12/1-31/2010 Unduplicated On OrigSpecID: Unduplicated On OrigSpecID: 9,211 rows9,211 rows

VS Data with DOB 11/1 – 12/31/2010 VS Data with DOB 11/1 – 12/31/2010 Unduplicated on State File Number: Unduplicated on State File Number: 37,741 rows37,741 rows

99% Unduplicated & Linked Records 99% Unduplicated & Linked Records with weighted score > 2.5with weighted score > 2.5

Page 14: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

14

Overall Linkage ResultsOverall Linkage Results

98 – 99 % using back-end 98 – 99 % using back-end approachapproach

Still not good enoughStill not good enough Follow Rhode Island front-end Follow Rhode Island front-end

approachapproach

Page 15: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

15

Advantages of Front-end Advantages of Front-end LinkageLinkage

Provide real-time linkage at hospital with Provide real-time linkage at hospital with VS Birth Date & NBS demographic dataVS Birth Date & NBS demographic data

Reduces data entry by hospital staffReduces data entry by hospital staff Provide daily report of unlinked/missing Provide daily report of unlinked/missing

recordsrecords Provide LAB w/checklist of incoming blood Provide LAB w/checklist of incoming blood

specimensspecimens Reduce follow-up by state staff to hospitalsReduce follow-up by state staff to hospitals Allow end-users (hospitals, MDs) ability to Allow end-users (hospitals, MDs) ability to

view electronic patient reports/results in view electronic patient reports/results in real-timereal-time

Page 16: 1 Data Linkage Project Florida’s Newborn Screening Program Gary Sammet Bureau of Vital Statistics.

16

AcknowledgementsAcknowledgementsKen JonesKen JonesBureau Chief/Deputy State Bureau Chief/Deputy State

RegistrarRegistrarBureau of Vital StatisticsBureau of Vital Statistics

Sharon DoverSharon DoverOperations ManagerOperations ManagerBureau of Vital StatisticsBureau of Vital Statistics

Paula StewartPaula StewartDatabase AnalystDatabase AnalystHealth Statistics & AssessmentHealth Statistics & Assessment