Post on 11-Feb-2017
IDENTIFYING DUPLICATE PEOPLE IN LAW ENFORCEMENT RECORDS
mark43
scott crouchco-founder & ceo
ej bensingsoftware engineer
mark43
A LITTLE BIT ABOUT US
founded
mission
help our first responders fight violent crime
funded by
2012
mark43
THE PROBLEM IN LAW ENFORCEMENT SOFTWARE
mark43
HUGE FRAGMENTATION ISSUES
18,000+ U.S. police departments
most software currently on premise
data incompatibility issues
mark43
WHAT WE’RE BUILDING
cloud-based records management, and analysis platform
mark43
A TYPICAL ARREST
domestic violence - aggravated assault
1 suspect, 1 victim
1 gun recovered
avg. # of fields
344
mark43
THE DATA ISSUE
for each person that is arrested
fields collected
85-150
NUMEROUS DUPLICATES OF PEOPLE
no universal master record of people
mark43
IS THERE A FEDERAL SOLUTION?
national warrant/wanted database
NCIC
master fingerprint ID system
IAFIS
all administered by the FBI, but no master person records
mark43
WHAT CAN WE DO ABOUT THIS?
mark43
WORKING WITH THE DATA
Washington, D.C. Metropolitan Police Dept.
20,000,000 reports
5,000,000 people
we had to import
mark43
DIFFICULTIES
names are not unique identifiers
data is very siloed
cannot legally, automatically merge people
mark43
WHAT TO DO?
87.7% of Americans can be correctly identified by DOB, Gender, and Location1
K-Anonymity leads to a Quasi-Identifier2
L. Sweeney, “Simple Demographics Often Identify People Uniquely.” Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh, 2000.
L. Sweeney, “k-anonymity, a model for protecting privacy” International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 2002; 557-570.
mark43
OUR APPROACH
quasi-identifiers to create groups of “likely duplicate” records
generic enough to work on other datasets (property, vehicles, etc.)
string matching
mark43
RESULTS
sample data set
accuracy of our first try?
5,000 people
2,000 known duplicates
80% of duplicates correctly identified
mark43
MOVING FORWARD
incorporating
dealing with rollbacks and versioning
real-time recommendations
officer feedbackcorrect and incorrect matched data
handlingpropertyevidencevehicleslocations