A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A...

11
A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood * and E.S. Levine * * New York City Police Department, New York, NY 10038 Police investigators are routinely asked to search for and iden- tify groups of related crimes, known as patterns. Investigators have historically built patterns with a process that is manual, time-consuming, memory based, and liable to inefficiency. To improve this process, we developed a set of three supervised machine learning models, which we called Patternizr, to help identify related burglaries, robberies, and grand larcenies. Pat- ternizr was trained on ten years of manually identified patterns. Problematic administrative boundaries and sensitive suspect at- tributes were hidden from the models. In tests on historical ex- amples from New York City, the models perfectly rebuild ap- proximately one-third of test patterns and at least partially re- build approximately four-fifths of these test patterns. The mod- els have been deployed to every uniformed member of the New York City Police Department through a custom software ap- plication, allowing investigators to prioritize crimes for review when building a pattern. They are used by a team of civilian crime analysts to discover new crime patterns and aid in mak- ing arrests. law enforcement | pattern recognition | machine learning | decision support Correspondence: [email protected], [email protected] Introduction Predictive policing can be defined as “the application of an- alytical techniques—particularly quantitative techniques— to identify likely targets for police intervention and pre- vent crime or solve past crime by making statistical predic- tions” (1). These techniques have come into increasingly common usage as the amount of data available to law en- forcement has grown (2, 3). Law enforcement agencies have a tradition of using quantitative techniques to inform opera- tions. Pioneering techniques in queuing (4, 5), in resource allocation via hotspots (6), and in organizational manage- ment (7) have been developed and implemented in police de- partments worldwide. Continuing this tradition, a wide vari- ety of predictive policing applications are in use by law en- forcement agencies today, including algorithms that forecast geographies at high risk of imminent crime (8, 9), predict re- cidivism to aid in sentencing (10), or identify individuals at risk for involvement in future gun violence (11). Advanced analytics have also been used in police management, such as optimizing the recruitment process for police applicants (12), tracking the locations of officers on patrol (13), and identify- ing officers in need of counseling or training (14). This is an electronic reprint of the original article published by the IN- FORMS Journal on Applied Analytics, 2019. This reprint differs from the original in pagination and typographic detail. ©2019 City of New York. One topic that has received less attention is the process of identifying crime patterns. Researchers have outlined potential approaches for using advanced analytics to aid in the identification of crime patterns by connecting similar crimes (15, 16), as analogous recommendation engines have been the subject of much attention in other domains (17); however, we know of no previous production deployment of these algorithms in a law enforcement environment. To aid New York City Police Department (NYPD) investigators in identifying groups of related crimes, we have designed and implemented a recommendation algorithm, which we call Patternizr. In 2016, people in New York City reported approximately 13,000 burglaries, more than 15,000 robberies, and over 44,000 grand larcenies (18). Evidence from the legal sys- tem and criminological research indicates that many of these crimes were committed by serial offenders (19). A series of such crimes—committed by the same criminal—is known as a pattern. The crimes in a pattern may be nearby in space, similar in times and days of occurrence, and (or) alike in method (also known as the modus operandi, or M.O.). Once a pattern has been identified, it is possible for police to use evidence from related crimes to more easily identify and ap- prehend the perpetrator. Additionally, if police investigators have connected an arrest to multiple crimes as part of a pat- tern, they can close more investigations and prosecution can proceed with more accurate charges. However, identifying patterns is not an easy task. Inves- tigators at many police departments follow a manual and memory-based pattern identification process that has not changed much from the days of paper records (20). While reading through a crime report, known as a complaint, an in- vestigator attempts to recall past crimes with similar char- acteristics. If two complaints seem to the investigator to be similar and could plausibly be committed by the same individual, the investigator links the two complaints into a pattern. Because investigators focus most of their limited time on recent complaints in their own precincts, patterns that occur across significant distances (particularly across multiple precincts) or over long periods are especially dif- ficult to identify. Although modern technology has aug- mented this memory-based process with simple search en- gines (21), these searches are typically done using exact cat- egorical matches, rather than searching for broader similarity across many disparate crime characteristics. Collectively, the NYPD still spends hundreds of thousands of hours each year searching for patterns using these methods. Chohlas-Wood and Levine | March 29, 2019 | 1

Transcript of A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A...

Page 1: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

A Recommendation Engine to Aid inIdentifying Crime Patterns

Alex Chohlas-Wood and ES Levine

New York City Police Department New York NY 10038

Police investigators are routinely asked to search for and iden-tify groups of related crimes known as patterns Investigatorshave historically built patterns with a process that is manualtime-consuming memory based and liable to inefficiency Toimprove this process we developed a set of three supervisedmachine learning models which we called Patternizr to helpidentify related burglaries robberies and grand larcenies Pat-ternizr was trained on ten years of manually identified patternsProblematic administrative boundaries and sensitive suspect at-tributes were hidden from the models In tests on historical ex-amples from New York City the models perfectly rebuild ap-proximately one-third of test patterns and at least partially re-build approximately four-fifths of these test patterns The mod-els have been deployed to every uniformed member of the NewYork City Police Department through a custom software ap-plication allowing investigators to prioritize crimes for reviewwhen building a pattern They are used by a team of civiliancrime analysts to discover new crime patterns and aid in mak-ing arrests

law enforcement | pattern recognition | machine learning | decision supportCorrespondence alexcwstanfordedu evanlevinenypdorg

IntroductionPredictive policing can be defined as ldquothe application of an-alytical techniquesmdashparticularly quantitative techniquesmdashto identify likely targets for police intervention and pre-vent crime or solve past crime by making statistical predic-tionsrdquo (1) These techniques have come into increasinglycommon usage as the amount of data available to law en-forcement has grown (2 3) Law enforcement agencies havea tradition of using quantitative techniques to inform opera-tions Pioneering techniques in queuing (4 5) in resourceallocation via hotspots (6) and in organizational manage-ment (7) have been developed and implemented in police de-partments worldwide Continuing this tradition a wide vari-ety of predictive policing applications are in use by law en-forcement agencies today including algorithms that forecastgeographies at high risk of imminent crime (8 9) predict re-cidivism to aid in sentencing (10) or identify individuals atrisk for involvement in future gun violence (11) Advancedanalytics have also been used in police management such asoptimizing the recruitment process for police applicants (12)tracking the locations of officers on patrol (13) and identify-ing officers in need of counseling or training (14)

This is an electronic reprint of the original article published by the IN-FORMS Journal on Applied Analytics 2019 This reprint differs from theoriginal in pagination and typographic detail

copy2019 City of New York

One topic that has received less attention is the processof identifying crime patterns Researchers have outlinedpotential approaches for using advanced analytics to aid inthe identification of crime patterns by connecting similarcrimes (15 16) as analogous recommendation engines havebeen the subject of much attention in other domains (17)however we know of no previous production deployment ofthese algorithms in a law enforcement environment To aidNew York City Police Department (NYPD) investigators inidentifying groups of related crimes we have designed andimplemented a recommendation algorithm which we callPatternizr

In 2016 people in New York City reported approximately13000 burglaries more than 15000 robberies and over44000 grand larcenies (18) Evidence from the legal sys-tem and criminological research indicates that many of thesecrimes were committed by serial offenders (19) A series ofsuch crimesmdashcommitted by the same criminalmdashis known asa pattern The crimes in a pattern may be nearby in spacesimilar in times and days of occurrence and (or) alike inmethod (also known as the modus operandi or MO) Oncea pattern has been identified it is possible for police to useevidence from related crimes to more easily identify and ap-prehend the perpetrator Additionally if police investigatorshave connected an arrest to multiple crimes as part of a pat-tern they can close more investigations and prosecution canproceed with more accurate charges

However identifying patterns is not an easy task Inves-tigators at many police departments follow a manual andmemory-based pattern identification process that has notchanged much from the days of paper records (20) Whilereading through a crime report known as a complaint an in-vestigator attempts to recall past crimes with similar char-acteristics If two complaints seem to the investigator tobe similar and could plausibly be committed by the sameindividual the investigator links the two complaints into apattern Because investigators focus most of their limitedtime on recent complaints in their own precincts patternsthat occur across significant distances (particularly acrossmultiple precincts) or over long periods are especially dif-ficult to identify Although modern technology has aug-mented this memory-based process with simple search en-gines (21) these searches are typically done using exact cat-egorical matches rather than searching for broader similarityacross many disparate crime characteristics Collectively theNYPD still spends hundreds of thousands of hours each yearsearching for patterns using these methods

Chohlas-Wood and Levine | March 29 2019 | 1

Solution Design

Patternizr is a decision support tool and recommendation en-gine that consists of a set of machine learning models to makethis search process more efficient and effective The purposeof Patternizr is to take a ldquoseedrdquo complaint chosen by the an-alyst and compare this seed against the hundreds of thou-sands of recorded crimes in NYPDrsquos crime database Witheach comparison between a crime and the seed Patternizrgenerates a ldquosimilarity scorerdquo that quantifies the likelihoodthat the pair of crimes are in a pattern together Once all sim-ilarity scores have been generated Patternizr ranks the pairsin descending order by similarity score and returns a list forthe analyst to review The complaints most likely to be in apattern with the seed complaint appear at the top of the listAfter reviewing these ranked results the analyst can decidewhich complaints should be grouped together into a pattern

The models that comprise Patternizr are supervised ma-chine learning classifiers that is they are statistical modelsthat learn from historical examples where classifications areknown and are then used to predict the classification for sam-ples for which the classifications are unknown In the caseof Patternizr each example is a pair of crimes and the clas-sification is whether the two crimes are in a pattern togetherThe collection of examples used for training is known as atraining set Each model learns to perform classification byincorporating information from a set of attributes known asfeatures which are selected to represent relevant characteris-tics about each pair of crimes Much of the work of buildinga machine learning model is in identifying and representingfeatures to enable the model to make accurate classificationsWe call this step feature extraction For Patternizr as we dis-cuss in the Sample Selection section we chose samples forthe training set in such a way that we improved informationgain Finally we trained a specific type of machine learningmodelmdasha random forestmdashusing standard techniques

We used a separate model for each of three different crimetypes (burglaries robberies and grand larcenies) becausethese crime types have a sufficient corpus of prior manuallyidentified patterns for use as training examples This corpusconsists of approximately 10000 patterns between 2006 and2015 from each crime type In addition to manually iden-tified official patterns a portion of this corpus is built fromcomplaint records where the same individual was arrestedfor multiple crimes of the same type within a span of twodays When these arrest groupings are included with man-ually identified patterns each crime type has approximately30000 complaints which are included in a patternmdasha smallportion of the 200000ndash400000 complaints for each crimetype recorded over the same ten-year period In this sectionwe describe how we constructed the features used in eachof the three machine learning models then discuss how wetrained the models that were put into production Figure 1 de-picts the process by which these models were produced Al-though we constructed separate models for each of the threecrime types the models rely on a nearly identical set of fea-tures and are trained in the same way

Fig 1 Process overview for training Patternizr Historical pattern data are used tochoose a sample of complaints That sample and any corresponding arrests arerun through the similarity calculations to build features for learning The randomforest model takes those features and uses the historical patterns for training toproduce a production model

Feature Extraction

We construct the features on which each model relies fromthe information contained in complaints A complaint con-tains a mix of unstructured text describing details of the crimeand structured fields about the crime including the date andtime (which can be a range if the precise time of occurrenceis unknown) location crime subcategory MO and suspectinformation We use this information to calculate the fivetypes of crime-to-crime similarities used as features by Patt-ernizr location date-time categorical suspect and unstruc-tured text

We selected these similarities by discussing pattern dis-covery with NYPD investigatory and enforcement personnelwith many years of experience manually building patternsThey described the data sources they have used to identifypatterns as well as specific pieces of information that ledthem to believe that two complaints were connected Ad-ditionally they discussed many specific patterns they hadidentified and the unique set of circumstances that connectedeach group of crimes In the course of these discussions wealso learned much about the inefficiencies that are inherent inmanual pattern creation as we detail in the introduction andthat could be reduced with the help of an automated tool

Because Patternizr predicts the likelihood of a complaintpair belonging together in a pattern the set of features readby the model (Table 1) which we call K is derived frompairwise comparisons between crimes i and j Each of thesepairwise comparisons which we call Xk is a similarity cal-culation between a selected attribute k of the pairrsquos complaintreports The mathematical details of our more complicatedfeatures can be found in Appendix A

For date-time location and categorical attributes we usedsimilarity computation methods previously described in re-lated work on Cambridge Police Department data (15) How-ever we modified that formulation by excluding categori-cal information corresponding to administrative geographies(ie a feature noting that the pair exists in the same precinct)to encourage matches across such boundaries Date-time andlocation attributes are useful features because detectivesrsquo ex-periences have shown that complaints in patterns are likely tohappen nearby in space and time Categorical attributes areimportant at identifying a consistent MO We also includedmetrics to compare the descriptions of groups of suspects

2 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

No Attribute Type Similarity measure1 Location (XY) Location Distance apart (Euclidean)2 Location (XY) Location Distance apart (exponential)3 Location (XY) Location Binned midpoint (longitudinal)4 Location (XY) Location Binned midpoint (latitudinal)5 Location (XY) Location Location frequency (sum)6 Location (XY) Location Location frequency (maximum)7 Location (XY) Location Location frequency (minimum)8 Location (XY) Location Location frequency (product)9 Date-time of occurrence Date-time Time of day similarity10 Date-time of occurrence Date-time Time of week similarity11 Date-time of occurrence Date-time Size of occurrence windows (larger)12 Date-time of occurrence Date-time Size of occurrence windows (smaller)13 Date-time of occurrence Date-time Size of occurrence windows (difference)14 Date-time of occurrence Date-time Days apart15 Date-time of arrest Date-time Days between arrest and crime16 Premise type Categorical Categorical similarity17 Crime classification Categorical Categorical similarity18 MO categorical fields Categorical Categorical similarity19 MO weapon Categorical Categorical similarity20 Location details Categorical Categorical similarity21 Firearm discharged Categorical Categorical similarity22 Crime attempted or completed Categorical Categorical similarity23 Domestic crime indicator Categorical Categorical similarity24 Medical assistance required Categorical Categorical similarity25 Suspect height(s) Suspect Group similarity (continuous)26 Suspect weight(s) Suspect Group similarity (continuous)27 Suspect force used Suspect Group similarity (categorical)28 Suspect count Suspect Difference29 MO victim count Suspect Difference30 Complaint narrative (burg only) Unstructured text Average similarity31 Complaint narrative (burg only) Unstructured text Sum of similarity32 Complaint narrative (rob and GL) Unstructured text Cosine similarity33 MO suspect statement (rob and GL) Unstructured text Cosine similarity34 MO victim actions (rob and GL) Unstructured text Cosine similarity35 MO method of flight (rob and GL) Unstructured text Cosine similarity36 Premise name (rob and GL) Unstructured text Cosine similarity37 Property taken (rob and GL) Unstructured text Cosine similarity38 All unstructured text Unstructured text Cosine similarity39 Complaint narrative Unstructured text Rare-word matches

Table 1 List of the features included in Patternizr These features are indexed by the variable k in the text We use the following abbreviations ldquoburgrdquo represents burglaryldquorobrdquo represents robbery and ldquoGLrdquo represents grand larceny

(eg the count of suspects who have committed a crime)however we excluded sensitive suspect attributes (eg race)Suspect descriptions are particularly useful in identifying pat-terns of violent crimes such as robberies because in suchcrimes the victim physically encounters the suspect Finallywe added features for the similarity of unstructured text nar-ratives including the cosine similarity of vector representa-tions of the descriptions as well as the number of rare-wordmatches between two descriptions These features were in-

cluded as a reflection of the central role that unstructured textplays in the manual identification of patterns

Many features represent similar but distinct representa-tions of the underlying data This allows the algorithm toleverage as much available information as possible We con-structed three slightly different feature sets for each crimetype based on the characteristics relevant available andfound to be important for that crime type For example theMO categorical field for ldquomethod of entryrdquo is relevant toburglaries but not for robberies or grand larcenies

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 3

Location Features We incorporate eight similarities (la-beled with indices k = 1 through k = 8) derived from loca-tion information including a strict Euclidean distance d (infeet) between the crime pair and an exponential factor edWe also include two location features corresponding to lon-gitude and latitude but binned to very coarse squares (overnine miles on a side for a total of roughly 11 boxes possi-ble across the geography of New York City) to minimize thepossibility that higher-resolution geographies could be usedby the model as a proxy variable for sensitive attributes (suchas race) For each pair of crimes we note which bin containsthe pairrsquos midpoint and include this as a feature We also usea two-dimensional location frequency distribution calculatedwith kernel density estimation (KDE) (22) which representsa heatmap of where crime is most (and least) common in thecity For each pair we add a feature for the sum maximumminimum and product of these KDE values providing themodel with a range of information on how common the loca-tions were for both crimes

Date-Time Features Seven separate similarities (k = 9through k = 15) depend on the date-time attributes of the twocomplaints The time-of-day similarity measure which com-pares the time of day of the two crimesrsquo occurrences is drawndirectly from Wang et al (15) Following their approach wecalculate the frequency distribution of occurrence times foreach crime type using KDE Then for each pair we weightthe difference between occurrence times by the frequency ofthose occurrence times Crimes that occur close in time toeach other and at rare times will have a low measure in-dicating greater similarity between crime times Crimes thatoccur far apart in time and during common times will havea high measure indicating lower similarity between crimetimes Alternatively crimes that are a certain distance apartin time but occur during rare times will be more similar thancrimes that are the same distance apart in time but occurduring common times This calculation can also be appliedto occurrence times specified over an occurrence windowmdashtypical for crimes such as burglaries where the victim oftenis not present to witness the crime and therefore knows onlythat the crime happened within some time span Comparisonsacross occurrence windows are made by dividing the windowevenly into ten points and averaging similarities across all tenpoints We applied the same process to time-of-week simi-larities In addition we included features for the duration ofthe larger and smaller occurrence windows from the pair ofcrimes as well as the difference in size of the two occurrencewindows We included a simple featuremdashdays apartmdashthatwe calculated between the midpoints of the time ranges ifthe occurrence time was over a range Finally we includeda feature for the time between an arrest associated with theearlier crime and the occurrence of the later crime so that themodel could have information about whether a criminal waspotentially incarcerated

Categorical Features Much of the information in a com-plaint is in categorical form including premise type crimeclassification the MO itself weapon type and details about

the crimersquos location Other information is stored in categori-cal fields that have yesnounknown structures examples in-clude whether a firearm was discharged during the crimewhether the crime was attempted or completed whethersomeone required medical assistance and whether the crimewas domestic in nature For all our categorical features(k = 16 through k = 24) we incorporate a simplified versionof Goodallrsquos similarity measure (23) identical to that used byWang et al (15) This measure accounts for the frequency ofcategorical values causing two records that match on a rarevalue to be rated as more similar than two records that matchon a common value For categorical values that may takeon multiple values for a single crime (eg multiple weapontypes may be associated with a single robbery) possible val-ues are encoded as dummy variables (22) before the categor-ical similarity calculation In addition many categories aretoo fine grained to be useful because Goodallrsquos simplifiedsimilarity measure is only nonzero when categories matchexactly on both complaints For example NYPD crime clas-sifications for burglaries include three separate categories forresidential burglary even though a perpetrator with the sameMO may plausibly commit crimes across all three typesTo address this issue we asked uniformed police experts togroup the existing classification into broader categories us-ing domain knowledge We included both the ungrouped andgrouped crime classification similarities as features for thealgorithm

Suspect Features We chose to include only nonsensitivesuspect attributes including height weight force used andcount of suspects as suspect features for Patternizr Suspect-attribute comparison differs from categorical comparison be-cause it is often a many-to-many comparison for a crime pairIt is generally not possible to know from two lists of suspectswhich descriptions correspond to the same individual there-fore comparisons must be made between all possible pairs ofindividuals (with each list providing one of the pair) For ex-ample for the height and weight continuous variables (k = 25and k = 26) we calculated the minimum difference betweeneach possible pair of suspect descriptions and then dividedthe sum of differences by the quantity of suspects For thecategorical feature of force used (ie whether the suspectthreatened to use force or actually used force in the com-mission of a robbery) we counted and tabulated the numberof matches between possible suspect pairs (k = 27) in a sim-ilar fashion We also included the difference in suspect countand victim count (which is a characteristic of the MO) asfeatures (k = 28 and k = 29)

Unstructured Text Features We also created a number ofunstructured text features including comparisons of the com-plaint text narrative which plays a central role in the manualidentification of patterns Unstructured text also comprisesthe suspectrsquos statements victim actions method of flightpremise name and property taken For burglaries we cal-culate a word-by-word score identical to the categorical sim-ilarities but on a word-by-word basis rather than attributebasis for words that match between the narratives on both

4 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

complaints We then provide an average and sum of thesesimilarities (k = 30 and k = 31)

For robberies and grand larcenies we use a more advancedmethod on other unstructured fields in addition to the nar-ratives We calculate the term-frequencyinverse-documentfrequency (TFIDF) (24) vector representing the presenceand importance of the set of words that comprise each un-structured text We compute a cosine similarity between thepair of TFIDF vectors for each unstructured text attribute(labeled with indices k = 32 through k = 37)

We also combine all unstructured text from each complaintand treat it as a separate unstructured text attribute this per-mits information that appears fragmented across differentnarratives to be compared at once across all possible sourcesFor this combined narrative we calculate the cosine similar-ity (k = 38) and also count the number of rare words that didnot appear in the training corpus but do appear in both com-plaints and we include that count as a feature (k = 39) Theseunusual words may indicate the presence of a consistent andunusual MO

Sample SelectionA very small proportion of all possible crime pairs (about8times10minus8 3times10minus8 and 9times10minus9 of possible burglary rob-bery and grand larceny pairs respectively) are together inmanually identified patterns We refer to these crime pairs aspositive pairs To preserve maximum information gain fromthe corpus of patterns while also permitting model trainingwithin memory constraints we included all positive pairs inthe training set and performed down-sampling on the set ofpossible negative pairs

Purely random sampling of negative pairs would largelyconsist of pairs that are far apart in space and time encourag-ing the model to identify positives based solely on proximityin those dimensions To counteract this effect we samplednearby negative pairs at a higher rate than distant negativepairs We paired every pattern crime with a random sampleof 2 of all negative examples within 3500 feet and an ad-ditional random sample of 01 of all crimes within 80 daysof the crimersquos midpoint in time These were sampled uni-formly all negative examples within 3500 feet or within 80days of the crimersquos midpoint in time were equally likely to beselected We repeatedly tested different thresholds until mod-els trained on these data rated nonspatial or nontemporal fea-tures roughly as important as distance and time apart (as mea-sured by feature importances) Finally a substantial portionof distant negative pairs that included one crime known to bepart of a pattern was also selected at randommdashapproximately

20 times the volume of already selected positive and nearbypairs for that crime

We subsequently uniformly sampled this set to fit in mem-ory for model training We trained the model for each crimetype on 20ndash32 million pairs of crimes For burglaries rob-beries and grand larcenies 02 008 and 006 of thisreduced set of pairs were positive respectively

Model TrainingFor each of the three crime types we relied on a random for-est model to calculate an overall similarity score between twocrimes We chose a random forest model for several rea-sons First tree-based classification algorithms are able tolearn interactions between features which is important whenconsidering how separate crime subtypes could exhibit dif-ferent similarity structures For example grand larceny pick-pocketing patterns may generally occur much closer in space(eg on a specific street corner) when compared with grandlarceny shoplifting patterns which may generally happenacross the city Second random forests have demonstratedan ability to achieve impressive performance while avoidingoverfitting (25) Third random forests are implemented inmany standard machine learning libraries can be trained andexecuted in parallel and do not require graphics processingunits to execute Finally similar to many other classificationalgorithms random forests generate a ldquoscorerdquo in Patternizrrsquoscase this represents the confidence that the pair belongs to-gether in a pattern These similarity scores are scaled be-tween 0 and 1 therefore they can be naturally represented toan investigator in the form of a percentage

We used the standard implementation of the random for-est algorithm in the Python library scikit-learn (26) Eachrandom forest model is an ensemble of hundreds of decisiontrees that are trained on the calculated features and responsesWe trained each individual tree on an independently boot-strapped sample of the training set with a randomized por-tion of the full set of features K We describe the mathemat-ical details of how we trained the trees in Appendix B

Each random forest model also has a collection of settingswhich are known as hyperparameters that specify modelstructure and operation We optimized several hyperparam-eters for the random forest model including the size of therandomized subset of features and the maximum allowed treedepth We used a cross-validated randomized hyperparame-ter search (27) which repeatedly selects a random combina-tion of hyperparameters from a prespecified distribution ofvalues and then tests the performance of this combination onseveral subsets of the data preventing overfitting to any par-

Algorithm Completely rebuiltpatterns (no)

Completely rebuiltpatterns ()

Patterns with matchin top 10 (no)

Patterns with matchin top 10 ()

Burglary 55 (43ndash67) 37 (29-45) 123 (114ndash132) 82 (76-88)Robbery 39 (29ndash49) 26 (19-33) 117 (107ndash126) 78 (71-84)Grand larceny 41 (30ndash51) 27 (20-34) 114 (103ndash124) 76 (69-83)

Table 2 Accuracy of Patternizr on 150 test patterns The average performance is listed first followed by the 95 bootstrapped confidence interval of the performance inparentheses

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 5

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 2: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

Solution Design

Patternizr is a decision support tool and recommendation en-gine that consists of a set of machine learning models to makethis search process more efficient and effective The purposeof Patternizr is to take a ldquoseedrdquo complaint chosen by the an-alyst and compare this seed against the hundreds of thou-sands of recorded crimes in NYPDrsquos crime database Witheach comparison between a crime and the seed Patternizrgenerates a ldquosimilarity scorerdquo that quantifies the likelihoodthat the pair of crimes are in a pattern together Once all sim-ilarity scores have been generated Patternizr ranks the pairsin descending order by similarity score and returns a list forthe analyst to review The complaints most likely to be in apattern with the seed complaint appear at the top of the listAfter reviewing these ranked results the analyst can decidewhich complaints should be grouped together into a pattern

The models that comprise Patternizr are supervised ma-chine learning classifiers that is they are statistical modelsthat learn from historical examples where classifications areknown and are then used to predict the classification for sam-ples for which the classifications are unknown In the caseof Patternizr each example is a pair of crimes and the clas-sification is whether the two crimes are in a pattern togetherThe collection of examples used for training is known as atraining set Each model learns to perform classification byincorporating information from a set of attributes known asfeatures which are selected to represent relevant characteris-tics about each pair of crimes Much of the work of buildinga machine learning model is in identifying and representingfeatures to enable the model to make accurate classificationsWe call this step feature extraction For Patternizr as we dis-cuss in the Sample Selection section we chose samples forthe training set in such a way that we improved informationgain Finally we trained a specific type of machine learningmodelmdasha random forestmdashusing standard techniques

We used a separate model for each of three different crimetypes (burglaries robberies and grand larcenies) becausethese crime types have a sufficient corpus of prior manuallyidentified patterns for use as training examples This corpusconsists of approximately 10000 patterns between 2006 and2015 from each crime type In addition to manually iden-tified official patterns a portion of this corpus is built fromcomplaint records where the same individual was arrestedfor multiple crimes of the same type within a span of twodays When these arrest groupings are included with man-ually identified patterns each crime type has approximately30000 complaints which are included in a patternmdasha smallportion of the 200000ndash400000 complaints for each crimetype recorded over the same ten-year period In this sectionwe describe how we constructed the features used in eachof the three machine learning models then discuss how wetrained the models that were put into production Figure 1 de-picts the process by which these models were produced Al-though we constructed separate models for each of the threecrime types the models rely on a nearly identical set of fea-tures and are trained in the same way

Fig 1 Process overview for training Patternizr Historical pattern data are used tochoose a sample of complaints That sample and any corresponding arrests arerun through the similarity calculations to build features for learning The randomforest model takes those features and uses the historical patterns for training toproduce a production model

Feature Extraction

We construct the features on which each model relies fromthe information contained in complaints A complaint con-tains a mix of unstructured text describing details of the crimeand structured fields about the crime including the date andtime (which can be a range if the precise time of occurrenceis unknown) location crime subcategory MO and suspectinformation We use this information to calculate the fivetypes of crime-to-crime similarities used as features by Patt-ernizr location date-time categorical suspect and unstruc-tured text

We selected these similarities by discussing pattern dis-covery with NYPD investigatory and enforcement personnelwith many years of experience manually building patternsThey described the data sources they have used to identifypatterns as well as specific pieces of information that ledthem to believe that two complaints were connected Ad-ditionally they discussed many specific patterns they hadidentified and the unique set of circumstances that connectedeach group of crimes In the course of these discussions wealso learned much about the inefficiencies that are inherent inmanual pattern creation as we detail in the introduction andthat could be reduced with the help of an automated tool

Because Patternizr predicts the likelihood of a complaintpair belonging together in a pattern the set of features readby the model (Table 1) which we call K is derived frompairwise comparisons between crimes i and j Each of thesepairwise comparisons which we call Xk is a similarity cal-culation between a selected attribute k of the pairrsquos complaintreports The mathematical details of our more complicatedfeatures can be found in Appendix A

For date-time location and categorical attributes we usedsimilarity computation methods previously described in re-lated work on Cambridge Police Department data (15) How-ever we modified that formulation by excluding categori-cal information corresponding to administrative geographies(ie a feature noting that the pair exists in the same precinct)to encourage matches across such boundaries Date-time andlocation attributes are useful features because detectivesrsquo ex-periences have shown that complaints in patterns are likely tohappen nearby in space and time Categorical attributes areimportant at identifying a consistent MO We also includedmetrics to compare the descriptions of groups of suspects

2 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

No Attribute Type Similarity measure1 Location (XY) Location Distance apart (Euclidean)2 Location (XY) Location Distance apart (exponential)3 Location (XY) Location Binned midpoint (longitudinal)4 Location (XY) Location Binned midpoint (latitudinal)5 Location (XY) Location Location frequency (sum)6 Location (XY) Location Location frequency (maximum)7 Location (XY) Location Location frequency (minimum)8 Location (XY) Location Location frequency (product)9 Date-time of occurrence Date-time Time of day similarity10 Date-time of occurrence Date-time Time of week similarity11 Date-time of occurrence Date-time Size of occurrence windows (larger)12 Date-time of occurrence Date-time Size of occurrence windows (smaller)13 Date-time of occurrence Date-time Size of occurrence windows (difference)14 Date-time of occurrence Date-time Days apart15 Date-time of arrest Date-time Days between arrest and crime16 Premise type Categorical Categorical similarity17 Crime classification Categorical Categorical similarity18 MO categorical fields Categorical Categorical similarity19 MO weapon Categorical Categorical similarity20 Location details Categorical Categorical similarity21 Firearm discharged Categorical Categorical similarity22 Crime attempted or completed Categorical Categorical similarity23 Domestic crime indicator Categorical Categorical similarity24 Medical assistance required Categorical Categorical similarity25 Suspect height(s) Suspect Group similarity (continuous)26 Suspect weight(s) Suspect Group similarity (continuous)27 Suspect force used Suspect Group similarity (categorical)28 Suspect count Suspect Difference29 MO victim count Suspect Difference30 Complaint narrative (burg only) Unstructured text Average similarity31 Complaint narrative (burg only) Unstructured text Sum of similarity32 Complaint narrative (rob and GL) Unstructured text Cosine similarity33 MO suspect statement (rob and GL) Unstructured text Cosine similarity34 MO victim actions (rob and GL) Unstructured text Cosine similarity35 MO method of flight (rob and GL) Unstructured text Cosine similarity36 Premise name (rob and GL) Unstructured text Cosine similarity37 Property taken (rob and GL) Unstructured text Cosine similarity38 All unstructured text Unstructured text Cosine similarity39 Complaint narrative Unstructured text Rare-word matches

Table 1 List of the features included in Patternizr These features are indexed by the variable k in the text We use the following abbreviations ldquoburgrdquo represents burglaryldquorobrdquo represents robbery and ldquoGLrdquo represents grand larceny

(eg the count of suspects who have committed a crime)however we excluded sensitive suspect attributes (eg race)Suspect descriptions are particularly useful in identifying pat-terns of violent crimes such as robberies because in suchcrimes the victim physically encounters the suspect Finallywe added features for the similarity of unstructured text nar-ratives including the cosine similarity of vector representa-tions of the descriptions as well as the number of rare-wordmatches between two descriptions These features were in-

cluded as a reflection of the central role that unstructured textplays in the manual identification of patterns

Many features represent similar but distinct representa-tions of the underlying data This allows the algorithm toleverage as much available information as possible We con-structed three slightly different feature sets for each crimetype based on the characteristics relevant available andfound to be important for that crime type For example theMO categorical field for ldquomethod of entryrdquo is relevant toburglaries but not for robberies or grand larcenies

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 3

Location Features We incorporate eight similarities (la-beled with indices k = 1 through k = 8) derived from loca-tion information including a strict Euclidean distance d (infeet) between the crime pair and an exponential factor edWe also include two location features corresponding to lon-gitude and latitude but binned to very coarse squares (overnine miles on a side for a total of roughly 11 boxes possi-ble across the geography of New York City) to minimize thepossibility that higher-resolution geographies could be usedby the model as a proxy variable for sensitive attributes (suchas race) For each pair of crimes we note which bin containsthe pairrsquos midpoint and include this as a feature We also usea two-dimensional location frequency distribution calculatedwith kernel density estimation (KDE) (22) which representsa heatmap of where crime is most (and least) common in thecity For each pair we add a feature for the sum maximumminimum and product of these KDE values providing themodel with a range of information on how common the loca-tions were for both crimes

Date-Time Features Seven separate similarities (k = 9through k = 15) depend on the date-time attributes of the twocomplaints The time-of-day similarity measure which com-pares the time of day of the two crimesrsquo occurrences is drawndirectly from Wang et al (15) Following their approach wecalculate the frequency distribution of occurrence times foreach crime type using KDE Then for each pair we weightthe difference between occurrence times by the frequency ofthose occurrence times Crimes that occur close in time toeach other and at rare times will have a low measure in-dicating greater similarity between crime times Crimes thatoccur far apart in time and during common times will havea high measure indicating lower similarity between crimetimes Alternatively crimes that are a certain distance apartin time but occur during rare times will be more similar thancrimes that are the same distance apart in time but occurduring common times This calculation can also be appliedto occurrence times specified over an occurrence windowmdashtypical for crimes such as burglaries where the victim oftenis not present to witness the crime and therefore knows onlythat the crime happened within some time span Comparisonsacross occurrence windows are made by dividing the windowevenly into ten points and averaging similarities across all tenpoints We applied the same process to time-of-week simi-larities In addition we included features for the duration ofthe larger and smaller occurrence windows from the pair ofcrimes as well as the difference in size of the two occurrencewindows We included a simple featuremdashdays apartmdashthatwe calculated between the midpoints of the time ranges ifthe occurrence time was over a range Finally we includeda feature for the time between an arrest associated with theearlier crime and the occurrence of the later crime so that themodel could have information about whether a criminal waspotentially incarcerated

Categorical Features Much of the information in a com-plaint is in categorical form including premise type crimeclassification the MO itself weapon type and details about

the crimersquos location Other information is stored in categori-cal fields that have yesnounknown structures examples in-clude whether a firearm was discharged during the crimewhether the crime was attempted or completed whethersomeone required medical assistance and whether the crimewas domestic in nature For all our categorical features(k = 16 through k = 24) we incorporate a simplified versionof Goodallrsquos similarity measure (23) identical to that used byWang et al (15) This measure accounts for the frequency ofcategorical values causing two records that match on a rarevalue to be rated as more similar than two records that matchon a common value For categorical values that may takeon multiple values for a single crime (eg multiple weapontypes may be associated with a single robbery) possible val-ues are encoded as dummy variables (22) before the categor-ical similarity calculation In addition many categories aretoo fine grained to be useful because Goodallrsquos simplifiedsimilarity measure is only nonzero when categories matchexactly on both complaints For example NYPD crime clas-sifications for burglaries include three separate categories forresidential burglary even though a perpetrator with the sameMO may plausibly commit crimes across all three typesTo address this issue we asked uniformed police experts togroup the existing classification into broader categories us-ing domain knowledge We included both the ungrouped andgrouped crime classification similarities as features for thealgorithm

Suspect Features We chose to include only nonsensitivesuspect attributes including height weight force used andcount of suspects as suspect features for Patternizr Suspect-attribute comparison differs from categorical comparison be-cause it is often a many-to-many comparison for a crime pairIt is generally not possible to know from two lists of suspectswhich descriptions correspond to the same individual there-fore comparisons must be made between all possible pairs ofindividuals (with each list providing one of the pair) For ex-ample for the height and weight continuous variables (k = 25and k = 26) we calculated the minimum difference betweeneach possible pair of suspect descriptions and then dividedthe sum of differences by the quantity of suspects For thecategorical feature of force used (ie whether the suspectthreatened to use force or actually used force in the com-mission of a robbery) we counted and tabulated the numberof matches between possible suspect pairs (k = 27) in a sim-ilar fashion We also included the difference in suspect countand victim count (which is a characteristic of the MO) asfeatures (k = 28 and k = 29)

Unstructured Text Features We also created a number ofunstructured text features including comparisons of the com-plaint text narrative which plays a central role in the manualidentification of patterns Unstructured text also comprisesthe suspectrsquos statements victim actions method of flightpremise name and property taken For burglaries we cal-culate a word-by-word score identical to the categorical sim-ilarities but on a word-by-word basis rather than attributebasis for words that match between the narratives on both

4 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

complaints We then provide an average and sum of thesesimilarities (k = 30 and k = 31)

For robberies and grand larcenies we use a more advancedmethod on other unstructured fields in addition to the nar-ratives We calculate the term-frequencyinverse-documentfrequency (TFIDF) (24) vector representing the presenceand importance of the set of words that comprise each un-structured text We compute a cosine similarity between thepair of TFIDF vectors for each unstructured text attribute(labeled with indices k = 32 through k = 37)

We also combine all unstructured text from each complaintand treat it as a separate unstructured text attribute this per-mits information that appears fragmented across differentnarratives to be compared at once across all possible sourcesFor this combined narrative we calculate the cosine similar-ity (k = 38) and also count the number of rare words that didnot appear in the training corpus but do appear in both com-plaints and we include that count as a feature (k = 39) Theseunusual words may indicate the presence of a consistent andunusual MO

Sample SelectionA very small proportion of all possible crime pairs (about8times10minus8 3times10minus8 and 9times10minus9 of possible burglary rob-bery and grand larceny pairs respectively) are together inmanually identified patterns We refer to these crime pairs aspositive pairs To preserve maximum information gain fromthe corpus of patterns while also permitting model trainingwithin memory constraints we included all positive pairs inthe training set and performed down-sampling on the set ofpossible negative pairs

Purely random sampling of negative pairs would largelyconsist of pairs that are far apart in space and time encourag-ing the model to identify positives based solely on proximityin those dimensions To counteract this effect we samplednearby negative pairs at a higher rate than distant negativepairs We paired every pattern crime with a random sampleof 2 of all negative examples within 3500 feet and an ad-ditional random sample of 01 of all crimes within 80 daysof the crimersquos midpoint in time These were sampled uni-formly all negative examples within 3500 feet or within 80days of the crimersquos midpoint in time were equally likely to beselected We repeatedly tested different thresholds until mod-els trained on these data rated nonspatial or nontemporal fea-tures roughly as important as distance and time apart (as mea-sured by feature importances) Finally a substantial portionof distant negative pairs that included one crime known to bepart of a pattern was also selected at randommdashapproximately

20 times the volume of already selected positive and nearbypairs for that crime

We subsequently uniformly sampled this set to fit in mem-ory for model training We trained the model for each crimetype on 20ndash32 million pairs of crimes For burglaries rob-beries and grand larcenies 02 008 and 006 of thisreduced set of pairs were positive respectively

Model TrainingFor each of the three crime types we relied on a random for-est model to calculate an overall similarity score between twocrimes We chose a random forest model for several rea-sons First tree-based classification algorithms are able tolearn interactions between features which is important whenconsidering how separate crime subtypes could exhibit dif-ferent similarity structures For example grand larceny pick-pocketing patterns may generally occur much closer in space(eg on a specific street corner) when compared with grandlarceny shoplifting patterns which may generally happenacross the city Second random forests have demonstratedan ability to achieve impressive performance while avoidingoverfitting (25) Third random forests are implemented inmany standard machine learning libraries can be trained andexecuted in parallel and do not require graphics processingunits to execute Finally similar to many other classificationalgorithms random forests generate a ldquoscorerdquo in Patternizrrsquoscase this represents the confidence that the pair belongs to-gether in a pattern These similarity scores are scaled be-tween 0 and 1 therefore they can be naturally represented toan investigator in the form of a percentage

We used the standard implementation of the random for-est algorithm in the Python library scikit-learn (26) Eachrandom forest model is an ensemble of hundreds of decisiontrees that are trained on the calculated features and responsesWe trained each individual tree on an independently boot-strapped sample of the training set with a randomized por-tion of the full set of features K We describe the mathemat-ical details of how we trained the trees in Appendix B

Each random forest model also has a collection of settingswhich are known as hyperparameters that specify modelstructure and operation We optimized several hyperparam-eters for the random forest model including the size of therandomized subset of features and the maximum allowed treedepth We used a cross-validated randomized hyperparame-ter search (27) which repeatedly selects a random combina-tion of hyperparameters from a prespecified distribution ofvalues and then tests the performance of this combination onseveral subsets of the data preventing overfitting to any par-

Algorithm Completely rebuiltpatterns (no)

Completely rebuiltpatterns ()

Patterns with matchin top 10 (no)

Patterns with matchin top 10 ()

Burglary 55 (43ndash67) 37 (29-45) 123 (114ndash132) 82 (76-88)Robbery 39 (29ndash49) 26 (19-33) 117 (107ndash126) 78 (71-84)Grand larceny 41 (30ndash51) 27 (20-34) 114 (103ndash124) 76 (69-83)

Table 2 Accuracy of Patternizr on 150 test patterns The average performance is listed first followed by the 95 bootstrapped confidence interval of the performance inparentheses

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 5

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 3: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

No Attribute Type Similarity measure1 Location (XY) Location Distance apart (Euclidean)2 Location (XY) Location Distance apart (exponential)3 Location (XY) Location Binned midpoint (longitudinal)4 Location (XY) Location Binned midpoint (latitudinal)5 Location (XY) Location Location frequency (sum)6 Location (XY) Location Location frequency (maximum)7 Location (XY) Location Location frequency (minimum)8 Location (XY) Location Location frequency (product)9 Date-time of occurrence Date-time Time of day similarity10 Date-time of occurrence Date-time Time of week similarity11 Date-time of occurrence Date-time Size of occurrence windows (larger)12 Date-time of occurrence Date-time Size of occurrence windows (smaller)13 Date-time of occurrence Date-time Size of occurrence windows (difference)14 Date-time of occurrence Date-time Days apart15 Date-time of arrest Date-time Days between arrest and crime16 Premise type Categorical Categorical similarity17 Crime classification Categorical Categorical similarity18 MO categorical fields Categorical Categorical similarity19 MO weapon Categorical Categorical similarity20 Location details Categorical Categorical similarity21 Firearm discharged Categorical Categorical similarity22 Crime attempted or completed Categorical Categorical similarity23 Domestic crime indicator Categorical Categorical similarity24 Medical assistance required Categorical Categorical similarity25 Suspect height(s) Suspect Group similarity (continuous)26 Suspect weight(s) Suspect Group similarity (continuous)27 Suspect force used Suspect Group similarity (categorical)28 Suspect count Suspect Difference29 MO victim count Suspect Difference30 Complaint narrative (burg only) Unstructured text Average similarity31 Complaint narrative (burg only) Unstructured text Sum of similarity32 Complaint narrative (rob and GL) Unstructured text Cosine similarity33 MO suspect statement (rob and GL) Unstructured text Cosine similarity34 MO victim actions (rob and GL) Unstructured text Cosine similarity35 MO method of flight (rob and GL) Unstructured text Cosine similarity36 Premise name (rob and GL) Unstructured text Cosine similarity37 Property taken (rob and GL) Unstructured text Cosine similarity38 All unstructured text Unstructured text Cosine similarity39 Complaint narrative Unstructured text Rare-word matches

Table 1 List of the features included in Patternizr These features are indexed by the variable k in the text We use the following abbreviations ldquoburgrdquo represents burglaryldquorobrdquo represents robbery and ldquoGLrdquo represents grand larceny

(eg the count of suspects who have committed a crime)however we excluded sensitive suspect attributes (eg race)Suspect descriptions are particularly useful in identifying pat-terns of violent crimes such as robberies because in suchcrimes the victim physically encounters the suspect Finallywe added features for the similarity of unstructured text nar-ratives including the cosine similarity of vector representa-tions of the descriptions as well as the number of rare-wordmatches between two descriptions These features were in-

cluded as a reflection of the central role that unstructured textplays in the manual identification of patterns

Many features represent similar but distinct representa-tions of the underlying data This allows the algorithm toleverage as much available information as possible We con-structed three slightly different feature sets for each crimetype based on the characteristics relevant available andfound to be important for that crime type For example theMO categorical field for ldquomethod of entryrdquo is relevant toburglaries but not for robberies or grand larcenies

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 3

Location Features We incorporate eight similarities (la-beled with indices k = 1 through k = 8) derived from loca-tion information including a strict Euclidean distance d (infeet) between the crime pair and an exponential factor edWe also include two location features corresponding to lon-gitude and latitude but binned to very coarse squares (overnine miles on a side for a total of roughly 11 boxes possi-ble across the geography of New York City) to minimize thepossibility that higher-resolution geographies could be usedby the model as a proxy variable for sensitive attributes (suchas race) For each pair of crimes we note which bin containsthe pairrsquos midpoint and include this as a feature We also usea two-dimensional location frequency distribution calculatedwith kernel density estimation (KDE) (22) which representsa heatmap of where crime is most (and least) common in thecity For each pair we add a feature for the sum maximumminimum and product of these KDE values providing themodel with a range of information on how common the loca-tions were for both crimes

Date-Time Features Seven separate similarities (k = 9through k = 15) depend on the date-time attributes of the twocomplaints The time-of-day similarity measure which com-pares the time of day of the two crimesrsquo occurrences is drawndirectly from Wang et al (15) Following their approach wecalculate the frequency distribution of occurrence times foreach crime type using KDE Then for each pair we weightthe difference between occurrence times by the frequency ofthose occurrence times Crimes that occur close in time toeach other and at rare times will have a low measure in-dicating greater similarity between crime times Crimes thatoccur far apart in time and during common times will havea high measure indicating lower similarity between crimetimes Alternatively crimes that are a certain distance apartin time but occur during rare times will be more similar thancrimes that are the same distance apart in time but occurduring common times This calculation can also be appliedto occurrence times specified over an occurrence windowmdashtypical for crimes such as burglaries where the victim oftenis not present to witness the crime and therefore knows onlythat the crime happened within some time span Comparisonsacross occurrence windows are made by dividing the windowevenly into ten points and averaging similarities across all tenpoints We applied the same process to time-of-week simi-larities In addition we included features for the duration ofthe larger and smaller occurrence windows from the pair ofcrimes as well as the difference in size of the two occurrencewindows We included a simple featuremdashdays apartmdashthatwe calculated between the midpoints of the time ranges ifthe occurrence time was over a range Finally we includeda feature for the time between an arrest associated with theearlier crime and the occurrence of the later crime so that themodel could have information about whether a criminal waspotentially incarcerated

Categorical Features Much of the information in a com-plaint is in categorical form including premise type crimeclassification the MO itself weapon type and details about

the crimersquos location Other information is stored in categori-cal fields that have yesnounknown structures examples in-clude whether a firearm was discharged during the crimewhether the crime was attempted or completed whethersomeone required medical assistance and whether the crimewas domestic in nature For all our categorical features(k = 16 through k = 24) we incorporate a simplified versionof Goodallrsquos similarity measure (23) identical to that used byWang et al (15) This measure accounts for the frequency ofcategorical values causing two records that match on a rarevalue to be rated as more similar than two records that matchon a common value For categorical values that may takeon multiple values for a single crime (eg multiple weapontypes may be associated with a single robbery) possible val-ues are encoded as dummy variables (22) before the categor-ical similarity calculation In addition many categories aretoo fine grained to be useful because Goodallrsquos simplifiedsimilarity measure is only nonzero when categories matchexactly on both complaints For example NYPD crime clas-sifications for burglaries include three separate categories forresidential burglary even though a perpetrator with the sameMO may plausibly commit crimes across all three typesTo address this issue we asked uniformed police experts togroup the existing classification into broader categories us-ing domain knowledge We included both the ungrouped andgrouped crime classification similarities as features for thealgorithm

Suspect Features We chose to include only nonsensitivesuspect attributes including height weight force used andcount of suspects as suspect features for Patternizr Suspect-attribute comparison differs from categorical comparison be-cause it is often a many-to-many comparison for a crime pairIt is generally not possible to know from two lists of suspectswhich descriptions correspond to the same individual there-fore comparisons must be made between all possible pairs ofindividuals (with each list providing one of the pair) For ex-ample for the height and weight continuous variables (k = 25and k = 26) we calculated the minimum difference betweeneach possible pair of suspect descriptions and then dividedthe sum of differences by the quantity of suspects For thecategorical feature of force used (ie whether the suspectthreatened to use force or actually used force in the com-mission of a robbery) we counted and tabulated the numberof matches between possible suspect pairs (k = 27) in a sim-ilar fashion We also included the difference in suspect countand victim count (which is a characteristic of the MO) asfeatures (k = 28 and k = 29)

Unstructured Text Features We also created a number ofunstructured text features including comparisons of the com-plaint text narrative which plays a central role in the manualidentification of patterns Unstructured text also comprisesthe suspectrsquos statements victim actions method of flightpremise name and property taken For burglaries we cal-culate a word-by-word score identical to the categorical sim-ilarities but on a word-by-word basis rather than attributebasis for words that match between the narratives on both

4 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

complaints We then provide an average and sum of thesesimilarities (k = 30 and k = 31)

For robberies and grand larcenies we use a more advancedmethod on other unstructured fields in addition to the nar-ratives We calculate the term-frequencyinverse-documentfrequency (TFIDF) (24) vector representing the presenceand importance of the set of words that comprise each un-structured text We compute a cosine similarity between thepair of TFIDF vectors for each unstructured text attribute(labeled with indices k = 32 through k = 37)

We also combine all unstructured text from each complaintand treat it as a separate unstructured text attribute this per-mits information that appears fragmented across differentnarratives to be compared at once across all possible sourcesFor this combined narrative we calculate the cosine similar-ity (k = 38) and also count the number of rare words that didnot appear in the training corpus but do appear in both com-plaints and we include that count as a feature (k = 39) Theseunusual words may indicate the presence of a consistent andunusual MO

Sample SelectionA very small proportion of all possible crime pairs (about8times10minus8 3times10minus8 and 9times10minus9 of possible burglary rob-bery and grand larceny pairs respectively) are together inmanually identified patterns We refer to these crime pairs aspositive pairs To preserve maximum information gain fromthe corpus of patterns while also permitting model trainingwithin memory constraints we included all positive pairs inthe training set and performed down-sampling on the set ofpossible negative pairs

Purely random sampling of negative pairs would largelyconsist of pairs that are far apart in space and time encourag-ing the model to identify positives based solely on proximityin those dimensions To counteract this effect we samplednearby negative pairs at a higher rate than distant negativepairs We paired every pattern crime with a random sampleof 2 of all negative examples within 3500 feet and an ad-ditional random sample of 01 of all crimes within 80 daysof the crimersquos midpoint in time These were sampled uni-formly all negative examples within 3500 feet or within 80days of the crimersquos midpoint in time were equally likely to beselected We repeatedly tested different thresholds until mod-els trained on these data rated nonspatial or nontemporal fea-tures roughly as important as distance and time apart (as mea-sured by feature importances) Finally a substantial portionof distant negative pairs that included one crime known to bepart of a pattern was also selected at randommdashapproximately

20 times the volume of already selected positive and nearbypairs for that crime

We subsequently uniformly sampled this set to fit in mem-ory for model training We trained the model for each crimetype on 20ndash32 million pairs of crimes For burglaries rob-beries and grand larcenies 02 008 and 006 of thisreduced set of pairs were positive respectively

Model TrainingFor each of the three crime types we relied on a random for-est model to calculate an overall similarity score between twocrimes We chose a random forest model for several rea-sons First tree-based classification algorithms are able tolearn interactions between features which is important whenconsidering how separate crime subtypes could exhibit dif-ferent similarity structures For example grand larceny pick-pocketing patterns may generally occur much closer in space(eg on a specific street corner) when compared with grandlarceny shoplifting patterns which may generally happenacross the city Second random forests have demonstratedan ability to achieve impressive performance while avoidingoverfitting (25) Third random forests are implemented inmany standard machine learning libraries can be trained andexecuted in parallel and do not require graphics processingunits to execute Finally similar to many other classificationalgorithms random forests generate a ldquoscorerdquo in Patternizrrsquoscase this represents the confidence that the pair belongs to-gether in a pattern These similarity scores are scaled be-tween 0 and 1 therefore they can be naturally represented toan investigator in the form of a percentage

We used the standard implementation of the random for-est algorithm in the Python library scikit-learn (26) Eachrandom forest model is an ensemble of hundreds of decisiontrees that are trained on the calculated features and responsesWe trained each individual tree on an independently boot-strapped sample of the training set with a randomized por-tion of the full set of features K We describe the mathemat-ical details of how we trained the trees in Appendix B

Each random forest model also has a collection of settingswhich are known as hyperparameters that specify modelstructure and operation We optimized several hyperparam-eters for the random forest model including the size of therandomized subset of features and the maximum allowed treedepth We used a cross-validated randomized hyperparame-ter search (27) which repeatedly selects a random combina-tion of hyperparameters from a prespecified distribution ofvalues and then tests the performance of this combination onseveral subsets of the data preventing overfitting to any par-

Algorithm Completely rebuiltpatterns (no)

Completely rebuiltpatterns ()

Patterns with matchin top 10 (no)

Patterns with matchin top 10 ()

Burglary 55 (43ndash67) 37 (29-45) 123 (114ndash132) 82 (76-88)Robbery 39 (29ndash49) 26 (19-33) 117 (107ndash126) 78 (71-84)Grand larceny 41 (30ndash51) 27 (20-34) 114 (103ndash124) 76 (69-83)

Table 2 Accuracy of Patternizr on 150 test patterns The average performance is listed first followed by the 95 bootstrapped confidence interval of the performance inparentheses

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 5

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 4: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

Location Features We incorporate eight similarities (la-beled with indices k = 1 through k = 8) derived from loca-tion information including a strict Euclidean distance d (infeet) between the crime pair and an exponential factor edWe also include two location features corresponding to lon-gitude and latitude but binned to very coarse squares (overnine miles on a side for a total of roughly 11 boxes possi-ble across the geography of New York City) to minimize thepossibility that higher-resolution geographies could be usedby the model as a proxy variable for sensitive attributes (suchas race) For each pair of crimes we note which bin containsthe pairrsquos midpoint and include this as a feature We also usea two-dimensional location frequency distribution calculatedwith kernel density estimation (KDE) (22) which representsa heatmap of where crime is most (and least) common in thecity For each pair we add a feature for the sum maximumminimum and product of these KDE values providing themodel with a range of information on how common the loca-tions were for both crimes

Date-Time Features Seven separate similarities (k = 9through k = 15) depend on the date-time attributes of the twocomplaints The time-of-day similarity measure which com-pares the time of day of the two crimesrsquo occurrences is drawndirectly from Wang et al (15) Following their approach wecalculate the frequency distribution of occurrence times foreach crime type using KDE Then for each pair we weightthe difference between occurrence times by the frequency ofthose occurrence times Crimes that occur close in time toeach other and at rare times will have a low measure in-dicating greater similarity between crime times Crimes thatoccur far apart in time and during common times will havea high measure indicating lower similarity between crimetimes Alternatively crimes that are a certain distance apartin time but occur during rare times will be more similar thancrimes that are the same distance apart in time but occurduring common times This calculation can also be appliedto occurrence times specified over an occurrence windowmdashtypical for crimes such as burglaries where the victim oftenis not present to witness the crime and therefore knows onlythat the crime happened within some time span Comparisonsacross occurrence windows are made by dividing the windowevenly into ten points and averaging similarities across all tenpoints We applied the same process to time-of-week simi-larities In addition we included features for the duration ofthe larger and smaller occurrence windows from the pair ofcrimes as well as the difference in size of the two occurrencewindows We included a simple featuremdashdays apartmdashthatwe calculated between the midpoints of the time ranges ifthe occurrence time was over a range Finally we includeda feature for the time between an arrest associated with theearlier crime and the occurrence of the later crime so that themodel could have information about whether a criminal waspotentially incarcerated

Categorical Features Much of the information in a com-plaint is in categorical form including premise type crimeclassification the MO itself weapon type and details about

the crimersquos location Other information is stored in categori-cal fields that have yesnounknown structures examples in-clude whether a firearm was discharged during the crimewhether the crime was attempted or completed whethersomeone required medical assistance and whether the crimewas domestic in nature For all our categorical features(k = 16 through k = 24) we incorporate a simplified versionof Goodallrsquos similarity measure (23) identical to that used byWang et al (15) This measure accounts for the frequency ofcategorical values causing two records that match on a rarevalue to be rated as more similar than two records that matchon a common value For categorical values that may takeon multiple values for a single crime (eg multiple weapontypes may be associated with a single robbery) possible val-ues are encoded as dummy variables (22) before the categor-ical similarity calculation In addition many categories aretoo fine grained to be useful because Goodallrsquos simplifiedsimilarity measure is only nonzero when categories matchexactly on both complaints For example NYPD crime clas-sifications for burglaries include three separate categories forresidential burglary even though a perpetrator with the sameMO may plausibly commit crimes across all three typesTo address this issue we asked uniformed police experts togroup the existing classification into broader categories us-ing domain knowledge We included both the ungrouped andgrouped crime classification similarities as features for thealgorithm

Suspect Features We chose to include only nonsensitivesuspect attributes including height weight force used andcount of suspects as suspect features for Patternizr Suspect-attribute comparison differs from categorical comparison be-cause it is often a many-to-many comparison for a crime pairIt is generally not possible to know from two lists of suspectswhich descriptions correspond to the same individual there-fore comparisons must be made between all possible pairs ofindividuals (with each list providing one of the pair) For ex-ample for the height and weight continuous variables (k = 25and k = 26) we calculated the minimum difference betweeneach possible pair of suspect descriptions and then dividedthe sum of differences by the quantity of suspects For thecategorical feature of force used (ie whether the suspectthreatened to use force or actually used force in the com-mission of a robbery) we counted and tabulated the numberof matches between possible suspect pairs (k = 27) in a sim-ilar fashion We also included the difference in suspect countand victim count (which is a characteristic of the MO) asfeatures (k = 28 and k = 29)

Unstructured Text Features We also created a number ofunstructured text features including comparisons of the com-plaint text narrative which plays a central role in the manualidentification of patterns Unstructured text also comprisesthe suspectrsquos statements victim actions method of flightpremise name and property taken For burglaries we cal-culate a word-by-word score identical to the categorical sim-ilarities but on a word-by-word basis rather than attributebasis for words that match between the narratives on both

4 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

complaints We then provide an average and sum of thesesimilarities (k = 30 and k = 31)

For robberies and grand larcenies we use a more advancedmethod on other unstructured fields in addition to the nar-ratives We calculate the term-frequencyinverse-documentfrequency (TFIDF) (24) vector representing the presenceand importance of the set of words that comprise each un-structured text We compute a cosine similarity between thepair of TFIDF vectors for each unstructured text attribute(labeled with indices k = 32 through k = 37)

We also combine all unstructured text from each complaintand treat it as a separate unstructured text attribute this per-mits information that appears fragmented across differentnarratives to be compared at once across all possible sourcesFor this combined narrative we calculate the cosine similar-ity (k = 38) and also count the number of rare words that didnot appear in the training corpus but do appear in both com-plaints and we include that count as a feature (k = 39) Theseunusual words may indicate the presence of a consistent andunusual MO

Sample SelectionA very small proportion of all possible crime pairs (about8times10minus8 3times10minus8 and 9times10minus9 of possible burglary rob-bery and grand larceny pairs respectively) are together inmanually identified patterns We refer to these crime pairs aspositive pairs To preserve maximum information gain fromthe corpus of patterns while also permitting model trainingwithin memory constraints we included all positive pairs inthe training set and performed down-sampling on the set ofpossible negative pairs

Purely random sampling of negative pairs would largelyconsist of pairs that are far apart in space and time encourag-ing the model to identify positives based solely on proximityin those dimensions To counteract this effect we samplednearby negative pairs at a higher rate than distant negativepairs We paired every pattern crime with a random sampleof 2 of all negative examples within 3500 feet and an ad-ditional random sample of 01 of all crimes within 80 daysof the crimersquos midpoint in time These were sampled uni-formly all negative examples within 3500 feet or within 80days of the crimersquos midpoint in time were equally likely to beselected We repeatedly tested different thresholds until mod-els trained on these data rated nonspatial or nontemporal fea-tures roughly as important as distance and time apart (as mea-sured by feature importances) Finally a substantial portionof distant negative pairs that included one crime known to bepart of a pattern was also selected at randommdashapproximately

20 times the volume of already selected positive and nearbypairs for that crime

We subsequently uniformly sampled this set to fit in mem-ory for model training We trained the model for each crimetype on 20ndash32 million pairs of crimes For burglaries rob-beries and grand larcenies 02 008 and 006 of thisreduced set of pairs were positive respectively

Model TrainingFor each of the three crime types we relied on a random for-est model to calculate an overall similarity score between twocrimes We chose a random forest model for several rea-sons First tree-based classification algorithms are able tolearn interactions between features which is important whenconsidering how separate crime subtypes could exhibit dif-ferent similarity structures For example grand larceny pick-pocketing patterns may generally occur much closer in space(eg on a specific street corner) when compared with grandlarceny shoplifting patterns which may generally happenacross the city Second random forests have demonstratedan ability to achieve impressive performance while avoidingoverfitting (25) Third random forests are implemented inmany standard machine learning libraries can be trained andexecuted in parallel and do not require graphics processingunits to execute Finally similar to many other classificationalgorithms random forests generate a ldquoscorerdquo in Patternizrrsquoscase this represents the confidence that the pair belongs to-gether in a pattern These similarity scores are scaled be-tween 0 and 1 therefore they can be naturally represented toan investigator in the form of a percentage

We used the standard implementation of the random for-est algorithm in the Python library scikit-learn (26) Eachrandom forest model is an ensemble of hundreds of decisiontrees that are trained on the calculated features and responsesWe trained each individual tree on an independently boot-strapped sample of the training set with a randomized por-tion of the full set of features K We describe the mathemat-ical details of how we trained the trees in Appendix B

Each random forest model also has a collection of settingswhich are known as hyperparameters that specify modelstructure and operation We optimized several hyperparam-eters for the random forest model including the size of therandomized subset of features and the maximum allowed treedepth We used a cross-validated randomized hyperparame-ter search (27) which repeatedly selects a random combina-tion of hyperparameters from a prespecified distribution ofvalues and then tests the performance of this combination onseveral subsets of the data preventing overfitting to any par-

Algorithm Completely rebuiltpatterns (no)

Completely rebuiltpatterns ()

Patterns with matchin top 10 (no)

Patterns with matchin top 10 ()

Burglary 55 (43ndash67) 37 (29-45) 123 (114ndash132) 82 (76-88)Robbery 39 (29ndash49) 26 (19-33) 117 (107ndash126) 78 (71-84)Grand larceny 41 (30ndash51) 27 (20-34) 114 (103ndash124) 76 (69-83)

Table 2 Accuracy of Patternizr on 150 test patterns The average performance is listed first followed by the 95 bootstrapped confidence interval of the performance inparentheses

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 5

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 5: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

complaints We then provide an average and sum of thesesimilarities (k = 30 and k = 31)

For robberies and grand larcenies we use a more advancedmethod on other unstructured fields in addition to the nar-ratives We calculate the term-frequencyinverse-documentfrequency (TFIDF) (24) vector representing the presenceand importance of the set of words that comprise each un-structured text We compute a cosine similarity between thepair of TFIDF vectors for each unstructured text attribute(labeled with indices k = 32 through k = 37)

We also combine all unstructured text from each complaintand treat it as a separate unstructured text attribute this per-mits information that appears fragmented across differentnarratives to be compared at once across all possible sourcesFor this combined narrative we calculate the cosine similar-ity (k = 38) and also count the number of rare words that didnot appear in the training corpus but do appear in both com-plaints and we include that count as a feature (k = 39) Theseunusual words may indicate the presence of a consistent andunusual MO

Sample SelectionA very small proportion of all possible crime pairs (about8times10minus8 3times10minus8 and 9times10minus9 of possible burglary rob-bery and grand larceny pairs respectively) are together inmanually identified patterns We refer to these crime pairs aspositive pairs To preserve maximum information gain fromthe corpus of patterns while also permitting model trainingwithin memory constraints we included all positive pairs inthe training set and performed down-sampling on the set ofpossible negative pairs

Purely random sampling of negative pairs would largelyconsist of pairs that are far apart in space and time encourag-ing the model to identify positives based solely on proximityin those dimensions To counteract this effect we samplednearby negative pairs at a higher rate than distant negativepairs We paired every pattern crime with a random sampleof 2 of all negative examples within 3500 feet and an ad-ditional random sample of 01 of all crimes within 80 daysof the crimersquos midpoint in time These were sampled uni-formly all negative examples within 3500 feet or within 80days of the crimersquos midpoint in time were equally likely to beselected We repeatedly tested different thresholds until mod-els trained on these data rated nonspatial or nontemporal fea-tures roughly as important as distance and time apart (as mea-sured by feature importances) Finally a substantial portionof distant negative pairs that included one crime known to bepart of a pattern was also selected at randommdashapproximately

20 times the volume of already selected positive and nearbypairs for that crime

We subsequently uniformly sampled this set to fit in mem-ory for model training We trained the model for each crimetype on 20ndash32 million pairs of crimes For burglaries rob-beries and grand larcenies 02 008 and 006 of thisreduced set of pairs were positive respectively

Model TrainingFor each of the three crime types we relied on a random for-est model to calculate an overall similarity score between twocrimes We chose a random forest model for several rea-sons First tree-based classification algorithms are able tolearn interactions between features which is important whenconsidering how separate crime subtypes could exhibit dif-ferent similarity structures For example grand larceny pick-pocketing patterns may generally occur much closer in space(eg on a specific street corner) when compared with grandlarceny shoplifting patterns which may generally happenacross the city Second random forests have demonstratedan ability to achieve impressive performance while avoidingoverfitting (25) Third random forests are implemented inmany standard machine learning libraries can be trained andexecuted in parallel and do not require graphics processingunits to execute Finally similar to many other classificationalgorithms random forests generate a ldquoscorerdquo in Patternizrrsquoscase this represents the confidence that the pair belongs to-gether in a pattern These similarity scores are scaled be-tween 0 and 1 therefore they can be naturally represented toan investigator in the form of a percentage

We used the standard implementation of the random for-est algorithm in the Python library scikit-learn (26) Eachrandom forest model is an ensemble of hundreds of decisiontrees that are trained on the calculated features and responsesWe trained each individual tree on an independently boot-strapped sample of the training set with a randomized por-tion of the full set of features K We describe the mathemat-ical details of how we trained the trees in Appendix B

Each random forest model also has a collection of settingswhich are known as hyperparameters that specify modelstructure and operation We optimized several hyperparam-eters for the random forest model including the size of therandomized subset of features and the maximum allowed treedepth We used a cross-validated randomized hyperparame-ter search (27) which repeatedly selects a random combina-tion of hyperparameters from a prespecified distribution ofvalues and then tests the performance of this combination onseveral subsets of the data preventing overfitting to any par-

Algorithm Completely rebuiltpatterns (no)

Completely rebuiltpatterns ()

Patterns with matchin top 10 (no)

Patterns with matchin top 10 ()

Burglary 55 (43ndash67) 37 (29-45) 123 (114ndash132) 82 (76-88)Robbery 39 (29ndash49) 26 (19-33) 117 (107ndash126) 78 (71-84)Grand larceny 41 (30ndash51) 27 (20-34) 114 (103ndash124) 76 (69-83)

Table 2 Accuracy of Patternizr on 150 test patterns The average performance is listed first followed by the 95 bootstrapped confidence interval of the performance inparentheses

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 5

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 6: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

Fig 2 Performance of Patternizr on 150 test patterns Each position on the x axis corresponds to a test pattern and each dot corresponds to the ranking of a crime in thatpattern Patterns are ordered left to right by the average precision score for that pattern as a visual aid Patternizr did best on the patterns that appear to the left side of eachchart Dashed lines indicate the proportion of test patterns with complete reconstruction (left of the vertical red line) and the patterns with a match in the top ten rankings(under the horizontal orange line) Roughly one-third of the test patterns are completely reconstructed (as we also detail in Table 2) that is no complaints outside of thepattern are ranked higher than a complaint included in the pattern and roughly four-fifths of the test patterns have at least one of the complaints in the pattern ranked in thetop ten

ticular subset For each crime type we trained one randomforest model with up to 220 trees using the generated featuresand tuned hyperparameters

We set our hyperparameter search to optimize hyperparam-eters for the best possible average precision (28)mdashan evalu-ation metric that gauges whether desired matches are rankedhighly in a query In Patternizrrsquos case the query is the listof complaints returned by Patternizr when the user chooses aseed Average precision relies on the precision metric whichmeasures the proportion of a predicted positive class that isactually positive For Patternizr precision is the percentageof suggested matches that are actually in a pattern with theseed Each true match in Patternizrrsquos ranked list can be rep-resented by the precision at that rankmdashthe number of truematches encountered so far divided by the rank (ie the totalnumber of suggested matches provided by Patternizr at thatpoint) Average precision therefore is simply an average ofthese precisions for all true matches in the test pattern

Performance

We tested the performance of Patternizr on a variety of ob-jectives First we measured the proportion of test patternsthat were entirely or partially rebuilt by the models even

though the models had not previously seen these test patternsSecond we constructed a benchmark model as a simplifiedproxy for existing practice to test whether investigators us-ing Patternizr would be more efficient than those using ex-isting methods Third we examined the features each modelhad isolated as most useful in identifying patterns and com-pared those with investigatorsrsquo intuition Finally we mea-sured whether Patternizr made consistent recommendationsacross racial groups

We validated all three Patternizr models on official patternsthat were not included in the training set For each crimetype we randomly chose 150 test patterns and randomly se-lected a seed from each pattern We then noted the rankingsof the remaining pattern crimes as provided by Patternizrrsquossimilarity scores We bootstrapped this process by randomlyselecting these 150 test patterns 1000 times which providedreliable estimates and uncertainty bounds Table 2 shows theresults from this analysis

The algorithm completely rebuilt (ie ranked all in-patternpairs ahead of all nonrelated pairs) approximately one-thirdof test patterns Additionally approximately four-fifths ofburglary robbery and grand larceny patterns had at least onematch in the top ten rankings out of the entire corpus of com-plaints for that crime type Figure 2 shows the rankings for all

Fig 3 Peformance of Patternizr against a simple baseline Each dot represents a pair of complaints where one half of the pair is a seed from the test set and the other halfof the pair is another complaint in that pattern Model performance is better than baseline performance for each pair of sample complaints that lies below the diagonal linethat is the algorithm gives a ranking closer to the top of the list than the baseline algorithm

6 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 7: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

Fig 4 Feature importances for Patternizr Feature importance is one way to gauge the relative contribution of different features to the overall classification of a randomforest Feature importances are calculated by measuring the Gini impurity reduction at each node at which that feature is used weighing these reductions by the number ofsamples that were routed to that node and then averaging these calculations for each feature across all trees in the forest in which that feature is used

test patterns In reality this performance is likely an under-estimation given that investigators have likely missed actualmatches in the historical corpus

To test whether Patternizr improves investigator efficiencywe built a simple baseline algorithm for comparison Thebaseline algorithm assumes that investigators examine com-plaints nearest in time to the seed complaint first while al-ways limiting themselves to complaints in the same precinctas the chosen seed This method provides a baseline rankedlist that we can compare with Patternizrrsquos ranked results Werestricted the comparison with local patterns in the test setbecause the baseline algorithm would perform poorly (andunrealistically) on patterns that span multiple precincts Wecompare the ranking of that complaint with how it appears inthe ranked list generated by Patternizr for all the local pat-terns in our test data set (Figure 3)

We found that for all three crime types Patternizr is typ-ically an improvement on the baseline algorithm this esti-mate is conservative because Patternizr is especially helpfulon patterns spanning multiple precincts

In Figure 4 we show the top ten feature importances foreach model The features for distance days apart and narra-tive information play a top role in all three models NYPDrsquospattern subject matter experts confirmed that these three fea-tures contain key information for pattern creation Other topfeatures include similarity between other unstructured textfields and time-of-day and day-of-week similarity None ofthe top ten features for each crime type was particularly sur-

prising to our subject matter experts Note that these calcula-tions only measure importance they do not indicate how pos-itive or negative changes in the feature will affect the modelscore

It is important to measure whether the recommendationsthat Patternizr makes are not only valuable but also fair par-ticularly given the growing concern that predictive policingtools may perpetuate disparate impact (29ndash31) We intention-ally designed the algorithm to minimize disparate impact onany specific group First the algorithm is completely blindto sensitive information about potential suspects includingrace and gender which were not included as a similarityfeature for the predictive model Second we kept poten-tial proxy variables for sensitive informationmdashparticularlylocationmdashextremely coarse to ensure correlation with sensi-tive attributes had a very low degree of certainty while retain-ing some very general information about location Finallyand most important several levels of expert human revieware still required to establish a pattern minimizing the poten-tial for a seemingly likely (but incorrect) recommendation toresult in any enforcement action

To confirm that these precautions were effective we testedwhether any of the three models recommended pairs with acertain racial group at a different frequency than either ran-dom pairing or existing identified patterns We randomly se-lected 5ndash10 million test pairs from each crime type and drew1000 bootstrapped samples from each of these samples toestimate uncertainty (Figure 5)

Fig 5 Proportion of pairs by race for random historical pattern and Patternizr-recommended pairs Confidence intervals are calculated from the range of values thatappear in 1000 bootstrapped samples Racial categories correspond to the official descriptions listed on NYPD complaint forms Notably many types of grand larceny willnot contain suspect information because there were no witnesses to the crime thus producing relatively higher levels of uncertainty for grand larceny than for the other twocrime types

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 7

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 8: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

Fig 6 Screenshot of Patternizr in the NYPD Domain Awareness System The map of the seed and most similar complaints is in the left panel the seed complaint is in themiddle panel and the most similar complaints are in the right panel Grey boxes obscure law enforcement sensitive and personally identifiable information

For all three crime types the proportion of Patternizr pairswith a similarity score greater than 05 which included a sus-pect description of any race was either consistent within the95 confidence limit or lower than a random sampling ofall pairs That is this analysis uncovered no evidence thatPatternizr recommends any suspect race at a higher rate thanexists with random pairing

Implementation

We incorporated finalized models into the back end of thedepartmentrsquos Domain Awareness System (DAS) a citywidenetwork of sensors databases devices software and infras-tructure (9 21) We processed all historical pairs of com-plaints in parallel in the cloud against ten years of recordsfor burglaries and robberies and against three years of grandlarcenies This historical load took 194 days on approxi-mately 1600 cores For new and revised complaints simi-larity scores are calculated and updated three times each dayEach new or revised crime is scored against the entire corpusof crimes before it is incorporated into DAS for querying byusers throughout the department

Officers and analysts can access Patternizr through the cus-tom software application which is a part of DAS on NYPDdesktop computers To use Patternizr an investigator pressesa ldquoPatternizerdquo button on any seed complaint to retrieve allavailable calculated similarity scores Investigators are pre-sented with a rank-ordered list of results from most to leastsimilar to the seed with ten results displayed at a time Foreach result the distance time apart and algorithm-calculatedsimilarity score are all displayed A map displays the seed

and similar results (Figure 6) and users can read a few impor-tant details about each complaint before choosing to examinethe seed and result complaints side by side in full detail Theuse of a ranked list as opposed to a similarity score thresholdallows us to always display the most similar crimes thus theanalysts always have a starting point for their analysis

The user interface gives investigators the ability to collab-orate with the algorithm using their specialized expertise Ifan investigator has reason to believe that a certain MO isparticularly relevant for a given pattern (eg only burglar-izing tools from construction sites) the investigator can ac-tively filter and query the result list A general text searchand filters for distance time apart presence of arrest andpremise type are all available and may be used simultane-ously The remaining results are still sorted by Patternizrrsquossimilarity score The filters do not reorder the results theyonly exclude crimes that do not meet the search criteria Inthis way investigators can both search for a specific MOand utilize the rank order of the results list to immediatelydisplay the most similar results given the manually specifiedMO criteria

After approximately two years of development includingwork on the algorithm backend systems and the user in-terface Patternizr was deployed to production in Decem-ber 2016 In parallel throughout 2017 the NYPD hired ateam of approximately 100 civilian analysts deployed in thefield to perform crime analysis These new crime analystswere trained to use Patternizr as part of their daily routineFor example a crime analyst assigned to a precinct typi-cally reviews crimes that occurred in his or her precinct tosee whether they are part of a pattern The uniformed offi-

8 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 9: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

cer that had been assigned this task before the crime analystswere hired used manual methods to perform this review thisreview is now conducted using Patternizr

Between January 2018 (with crime analysts at fully staffedlevels) to the time of this paperrsquos writing (July 2018) ap-proximately 400 complaints per week were run through Pat-ternizr and this number is trending upward This representsapproximately 30 of all the burglaries robberies and grandlarcenies recorded by the NYPD during that period Approxi-mately 80 of the usage comes from the new crime analystswith the remaining 20 coming from uniform officers per-forming crime analysis and investigatory duties Addition-ally crime analysts created more than 90 preliminary pat-terns per week in the same period using Patternizr in DASThese patterns are considered preliminary because they havenot yet been reviewed and approved by the relevant special-ist units however the volume of preliminary patterns cre-ated with Patternizr demonstrates the value of the tool De-bra Piehl the NYPDrsquos Senior Crime Analyst says of the toolldquoPatternizr dramatically improves efficiency compared to tra-ditional methods and it still allows the analysts that work forme to apply their own thinking and analysis The sciencedoesnrsquot overwhelm the artrdquo (32)

Crime analysts have also provided specific instances inwhich official patterns were created because the Patternizralgorithm recognized similarities between crimes that the an-alysts would otherwise have found difficult to notice Forexample one analyst was examining a recent robbery in herprecinct the perpetrator was shoplifting power drills froma commercial establishment and when confronted attackedand injured an employee with a hypodermic needle The an-alyst ran the complaint through Patternizr and the algorithmreturned an additional robbery in a distant precinct wherethe perpetrator was shoplifting a drill and threatened an em-ployee with a hypodermic needle This robbery was likelyidentified by the algorithm because of high similarity scoresfor the time of occurrence the robbery subtype (began asshoplifting) the weapon used and several matching words inthe narratives (eg drill needle) The investigators combinedthese two complaints into an official pattern along with twoother larcenies committed by the same perpetrator and thepattern was then passed to the detective squad The NYPDconducted an investigation and arrested the perpetrator wholater pled guilty to larceny and felony assault and is currentlyawaiting sentencing

A second example involves the discovery of a grand larcenyof unattended-property pattern In this example an analystwas examining the theft of a watch from a locker at a hotelgym in Midtown Manhattan in the late afternoon The an-alyst used Patternizr on this complaint and discovered threeother complaints with the same MOmdashjewelry and watcheswere removed from a locker in a fitness facility in the lateafternoon or early evening in the same area The algorithmlikely recognized similarities in the complaintsrsquo times of oc-currence their geographic proximity the property removed(jewelry and watches) the premise types (gyms) and keywords in the narrative (eg locker) Two suspects were iden-

tified through the review of video footage and the NYPDrsquosinvestigation is ongoing

The helpful feedback we have received from our users hashighlighted potential improvements to Patternizr For exam-ple users have requested that the grand larceny algorithmalso calculate similarities to petit larceny complaints whichdiffer from grand larcenies only in the value of the itemsstolen Users have also requested the ability to compareacross crime types such as robberies and grand larcenieswhere the only difference is whether force was used Weare considering adding this functionality when we deploy asecond version of the Patternizr algorithms

ConclusionPatternizr is a new effective and fair recommendation en-gine deployed by the NYPD to help investigators group re-lated crimes To our knowledge this is the first time suchan algorithm has been deployed to production in a law en-forcement environment It joins a growing list of machinelearning applications customized for public safety and for thepublic sector in general Patternizr is also one of a long lineof data-driven tools prototyped and developed at the NYPDincluding CompStat (7) These tools when used properly en-courage precision policing approaches instead of widespreadheavy-handed enforcement techniques and enable investiga-tors to focus on the art of policing instead of rote and unin-teresting work We expect that other departments and localagencies in general will continue this line of research by ex-tracting similar value from existing government data

Bibliography1 Walt L Perry Brian McInnis Carter C Price Susan C Smith and John S Hollywood Predic-

tive policing The role of crime forecasting in law enforcement operations Rand Corporation2013

2 Andrew Guthrie Ferguson The rise of big data policing Surveillance race and the futureof law enforcement NYU Press 2017

3 Greg Ridgeway Policing in the era of big data Annual Review of Criminology 1401ndash4192018

4 Richard C Larson Urban police patrol analysis volume 28 MIT Press Cambridge MA1972

5 Linda V Green and Peter J Kolesar Improving emergency responsiveness with manage-ment science Management Science 50(8)1001ndash1014 2004

6 Jack Maple The crime fighter How you can make your community crime free Broadway2000

7 Vincent E Henry The COMPSTAT paradigm Management accountability in policing busi-ness and the public sector Looseleaf Law Publications 2002

8 George O Mohler Martin B Short Sean Malinowski Mark Johnson George E Tita An-drea L Bertozzi and P Jeffrey Brantingham Randomized controlled field trials of predictivepolicing Journal of the American statistical association 110(512)1399ndash1411 2015

9 ES Levine Jessica Tisch Anthony Tasso and Michael Joy The New York City PoliceDepartmentrsquos Domain Awareness System Interfaces 47(1)70ndash84 2017

10 Richard Berk and Justin Bleich Forecasts of violence to inform sentencing decisions Jour-nal of Quantitative Criminology 30(1)79ndash96 2014

11 Jessica Saunders Priscillia Hunt and John S Hollywood Predictions put into practice aquasi-experimental evaluation of Chicagorsquos predictive policing pilot Journal of ExperimentalCriminology 12(3)347ndash371 2016

12 Nelson Lim Carl Matthies Brian Gifford and Greg Ridgeway To protect and to serveenhancing the efficiency of LAPD recruiting volume 881 Rand Corporation 2009

13 David Weisburd Elizabeth R Groff Greg Jones Breanne Cave Karen L Amendola Sue-Ming Yang and Rupert F Emison The Dallas patrol management experiment can AVLtechnologies be used to harness unallocated patrol time for crime prevention Journal ofExperimental Criminology 11(3)367ndash391 2015

14 Jennifer Helsby Samuel Carton Kenneth Joseph Ayesha Mahmud Youngsoo Park An-drea Navarrete Klaus Ackermann Joe Walsh Lauren Haynes Crystal Cody et al Earlyintervention systems Predicting adverse interactions between police and the public Crimi-nal justice policy review 29(2)190ndash209 2018

15 Tong Wang Cynthia Rudin Daniel Wagner and Rich Sevieri Learning to detect patternsof crime In Joint European conference on machine learning and knowledge discovery indatabases pages 515ndash530 Springer 2013

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 9

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 10: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

16 Michael D Porter A statistical approach to crime linkage The American Statistician 70(2)152ndash165 2016

17 Robert M Bell and Yehuda Koren Lessons from the Netflix prize challenge SiGKDD Ex-plorations 9(2)75ndash79 2007

18 New York City Police Department Citywide seven major felony offenses 2000ndash2016 2017Accessed 2017-12-20 httpwww1nycgovassetsnypddownloadspdfanalysis_and_planningseven-major-felony-offenses-2000-2016

pdf19 Marvin E Wolfgang Robert M Figlio and Thorsten Sellin Delinquency in a birth cohort

University of Chicago Press 198720 Samantha L Gwinn Christopher W Bruce Steven R Hick and Julie P Cooper Exploring

crime analysis Readings on essential skills International Association of Crime Analysts2008

21 ES Levine and JS Tisch Analytics in action at the New York City Police DepartmentrsquosCounterterrorism Bureau Military Operations Research 19(4)5ndash14 2014

22 Trevor Hastie Robert Tibshirani and Jerome Friedman The elements of statistical learningprediction inference and data mining Springer-Verlag New York 2009

23 Shyam Boriah Varun Chandola and Vipin Kumar Similarity measures for categorical dataA comparative evaluation In Proceedings of the 2008 SIAM international conference ondata mining pages 243ndash254 SIAM 2008

24 James H Martin and Daniel Jurafsky Speech and language processing An introductionto natural language processing computational linguistics and speech recognition Pear-sonPrentice Hall Upper Saddle River 2009

25 Leo Breiman Random forests Machine learning 45(1)5ndash32 200126 Fabian Pedregosa Gaeumll Varoquaux Alexandre Gramfort Vincent Michel Bertrand Thirion

Olivier Grisel Mathieu Blondel Peter Prettenhofer Ron Weiss Vincent Dubourg et alScikit-learn Machine learning in Python Journal of machine learning research 12(Oct)2825ndash2830 2011

27 James Bergstra and Yoshua Bengio Random search for hyper-parameter optimizationJournal of Machine Learning Research 13(Feb)281ndash305 2012

28 Christopher D Manning Raghavan Prabhakar and Schuumltze Hinrich Introduction to infor-mation retrieval Cambridge University Press 2008

29 Kristian Lum and William Isaac To predict and serve Significance 13(5)14ndash19 201630 Sam Corbett-Davies Emma Pierson Avi Feller Sharad Goel and Aziz Huq Algorithmic

decision making and the cost of fairness In Proceedings of the 23rd ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining pages 797ndash806 ACM2017

31 Andrew Guthrie Ferguson Policing predictive policing Wash UL Rev 941109 201632 Debra Piehl Conversation with Evan Levine June 22 201833 Daniel Jurafsky and James H Martin Speech and language processing An introduction

to speech recognition computational linguistics and natural language processing UpperSaddle River NJ Prentice Hall 2008

AcknowledgementsThe authors acknowledge the support of the men and womenof the New York City Police Department In particular theauthors thank Police Commissioner James P OrsquoNeill theformer Police Commissioner William J Bratton Chief LoriPollock Chief Dermot Shea Deputy Chief Timothy McCor-mack Inspector Anthony Tasso Deputy Inspector Brandondel Pozo Captain John Hall Lt Juan Duque Lt PhillipMcEneaney Sgt Christopher Bray Sgt Joseph Klub-nick Sgt Sean McCarthy Deputy Commissioner JessicaTisch Assistant Commissioner Richard Schroeder AssistantCommissioner Ronald Wilhelmy Director Martha NorrickDaniel Montiel and Debra Piehl Special thanks to TongWang Cynthia Rudin Lt Daniel Wagner Theo DamoulasRavi Shroff and Joe McLaughlin

Alex Chohlas-Woodrsquos current affiliation is Stanford Univer-sity Stanford CA

Appendix A Feature ExtractionIn this section we provide the mathematical details involvedin our feature extraction process

Categorical Attributes (k = 16 through k = 24) Define Nto be the number of documents in the corpus First we countthe number of documents in the corpus where attribute k hasthe value x we label these counts nk

x Then

Xkij =

1minussum

y st nkylenk

match

nky(nk

yminus1)N(Nminus1)

if documents i

and j match onattribute k

0 otherwise

where the document count nkmatch is defined as the number of

documents in the corpus that have the same value of attributek as documents i and j (23)

Suspect Height and Weight Attributes (k = 25 and k =26) Call the minimal absolute difference between the valueof a continuous variable associated with suspect h of com-plaint i to that of any suspect in complaint j aijh In math-ematical form using the weight attribute wih of suspect h incomplaint i as an example

aijh = minf

[|wihminuswjf |

]

where the minimum function runs over all suspects f in com-plaint j For example if complaint 1 has suspects of weights140 pounds (lbs) 180 lbs and 220 lbs and complaint 2 hassuspects of weights 150 lbs and 215 lbs then a121 = 10 lbsa122 = 30 lbs and a123 = 5 lbs After calculating thesecomparisons we transform them into a similarity via

Xkij =max

[exp(minusPi

Pisumh=0

aijh

Pi +Pj

)exp

(minusPj

Pjsumh=0

ajih

Pi +Pj

)]

where Pi is the number of suspects for complaint i

Suspect Force Used (k = 27 ) In this case aijh is just thenumber of suspects in complaint j with the same value of thecategorical feature as suspect h in complaint i

Xk=27ij = max

[Pi

Pisumh=0

aijh

Pi +PjPj

Pjsumh=0

ajih

Pi +Pj

]

Unstructured Text for Robberies and Grand Larcenies(k = 32 Through k = 37 ) Treating each attribute k sepa-rately let the term frequency tfxi be the count of word x indocument i let the document frequency dfx be the number ofdocuments in the corpus containing word x and let M be thenumber of unique words in the corpus Then for each wordx we calculate the inverse document frequency (33)

idfx = log( N

dfx

)+1

The ldquo+1rdquo in this equation ensures that words that occur inevery document in the training corpus will not receive a zerofor inverse document frequency which would preclude thosewords from contributing to the eventual cosine similarity Foreach word x and document i we calculate

10 Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training
Page 11: A Recommendation Engine to Aid in Identifying Crime Patternscynthia/docs/Patternizr.pdf · A Recommendation Engine to Aid in Identifying Crime Patterns Alex Chohlas-Wood* andE.S.

sxi = tfxitimes idfx

We then build the TFIDF vector Si for each document i

Si =[s0i sMi

]

Finally for each pair of documents i and j we calculatethe cosine similarity between the two TFIDF vectors

Xkij = SitimesSj

SiSj

Appendix B Random Forest Model TrainingTrees are recursively built by identifying the best feature kand split s at each node m which minimize the Gini indexa measure of node purity for the two descendent nodes (22)Let R be the subset of feature values Xk left or right of thesplit s and let pqm be the proportion of class q in node mThen

Rleft(ks) =

X|Xk le s

Rright(ks) =

X|Xk gt s

pqm = 1Nm

sumxiisinRm

I(yi = q)

where pqm is the estimated sample proportion of class q inleft or right node m Nm is the number of samples in nodem and I(yi = q) is an indicator function that returns 1 whensample i belongs to class q and returns 0 otherwise We setq = 1 (ie positive examples) and find the combination of kand s that minimizes the Gini index as follows

minks

[2(

p1left(1minus p1left

))+2(

p1right(1minus p1right

))]

In addition the group of features Km available for split-ting at each node is chosen from a randomly selected subsetof all available features K Some or all of these features maybe available for splitting at each node from these availablefeatures the model selects the feature and split that best sep-arate the child nodes into purer groups of either in-pattern orunrelated crime pairs

Chohlas-Wood and Levine | A Recommendation Engine for Crime Patterns 11

  • Feature Extraction
  • Random Forest Model Training