Criminal Incident Data Association Using OLAP Technology
Transcript of Criminal Incident Data Association Using OLAP Technology
![Page 1: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/1.jpg)
Criminal Incident Data Association Using OLAP Technology
Donald E. Brown & Song LinDepartment of Systems & Information
EngineeringUniversity of Virginia
![Page 2: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/2.jpg)
Summary
In this paper, we combine OLAP (Online Analytical Processing) and data mining to associate criminal incidents.This method is tested with a robbery dataset from Richmond, Virginia
![Page 3: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/3.jpg)
Objectives of Spatial Knowledge MiningLeverage DBMS (records management), OLAP, & GISFind spatial-temporal patterns and relationships in dataSupport crime analysis & information sharing
![Page 4: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/4.jpg)
Related Applications - UVa
ReCAP Regional Crime Analysis Program Provides support for regional analysis using RDBMS Requires implementation on each client computer
CARV Crime Analysis and Reporting in Virginia Runs on Citrix Metaframe, so the number of concurrent
users is limited
GRASP Geospatial Repository for Analysis and Safety Planning Web interface for a central repository of criminal incident
data and geospatial files
![Page 5: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/5.jpg)
Outline
IntroductionExisting studies on OLAP & data miningCombined approachApplicationConclusions
![Page 6: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/6.jpg)
Introduction (crime association)
80-20 rule: 20% of the criminals commit 80% of the crimesHow can we link criminal incidents committed by the same criminal?Start by looking at the same crime types
![Page 7: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/7.jpg)
Theories of criminal behavior (criminology)
Rational choice (Clarke and Cornish) Criminals evaluate “benefit” and
“risk”, make rational decisions to maximize “profit”.
Routine activity (Felson) A ready criminal Suitable target Lack of effective guardian
![Page 8: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/8.jpg)
Theories of criminal behavior (template)
“Template” (Brantingham & Brantingham) Environment sends out cues about its
characteristics Criminals use cues to evaluate Template is built to associate certain cues
with suitable targets Template is self-reinforcing and enduring A criminal does not have many templates
![Page 9: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/9.jpg)
An operational approach to the theories (template)
Criminal incidents committed by the same person Similar patterns in time Similar patterns in space Similar patterns in MO
It is possible to associate incidents from the same person by discovering these patterns
![Page 10: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/10.jpg)
Existing Association Methods & Systems
AREST (Badiru et al.) Suspect matching
ViCAP (FBI) Incident matching
COPLINK (U. Arizona) Link search terms with cases (concept
space)
![Page 11: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/11.jpg)
Existing Association Methods & Systems
TSM (Brown) Total similarity measures Could be used for both incidents and
suspects matching
SQL Used by analysts in practice
![Page 12: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/12.jpg)
Comments on existing methods
Computer technologies are central to criminal incident associationFor example MIS Databases Information Retrieval GIS
![Page 13: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/13.jpg)
Comments on existing methods
Two additional techniques that enable incident association Data Warehousing / OLAP Data Mining
We develop a method thatseamlessly integrates OLAP and data
mining.
![Page 14: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/14.jpg)
Related Work on OLAP and data mining
OLAP Ancestor: OLTP (transactional data) OLAP: (summary data for analysis) Dimension:
OLAP data is multidimensional Dimension: numeric or categorical
attributes Hierarchical structures exist in dimensions
Aggregates: Sum, count, average, max, min, …
![Page 15: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/15.jpg)
OLAP and Data Mining
Both of them are powerful tools to support decision making process, but OLAP focus on efficiency, few
quantitative analysis methods are used Data mining is typically for 2-D dataset
(spreadsheets), not for multidimensional OLAP data structures
Idea: combine them
![Page 16: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/16.jpg)
Existing studies on combining OLAP and Data mining
Cubegrade Problem (Imielinski) Generalized version of association
rule Association rule: change of “count”
aggregate imposing another constraint, or perform a “drill-down” operation
Other aggregates could also be considered
![Page 17: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/17.jpg)
Existing studies on combining OLAP and Data mining
Constrained Gradient Analysis Retrieve pairs of OLAP cells
Quite different in aggregates Similar in dimension (parents, children,
siblings) More than one aggregate could be
considered simultaneously (e.g., sum and mean).
![Page 18: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/18.jpg)
Existing studies on combining OLAP and Data mining
Data driven exploration (Sarawagi) Find “exceptions” Mean and STD are calculated for a
cell If the aggregate of the cell is outside
the (-2.5, +2.5) exception OLAP version of “3” rule
![Page 19: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/19.jpg)
Associating records by finding distinctive values or outliers
Basic idea If a group of records have common characteristics, and
these “common” characteristics are unusual or “outliers”, we are more confident in asserting that these records come from the same causal mechanism.
Look for distinctive characteristics – the best would be DNA
![Page 20: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/20.jpg)
OLAP-outlier-based method to associate records
Rationale for distinctive values or outliers Weapon used in robberies “gun” – very common, hard to associate “Japanese sword” – distinctive, come from
the same person
We build an outlier score function to measure this “distinctiveness”, Higher score more distinctive more
confident to associate It is for categorical attributes (MO is
important in linking criminal incidents)
![Page 21: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/21.jpg)
Definitions
Cell, Parent, Neighbor Cell: a vector of values for some
attributes. Parent: replace one attribute of the
cell with wildcard element “*”. Neighbor: A group of cells having the
same Parent.
Derive from OLAP field
![Page 22: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/22.jpg)
Illustration -- Cell
Dimension 1
Dimension 2
a1 a4a3a2
b1
b2
b4
b3
Two-Dimension Cell
(a 4,b 2)
One-Dimension Cell
(*,b 4)
![Page 23: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/23.jpg)
Illustration --parent
a1 a2 a4 a5a3
b4
b3
b2
b1
Cell (a5,b3) has two parents: (a5, *) and (*,b3)
![Page 24: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/24.jpg)
Illustration -- Neighbor
Neighbor is a collection of cells sharing the same parent
![Page 25: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/25.jpg)
Outlier Score Function
We start building this function from one dimension, and then we generalize to higher dimensions.For one dimension, we have the following two observations. Values with small probability
(frequency) are more “unusual” Outlier score is high when the
uncertainty level is low.
![Page 26: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/26.jpg)
Observation I
Blond Brown Black Red Gray
HairColor
0
10
20
30
40
50
Cou
nt
P=0.1Outlier
For attribute “color”, value “blond” covers 10% of the records. Hence, it should get a higher outlier score.
![Page 27: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/27.jpg)
Observation II
Blond Brown Black Red Gray
HairColor
0
20
40
60
80
Count
Blond Brown
HairColor
0
20
40
60
80
Coun
t
Although both of them have frequency=0.2, the left one is more “unusual”, because the uncertainty level is low.
![Page 28: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/28.jpg)
Observation III
“more evidence” More evidence is better than less
higher outlier score
![Page 29: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/29.jpg)
OSF for One Dimension
-log(p) comes from information theory, where p is the probability of a valueEntropy measures the information in a message (in this case, in a data record)
Entropy
pOSF
)log(
![Page 30: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/30.jpg)
OSF for Higher Dimensions
For any cell, calculate the sum of the OSF of its parent cell and the OSF conditional on the neighbor of this cell. (one-dimension OSF)Do this calculation for all parent cells.Take the maximum as the outlier score for this cell.
)(*,*,...,*0
))(
))(log()),(((max
)(c
cofneighborkEntropy
cfrequencykcparentf
cf th
![Page 31: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/31.jpg)
Association (using this OLAP-outlier method)
For a pair of incidents (A,B) If there is a cell that contains both A
and B And the outlier score of this cell is
large enough (threshold test) Associate them
![Page 32: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/32.jpg)
Application (dataset)
Applied to a robbery dataset (Richmond, VA, 1998) Why robbery?
For evaluation purpose # of multiple offenses > murder # of known suspects > B & E
![Page 33: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/33.jpg)
Attributes
Three attributes Modus Operandi -- categorical Census Features -- numeric Distance Features – numeric
![Page 34: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/34.jpg)
Feature Selection
Redundant features feature selection Cluster features (similar features in the
same group) Pick a representative feature for each
group Method: k-medoid clustering
Applicable to distance matrix Return “medoids”
![Page 35: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/35.jpg)
Feature Selection Result
Component 1
Co
mp
on
en
t 2
-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
-0.6
-0.4
-0.2
0.0
0.2
0.4
These two components explain 44.25 % of the point variability.
Medoids -- 1 : HUNT 2 : ENRL3 3 : TRANS.PC
![Page 36: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/36.jpg)
Final Selected Features
Medoids HUNT (housing unit density) ENRL3 (public school enrollment)
POP3 (population:12-17) more meaningful (attacker and victims)
TRAN_PC (transportation expense per capita) MHINC (median income)
![Page 37: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/37.jpg)
Discretize
Discretize these numeric features into bins Similar to histogram Sturges’ number of bins rule
![Page 38: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/38.jpg)
Evaluation
For incidents with known suspects (170) Generate all incident pairs If a pair of incidents have the same
criminal suspect, then “true association”
Compare results given by the algorithm with the “true result”
![Page 39: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/39.jpg)
Evaluation Criteria
Two measures Detected true associations
Larger is better Average number of relevant records
Similar to search engines like “google” Given one record, system return a list Take the average of the length of all lists Shorter is better.
![Page 40: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/40.jpg)
Evaluation Criteria (cont.)
From information retrieval Recall: ability to provide relevant
items Precision: ability to provide only
relevant items
1st measure is “recall”; 2nd is equivalent to “precision”2nd also measures the user effort (in further investigation)
![Page 41: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/41.jpg)
Result (OLAP-outlier based)
Threshold Detected true associations
Avg. number of relevant records
0 33 169.00 1 32 121.04 2 30 62.54 3 23 28.38 4 18 13.96 5 16 7.51 6 8 4.25 7 2 2.29 0 0.00
![Page 42: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/42.jpg)
Result of binary association method (calculating similarity score)
Threshold Detected true associations Avg. number of relevant records 0 33 169.00
0.5 33 112.98 0.6 25 80.05 0.7 15 45.52 0.8 7 19.38 0.9 0 3.97 0 0.00
![Page 43: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/43.jpg)
Comparison Outlier vs. Binary
0
5
10
15
20
25
30
35
0 20 40 60 80 100 120 140 160 180
Avg. relevant records
Similarity
Outlier
![Page 44: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/44.jpg)
Comparison (cont.)Generally, the curve of our method lies above the other one Given the same accuracy level, this method
returns less records Keep the same “length” of the list, this
method is more accurate
The other method is better at the tail However, that means the average number of
relevant records is > 100 Given the size is 170, no analyst would
investigate 100 incidents.
Generally, the new method is effective.
![Page 45: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/45.jpg)
Comparison(Outlier vs. Simple Combination)
0
5
10
15
20
25
30
35
0 50 100 150 200
Similarity
Outlier
Combine
![Page 46: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/46.jpg)
WebCAT Implementation
A secure web environment that can read several data formats, translate them into a uniform standard (XML)Uses free, open-source technology ASP, XML, MapServer, SVG, etc.
Provides tools to meet spatial and statistical analysis needs, to include associationProvides utilities for querying and reporting
![Page 47: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/47.jpg)
Conclusions
Developed a new data association method for linking criminal incidents that combines Concepts in OLAP (multidimensional) Ideas in data mining (outlier detection)
Testing with a robbery dataset shows promiseDeployment through WebCAT provides open source (XML-based) capability for data access and analysis over the web
![Page 48: Criminal Incident Data Association Using OLAP Technology](https://reader035.fdocuments.net/reader035/viewer/2022062513/555cc902d8b42a64718b56bc/html5/thumbnails/48.jpg)
Questions?