Wyland Crime Data base management system.
-
Upload
rohit-kumar -
Category
Documents
-
view
223 -
download
0
Transcript of Wyland Crime Data base management system.
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 1/40
Crime Data Visualizationand Spatial Database
Management SystemMichael Wyland
CS 6604 – Virginia Tech – NVC
24 April 2008
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 2/40
2
Agenda
Overview Project Context & Objectives Design
– Database design – Data Loading – Optimization – Application Interface
Extensions – Hotspot Detection & DBSCAN
Future Work Demonstration
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 3/40
3
Overview
Implemented Crime Spatial DatabaseManagement System components
– Crime Data Engine
– Mapping and Visualization Capabilities
http://crime.dnsalias.com:9090/crime/
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 4/40
4
Project Context
Design and implement a Crime SpatialDatabase Management System and
Application to support
– General Public (awareness, safety)
– Police officer (optimize resources)
– Crime analyst (trending, patterns)
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 5/40
5
Project Objectives
Configure project infrastructure includingSpatial DBMS
Implement physical model for crime data
Load Fairfax County Police Departmentdata into the database
Develop required queries (Window and K-Nearest Neighbor)
Optimize query performance
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 6/40
6
Project Objectives (cont‟d)
Implement Hotspot Identification (Cluster)
Integrate mapping and visualizationinterfaces
Create a publishable quality site
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 7/40
7
Data Structure
Crime records need to have someconsistent attributes
– Date/time of crime
– Type of crime
– Narrative or description of the crime
– Address of the crime
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 8/40
8
Data Structure (cont‟d)
Spatial database and application requiresthat we store some additional info
– Crime ID
– Latitude/longitude location of crime
– Spatial object
– Geocode accuracy
A single relation was defined:
Crime_ID Crime_type Crime_address
TCRIMES
Crime_dt Crime_geo_location
Narrative Geocode_lat Geocode_lon Crime_city Geocode_accuracy
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 9/40
9
Loading the Data
Wanted to pursue the Fairfax CountyPolice Department data
– Seemed like everyone else was doing City of
Falls Church Police Department data…
– FFX County is also a larger dataset both innumber of records and space to play with
SDBMS?????
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 10/40
10
Loading the Data (cont‟d)
Several Challenges – All crime reports were in .pdf or .doc format
which are difficult to access without an API
– No standard format/structure exists, so eachcrime record has a slightly different format
– To find date/time attribute requires literallyreading the report text
– Reports actually contain some errors, likeincorrect crime types in some cases
– Non-specific locations and times
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 11/40
11
Data Loading Strategy
Did not design a parser
– Given the challenges above, short timeline, andpurpose of this class (Not „Parsers 101‟)
Designed web-based data input interface Required human assistance to manually read the
reports and enter the crime info
– My wife entered about 1251 records into my inputinterface
– 8 of 95 .doc files were reviewed and input
– Many assumptions made to correct the data
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 12/40
12
Data Loading Strategy (cont‟d)
Enter the “block” address
Address is GeoCode‟d using Google Maps API via Geocoding object and JSONresponse object
Enter date/time, crime type, narrative
Record saved in crimes table along withGeoCode results
– No need to GeoCode on-the-fly later
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 13/40
13
Performance Planning
Indexes
– Normal primary key indexes implemented
– Implemented R-tree index for spatial objects
– Hints for spatial index are used in my queries, basedon Oracle Spatial best practice documentation
Examined Oracle Query Plans
– Confirmed that the required queries are utilizing the
normal and spatial indices and working efficiently
System responds in a timely fashion to multipleconcurrent, large queries
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 14/40
14
Query Processing
Web app builds Dynamic SQL based on user input
Web app interfaces with database (CDE) and passes SQL
CDE retrieves data and passes data set back
Web app binds data to visualizations
CDEEntity Type Selection
(corners, dist., crime type)
Entity Value Selection(set or range)
Time Range
Web App(UI)
Web App(Visualization)
Query Type(window, k-near., range)
Data Set
Charts
Data Grids
Maps
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 15/40
15
Query Support
Window Query
– Search inside a dragged box
K-Nearest Neighbor Query
– Click a spot and choose # of results desired
Range Query or “Circle Search”
– Click a spot and choose a distance
All queries can be limited by a date/timerange and crime type
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 16/40
16
System Platform
Software – Windows XP Home
– IIS 5 Availability
– Oracle 10g Express Editionwith Oracle Locator(reduced, free version of Oracle Spatial) Familiarity
Licensing
Just enough features – Google Maps API
– Dundas Chart for ASP.NETProfessional Evaluation
Hardware – Server: Dell Dimension
8400
– Client: Dell Inspiron 2650Notebook
Development – Visual Studio 2005
– ASP.NET in Visual Basic
– Oracle Data Provider for.NET (ODP.NET)
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 17/40
17
Demo Architecture
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 18/40
18
Crime Hotspot Identification
Chose to implement Crime Hotspot Identificationusing the DBSCAN algorithm – DBSCAN is a density-based spatial clustering
algorithm driven by two parameters “Eps” – epsilon radius from each point
“MinPts” – minimum neighbors to categorize points
– Straightforward approach makes for timely integrationinto the project
– Density-based approach is good for crime hotspots – Resistance to noise makes DBSCAN a good choice for
this application
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 19/40
19
DBSCAN Review
Density = # of points within a specified radius
– Density is a “score” for each point based on Eps
Each point in the database is categorized
– A core point has more than a specified number of points (MinPts) within Eps
– A border point has fewer than MinPts within Eps butis in the neighborhood of a core point
– A noise point is any point that is not a core or border
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 20/40
20
DBSCAN Implementation
To support incremental application of theDBSCAN algorithm, an additional relationwas defined:
For each point this stores the densityscore for a given Eps, a categorizationbased on MinPts, and a cluster label
Crime_ID Density Eps MinPts Category
TCLUSTERS
Cluster_Label
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 21/40
21
DBSCAN Implementation
First step is to calculate a density score forall points, based on Eps
This is implemented using the
sdo_within_distance function for eachpoint, and counting results (grouped) – Results are inserted into the clusters table
For points which have no neighbors withinEps, they are directly added to theclusters table with a zero Density score
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 22/40
22
DBSCAN Implementation
Now that we have Density scores for all points,we need to categorize them – Update the cluster table to mark each point as core,
border, or noise
If Density score >= MinPts – Mark as a core point
If Density score < MinPts but the point is within
Eps distance of a core point – Mark as a border point
If not marked as core or border – Mark as a noise point
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 23/40
23
DBSCAN Implementation
Now that we have categorized each pointas core, border, or noise we can get ourfirst visual look at the results…
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 24/40
24
DBSCAN Point Categorization
Core, Border, Noise
– This example uses Eps=0.5, MinPts=20
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 25/40
25
An Early Insight
This example visually confirms that thereare many noise points, so DBSCAN lookslike a smart choice of clustering algorithm
– DBSCAN is fairly resistant to noise
– Note: level of noise is similar with other Epsand MinPts values
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 26/40
26
DBSCAN Implementation
Next we need to actually identify and labelthe clusters
But first we eliminate noise points
– Simply delete these from the clusters table
Now let‟s review the labeling algorithm…
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 27/40
27
DBSCAN Cluster Labeling
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 28/40
28
DBSCAN Cluster Labeling
So for each core point…
If the core point is not in a cluster already, labelit as part of a new cluster
Find all of its neighbors within distance Eps, andif they are not in a cluster already, label them aspart of the same cluster too
Then move on to the next unlabeled core point
(start of next cluster) Using sdo_within_distance and loops, these
labels are updated in the clusters table
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 29/40
29
DBSCAN Implementation
Now that we have labeled the clusters, wecan get a visual look at the final results…
– This example uses Eps=0.25, MinPts=20
– Notice Tysons, Rt 1 corridor, Springfield Mall…
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 30/40
30
DBSCAN Implementation
Implemented DBSCAN as a storedprocedure to be easily called with varyingEps and MinPts (and time range, crime
types, etc.) – Recalculates all clusters relatively quickly
Let‟s review how we optimize Eps andMinPts…
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 31/40
31
DBSCAN Optimization
Uses concept that k th nearest neighborsare at some similar distance in a cluster
So we pick a k-value and sort all thepoints‟ distance to that neighbor
Example from class:
– k=4 only
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 32/40
32
DBSCAN OptimizationDBSCAN: Determining Eps and MinPts
0
0.5
1
1.5
2
2.5
3
1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 951 100110511101115112011251
Points Sorted According to Distance of kth Nearest Neighbor
k t h
N e a r e s t N e i g h b o r D i s t a
n c e
k=1
k=4
k=9
k=10
k=15
k=20
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 33/40
33
DBSCAN Optimization
Not so easy to choose a good point onthat curve…
Original DBSCAN algorithm authorsdiscussed in their paper that we shouldeyeball the plot for the first/last “valley”
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 34/40
34
DBSCAN Observations
DBSCAN does not work well with varyingdensities… which is also a characteristic of this crime data…
This is observable when we change theorder of the core point cluster labeling
– Ex: Start w/ high vs. low density score cores?
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 35/40
35
DBSCAN Observations (cont‟d)
The algorithm labels neighbors of thecurrent core point regardless of whetherwe started a new cluster or not…
Initialize the cluster label to zero, not one
Many different “presentations” of theDBSCAN algorithm…
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 36/40
36
DBSCAN Algorithm Presentations
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28
DBSCAN Algorithm
Eliminate noise points
Perform clustering on the remaining points
– Connect all core points with an edge that are less
than Eps from each other.
– Make each group of connected core points into a
separate cluster.
– Assign each border point to one of its associated
clusters.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16
DBSCAN Algorithm (simplified view for teaching)
1. Create a graph whose nodes are the points to be clustered
2. For each core-point c create an edge from c to every point p
in the -neighborhood of c
3. Set N to the nodes of the graph;
4. If N does not contain any core points terminate
5. Pick a core point c in N
6. Let X be the set of nodes that can be reached from c by
going forward;
1. create a cluster containing X{c}
2. N=N/(X{c})
7. Continue with step 4Remarks: points that are not assigned to any cluster are outliers;http://www2.cs.uh.edu/~ceick/7363/Papers/dbscan.pdf gives a more efficient implementation by
performing steps 2 and 6 in parallel
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 37/40
37
Publishable Quality (?)
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 38/40
38
Future Work
To be useful for a real law enforcementorganization much more work needs done – System must handle more than one crime at a single
location (map currently shows 1 clickable marker) –
partially due to non-specific crime addresses – Crime addresses written as intersections (corner of
abc street and xyz street) must be considered
– Need to support individual layers (enable/disable) for
crime types – Most upgrades rely on officers or someone inputting
more details attributes and data into system Only then can intelligence and case-building be really cool
7/27/2019 Wyland Crime Data base management system.
http://slidepdf.com/reader/full/wyland-crime-data-base-management-system 39/40
39
Demonstration
Queries to demo CDE
– Window Query
– K-Nearest Neighbor Query
– Range Query
Visualizations integrated with query demos
– Mapping & charting, info windows, etc.
DBSCAN categorization and clusterdetection demo