Tajinder presentation4
-
Upload
tajinder-singh -
Category
Data & Analytics
-
view
228 -
download
0
Transcript of Tajinder presentation4
crimeX Real time crime analysis and alert system
Tajinder Singh
Motivation
Motivation
• How criminals operate
• Dynamics between criminals and anti criminal squad
Pipeline
Crime data (real)
User data (real)
Crime data (batch)
Ingestion Batch Layer Serving Layer
Real Time
Data flow
• Seed: http://us-city.census.okfn.org/dataset/crime-stats
• Engineered Data (600 GB)
Data sources
Data flow
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
Crime data (batch)
Batch Processing
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}
+ Python Script (Refining)
Data flow
Crime data (batch)
Batch Processing
{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,
“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”,
“zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa”}
Index Type: crimes
Data flow
{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}
Real Time Processing
Crime data User data
{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }
Data flow
{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}
Real Time Processing
[ Processing ]
Crime data User data { “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }
Data flow Real Time Processing
{ “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”, “lat”: “34.5462”,
“lon”: “-118.453”, “zip”:”90007”, “city”: “los angeles”, “state”:”california”,
“country”:”usa”}
Crime data User data
{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243”,
”zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa” }
Index Type: crimes_realtime and user-subscribe-crime
Data flow use case 1 (batch)
Input [ “location”:”2611 portland street, los
angeles”]
Data flow use case 1 (batch)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp
Data flow use case 1 (batch)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp
[output]
Data flow use case 2 (real)
Real Time [ “crimetype”:”robbery”, “lat”:
”34.2353”, “lon”:”-113.42534”]
Data flow use case 2 (real)
Output Fields
Distance Covered (radius)
Total crimes analyzed
Average latency*
Crime Types
Alert nearby users
User Phone number
User Name
User latitude
User longitude
[output]
Challenge: Front-end display after 5 seconds per request
Reason:
• A lot of I/O operations (all crime documents were fetched to the UI)
• Business logic and query execution on front-end (flask)
Solution:
• Query execution on Elasticsearch cluster
• NO I/O operation
• Dynamic scripting enabled on ES cluster.
• Used Groovy scripts as opposed to Javascript, Python, MVEL (built-in),
expression (built-in) etc.
Challenge: Network Latency
Solution: Co-locate Storm and Elasticsearch cluster nodes to reduce network
latency
Performance Optimization
Challenges
Caveat: Vulnerable to outside attacks (Security vulnerability)
Reason:
• Enabled dynamic scripting
Solution:
• Don’t run Elasticsearch as root
• Provide read-only access to requisite directories
Performance Optimization
Challenges
about me
Tajinder Singh [University of Southern California]
5 yrs experience in web development