Final PresentationRodent Baiting

27
Fall 2014 Analytics Project Presentation - Fall 2014 NYU Real Time and Big Data Project : Rodent Baiting in NYC. Team : Sanchit Khandelwal, Rohit Shankar, Simran Kaur. 1

Transcript of Final PresentationRodent Baiting

Page 1: Final PresentationRodent Baiting

Fall 2014

Analytics Project Presentation - Fall 2014

NYU Real Time and Big Data

Project : Rodent Baiting in NYC.

Team: Sanchit Khandelwal, Rohit Shankar, Simran Kaur.

1

Page 2: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

AbstractAnalytic 1•To find the factor which can be best used to predict the occurrence of Rodents in a particular area.•Using Garbage, Water Leaks complaints with Rodent complaints to find the if there is an increase in Rodent complaints.

Analytic 2•Analyze the frequency of rodent complaints made in the city with respect to temperature ranges since 2012

Analytic 3•To estimate the rat population of the city. 8 million rats for 8 million New Yorkers? Debunk the myth ?

2

Page 3: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Background•NYC- infamous for its rodent problem.

•311-non emergency helpline to provide access to different government services. •Takes requests in the form of complaints. Tracks and Manages complaints.

•311 complaints database updated daily and open source.

•New York City Department of Health and Mental Hygiene (DHMH)

3

Page 4: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Motivation

•The aforementioned rodent problem.

•DHMH does not take well planned preemptive actions to control rodent population.

•First come first serve basis problem solving.

•No official estimate of no. of rodents.

•DHMH can use our analytic to take preemptive actions which can help reduce /control the no. of rodents.

4

Page 5: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Data Sources<311 Rodent Complaint Database>•Contains rodent complaints with details like timestamp of complaint, zip code, location type etc. for year 2010- Nov ’14.•Size: 38MB; Format: ‘.CSV’

<311 Sanitation Complaint Database>•Contains sanitation complaints having fields similar to rodent database for 2010-Nov’14.•Size: 41MB; Format: ‘.CSV’

<311 Water Leak Database>•Contains several water complaints like water leaking, standing water, hydrant overflow along with timestamp, zip code etc. for 2010-Nov’14.•Size: 30MB; Format: ‘.CSV’

5

Page 6: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Data Sources Contd.<NCDC Weather Database>•The National Climate Data Center (NCDC) weather database for NYC contains fields like max, min temp, rainfall, wind speeds for each day for years 2012-Nov’2014.•Size:1MB; Format: ‘.CSV’

Analytic 1: Sanitation, Water FactorDesign Diagram:

6

Page 7: Final PresentationRodent Baiting

Fall 2014

Figure 1: Sanitation/Water leak

7

‘311 Rodent complaints’ database

‘311 Sanitation complaints’ database

Data cleanup: Extract {date,zipcode} fields

Data cleanup: Extract {date ,zipcode} fields

PIG: Join operation to get for each sanitation date all rodent dates along

with zipcodes (area)

MR1: For each sanitation date get count of no. of rodent complaints ,1 week prior(negative) and 1

week (positive)after the sanitation date, along with zipcodes (area)

MR2: Get Average no of negative and positive rodent complaints for each ZipCode(area)

Analysis of results

Page 8: Final PresentationRodent Baiting

Fall 20148

Data Flow Diagram

Figure 2: Input and Outputs in each Stage using Cloudera VMware

Page 9: Final PresentationRodent Baiting

Fall 2014

Centra

l Broo

klyn

Bushw

ick an

d Willi

amsb

urg

E. New

York

and N

ew Lo

ts

Inwoo

d & W

ashin

gton H

eights

Southe

ast B

ronx

Wes

t Cen

tral Q

ueen

s

Flatbu

sh

Centra

l Broo

klyn

High B

ridge

& M

oriss

ania

Upper

Wes

t Side

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Top 10 areas with highest sanitation factor

Sanitation Factor

ResultAreas where, when a sanitation complaint is received, preemptive rodent control action should be taken .

Page 10: Final PresentationRodent Baiting

Fall 2014

ResultAreas where sanitation is not the cause for a rodent complaint

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

Top 10 areas least affected by sanitation complaint

Page 11: Final PresentationRodent Baiting

Fall 2014

11.60%

88.40%

Sanitation factors - comparison

Negative Sanitation Factor Positive Sanitation FactorResultIn almost all cases number of rodent complaints a week after a sanitation complaint is more than the rodent complaints a week before

Page 12: Final PresentationRodent Baiting

Fall 2014

Bushw

ick an

d Willi

amsb

urg

Wes

t Que

ens

High B

ridge

and M

orrisa

nia

Flatbu

sh

Centra

l Bron

x

Centra

l Broo

klyn

Centra

l Harl

em

East N

ew Y

ork an

d New

Lots

East N

ew Y

ork an

d New

Lots

Northw

est Q

ueen

s0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Top 10 areas with highest water leak factor

ResultAreas where, when a water leak complaint is received, preemptive rodent control action should be taken

Page 13: Final PresentationRodent Baiting

Fall 2014

Lower West Side

Chelsea & Clinton

Bronx Park and Fordham

Central Bronx

Upper East Side

Borough Park

Central Harlem

Upper East Side

Northwest Brooklyn

West Queens

-1.8

-1.6

-1.4

-1.2

-1

-0.8

-0.6

-0.4

-0.2

0

Top 10 areas least affected by water leak complaint

ResultAreas where a water leak is not the prime cause for a rodent complaint; other factors are more dominant.

Page 14: Final PresentationRodent Baiting

Fall 2014

28.12%

71.88%

Water Leak factors - comparison

Negative Water Leak Factor Positive Water Leak Factor

ResultIn most cases number of rodent complaints a week after a water leak complaint is more than the rodent complaints a week before

Page 15: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Analytic 2: Weather affecting rodent complaints

Aim to find Rodent complaints and temperature relation.

Design Diagram:

Page 16: Final PresentationRodent Baiting

Fall 2014

Figure 3: Weather AnalyticNCDC Weather

database for NYC, 2012-14

311 Rodent Complaints database

Data Cleanup and date formatting

Data Cleanup and extracting 2012-

14 data only.

MR1:Date formatting

Individual temperature values replaced by 5⁰C

interval Ranges.

PIG: Inner Join to get temperature range for each rodent complaint

date

MR2: Aggregation of complaints based

on temperature ranges.

Analysis of results

Page 17: Final PresentationRodent Baiting

Fall 2014

[-15 , -10] [-10 , -5] [-5 , 0] [0 , 5] [5 , 10] [10 , 15] [15 , 20] [20 , 25] [25 , 30] [30 , 35]0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

Number of complaints for each temperature range (in Celsius)

Rodent Complaints

Result1)As NYC experiences moderate temperature [15 – 25 C] the number of rodent complaints increase.

Page 18: Final PresentationRodent Baiting

Fall 2014

2) Results analogous to scientific finding3) When we move from summer to winter ((30-25)->(10-5)) Rodent complaints increase. Because rodents move indoors. Preemptive measure when fall ends and winter starts.

Analytic 3: Estimation of Rodent Population

Design Diagram:

Page 19: Final PresentationRodent Baiting

Fall 2014

311 Rodent Complaint Database for 5 years (2010-14)

Calculate Avg. no of complaints each year=>Total no of complaints /5. Assuming one rat lives 1 year.

Multiply the Avg. by 50. Each colony of rat has around 50 rats. Assuming Each complaint is for

different colony

OutPut:Overestimate of the number

of rats in NYC

PIG: Calculating rodent complaints for each zipcode for each

year.

Analysis of result

Page 20: Final PresentationRodent Baiting

Fall 2014

Bushw

ick an

d Willi

amsb

urg

Centra

l Broo

klyn

Centra

l Broo

klyn

East N

ew Y

ork an

d New

Lots

Bronx P

ark an

d Ford

ham

Upper

Wes

t Side

High B

ridge

and M

orrisa

nia

Bronx P

ark an

d Ford

ham

Wes

t Cen

tral Q

ueen

s

Centra

l Broo

klyn

0

5000

10000

15000

20000

25000

30000

35000

Top 10 areas with highest number of rodents (numbers are estimates)

Page 21: Final PresentationRodent Baiting

Fall 2014

Worl

d Trad

e Cen

ter

Rocka

ways

Rocka

ways

Upper

East S

ide

Chelse

a and

Clin

ton

Jamaic

a

North E

nd A

v

New H

yde P

ark

Wes

t Cen

tral Q

ueen

s

Canars

ie an

d Flat

lands

0

20

40

60

80

100

120

140

160

180

Top 10 areas with lowest number of rodents (numbers are estimates)

Page 22: Final PresentationRodent Baiting

Fall 2014

Southe

ast B

ronx

Wes

t End

Ave

Northw

est Q

ueen

s

North E

nd A

v

Chelse

a and

Clin

ton

Gramerc

y Park

and M

urray

Hill

North Q

ueen

s

Inwoo

d and

Was

hingto

n Heig

hts

Lower

East S

ide

Gramerc

y Park

and M

urray

Hill

0

1

2

3

4

5

6

7

Top 10 areas with greatest percentage change in rodent population between 2010-2014

% change in rodent population

22

Page 23: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.Analysis of Results for Estimation of Rodent Population:1) Scientific studies have shown that life expectancy of a

rodent is 1 year in a city.

2) Hence we found Avg. no rodent complaints for 1 year

3) Taking the big overestimation-each rodent call represents each entire colony (on an avg. rodents live in a colony of 40-50)

4) We Get approx.1.2million

5) Sewer population(not that much)+ 1.2million = approx. 2 million. A very good Overestimation.

6) Which is still less than 8 Million. Urban myth debunked.

23

Page 24: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Obstacles

•Change of analytic project- no access to College data.

•NYC HPC Cluster – Encountered several problems and had to start over using Cloudera VM

•Each database had a date format that was entirely different from the other (sometimes even within a database)

24

Page 25: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Conclusion1) Sanitation and Water leakage are a cause for increase

in rodents in 85% of the NYC areas.2) Rodents increase between 65F -90F, which conforms to

scientific findings.3) Urban Theory “8 million rats for 8 million people” debunked.

Acknowledgements

25

•NCDC for providing us with the weather database for NYC

•311 service of NYC for putting up their extensive databases online

•Prof. Suzanne Macintosh for her guidance and support during the course of this project

Page 26: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

References[1] http://www.statetechmagazine.com/article/2014/11/chicago-leverages-311-and-big-data-tackle-its-rat-problems

[2] New York Department of sanitation: Spatial Analysis Of Complaints. Sarah Williams, Nick Klien

[3]http://www.health.ny.gov/statistics/cancer/registry/appendix/neighborhoods.htm

[4] Planning Rodent Control For Boston’s Central Artery/Tunnel Project. Bruce Colvin, A.Daniel AShton,Wellard McCartney, William Jackson

26

Page 27: Final PresentationRodent Baiting

Fall 2014

Rodent Baiting in NYC.

Than

k yo

u!

27