In NYC, more robberies happen right when school gets out than any other time.
I was inspired by this post on the fantastic blog I Quant NY. Author Ben Wellington analyzes the NYPD Major Felony Incidents data set provided by NYC Open Data. The below analysis will recreate Ben’s work using the Python programming language and even take it a bit further.
Download the csv directly from the NYC Open Data site and load it into a data frame.
Preview the data frame and see what dtypes the columns are.
Clean the data
Select years 2006 through 2015 into new data frame.
Convert columns in dataframe to the appropriate types.
Sort data frame by Occurrence Date and then reset the index using reset_index
What times of day do major felonies happen?
groupby Occurrence Hour and Offense. count() the number of rows.
plot number of crimes grouped by Occurrence Hour and Offense type.
Nice! We’re starting to get somewhere. The chart above shows the absolute number of Crimes by hour over the 2006-2015 period. However, we want to see the hourly crime rate rather than the absolute number of crimes. So, we need to divide this the grouped data frame by the number of days in the time period to get the hourly crime rates that we want.
Convert the absolute number of crimes to hourly rates by dividing by the number of days in the period.
This matches up nicely with Ben’s chart:
It appears that there are peaks around midnight and noon. There are also peaks around 8am and 3pm. This happens to be approximately when school starts and gets out in New York.
Is it possible that school getting out is responsible for higher crime rates? Let’s dig deeper to find out.
Let’s segment school days from non-school days to tease out the effect of school hours on crime rates. Specifically, categorize each day as a school day, summer vacation, weekday holiday or weekend day. To pinpoint the weekday holidays I referred to this NYC school calendar.
PROBLEM: Only the 2015-2016 academic calendar is available here. So, I make the obviously flawed assumption that the academic recesses and holidays occurred on the same dates in 2006 through 2014 as they did in 2015. Ben also created his school ‘vacation weekday’ sub- data set. It’s unclear exactly how he made it, so I’ll make this assumption for now and move forward with the analysis and ask him :) .
Do crime rates increase when school gets out?
In order to tease out crime trends on school days, I want to categorize each day from 2006 to 2015 as either a 1) School Day, 2) Weekday Holiday, 3) Weekend or 4) Summer Vacation.
In order to do so I’ll need to insert a Date column into the dataframe that is simply a date (rather than a datetime stamp).
I created a csv of school year holidays in 2015-2016 per http://schools.nyc.gov/Calendar/default.htm
Indicate whether days are Summer, Weekend, Weekday Holiday or School Days using 1’s and 0’s.
Set Day Type equal to the correct type of day.
Plot Hourly Felony Rate in NYC by day type.
You can see that the overall level of felonies is much higher on school days than holiday or weekend days. Let’s dive deeper and segment this analysis across the various types of offenses. We’ll want to calculate two things:
Number of crimes
Rate of crimes
And segment them across three attributes:
Day Type: school day, summer vacation, weekday holiday, weekend day
Offense Type: burglary, felony assault, grand larceny, grand larceny of motor vehicle, murder, rape and robbery
Occurrence Hour: hourly from 12am through 11pm
If we were working in Excel, this would be pretty time intensive. Thankfully Python’s groupby and aggregate functions can take care of this pretty easily. Below we define an aggregation to count the number of occurrences. Then, we group the data frame by Offense, Occurrence Hour and Day Type and apply the aggregation. Finally, we divide the ‘hourly_rate’ aggregation by the appropriate number of days for each day type to convert crime counts into crime rates.
Make a plot for each type of offense.
Robbery peaks on school days at 3pm, right when school gets out.
Burglary peaks on school days at 8am, before school starts.
Felony assault peaks on weekend days at 3am.
Grand larceny tends to be higher on school days, with peaks at 8am, 12pm and 3pm.
Motor vehicle theft peaks at 10pm on summer nights.
Murder peaks at 4am on weekends.
Rape peaks at midnight.
Some fascinating, albeit morbid, takeaways.
Which crimes happen most frequently?
So far we’ve looked at the varies types of crimes individually, but how to they compare to one another? To answer this question, we will perform another aggregate / groupby function to count the number of occurrences by Day Type and Offense Type.
How have crime rates changed over time?
The good news: robbery is significantly lower in 2015 than 2006.