In NYC, more robberies happen right when school gets out than any other time.

I was inspired by this post on the fantastic blog I Quant NY. Author Ben Wellington analyzes the NYPD Major Felony Incidents data set provided by NYC Open Data. The below analysis will recreate Ben’s work using the Python programming language and even take it a bit further.

You may view my complete notebook here.

## Set up the environment

Import the necessary libraries.

Download the csv directly from the NYC Open Data site and load it into a data frame.

Preview the data frame and see what dtypes the columns are.

## Clean the data

Select years 2006 through 2015 into new data frame.

Convert columns in dataframe to the appropriate types.

Sort data frame by Occurrence Date and then reset the index using reset_index

## What times of day do major felonies happen?

groupby Occurrence Hour and Offense. count() the number of rows.

plot number of crimes grouped by Occurrence Hour and Offense type.

Nice! We’re starting to get somewhere. The chart above shows the absolute number of Crimes by hour over the 2006-2015 period. However, we want to see the hourly crime rate rather than the absolute number of crimes. So, we need to divide this the grouped data frame by the number of days in the time period to get the hourly crime rates that we want.

Convert the absolute number of crimes to hourly rates by dividing by the number of days in the period.

This matches up nicely with Ben’s chart:

It appears that there are peaks around midnight and noon. There are also peaks around 8am and 3pm. This happens to be approximately when school starts and gets out in New York.

Is it possible that school getting out is responsible for higher crime rates? Let’s dig deeper to find out.

Let’s segment school days from non-school days to tease out the effect of school hours on crime rates. Specifically, categorize each day as a school day, summer vacation, weekday holiday or weekend day. To pinpoint the weekday holidays I referred to this NYC school calendar.

PROBLEM: Only the 2015-2016 academic calendar is available here. So, I make the obviously flawed assumption that the academic recesses and holidays occurred on the same dates in 2006 through 2014 as they did in 2015. Ben also created his school ‘vacation weekday’ sub- data set. It’s unclear exactly how he made it, so I’ll make this assumption for now and move forward with the analysis and ask him :) .

## Do crime rates increase when school gets out?

In order to tease out crime trends on school days, I want to categorize each day from 2006 to 2015 as either a 1) School Day, 2) Weekday Holiday, 3) Weekend or 4) Summer Vacation.

In order to do so I’ll need to insert a Date column into the dataframe that is simply a date (rather than a datetime stamp).

I created a csv of school year holidays in 2015-2016 per http://schools.nyc.gov/Calendar/default.htm

Indicate whether days are Summer, Weekend, Weekday Holiday or School Days using 1’s and 0’s.

Set Day Type equal to the correct type of day.

Plot Hourly Felony Rate in NYC by day type.

You can see that the overall level of felonies is much higher on school days than holiday or weekend days. Let’s dive deeper and segment this analysis across the various types of offenses. We’ll want to calculate two things:

1. Number of crimes
2. Rate of crimes

And segment them across three attributes:

1. Day Type: school day, summer vacation, weekday holiday, weekend day
2. Offense Type: burglary, felony assault, grand larceny, grand larceny of motor vehicle, murder, rape and robbery
3. Occurrence Hour: hourly from 12am through 11pm

If we were working in Excel, this would be pretty time intensive. Thankfully Python’s groupby and aggregate functions can take care of this pretty easily. Below we define an aggregation to count the number of occurrences. Then, we group the data frame by Offense, Occurrence Hour and Day Type and apply the aggregation. Finally, we divide the ‘hourly_rate’ aggregation by the appropriate number of days for each day type to convert crime counts into crime rates.

Make a plot for each type of offense.

### Rape peaks at midnight.

Some fascinating, albeit morbid, takeaways.

## Which crimes happen most frequently?

So far we’ve looked at the varies types of crimes individually, but how to they compare to one another? To answer this question, we will perform another aggregate / groupby function to count the number of occurrences by Day Type and Offense Type.

## How have crime rates changed over time?

The good news: robbery is significantly lower in 2015 than 2006.