The leading causes of death by sex and ethnicity in New York City in since 2007.
The Department of Health and Mental Hygiene (DOHMH) publishes its data on the leading causes of death in New York City from 2007 through 2011. You may view the data set on NYC OpenData.
This is what it looks like:
Let’s see what this data tells us about health and death in New York City.
A few data requests for the DOHMH:
DATE: Please include the dates of each death, rather than simply the year.
AGE: Please include the age of individual at their time of death.
Below, let’s use Python and several of its popular libraries to load, clean and explore the data.
Set up the environment
Import the necessary libraries.
Set some visual parameters.
Load the raw data
Download the csv directly from the NYC Open Data site and load it into a data frame.
Preview the DataFrame and see what dtypes its columns are.
Clean the data
Assign the raw_data to a new DataFrame for cleaning.
Convert columns to their appropriate dtypes. We’ll want Ethnicity, Sex and Cause of Death to be categorical.
Sort the data chronologically.
Explore the data
See what categories (and how many) there are for each categorical column.
Use groupby and sum calculate the absolute number of deaths by category by year.
Make some intital plots.
While the Cause of Death chart above isn’t very useful right now, it does suggest that two of the causes are much more common than the others. There are too many bars to see - but I’m going to guess that heart disease and cancer (i.e. malignant neoplasms) are the most common causes.
It’s also interesting to note that the most common cause of death (which appears to be heart disease) is steadily declining from 2007 through 2011.
Let’s start by examining the biggest cause of death: heart disease.
It turns out that deaths from heart disease has been dropping steadily in America since 1970. One doctor attributes this trend to healthier diets, more exercise, reductions in cigarette smoking, use of high-blood pressure medication and widespread acceptance of statin drugs to control cholesterol levels (read more here).
Deaths from heart disease in New York has dropped the most for white people. Why is that?