Working with the Census API

A friend of mine was complaining to me the other day about trying to get data from the American Community Survey. Fortunately, it turns out that the Census has a fantastic and well-documented API. So I volunteered to help. I read through this guide, found a helpful page for looking up geocodes from the Missouri Census Data Center, and looked through a rather cumbersome list of the variables you can request.

My friend needed population, housing info and income brackets for several metropolitan areas in the US: Boston, Denver, Jacksonville, Miami, Pittsburgh, Sacramento and Springfield, MA.

Here’s what I came up with:

import pandas as pd
import json
import requests

# API request, divided into manageable chunks

url_string = str("https://api.census.gov/data/2015/acs1?get=" +
	"NAME," + 
	"B01003_001E," +
	"B01003_001M," +
	"C25032_001E," +
	"C25032_001M," +
	"C25032_002E," +
	"C25032_002M," + 
	"C25032_003E," + 
	"C25032_003M," + 
	"C25032_007E," + 
	"C25032_007M," + 
	"C25032_008E," + 
	"C25032_008M," + 
	"C25121_012E," +
	"C25121_012M," +
	"C25121_017E," +
	"C25121_017M," +
	"C25121_022E," +
	"C25121_022M," +
	"C25121_027E," +
	"C25121_027M&" +
	"for=urban%20area:69697,56602,42346,23527,09271,83926,77068&" +
	"key=get_your_own_census_key")

# Dictionary for renaming columns

col_dict = {
	"NAME"        : "URBANIZED_AREA_NAME",
	"B01003_001E" : "POPULATION",
	"B01003_001M" : "POPULATION_ME",
	"C25032_001E" : "TOTAL_HOUSING_UNITS",
	"C25032_001M" : "TOTAL_HOUSING_ME",
	"C25032_002E" : "OWNER_OCCUPIED_TOTAL",
	"C25032_002M" : "OWNER_OCCUPIED_TOTAL_ME",
	"C25032_003E" : "OWNER_OCCUPIED_1",
	"C25032_003M" : "OWNER_OCCUPIED_1_ME",
	"C25032_007E" : "RENTER_OCCUPIED_TOTAL",
	"C25032_007M" : "RENTER_OCCUPIED_TOTAL_ME",
	"C25032_008E" : "RENTER_OCCUPIED_1",
	"C25032_008M" : "RENTER_OCCUPIED_1_ME",
	"C25121_012E" : "INCOME_20_34K",
	"C25121_012M" : "INCOME_20_35K_ME",
	"C25121_017E" : "INCOME_35_49K",
	"C25121_017M" : "INCOME_35_49K_ME",
	"C25121_022E" : "INCOME_50_74K",
	"C25121_022M" : "INCOME_50_74K_ME",
	"C25121_027E" : "INCOME_75_UP",
	"C25121_027M" : "INCOME_75_UP_ME",
	"urban area"  : "URBANIZED_AREA_CODE"
}

# Get info from Census website

response = requests.get(url_string).json()

# Match Census column names with readable ones

col_list = []

for item in response[0]:
	col_list.append(col_dict[item])

# Create DataFrame

df = pd.DataFrame(response[1:], columns=col_list)

# Write DataFrame to CSV file

df.to_csv("/filepath/census data.csv", index=False)

The first chunk of this code generates a long URL address which asks the Census API for the data. You can click here to see what it looks like. If you’re familiar with NumPy, it’ll look like a NumPy array — or a list of lists in which each list constitutes a row in the data table.

The second chunk is a dictionary that associates the Census column codes with readable titles. This enabled me to take the first row of the array I got from the Census and create a list of column names. From there, I turned the array into a DataFrame and then into a CSV.

But wait, there’s more! I’ve been learning how to do data visualization with matplotlib. Using this example as a model, I made the following bar chart to look at income brackets (I’ll spare you the code):

This is helpful, but it’s hard to compare the different urban areas because of the differences in population. This second figure solves that problem by looking at income brackets as a percentage:

This is much more helpful. For example, look at Jacksonville vs. Boston. The income distribution is skewed towards the highest income bracket in the latter.

Written on November 3, 2017