San Diego Datascape: Help


All distance, area, containment, intersection, and bounding box calculations are PostGIS queries via CartoDB API, without which this project would have been impossible. Thanks to Andrew @ Vizzuality and the entire CartoDB team.

The mesh is based on 2010 census tract shape files. We used the excellent NHGIS system to browse and retrieve census data:

Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011.

The neighborhood names for San Diego City were taken from the San Diego City Police Department's neighborhood data:

Areas in other incorporated cities have the name of the relevant city. Unincorporated areas were named by hand, or just left with a generic name.

The census tract polygons are rendered using the Google Earth API:

Residential, Industrial, Commercial and Rural Uses and Airport Noise

The information about commercial, industrial, residential and rural land use comes from the Sandag LANDUSE_CURRENT data set listed on the Apps Challenge site. Airport noise was applied to areas subjectively based on the map on the last page of the San Diego International Airport's 2011 Third Quarter Noise Report:

As an aside, check out this nifty Web site which lets you see replays of San Diego air traffic and associated noise level data:

Tracts are "bluer" the less there is in the way of office buildings, industry, open and agricultural space and airport noise. It's definitely true that open space can often be considered a plus (as opposed to airport noise), but it's hard to model whether the open space is beautiful hiking land or parched desert. Parks do not count as open land, and do not reduce the residentiality rating.


The price data comes from this table:

We used the median resale figures, either single-family, or condminium, or the average of both if available. Missing data was filled in using a weighted average of the surrounding tracts.


Commute is measured strictly as a distance from the centroid of the tract to the selected location. It would have been too difficult to model the transportation grid and estimate travel times.


Crime information is taken from ARJIS, which aggregates data from various San Diego enforcement agencies:

We used data for January through December, 2011. This data is useful, but has limitations. One limitation is that it does not indicate the population of the affected area, and another is that it's not clear what the boundaries of the reporting regions are.

Luckily, for San Diego City, the Police Department also provides per capita figures:

There is also the clearly defined neighborhood map listed above. Some neighborhood data was excluded when the neighborhood population was too small to allow meaningful per capita figures. The cutoff was a population of around 2,500.

For other incorporated cities, the crime totals for each city were divided by the population of the city taken from the Web to arrive at one value used for the entire city. The exceptions were Oceanside and Chula Vista, where we were able to find accurate enough maps of the section breakdown that we could generate meaningful per capita numbers for parts of the city. Here is the one for Oceanside:

For unincorporated areas, some guess work was needed as to the population. Tracts without clear data were simply left empty. Finally, all tracts left without data (some unincorporated areas and sparsely populated parts of San Diego City) were assigned values equal to the weighted average of neighboring tracts. An upper bound of 75% was used for tracts in the Desert section, which are so large and open that it would be difficult to imagine the area ever being as safe as a patrolled urban area.

When computing a crime figure for an area, we used the murder and rape counts times 5, assault and robbery times 2 and residential burglary times 1, and added the values together. Tracts with a high score are colored red (overriding the "residentiality" coloring).


Park land within half a mile of a tract increases the tract's desirability. We used the PARKS_ACTIVE_USE data set from the Apps Challenge to locate parks, because the PARKS_CN and PARKS_SD data sets did not cover other incorporated cities. The park desirability factor is proportional to the percentage of the local area around the tract which is occupied by park land.


The youth factor is proportional to the percentage of the population that is between the ages of 25 and 44. The population data comes from NHGIS, as listed above.

Bike Lanes

We used the bike route data from the Apps Challenge. The bike factor is proportional to the total linear distances of all bike paths which fall within the loal area around the tract.

Golf Courses

The golf course data comes from the same land use data set used for residentiality calculations above. The factor is determined using a similar methodology to the parks: the proportion of the local area which is occupied by golf courses. We didn't check whether the golf courses were public or not, though, so it's possible that the Web page might recommend an area close to golf courses that no mortal could access.