Table of Contents
Background and Overview
Input options
Output options
Universe filtering options (Inc. point-and-radius)
Accessing and Understanding Outputs
Background and Overview
Where the application runs
The Geocorr engine is mirrored and can be accessed via the following URLs:
http://blue.census.gov/plue/geocorr
and at
http://oseda.missouri.edu/plue/geocorr
and at
http://plue.sedac.ciesin.org/plue/geocorr.
The "sedac" site is considered the
primary mirror but all sites offer the same functionality at all times.
What the application does
The MABLE/Geocorr geographic correspondence engine generates files and/or
reports showing the relationships between a wide variety of geographic
coverages for the United States. It can, for example, tell you with which county
or counties each ZIP code in the state of California shares population.
It can tell you, for each of those ZIP/county intersections, what the size of
that intersection is (based on 1990 population or other user-specifed variable)
and what portion of the ZIP's total population is in that intersection. The
application permits the user to specify the geographic scope of the correspondence
files (typically, one or more complete states, but with the ability to specify
counties, cities, or metropolitan areas within those states), and, of course,
the specific geographic coverages to be processed. The latter include virtually
all geographic units reported in the 1990 U.S. census summary files, and several
special "extension coverages" such as 103rd Congress districts and the PUMA areas
used in the 1990 PUMS files. The application creates a report file and a
comma-delimited ascii file (by default) which the user can then browse and/or
save to their local disk.
What is a "Correlation List"?
The output files created by this application are referred to as "correlation
lists". Other commonly used terms for such entities are "equivalency files",
"crosswalk files", and "geographic correspondence files". A correlation list
consists of a set of "source geocodes" specifying the geographic coverage to
be related (i.e the "known" geographic coverage), and a set of "target geocodes"
specifying the geographic coverage to which we want to relate the source areas.
Frequently (always, in the case of files generated by this application) the
correlation list will include a variable to measure the absolute "size" of the
correspondence (such as the land area of the intersection or the number of
persons living in the intersection). When such an absolute measure is present
then there may also be an "allocation factor" variable that indicates what
portion of the source area is located within the target area. An entry
in a census tract to ZIP correlation list (i.e. a list with "census tract" as
the source coverage and ZIP as the target) might contain the population living
in the tract/ZIP intersection and a number indicating what decimal portion of
the tract's total population also live within the ZIP. The sum of these allocation
factors for any specific value(s) for the source geocodes(s) should always be
1.0. For example:
COUNTY TRACT ZIP POP AFACT
29510 1101.00 63109 1250 .500
29510 1101.00 63110 625 .250
29510 1101.00 63111 625 .250
Here we see 3 entries from a tract-to-ZIP correlation list. All 3 entries are
for the same source code, census tract 1101.00 within county 29510 (city of
St. Louis, Mo.) The entries show that the tract intersects with 3 different
ZIP codes (estimate based on 1990 census) and show the absolute and relative
sizes (POP and AFACT, respectively) of the the intersections. Note that if we
add the 3 POP values we get the total POP value for the tract (2500), while if
we add the 3 AFACT values we get (as always) 1.0.
Typically (always in this application unless overridden with an option)
correlation lists are
sorted first by the source geocodes, and then by the target geocodes within the
source codes.
Who/what is MABLE?
"MABLE" is an acronym for Master Area Block Level Equivalency File. This is
the name of the massive database that is used by the geocorr engine to create
the correlation lists. "Block" here refers to 1990 census blocks, the smallest
geographic units used in the 1990 census. It was chosen as the base unit for
the application because the Census Bureau uses these blocks as their "atomic
unit" for all other census-based geographies. Thus, census blocks will never
cross a place (city) or MCD (county subdivision, township, New England town)
boundary. While they can and do cross ZIP code boundaries, for the sake of
this application (and based on the Census Bureau's offical 1990 Block-ZIP Equivalency
file) each block is assigned to a unique ZIP code (vintage October, 1991). The
MABLE database is actually a collection of 51 state-level datasets containing
a total of just under 7 million block entries.
How does GEOCORR work?
The hard part was building the database and the user interface. The actual
processing is fairly simple. Once you determine the geographic universe that
the user specifies as well as the source and target geocodes and weighting
variable, it is a matter of extracting these items from the appropriate entries
in the MABLE database. This yields a set of census blocks for the geographic
area specified, each one identified by the source and target geocodes and with
a measure of its "size" (population, land area or number of housing units.) To
build the correlation list outputs (listing and/or .csv file) is a
relatively simple process of sorting and aggregating. What this amounts
to is using the census blocks as a kind of "geographic pixel", or
indivisable geographic unit. All correlations are "rounded off" to the
census block level. For a majority of the geographic codes the roundoff
error is 0 since most of them are never split by blocks. The resulting
file is similar to the sort of result you can get from a GIS by doing a
polygon intersection operation. But it goes much faster (and the output
is presented in a more convenient format perhaps) because we have already
determined all the spatial correspondences and stored the results in
MABLE: all we need to do is pull out the subset of the 7-million
pre-defined answers and aggregate them.
Programming Details
We'll work on a separate module for discussing the real nitty gritty details of
the programming and interface tools used to build the application. But basically
the application was written in SAS(r) and uses Perl interface scripts to handle
the forms output. The MABLE database is a series of SAS datasets and views
with a few of the items (so far the 103rd Congressional Districts and the 1990
PUMA codes) implemented as "virtual variables" using SAS format libraries to
do lookups "on the fly". Most of the SAS code (and all of the dababase design)
was done by John Blodgett of the Urban Information Center, University of Missouri
St. Louis under a contract with CIESIN. The Perl interface routines were written
primarily by Hendrik Meij of CIESIN. The HTML
design and coding have been a joint effort.
Input Options
These are the options (specified at the top of the MABLE/Geocorr input form)
that control the basic nature of the correlation list you want to have built
for you. Here you specify the states, the geocodes and the weighting variable.
Note that here, as throughout the form, all items have been assigned default
values, so you need to at least consider each one. If you do not, then the
default value remains in effect and you need to be sure that this is acceptable.
In other words, don't rush through the form assuming that if you fail to fill
something out that is important, that you will be prompted for the value. If
you do not specify the weighting variable (for example), the program assumes POP and does so
without any dialogue with the user.
Selecting state(s)
Click on one or more state names in this select box to indicate the state or
states that you wish to process. You must specify at least one state (note
that Alabama is "pre-selected" so if you do nothing about it that is what you
will get.) The instructions on the form state "(max=4)". We put that on there
to discourage over-use of the system which may overload the systems on which
the application runs and have a detrimental effect on other users' response
times. We would prefer if you honored this limit. But as a bonus for actually
taking the time to read the documentation you should know that this limit is
not implemented in the code. But we strongly encourage you to abuse this
priviledge only during off hours and we reserve the right to cancel any
application if we see that it may be having an adverse effect on other processes.
Be sure you know how your browser works when making multiple selections from a
select list like this. Netscape, for example, requires that you hold down the
"ctrl" key while clicking on items to get multiple selections; but IBM's Web
Explorer does not - each click makes a new selection and you have to click on
a selected item to de-select it. Be careful with this.
Weighting variable
Select a single variable to be used to measure the amount of intersection between
the source and target geocodes on the output file. By default, the 1990
(complete count) population will be used. On the output this variable
will contain the sum for all the blocks used in creating the output
record.
Option to ignore blocks with no population
There are many census blocks that occupy space but have no population. When
building a correlation list with POP as the weighting variable you may find
that leaving these blocks in results in output lines showing a correspondence
that has a value of 0 for the POP and AFACT (allocation factor) variables.
This indicates some spatial overlap between the areas, but no population in
that overlap. If you check this box then those lines with 0 population will
not be present on your output. It will also make processing slightly
faster since the program will have fewer observations to process.
Link to the MAGGOT file
This application makes use of a lot of different levels of geography
("coverages"), most of them corresponding to standard Census Bureau
defined areas. To help users who are not familiar with these types of
geography, we have created this auxiliary help file with more
detailed descriptions of each of the area types. The link was placed
here because the next two select boxes are where you'll be selecting the
geocodes you want to process. "MAGGOT" is an acronym for Master Area
Geographic Glossary of Terms.
Source geocodes
Click on one or more geographic codes you want to use for the "source"
portion of the correlation list. The output will normally be sorted by the
values of these codes. For example, if you select COUNTY and TRACT (the defaults)
then the output file will be sorted
by county and then tract. The sort order is the order in which the variables
appear in this select list. Certain geocodes occur in a hierarchy so that
selecting them automatically triggers selection of a higher-level qualifying
variable. These are MCD (implies COUNTY), TRACT (implies COUNTY), BLOCK GROUP
(implies COUNTY and TRACT), and BLOCK (implies COUNTY and TRACT; block group
is not selected but it is implicitly present as the 1st digit of the block.)
Note that COUNTY is a 5-digit code that includes the state code. STATE is
always added to the output file, even if it is not explicity selected as one
of the source or target geocodes. All codes are FIPS (Federal Information
Processing Standard) if defined, or census codes otherwise. Be sure you
understand that selecting multiple source geocodes means that the source areas
are the intersection areas formed by looking at values for all the
source codes. Thus if you click on COUNTY, MCD and ZIP for the source geocodes
then the "source areas" represented on the resulting correlation lists are
formed by the intersection of these 3 area types: i.e. a portion of a ZIP code
within an MCD within a county. If what you actually want is a correlation list
for MCD's and a (separate) correlation list for ZIP codes, then you need to
invoke the application twice; geocorr creates only one list per run.
Target geocodes
Most of what was said for the source geocodes, above, apply equally to the
target geocodes. Do not select the same geocode in both lists. The codes you
select here define an area formed by the intersection of those areas. The
correlation list defines the relationship of the source codes to these
target areas. The default value (what you'll get if you do not click
in this select box at all) is ZIP - the 1991 5-digit ZIP code (as defined in
the Census Bureau's ZIP-Block equivalency file.)
Output Options
These options specify details about the output generated by geocorr. In most
cases you will be able to accept all defaults for these options (unlike
the input options where accepting all defaults would be very rare.)
Second allocation factor (afact2): target to source
The standard output correlation list from geocorr has a single AFACT allocation
factor variable which indicates the decimal portion of the source geocodes
contained within the target geocodes. It may also be useful to know how this
works going in the other direction, i.e. to know what portion of the target
area (the complete target area, not just the part within the source area) is
contained in the source geocodes. Selecting this option causes geocorr to do
the extra processing and calculations required to create such a "dual factored
list". The best way to see how it works is to select the option once and study
the AFACT2 values.
Sort by target geocodes
Normally the output file is sorted by the source geocodes, then by the target
geocodes within the source. This option lets you override the default and
have it sorted by the target codes first. An example of where you might
want to use this option would be in creating a ZIP to CD103 list. You want to
look at which ZIPs and what portions of those ZIPs make up each Congressional
District. But you want the results organized by CD first, so that you can
focus on the portion of the report relevant to the district you want to mail
to. Specifying this option causes the output to be sorted by CD first, then
show all the ZIPs within each CD together with allocation factors
indicating what portions of the ZIP are in the CD. (If you specified
CD103 as the source and ZIP as the target, you would get the sort the way
you wanted, but then the AFACT allocation factor values would show the
portion of the CD that was within the ZIP, which is typically a very small
and relatively useless number.)
Generate a comma-separated value (CSV) file
This option is selected by default, meaning that geocorr will create an ascii file
in comma-delimited format that you'll be able to browse (preview) and then save
to your local disk. Generally this is the option to use if you want to do
processing of the correlation list back on your platform using your favorite
software package. The ".csv" file extension is a standard that is recognized
by most Windows programs, making it easier to import the data into those
applications. Note that this file will have the variable names as values in
the first line (the "header" record), which when imported into a
spreadsheet such as Excel or Lotus will become the first row. If you have
no interest in obtaining such a file (you only want the report format)
then click on this box to turn off the option. It will save processing
time.
Add names to output CSV file
Note: that this option seems to appear twice - but once it pertains
to the output .csv file and the 2nd time it pertains only to the listing.
In many cases it will be convenient to carry along names to go with the codes
on your output file. If you select option 2 or 3 then, for any geocode for which
geocorr has a "name table" and that you select as either
a source or target geocode, the program will add a new variable (with name
ending with "NM", e.g. PLACENM, COUNTYNM, etc.) to the output ascii (.csv) file.
Usually, if you want names, you should select the "codes and names" option,
rather than asking for just the names.
Generate a listing (same information as output file but report format)
You'll normally want to leave this option selected so you can at least see a
nicely formatted eye-readable version of your output (the .csv file is intended
more to be program-readable than eye-readable although you can browse it and
count commas.) This is the preferred format for using as a reference report.
The lines can be up to 120 characters across and it will print 240 lines before
generating a page break with fresh column headers. Source geocodes will always
appear first (leftmost) on the report and consecutive duplicate values of the
source geocodes will be blanked out to emphasize "breaks" in the value of
the source codes. This will normally be the largest output file. If you
do not need or want it then you can save processing time by deselecting
this option.
Add names to output listing
See the discussion, above, of names for the output CSV file. Generally, you
are more likely to want names on the listing output than on the .csv file.
The default is no, so you have to select this option to get the names included.
Weighted centroids on output file(s)?
Each of the census block entries in the MABLE database has a pair of latitude,
longitude coordinates for an "internal point" of the census block. This is
the geometric centroid of the block except in those few cases where the true
centroid is not within the block, in which case it is moved to a location just
inside the block. When you select this option, geocorr keeps these coordinate
values and as it processes the blocks within the source/target geocode groups
it takes a weighted average of their values (using the weight variable specified
in the INPUT OPTIONS section - usually 1990 population.) The result of this
is that on the output files you will have two extra columns of data, INTPTLNG
and INTPTLAT (these are terrible names and we may get around to changing them -
make sense on the MABLE database, but not on the output). They will be in
degrees, with 6 digits after the decimal point kept (if needed.) West longitude
is assumed, no minus signs.
Specifying a name for the output files
This option is perhaps more trouble than its worth. You can safely ignore it
if you want. It allows you to specify up to a 15-character name for your two
output files. They are normally named "geocorr.csv" and "geocorr.lst" for the
ascii comma-separated-value file and listing file, respectively. If you type
"tr2zip.detroit" in this box, then the files will instead be named
"tr2zip.detroit.csv" and "tr2zip.detroit.lst". The only reason this
might matter to someone is if they intend to save the files to a local disk
and your browser is able to pick up and use the original name as the default
for the copy on your local disk. If this makes no sense to you, just ignore
the option - you won't need it.
Universe filtering options (Limiting the Geographic Universe)
For many applications by the time you get to this point on the input form
you'll be ready to click on the "Run Request" button to tell geocorr you are
finished with your specifications. With one very minor exception (having to
do with adding a distance-to-a-point variable to the output file) all of the
options that remain have to do with limiting the set of blocks that will be
processed by geocorr. This can be done by specifying either county, place or
metro-area level filters, or by specifying a point-and-distance select criteria.
We begin with the latter.
Point-and-Distance Criteria
There are 4 closely-related items that can be specified in this section. If
you enter values for a specified point location as decimal degrees of longitude
and latitude you are telling geocorr that you want it to calculate the distance
between that point and the "internal point" of each census block on the MABLE
database that is otherwise selected for processing (i.e. that first passes the
other geographic filters we'll be discussing, below.) Note that the longitude
value entered is assumed to be West longitude and the leading minus sign is
optional; if entered, it is ignored. Entering a value of "92.3456" is interpreted
as 92.3456 degrees west longitude. Geocorr expresses all coordinates with
this convention: longitudes on output files are also expressed as positive values
for west longitudes. Many GIS programs will require these values to be negated
if these coordinates are to be processed.
If the point location you want to use corresponds to a valid street
address then you may be able to take advantage of an address-location
service provided by the Mapquest corporation. A link to their web page
is provided. You'll need to do 2 additional clicks to get to the form
where you enter the street address. If the address is found, it will
return a map of the area with the address at the center. The lat-long
coordinates appear (in very small font) just above the map. You need to
write these down, back out of the MAPQUEST application (will take about
4 clicks on the "previous" button), and then manually enter the values
into the geocorr form. Be sure to keep the latitude and longitude
straight. If you enter the latitude in the longitude box and vice-versa
(like we did when we first tested this feature) you will not get any
geocodes selected. WE HAVE NOT VERIFIED AND CANNOT GUARANTEE THE
ACCURACY OF THE MAPQUEST COORDINATES. We did, however, run several test
addresses using local addresses and the results appeared to be correct.
You cannot enter just one of the coordinate values: if you specify a longitude
value then you must specify a latitude value as well.
The entry for "radius" has a default value of "0", which has a special meaning.
When "0" is not overridden it signifies that the distance calculation is
not to be used a filtering mechanism (i.e. no blocks are to be excluded
from processing based on their distance from the specified point.) In this
case it means that you want to carry along an extra variable in the output
file which represents the distance (in miles or kilometers - see the next
option) between the weighted centroid of the output record and the specified
point. Not a frequently used option, but possibly of some value. More typically,
however, you will enter a non-zero value for the radius option and when
this happens filtering takes place that will limit processing to blocks whose
internal points are within the specified distance from the specified point.
Using this option has a dramatic effect on the way you interpret the entries
in the output correlation list, since everything there has to be qualified by
starting with the initial filtering options. Typically, use of this option,
will be used with a very large target area (such as a complete state or metro
area) and the real correlation is between the n-mile circular area and that
large target area or areas. For example, you could specify a place-to-state
correlation list (source geocodes=place, target geocodes=state), with a metro
area filter (only the portions of the places with the specified metro area are
processed) and the coordinates of the metro airport entered with a radius of
3 miles specified. What results is that only blocks within 3 miles of the
airport are selected. On output the POP figure shows the total persons living
in the specified places and also in blocks that are within 3 miles of the
airport, and the AFACT variable will typically be 1.0 since ALL of the blocks
in the selected place will be associated with the same state. It is critical
to remember that the POP figure shown is not the total population of the place,
but only the population of the portion of the place within 3 miles of the
specified point. If you need to know what portion of the total population of
the place is within this circle, you will have to do some special postprocessing,
since this figure is not readily generated directly by geocorr.
The "check for kilometers" check box can be used to specify
that the radius value entered is to be interpreted as kilometers rather
than the default miles. The "Label of point" box allows you to enter text to
describe the location represented by the point. This label gets added to the
variable label on one of the documentary output files.
Note that whenever you specify the point option a DISTANCE variable is
added to the output file. This distance is in miles (or kilometers if you
checked that box) and represents the approximate distance from the calculated
weighted centroid of the output area (source/target intersection) and the
specified point. When you are using the point-and-radius options strictly as
a filter you may well have no interest in this item, but it is included in the
output nonetheless.
General information re filtering by geographic code lists
Geocorr allows you to specify lists of 3 types of geographic areas that will
be used to further limit the geographic universe ("further" meaning, following
state-level filtering which is mandatory and is dealt with under INPUT OPTIONS.)
The first box to check is preceded by the explanation for it use: to
specify that if multiple types of geocodes are used to filter that they each
be considered as sufficient rather than necessary conditions
for inclusion. For example if I do not check this box and I then enter a value
in the "place codes" box for Kansas City, Mo and a value in the "county codes"
box for Jackson county, Mo, then the universe would be limited to the portion
of Kansas City within Jackson County. But when I check this box then the
conditions become "or"-ed instead of "and"-ed, meaning I want all blocks that
are either in the city of Kansas City or in Jackson County. So now I
get all of the city (which I did not before) plus I get the parts of Jackson
County that are not inside the city.
To limit the universe based on one or more counties to be selected you can
enter their FIPS codes in the box provided. Be careful to enter full 5-digit
codes when processing multiple states; 3-digit codes are OK if you only selected
a single state for processing. Specifying a code for a state that was not
selected will cause an error and geocorr will not complete processing. If you
need to look up a county code, simply click on the "County codes" hyperlink.
You'll have to note what the codes are and enter them after returning from the
the linked-to code pages.
Similar processes apply for filtering by place and by metro area, except,
of course, that there is no option for entering a state portion of the codes.
Simple enter the 5-digit FIPS place codes or the 4-digit MSA, CMSA or PMSA metro codes
in the appropriate boxes.
Be sure to specify leading zeroes in all codes.
Don't forget to click on the SUBMIT button to tell the application
that you have finished with specifications and are ready for processing.
Accessing and Understanding the Output
When and if your request is successfully processed you should see
a screen with a series of filenames and descriptions, with each of the
filenames being a hyperlink to the file itself. There are four possible
output files, depending on what options you select. These are each
described.
The summary.log file
This file gives a very brief summary of what you requested and a little
about what the program did to satisfy the request. It tells you, for
example, how many census blocks were selected for processing and how many
lines (records, observations) actually made it to the output files. The
first line on this file tells you what your "Process id" was for this
request. If you have any problems with your request you need to be sure
to save this key number and report it to the authors with a description
of what went wrong. In most cases, you'll find that you should be able
to safely ignore this file unlesss you have a problem.
The geocorr.lst file
This is your listing (i.e. report format) file. It is usually the largest
of the output files, and often the most important. Note its size before
attempting to print or save it to your desktop since it may be quite
large. If you filled in that box on the form that let you specify a name
other than "geocorr" you should see that name here instead of geocorr.
The same applies for the .csv file, next.
The geocorr.csv file
This is your comma-delimited ascii file. You might want to
browse/preview it, but you'll most likely want to save this back on your
local disk. You should be able to easily load the file into a
spreadsheet for further local processing.
The varlst.lst file
This is a very short file that simply provides a little extra information about the
variables, as specified in the header record,
on your .csv file. If you did not request a comma-delimited
file then you will not get this file either: they are a matched set. The
report lists each of the variables (fields) on your file and adds a
descriptive label to help you identify what each means. You'll note that
the variables have a consistent order in this report and on the .csv file
with the source geocode fields appearing first, followed by the target
codes and then the weight variable, allocation factor(s) and any x-y
coordinate and distance-to-specified-point items. If you did not
explicitly specify "state" as one of your geocodes you will nonetheless
see it added to this file as well, usually after the last target geocode
and just before the weighting variable.
These files are all stored in a temporary directory and will remain there
for a period of 2 hours or so. But you should retrieve them to your
local system before exiting the application.
Last Update: 02-04-97