Street Address Geocoding

Manifold System includes street address geocoding capability that becomes operational when either the Manifold US streets geocoding database or Microsoft's MapPoint product is installed. Without the US streets geocoding database or MapPoint, Manifold's geocoding commands cannot function. The Manifold geocoding database supports US street address geocoding only. The Microsoft MapPoint North American edition product will allow street address geocoding in the US and Canada, while the European edition of MapPoint will allow street address geocoding in eleven European countries.

For a general introduction to geocoding, see the About Geocoding topic. For notes on MapPoint usage, see the Geocoding with MapPoint topic.

The Manifold geocoder uses the Manifold US streets geocoding database (based on the US government's official address database published in the TIGER/Line data set) to find estimated positions for street addresses in the United States. It works best in urban and suburban areas where street addresses follow reasonably regular patterns.

Very Important: The Manifold geocoding database does not capture all possible street addresses nor can it guarantee the exact location of any address. Therefore, it must not be used for applications, such as 911 or other emergency response applications, which require every address to be exactly located. Note that the Manifold System End User License Agreement (EULA) covers possession and use of the geocoding database and that the EULA specifically excludes uses, such as emergency response, that require fail-safe performance.

Nonetheless, the Manifold geocoder working together with the Manifold geocoding database is a good choice for many GIS tasks, such as demographic or marketing studies, that require geocoding of a reasonably high percentage of addresses. For such uses the Manifold geocoder is highly effective and a spectacularly good value.

If Microsoft MapPoint is installed, the MapPoint geocoder provides slightly greater accuracy in the United States, finding a few percentage points more addresses in large geocoding projects than is possible using the Manifold geocoding database. The MapPoint geocoder uses a variety of data sources to provide slightly better address recognition than is possible with government data only.

Records that are to be geocoded must have the following characteristics:

When geocoding addresses, most effort will be expended on cleaning addresses and converting those that are in non-standard form into standard form. This is accomplished with the Standardize command. Once addresses have been standardized, we can use the Geocode command to geocode them. The Manifold geocoding database must be installed on for either of these commands to function.

The Standardize and Geocode dialogs are related in how they use the geocoding database and both dialogs use the same Unmatched Records dialog to allow manual adjustments to records that cannot be automatically standardized or geocoded. The Standardize dialog uses the geocoding database to look up valid zip codes, city names, street names and address ranges. It can use that information to identify errors in records and to offer alternatives. The Geocode dialog uses presumably valid address information to determine, if possible, the latitude and longitude location of a specific address.

To geocode a table of street addresses:

  1. Open the table of addresses. If the addresses are not known to be absolutely clean and well formed we should first run Table - Address - Standardize to eliminate common problems.

  2. Choose Table - Address - Geocode. Specify the fields that are to be used for the standard geocoding fields of Address, City, State and Zip. Specify which fields are to contain the created latitude and longitude values or let Manifold create new columns for these values. Press OK.

  3. Any records that cannot be automatically located in the geocoding database will be presented in the Unmatched Records dialog. This dialog will allow us to edit the record to make corrections. It will present a menu of nearest matches, if any, from which we can quickly choose a desired match.

After geocoding the table will contain latitude and longitude columns that will contain the latitude and longitude location of each record.

Standardize

The Standardize command is used to convert addresses from non-standard forms into the standard form expected by the Manifold geocoder. It has several functions:

The Standardize command will process a table and then raise the Unmatched Records dialog to allow editing and selection of proposed matches the system thinks are close to what is intended.

Standardize Addresses Dialog Controls

generate … using
Specify the columns to be used for standardized address fields and from which column each standardized address should be extracted. Either new columns may be automatically created or existing columns may be used.
Address
Column to be used for the standard field that contains the building number and street name. Choosing [New Column] will add a new column to the table called Address.
City
Column to be used for the standard field that contains the city name. Choosing [New Column] will add a new column to the table called City.
State
Column to be used for the standard field that contains the state name. Choosing [New Column] will add a new column to the table called State.
Zip
Column to be used for the standard field that contains the five-digit postal ZIP code. Choosing [New Column] will add a new column to the table called Zip.
Country
Optional column for country name. The Manifold Geocoding engine recognizes typical variations of country names ("US," "USA," "United States," "The United States of America," and so on).
Status
If desired, add a column that reports the status of a particular record after processing. [None] does not report status, [New Column] will add a new column to the table called Status.
Fail on
Criterion to be used to declare a particular record to be unmatched.
Normalize addresses
Convert address nomenclature into standardized abbreviations, such as "Rd" for "Road," state names into official postal abbreviations. etc.
Skip completed records
Do not process records that already have values in the designated Address, City, State and Zip fields.

Fail on
Options

any error
Does not process record unless it is a perfect match in all fields. Equivalent to setting an "unknown building" error.
unknown street name, possible misspelling
Matches to Zip and City and does not process record unless the street name is an exact match.
unknown street name, no similar names
Matches to Zip and City, and attempts to match to similar street names if an exact match to the street name cannot be found. Uses "Soundex" and similar algorithms to attempt to find the right street in case of a street name misspelling. This is the default setting.
unknown zip
Do not process record if the zip code cannot be found.
critical error
Stop processing only on problems with an incomplete geocoding database or on hardware failure.

Status Values

When reporting the status of a record's processing in a Status field the following values are used, listed in order of increasing severity:

ok
No errors in processing record.
unknown building
The geocoder was able to locate the zip code and the street name, but the street does not appear to contain an address of the given number.
unknown street name, possible misspelling
The geocoder was able to locate the zip code but not the street name; however, the zip code does contain some streets with names similar to the street name.
unknown street name, no similar names
The geocoder was able to locate the zip code but not the street name. The zip code does contain any streets with names similar to the street name.
unknown zip code
The geocoder could not locate the zip code.
critical error
The geocoder could not function due to an incomplete geocoding database, a problem accessing the database or a hardware failure.

Search Priority

The biggest difficulties encountered when standardizing addresses arise from errors in the address. Addresses may be incomplete, missing zip codes, for example, or they may incorporate typographical or other errors that result in erroneous zip codes, city names, street names and building numbers. At times, cities may be called by local names that are different than those officially recorded by the US government. For example, a small town located North of Boston is called "Manchester," "Manchester-by-the-Sea" and "Manchester by the Sea".

Manifold helps deal with such errors by prioritizing searches for standardization and geocoding using the following order of precedence:

The setting used for the Fail on parameter allow Manifold to accept failed address matches up to a given level of severity. For example, the default setting of unknown street name, no similar names for the Fail on parameter will allow Manifold to automatically accept records for which an unknown building or an unknown street name, possible misspelling error would be reported.

Geocode

The Geocode command takes an address in standard form, finds it in the Manifold geocoding database and then produces latitude and longitude values for the location of the address. Addresses to be geocoded must include the four standard Address, City, State and Zip fields.

Records that cannot be located within the error level specified by the Fail on parameter will be reported in an Unmatched Records dialog to allow editing and selection of proposed matches the system determines may be the correct locations. If a particular address cannot be located on a street, the system will report "Building not found" and will offer a match to the address. Choosing that match will choose a location for the address that is at the midpoint of the street segment for the street of that name.

Records that cannot be geocoded at all, even after manual intervention, will have zero values for latitude and longitude. When the table is copied and pasted as a drawing these records will be ignored if the Skip zero latitude / longitude records option is checked (the default setting) in the Paste As Drawing dialog.

If an address cannot be found in the available street address ranges for a specific street the command will choose a point near the middle of the street segment.

Geocode Addresses Dialog Controls

generate … using
Specify the columns to be used for Longitude and Latitude, Status and Match type fields, and which columns in the table should be used for the standardized Address, City, State, and Zip fields.
Longitude
Table column to be used for the longitude of geocoded record. Contains 0 if the record was not geocoded. Use [New Column] to automatically create a new column called Longitude.
Latitude
Table column to be used for the latitude of geocoded record. Contains 0 if the record was not geocoded. Use [New Column] to automatically create a new column called Latitude.
Status
If desired, add a column that reports the status of a particular record after processing. [None] does not report status, [New Column] will add a new column to the table called Status.
Match type
If desired, add a column that reports the geocoding match type of a particular record after processing. [None] does not report match type, [New Column] will add a new column to the table called Match type.
Fail on
Criterion to be used to declare a particular record to be unmatched.
Offset Locations by
Offset the location for each geocoded record by the given number of units from the street line, to the left or right side of the street based on whether the address was found in the left or right range.
Skip completed records
Do not process records that already have values in the designated Address, City, State and Zip fields.

Fail on
Options

any error
Does not process record unless it is a perfect match in all fields. Equivalent to setting an "unknown building" error.
unknown street name, possible misspelling
Matches to Zip and City and does not process record unless the street name is an exact match.
unknown street name, no similar names
Matches to Zip and City, and attempts to match to similar street names if an exact match to the street name cannot be found. Uses "Soundex" and similar algorithms to attempt to find the right street in case of a street name misspelling. This is the default setting.
unknown zip
Do not process record if the zip code cannot be found.
critical error
Stop processing only on problems with an incomplete geocoding database or on hardware failure.

Match type Values

When reporting the status of a record's processing in a Match Type field the following values are used, listed in order from most precise to least precise:

building
The zip code, street name and building number were used to identify a location at the building's number.
street
The zip code and acceptable street name were found but not the building's number so the location has been placed at the center of the bounding box for the street.
zip
Only a zip code was used so the location has been placed at the center of the bounding box for the zip code.
city
Only the city name was used so the location has been placed at the center of the bounding box for the zip code.
(empty string)
The record could not be geocoded and values of 0 have been written into both Latitude and Longitude.

Selections

Both the Standardize and the Geocode commands are auto-scoped: if a selection is present in the table they will operate only on the selected records.

Example

images/eg_geocode_01.png

Suppose we have a table like that above, which contains two text fields: a Name field and a Street Address field. The table lists a few sushi restaurants near the USGS facility in Menlo Park, California. To geocode this table we must first standardize the addresses it contains using the Standardize command by choosing Table - Address - Standardize.

images/eg_geocode_02.png

The Standardize Addresses dialog allows us to designate which fields will be used as sources to generate the four standard Address, City, State and Zip fields and which existing fields, if desired, will host the four standard fields. We choose Street Address as the source for the new fields and [New Column] for each new field to be generated, so that a new column is created for each.

images/eg_geocode_03.png

The result is that four new columns, Address, City, State and Zip, have been created in the table and the relevant parts for each record have been extracted from the Street Address column and placed in the new columns.

images/eg_geocode_04.png

Let's hide the Street Address column to reduce the size of the illustrations. The table above is now in standard form so we can geocode it using Table - Address - Geocode.

images/eg_geocode_05.png

The Geocode Addresses dialog allows us to choose which columns will be used for the four standard fields used by the Manifold geocoder. We can also choose which columns will receive the generated latitude and longitude locations.

The Offset locations by checkbox will automatically position geocoded points to the left or right side of a street based on which side of the street that address falls on (according to the geocoding database). The distance and units boxes allow us to choose how far from the street centerline the geocoded points will be offset. The default offset is 50 feet.

The Skip completed records checkbox tells the dialog to ignore records that already contain data in the target fields.

images/eg_geocode_06.png

If all records can be matched by the geocoder the result will be the addition of two new columns to the table called Latitude and Longitude that contain the latitude and longitude locations for each address. The table is now a geocoded table and can be copied and pasted as a drawing.

images/eg_geocode_07.png

If we copy the table and paste it as a drawing, we can drag and drop the drawing into a map created from a drawing of roads in the Menlo Park and Palo Alto area. The points have been formatted as bright green dots.

The Unmatched Records Dialog

The Unmatched Records dialog is used with both the Standardize and the Geocode commands. In both cases, records that cannot be correctly standardized or geocoded are presented one by one within the Unmatched Records dialog to allow users to deal with each record on a case-by-case basis.

The dialog presents the address record being processed in editable boxes for Address, City, State and Zip fields. A list of possible alternatives found in the geocoding database is presented in a Found pane. We can edit the fields manually or, if we see a match that we like in the Found pane we can click on it to highlight the match and load it into the edit boxes in the dialog. Pressing Accept will save the values from the edit boxes into the record in the table, including the latitude and longitude in the case of the Geocode command.

If desired (say in the case of an obvious typographical error in the name of a street) we can edit the values in the Address, City, State and Zip edit boxes and then press Look Up to direct the system to check the new values against the geocoding database to see if they can be located. If the location is found we can press Accept to accept the geocoding for the edited address and to write the edited address values back into the table for that record. If we would like to go back to the previous set of values, before editing, we can press the Previous button.

If we would like to skip this record and continue with other unmatched records, we press the Skip button.

Unmatched Record Dialog Controls

Record
Current record as it has been read from the table. A read-only edit box that can be used with Copy.
Address
Address value now in use.
City
City value now in use.
State
State value now in use.
Zip
Zip value now in use.
(result)
Result of last Look Up operation.
Found pane
A list of possible matches found in the geocoding database. Clicking on one of the matches will load it into the edit boxes. Double-clicking on one of the matches will load it and immediately Accept it as well and move on to the next record.
Accept
Accept the current values in the edit boxes. Click on one of the choices in the Found pane and then choose Accept. This updates the table with specified values, replacing the current values of the record.
Look Up
Look up the current address in the edit boxes in the geocoding database. When used with Standardize, this checks the possible validity of the address. When used with Geocode, this looks up the location of the address.
Previous
Reload the previous values for this address into the edit box. Enabled if the address has been edited, either by choosing a possible match in the Found pane or by manually editing the address.
Skip
Skip this record and continue with other unmatched records.
Close
Skip this and all remaining unmatched records. Note that any Accept commands that have been issued for any unmatched records before a Close command will have already updated the table. Close is not the same as a Cancel of all changes made in this session with the Unmatched Records dialog.

The Unmatched Records dialog has keyboard accelerators for the Accept, Look Up, Previous, Skip and Close commands bound to ALT E, L, P, K and O keys. Pressing ALT-E is the same as clicking Accept. Using keyboard accelerators can help deal with very long lists of unmatched records.

Skipping Unmatched Records when Creating Drawings

The results of the Geocode command are latitude and longitude values for each record that was successfully geocoded. Unmatched records that were skipped will have 0 values for their latitudes and longitudes. When copying a table that contains unmatched records and pasting it as a drawing, make sure the Skip zero latitude / longitude records checkbox is checked in the Paste As Drawing dialog. This will make sure that the unmatched records having 0 values for their latitudes and longitudes are not pasted as a cluster of points off the coast of Africa.

Geocoding SQL Extensions

Manifold SQL includes geocoding extensions that operate with Manifold's geocoding engine to perform spatial operations based upon an address string or zip code. Geocoding extensions will not work unless the US streets geocoding database is correctly installed. Geocoding extensions will not work with Manifold IMS unless the US streets geocoding database is installed within the Manifold application installation folder (normally, C:\Program Files\Manifold System). Therefore, the US streets geocoding database should be installed in the Manifold application installation folder on machines on which Manifold IMS operates.

Boolean CloseToAddress(Number ID, String Address, Number Distance, [String Unit])

Given an object ID, an address string, a distance and an optional distance unit determine if the object lies within the specified distance of the address.

Boolean CloseToZip(Number ID, String Zip, Number Distance, [String Unit])

Given an object ID, a ZIP code string, a distance and an optional distance unit determine if the object lies within the specified distance of the zip code centroid.

Number DistanceToAddress(Number ID, String Address, [String Unit])

Given an object ID, an address string, and an optional distance unit computes the distance between the object and the address.

Number DistanceToZip(Number ID, String Zip, [String Unit])

Given an object ID, a ZIP code string, and an optional distance unit computes the distance between the object and the zip code centroid.

Notes on usage:

Geocoding Function Examples

SELECT * FROM Dealers

WHERE CloseToAddress(ID, "330 Lytton Ave, Palo Alto, CA, 94301", 10, "mi")

SELECT * FROM Dealers

WHERE DistanceToAddress(ID, "330 Lytton Ave, Palo Alto, CA, 94301", "mi") <= 10

SELECT * FROM Dealers

WHERE CloseToZip(ID, "94301", 10, "mi")

SELECT * FROM Dealers

WHERE DistanceToZip(ID, "94301", "mi") <= 10

All four examples have a similar function. The first query selects all objects in Dealers that are within 10 miles of the given address using the CloseToAddress function, while the second example performs the same task using the DistanceToAddress function. The third and fourth examples perform the same functions using the 94301 ZIP code.

To keep the user interface simple and to avoid the complication of dealing with possible user errors when entering address information into forms, many web applications with IMS will use the CloseToZip or DistanceToZip functions since these require the user to merely enter the ZIP code correctly. For many applications, such as locating a dealer, finding the closest objects to the ZIP centroid provides acceptable accuracy.

See the Units topic for a list of unit abbreviations that may be used to specify optional distance units.

GoTo Extensions

When the US street address geocoding database is installed, the Edit - GoTo command will allow a GoTo to an Address or to a Zip code within the US. The address may be a full street address, or it may be a partial address, such as "Atlanta" or "Atlanta, GA" or "GA." Manifold includes a gazetteer of large city names so that "Atlanta" will find the large city in Georgia and not one of the various small towns of that name throughout the US. City names will take priority over states when spelled out, so that "Washington" will find the capital city of the US and not the state. To find the state, use "WA."

Installing the Manifold Geocoding Database

Both the Standardize and the Geocode commands require installation of the Manifold US streets geocoding database (or Microsoft MapPoint) as do the geocoding SQL extensions. If the database is not installed on the computer system these commands and functions will not be available. The full US streets geocoding data set requires approximately 950 MB of free space on disk. See your Microsoft MapPoint documentation for MapPoint installation requirements and procedures.

Installing the Manifold US geocoding database:

  1. Insert the Manifold US Geocoding Data CD into the CD drive of your system and launch Windows Explorer. In Windows Explorer, browse over to the CD and double click on the GCDB.msi installation program to launch it. (If Windows Explorer has been set to hide extensions for well-known file types, this file may be listed in Windows Explorer as GCDB without the .msi extension.)

  2. The installer will offer to install the geocoding database into a default folder, C:\Program Files\Manifold System\GCDB, within the default installation folder for Manifold System. If you have installed Manifold System in a different location on your hard disk it is strongly recommended to install the geocoding data within the folder used for Manifold System so the geocoding SQL extensions will be available within Manifold IMS.

  3. Launch Manifold. In the Tools - Options - File Locations pane, specify the folder used to install the geocoding database for the Geocoding Database folder and press OK.

The states.dat file must always be available on your hard disk. The geocoding database is organized by US states with a file for each state. Each state file ends in a .dat extension and is named using the state postal abbreviation. For example, the geocoding database files for California and New York are ca.dat and ny.dat respectively.

The GCDB.msi installer will install geocoding database files for all US states. However, only those state files for which addresses will be geocoded need be located on the hard disk. Files for states that will not be used may be removed to free up disk space. For example, if we will be geocoding street addresses only in the state of California and no other state, we may delete all files from the GCDB folder on hard disk except the states.dat file and the ca.dat file.

To uninstall the US geocoding database and entirely remove all files, we may use the Windows Control Panel Add / Remove Programs applet. Removing the geocoding database will render all geocoding functions inoperative.

Accuracy

The accuracy of the Manifold geocoder depends almost entirely upon the accuracy of the US streets geocoding database it uses. The Manifold geocoding database uses address data extracted from the US Bureau of the Census TIGER/Line data set. Although TIGER/Line is the federal standard for address accuracy used to support the Constitutional requirement of a census that counts every citizen, even this mammoth data set does not accurately capture all possible addresses in the US. Although TIGER/Line is updated every few years using a network of many Census Bureau field offices it does not capture address exceptions nor does it provide a satisfying level of geospatial accuracy in rural areas.

A further limitation of TIGER/Line is that due to constant churning of zip codes by the US Postal Service it is possible (although very rare) that a zip code for a valid geographic location might not be matched. A more frequent problem is the appearance of zip codes within address records that are abstract zip codes (such as those assigned to some ships in the US Navy) that do not correspond to a geographical location within the United States.

If a street address cannot be found within the Manifold geocoding database, resulting in an unmatched record with no options presented, the user has several choices to deal with the unmatched address:

When using the geocoder for demographic or marketing studies it is usually safe to ignore unmatched addresses because addresses that can not be found in TIGER/Line are usually randomly dispersed. After all, if within a sample of 10,000 addresses one can obtain 90% geocoding without any effort to identify unmatched records the 9,000 data points thus obtained will normally be highly representative of the characteristics of the overall data set. For some applications, of course, achieving a match for every record may be a sufficiently important objective to merit a significant amount of time working with the Unmatched Records dialog or manual geocoding or use of a different geocoder.

Troubleshooting

If the geocoder does not work, check the following:

If an address cannot be located, check the following:

See Also

About Geocoding

Geocoding with MapPoint

Geocoding Extensions

Back to Manifold Home Page