This content applies to a previous version of CARTO
In October 2021 we released a new version of our platform. You can find the latest documentation at docs.carto.com
Cleaning Data for Geocoding
Why do you need this?
You have data you need to geocode.
Whether you geocode your data by using the “Geocoding Analysis” in Builder, or with one of the Geocoding Functions available in the Data Services API, it will require data that you provide. This article will help to understand the best practices for the best results.
Data Best Practices
One of the most important things to consider before geocoding anywhere, with any tool, is data cleanliness. If your data is inconsistent, or wrong, geocoders will not be able to match your input with any of their stored values that relate to a coordinate pair. Let’s look at some examples of addresses:
CARTO DC, 4075 Wilson Blvd 8th Floor, Arlington, VA 22203CARTO HQ, 114 W 26th FL 3 New York NY 10001CARTO Madrid, Plaza Callao 4, Planta 2 (Gran Via 46), 28013 Madrid Espana
The best way to separate these addresses into columns would be as follows:
| name | street | street_2 | city | admin_1 | country | postal_code |
|---|---|---|---|---|---|---|
| CARTO DC | 4075 Wilson Blvd | 8th Floor | Arlington | VA | USA | 22203 |
| CARTO HQ | 114 W 26th | FL 3 | New York | NY | USA | 10001 |
| CARTO Madrid | Grand Via 46 | Planta 2 | Madrid | Madrid | Espana | 28013 |
Most geocoders will have different parameters for each of these columns to fit in. Our street level geocoder is able to take a free form address as the first parameter, but you will still want to only include street address related details, separated with commas (e.g. 4075 Wilson Blvd, Arlington, VA) rather than including business names and floor numbers.
Know your Geocoder
Know what type of information your geocoder is looking for! Different geocoding tools have specific requirements. For example, the Country Geocoder recognizes country names through various synonyms, English names, or ISO codes. Always review your geocoder’s documentation to understand its expected input format.
