Minimum geocoding match rates: an international study of the impact of data and areal unit sizes


The analysis of geographically referenced data, specifically point data, is predicated on the accurate geocoding of those data. Geocoding refers to the process in which geographically referenced data (addresses, for example) are placed on a map. This process may lead to issues with positional accuracy or the inability to geocode an address. In this paper, we conduct an international investigation into the impact of the (in)ability to geocode an address on the resulting spatial pattern. We use a variety of point data sets of crime events (varying numbers of events and types of crime), a variety of areal units of analysis (varying the number and size of areal units), from a variety of countries (varying underlying administrative systems), and a locally-based spatial point pattern test to find the levels of geocoding match rates to maintain the spatial patterns of the original data when addresses are missing at random. We find that the level of geocoding success depends on the number of points and the number of areal units under analysis, but generally show that the necessary levels of geocoding success are lower than found in previous research. This finding is consistent across different national contexts.

International Journal of Geographical Information Science