Supporting Integration
Cleansing and matching data are two key activities which can help support integration.
These are distinct steps which are often done one after the other specifically where you have address or street data which does not originate from a definitive source, but where you want to determine what their respective UPRNs or USRNs are.
For example you may want to cleanse and match data sourced from:
An export of manually entered addresses / streets from a departmental system
An entire export of a system that does not utilise definitive address / street data in an attempt to determine missing / incorrect addresses or map it spatially.
An entire export of a system that does not utilise definitive address / street data in order to provide a baseline match for integration purposes.
What is data cleansing?
Data cleansing is typically carried out before any matching is conducted and refers to the process of standardising, removing errors and ensuring data is consistent. The more consistent the data, then the increased likelihood of it successfully matching to an official address or street record.
Some of the common cleansing operations can involve:
Removing duplicates
Correcting spelling mistakes
Removing punctuation / non-supported characters
Expanding out abbreviations (apart from "St" or "St." for "Saint")
Ensuring the character casing is as you require it (lower, upper, proper etc)
Inserting known, yet missing details
Splitting a single line address into its various address components
What is data matching?
Data matching refers to the process of taking an address or street entry which does not originate from an official source and trying to find its equivalent in a definitive data source. Doing this allows you to see what the address / street should be (highlighting inconsistencies) as well as providing its UPRN or USRN.
Matching can either be done manually taking each record to match one-by-one and searching for it in a definitive source. This can take a significant amount of time depending on the volume of records to match, but does enable you to obtain a result for small numbers of records relatively quickly.
Other approaches are possible, such as automated or semi-automated (requires manual review) matching, however this requires specialist software or matching services to be utilised.
These approaches provide the ability to rapidly match a high volume of records.
Further guidance and services
For further information about cleansing and matching, please take a look at our article on "Address matching methodologies"
We also have a blog on "Understanding false results in matching address data":
In addition, GeoPlace offer a range of address cleansing and matching services:
Your current Gazetteer Management System (GMS) supplier may also offer services or software to enable you to conduct address cleansing and matching so it may be worth speaking to them as well.