Don't forget your keys

Data cleansing is the process of taking the data contained in a data base and correcting the errors in it. It is probably one of the most hated jobs in the industry as it tends to be both tedious and exacting. The final two nails in its coffin is that it is almost never finished and almost never completely correct. All in all a miserable and mostly unsatisfying experience.

One of the secrets to success in data cleansing is in selecting keys i.e. the fields that allow the identification of the record to be updated.
To work best a key should be unique. A good key identifies exactly one item (often a record) to be updated. Furthermore, it is best if they do not get recycled.

Recycling keys runs the risk of updating the wrong record. For example deleting a record with a key and then creating a new record with the same key ensures that the keys are unique at any moment in the database but potentially identifies 2 different records, one of which no longer exists. Thus the update generated from the cleansed data might introduce an error into the database.

Ensuring that you have suitable keys in the extracted data, that the cleaning process maintains the association between the keys and the records and choosing a key that allows the easy updating of data in the database greatly eases the difficulty of the data cleansing process. This is one of the processes where thinking backward from the end of the process greatly enhances the chances of success.