Data quality consultants will tell you that collecting data correctly, getting it right first time is essential, whilst in contrast almost every organisation actually puts most of their budget and labour into attempting to cleanse data after collection.
The proactive versus reactive debate rages, but in fact data quality must be both a proactive and a reactive process. The data will dictate which to use, or whether both are required.
The first time right proactive approach applies to all data. Some data, such as transactional data, must be collected correctly because corrections after the event are time-consuming and very costly. Note the wrong information when a product purchase is telephoned in, for example, and your customer, the call centre staff, the warehouse staff and so on will all need to be included when trying to correct the error after the wrong products are delivered, with the accompanying damage to your reputation and your brand image.
When transactional data is collected correctly first time, no further cleansing will ever be required – the record represents a historic event that will never change.
Other data may be cleansed later, but reactive cleansing never achieves the high level of data accuracy that right first time practices achieve. Postal addresses are a good example. Many addresses can be found and validated against postal data files after collection. But without the ability to hold a dialogue with the customer, reactive processing will only correct a percentage of the issues.
The situation with personal names is far more serious. People give their children’s names unusual spellings, casings and other twists to mark them out from the crowd. Without being collected correctly first time, no automated post-processing can correct these errors. In fact, many attempts to correct the errors will result in further inaccuracies being introduced. Organisations also often reactively gender code their customers based on given names. People with the same names can have different genders both within and between cultures, and as in our mobile world people move around, you can no longer assume that the American Joan is a female and not a male originating from Spain, for example. This sort of post-processing significantly reduces data accuracy.
So if first time right is always the best policy for data quality, where does reactive data processing come in? When the data refers to an entity, you need to remember that the entity’s situation may change over time. If it is a person they may marry, change their name, move house, change job etc. If it is a city with a postal code the city name may change, or the postal code, or the street name may change. Somebody living in Sudan on 8th July 2011 may find themselves living in South Sudan on 9th July. Somebody buying with Estonian Krooni on 31st December 2010 will be buying in Euros on 1st January 2011. It is remarkable how we attempt to track the changes in the lifecycles of our customers but tend to overlook the dynamic changes in their environments which affect them and you – they happen very much more often than we think.
So the lesson must be that first time right is always right and should never be replaced with reactive processing. Reactive processing is required when real world change affects our data.