Okay, so the theme Data Quality (DQ) has been around for more than a couple of years now. If you are reading this, chances are that you are obviously already informed on what’s available.
I came from a large logistics company, where DQ was preached heavily and seen as a way of reducing costs. The further though we went into what DQ could actually mean – the more vague and indirect the costs and effects seemed to be. The one thing we knew we really suffered from it was that we had a whole lot of duplicates in the system. This was always visible and the effects from it very tangible. They effectively helped screw up a perfectly good CRM tool. The solution was simple. Buy a deduplication tool and identify the duplicates!
Now there’s a good few deduplication tools out on the market. All of them will tell you how good they are at finding duplicates using mathematical-, probabilistic- and fuzzy matching. The list can go on and on. Where all providers though seem to stop short, is what they do with the duplicates found. Now all vendors have ways of identifying the duplicate pairs or even duplicate groups and most vendors will offer clever and fancy ways of bringing the duplicates together, but in almost all of these cases this happens OUTSIDE the clients’ current IT systems. Of course bulk loading old/new data back into the IT system is always so simple! So much fun! IT/ICS departments are always so understanding and helpful!!
This is what we had to find out for ourselves in a painful fashion. We quickly came to realise that finding the duplicates is barely even the tip of the iceberg. Actually trying to group the information together in such a way as to create a unique (golden) record was going to cost the company a lot of money. It would need an involvement of a Systems Integrator – because the problems are never just related to only one system, right? Multiple man years spent in project time (God! Consultants just prey for those types of projects.) Naturally, project costs rapidly rising into million of euros.
Where’s the saving then? I mean, the company has lived with the same problem for years right? So it can’t be that bad, can it?!? (Ever wondered exactly how the banks got into the current credit crisis?)
Unfortunately for the logistics company – the project never really got off the ground. Although the benefits of non duplicates were visible: reduced overhead – a more streamlined sales force, increased effectiveness, the risks about project timelines and overall cost killed it.
All is not lost. Good DQ providers will also offer software that effectively stops duplicates being allowed on to the clients system. Of course the safest way of ensuring a high level of DQ is to capture all the mistakes by their entry. Sounds wonderful, but like the Irish saying goes “If I was you – I wouldn’t start from here.” The DQ providers are only asked for a solution because the need has already been identified by the prospective client. The client has no other choice but to start from where they are. Just stopping the problem coming in through the front doors does not make the waste that’s already on the system magically disappear.
So data quality tools will of course offer solutions – but far too often they won’t go far enough! DQ can never really be about just installing a CD to solve the problem. Good expertise and guidance is necessary. It will always require a deeper understanding of the origin of the problem, a sharp focus on the companies’ pain points and a tight integration into the IT landscape. Suddenly a simple CD solution doesn’t fit anywhere; it quickly becomes an exponential cost, sucking up time and resources simply to make it work. At a time where companies are looking to become more efficient, cut costs and unfortunately overhead too, is that really the best way?