Data Quality Analysis – It requires a bit of all worlds

Doing a Data Quality Analysis (DQA) is a challenging task. You need to get your head into the domain of the business to understand what the data is all about. You need to talk to the users of the organization to understand how they work with data. And within hours you’ll most probably have a dozen of different data sources that you need to dig into.

Tools, tools, tools
The DQA is not trivial, actually the opposite, so often you’ll see that tool support is lacking. The analyst himself will have to use a toolset that is just as diverse and uncontrollable as the data he is trying to manage. The problem with such an approach is that it will eventually get in your way because you’re trying to get 2-3 independent tools to work nicely together, instead of just having these functions available where you need them.

I don’t mind combining tools at all, but we have to do so with care and acknowledge that combining tools also adds a lot of constraints to our working process. Let us for example say that you’re doing an analysis of string patterns in a set of Company names. You’re noticing a piece of the pattern that shows out to be legal forms like GmbH, Ltd, A/S and so on. You want to separate the legal form from the company name, but switching between tools that do the pattern finding (a profiling tool) and the separation (a transformation, perhaps even ETL, tool) means that you have to go back to step 1 in your workflow and re-do all the steps in your flow in different tools. If your chain of analysis steps is more than just a few steps long, then you’re out for a lot of waste. Continue reading ‘Data Quality Analysis – It requires a bit of all worlds’

Data Quality – who needs it!

escher_gezichtsbedrog2Okay, so the theme Data Quality (DQ) has been around for more than a couple of years now. If you are reading this, chances are that you are obviously already informed on what’s available.

I came from a large logistics company, where DQ was preached heavily and seen as a way of reducing costs. The further though we went into what DQ could actually mean – the more vague and indirect the costs and effects seemed to be. The one thing we knew we really suffered from it was that we had a whole lot of duplicates in the system. This was always visible and the effects from it very tangible. They effectively helped screw up a perfectly good CRM tool. The solution was simple. Buy a deduplication tool and identify the duplicates!

Continue reading ‘Data Quality – who needs it!’