Your name is too “common”….


A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The bank states that the refusal is nothing personal, since nobody with the name Li Jun will be accepted as customer in the near future….. In the meanttime, Li Jun is taking legal action against the bank. Continue reading ‘Your name is too “common”….’

How-to create the Golden Record


The term Golden Record is closely related to Customer Data Integration or MDM for Customer data. It refers to the “single truth” which has been created or calculated from all those duplicate customer records from different systems. This post is not about finding or tagging all those duplicate records. There all kinds of ways to find them using advanced statistical methods, fuzzy matching etc.

But what do you once you have found the duplicates. How do you create the best possible customer data out of all gathered elements? Continue reading ‘How-to create the Golden Record’

Deduplication, first time wrong?


One of my current projects has been to take an intelligent approach to the removal of duplicates already on an existing system (SAP).

The client has already successfully used our software in their IT environment to effectively stop all new duplicates being entered into SAP. They now want to use the same technology to remove all existing duplicates. Their idea is so simple I am amazed that I have not heard of it being done elsewhere before.

Every evening the whole clients SAP database will be searched for duplicates in their Companies and Contacts (> 3 million records deduplicated in less than an hour!) The results are stored in a master result table that SAP has been given access to. Now depending on the likelihood of the match, the duplicates can fall into one of three categories: automatic merging, manual merging or no merge. If the score for the whole duplicate group is above the threshold for automatic merging then the automatic merging process is started. Continue reading ‘Deduplication, first time wrong?’

Data Quality – who needs it!

escher_gezichtsbedrog2Okay, so the theme Data Quality (DQ) has been around for more than a couple of years now. If you are reading this, chances are that you are obviously already informed on what’s available.

I came from a large logistics company, where DQ was preached heavily and seen as a way of reducing costs. The further though we went into what DQ could actually mean – the more vague and indirect the costs and effects seemed to be. The one thing we knew we really suffered from it was that we had a whole lot of duplicates in the system. This was always visible and the effects from it very tangible. They effectively helped screw up a perfectly good CRM tool. The solution was simple. Buy a deduplication tool and identify the duplicates!

Continue reading ‘Data Quality – who needs it!’