
One of my current projects has been to take an intelligent approach to the removal of duplicates already on an existing system (SAP).
The client has already successfully used our software in their IT environment to effectively stop all new duplicates being entered into SAP. They now want to use the same technology to remove all existing duplicates. Their idea is so simple I am amazed that I have not heard of it being done elsewhere before.
Every evening the whole clients SAP database will be searched for duplicates in their Companies and Contacts (> 3 million records deduplicated in less than an hour!) The results are stored in a master result table that SAP has been given access to. Now depending on the likelihood of the match, the duplicates can fall into one of three categories: automatic merging, manual merging or no merge. If the score for the whole duplicate group is above the threshold for automatic merging then the automatic merging process is started. Continue reading ‘Deduplication, first time wrong?’
Okay, so the theme Data Quality (DQ) has been around for more than a couple of years now. If you are reading this, chances are that you are obviously already informed on what’s available.
The dutch radio program Andersmans Veren (Radio 2, AVRO) broadcasted a special on names on February 15th. Although not related to any burning data quality issues it is almost two hours of fun to listen to. From Theo Maassen to Toon Hermans, and of course Hans Teeuwen with his famous sketch. Point your browser to the