In his excellent post “New matching engines go beyond apples and oranges”, Winfried van Holland states that traditional matching engines are based on atomic string comparison functions, like match-codes, phonetic comparison, Levenshtein string distance and n-gram comparisons. He further argues that the drawback of these functions is that it’s not always clear for what purpose one needs to utilize a particular function, and that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names.
Good point! In essence, this is the basis of the discussion on the matching approach within customer data management: As intelligent automated matching of records distributed over various heterogeneous data sources is an essential pre-requisite for correct and adequate customer data integration, there are many opinions on how to achieve this.
In theories on data matching, there are in general two methods that prevail when customer data management is concerned: deterministic and probabilistic matching. Continue reading ‘High precision matching – apples, oranges or fruit salad?’