Short question, complex answer: Who is who and what is what in your database?

 

Any organization that deals with customer, prospect, supplier, distributor, product and service information, uses all kinds of data in their day-to-day business processes. Identification of a customer or a product within an automated system, using a specific id-number, the name or any other identifying feature, is a key issue in these processes. Furthermore, it is a task that needs considerable attention, since the collection and management of data is essentially error-prone. People make mistakes, names are understood incorrectly, numbers are typed in the wrong order; there are just too many reasons for defective data and poor information quality.

The collective term ‘business data’ is often used without a precise notion of what business data actually contain. It is not just the customer identification numbers and product codes. Naturally, the sort and the importance of data used in a business process will differ from organization to organization. However, a closer look at the seemingly endless variation will show that names and addresses of persons and organizations are as detailed and complicated as they are identifying. The following classification will show the details of names, addresses and complementary data.

* In personal names we will encounter: given (first) names, middle names, initials, surnames, surname prefixes, surname suffixes, forms of address, titles, functions, qualifications, professions, patronymics and nicknames.

* The name of an organization can consist of virtually everything: legal forms, fantasy words, natural language words, personal names, numbers, Roman numerals, ordinals, letters, acronyms, geographical indications, suffixes, articles, prepositions, conjunctions, indication of year of establishment and non-alphabetical signs.

* Postal Address data combine recipient information with delivery points: countries, regions, towns, districts, proximate towns, delivery service indicators, delivery service qualifiers, postcodes, addressee and mailee indicators, thoroughfare names, thoroughfare types,  house or plot  numbers, house number additions, building names, building types and delivery point access data, such as wing, floor or door.

* Complementary data used in business processes include: phone numbers, fax numbers, e-mail addresses, dates of birth, contract dates, social media account id’s, product and brand names, product codes, product numbers, gender indication, financial data, lifestyle data and transaction data.

Defining the data groups as precisely and as detailed as possible, is the first step towards useful interpretation. People, applying their natural language processing capabilities, structure the information as they interpret it. They will use their frame of reference, which includes their knowledge dictionary, their linguistic repository, statistical information and mathematical information.

Knowledge-based interpretation, incorporated in an automated system to solve data quality issues, must work in exactly the same way. Consider the following examples: Continue reading ‘Short question, complex answer: Who is who and what is what in your database?’

Has your name ever hurt you? – when nomen becomes omen

Addressing clients with the right data often means the difference between making a profit and not making a profit. Working with data quality experts has made me ever more consious of the value personal data represents for people. In this respect names are especially intriguing to me, as owners appear to identify with their name a lot. So I decided to do a little research and determine if people really are what their name tells you. Can nomen indeed become omen?

Your parents probably gave a lot of thought to the name they once gave you, and as it turns out they were right to do so! Research tells us a name can do wonders for its owner, as well as a lot of damage for that matter. Let’s have a look at some remarkable results.

Peter for President!
Recent studies show that in the US a student called Fred is more likely to fail his exam than a student who just happened to be named Andrew: people tend to indentify with their name and, in general, have a positive feeling about letters that correspond with their initials. Consequently Fred is far more likely to settle for a meager F, while Andrew will have an extra motive to strive for an A. Continue reading ‘Has your name ever hurt you? – when nomen becomes omen’

Marketing? – Let your ingredients interact!

 

Throughout the years Human Inference has carried out and supported research with regard to the importance, the impact and the perception of customer data quality in business environments. This reasearch shows that the phenomenon customer data quality is subject to a perception shift. In general, one could argue that, in the early years, data quality used to be perceived as “something that is being carried out by the IT-department”, whereas nowadays more and more companies and organizations are recognizing the importance of customer data and information quality. Issues and initiatives such as the value of a single customer view, data integration, fraud prevention, customer relationship management, operational risk management, compliance and anti-terrorism have become boardroom themes. Continue reading ‘Marketing? – Let your ingredients interact!’

New white paper: First Time Right – Turning your customer data into customer lifetime value

As promised in my previous post “First Time Right – The customer perspective“, I’m sending out this post to inform you about our new white paper. This paper describes the background, definition and business impact of the application of the First Time Right-principle in any organization. The First Time Right-principle is the basis of your upstream and downstream data management. Making sure that the input of data is correct, valid, complete and standardized, is the starting point of customer lifetime value. The application of the principle will take care of the quality of your data at the source, and will consequently have an increasingly positive effect on the total data quality in your organization.

The paper discusses business examples, the customer contact process, the reciprocity between people, process and technology, and the underlying concept of intelligent interpretation of customer data. In short, there are many ways to turn your data into customer lifetime value. The quickest, most efficient and most valuable is the implementation of the First Time Right-principle.

Please click here and dowwnload our white paper.

International data quality – Is a football always a football?

football 2

football 1High quality customer data have become the prerequisite for successful business decisions. In order to reach the intended data quality level, a lot of money is being invested in solutions for input control, file merging, data enrichment and duplicate identification. But do these investments guarantee high quality data and information? For example, are the data quality tools and processes equipped for the inevitable internationalization of our business community? Is a football always a football?

Natural language processing

Why do we know that William Jones International Logistics Ltd and W. Jones Int. Transport Co. are probably different notations for the same company? How do we determine that Leonard is a given name in Leonard Peters and a surname in Leonard & Peters? Without being all that aware of it, we are using methods such as pattern recognition, context analysis and other linguistic considerations. To answer the question ”what is what in customer data?” people will use their knowledge of language and culture to interpret the data they will encounter in daily life. Continue reading ‘International data quality – Is a football always a football?’

High precision matching – apples, oranges or fruit salad?

apples-oranges In his excellent post “New matching engines go beyond apples and oranges”, Winfried van Holland states that traditional matching engines are based on atomic string comparison functions, like match-codes, phonetic comparison, Levenshtein string distance and n-gram comparisons. He further argues that the drawback of these functions is that it’s not always clear for what purpose one needs to utilize a particular function, and that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names.

Good point! In essence, this is the basis of the discussion on the matching approach within customer data management: As intelligent automated matching of records distributed over various heterogeneous data sources is an essential pre-requisite for correct and adequate customer data integration, there are many opinions on how to achieve this.

In theories on data matching, there are in general two methods that prevail when customer data management is concerned: deterministic and probabilistic matching. Continue reading ‘High precision matching – apples, oranges or fruit salad?’