Short question, complex answer: Who is who and what is what in your database?


Any organization that deals with customer, prospect, supplier, distributor, product and service information, uses all kinds of data in their day-to-day business processes. Identification of a customer or a product within an automated system, using a specific id-number, the name or any other identifying feature, is a key issue in these processes. Furthermore, it is a task that needs considerable attention, since the collection and management of data is essentially error-prone. People make mistakes, names are understood incorrectly, numbers are typed in the wrong order; there are just too many reasons for defective data and poor information quality.

The collective term ‘business data’ is often used without a precise notion of what business data actually contain. It is not just the customer identification numbers and product codes. Naturally, the sort and the importance of data used in a business process will differ from organization to organization. However, a closer look at the seemingly endless variation will show that names and addresses of persons and organizations are as detailed and complicated as they are identifying. The following classification will show the details of names, addresses and complementary data.

* In personal names we will encounter: given (first) names, middle names, initials, surnames, surname prefixes, surname suffixes, forms of address, titles, functions, qualifications, professions, patronymics and nicknames.

* The name of an organization can consist of virtually everything: legal forms, fantasy words, natural language words, personal names, numbers, Roman numerals, ordinals, letters, acronyms, geographical indications, suffixes, articles, prepositions, conjunctions, indication of year of establishment and non-alphabetical signs.

* Postal Address data combine recipient information with delivery points: countries, regions, towns, districts, proximate towns, delivery service indicators, delivery service qualifiers, postcodes, addressee and mailee indicators, thoroughfare names, thoroughfare types,  house or plot  numbers, house number additions, building names, building types and delivery point access data, such as wing, floor or door.

* Complementary data used in business processes include: phone numbers, fax numbers, e-mail addresses, dates of birth, contract dates, social media account id’s, product and brand names, product codes, product numbers, gender indication, financial data, lifestyle data and transaction data.

Defining the data groups as precisely and as detailed as possible, is the first step towards useful interpretation. People, applying their natural language processing capabilities, structure the information as they interpret it. They will use their frame of reference, which includes their knowledge dictionary, their linguistic repository, statistical information and mathematical information.

Knowledge-based interpretation, incorporated in an automated system to solve data quality issues, must work in exactly the same way. Consider the following examples: Continue reading ‘Short question, complex answer: Who is who and what is what in your database?’

Rue sans nom – address certification in France

Despite the increasing use of email in the last decade in contacts between organizations and their relations; organizations still need to manage a huge amount of postal en geographical addresses.

In an era where costs reducing and compliance policies are daily issues it becomes essential for bulk mailers to have their relationship data as clean and accurate as possible.  The use of duplication detection tools in order to avoid pollution is definitely more effective if mailing addresses are correct.

In France, postal addresses must comply with the norm AFNOR XP Z 10-011 which is issued by the Association française de normalisation (AFNOR). This norm describes the rules for writing addresses. In order to get correct addresses the “Service national de l’adresse  (SNA) “  publishes postal reference data. With this data it is possible to validate your own data or to develop address correction and validation tools.

But La Poste/SNA takes this a step further and offers services and advantages to organizations with address data complying with the AFNOR norm. Therefore they have established a certification committee for address software. The realization of the certification is the responsibility of the SNA. Continue reading ‘Rue sans nom – address certification in France’

Data Value?

When I was attending the ECCMA Data Quality Solution Summit in October 2012, I got in an interesting discussion on the quality of a  specific customer data item. The actual point of the discussion was whether an address has quality when you are not aware of what the intended use and value of that particular address was. Intended use? Intended value?  Yes indeed!

As the importance of data quality management is becoming more and more obvious to organizations, the question is no longer “should I manage my data?”, but “how do I manage my data?”. In other words: What is the value of data in the context of the customer’s solution? How is the customer going to use the data? And what is the consequent value of the data for that particalur customer? Is the address mentioned in the discussion above going to be used for a geocoding solution or will it be a delivery address for a postal item?

I think that this is a very interesting way to look at data quality and data management. In his summit presentation, Walid el Abed of Global Data Excellence said that the value of data should be derived from the current or future outcome of the activities accomplished by using the data. In this context, he refers to a paradigm shift from KPI (key performance indicators) driven organizations to KVI (key value indicators) driven organizations.

I like that. At Human Inference we strive to enable organizations to benefit from personal and relevant interactions, based on trustworthy information. It is our “translation of bringing value to the customer’s data.  

International data quality

When I read Henrik Liliendahl Sørensen’s blog on cross border data quality, I made a mental note to write a follow-up blog, because his theme closely borders on a presentaion I am preparing for the 2012 ECCMA Data Quality Summit in Ocober. As it happened, the organization  committee of the summit asked me to write an article on my upcoming presentation, and so I thought I’d combine my efforts and use the article as input for this blog.

As Henrik pointed out, there are a lot of data and information quality aspects to consider when crossing the border, and they cannot be solved by using domestic tools such as a national change of address service. Organizations are investing substantial amounts of money to deal with issues and initiatives such as the value of a single customer view, data integration, fraud prevention, operational risk management and compliance. But how do these investments equip companies for the inevitable internationalization of our business community?

Apparently, a lot of companies doing business abroad often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. If we take a look at European contact data diversity, here are a couple of examples:

The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács are illustrative for the variety of names in Europe. These names all mean “Smith” in different countries. Naturally, there is a large variety of names in the US as well, but the rules and habits concerning structure, storage, exchange and representation are far more intricate in the various European countries. Think of the use of patronymics (Sergei Ivanovich Golubev and Olga Ivanovna Golubeva), prefix sorting and different significations for similar name components. An even greater challenge lies in the interpretation and processing of European postal addresses. The variety in address components and the differences in order and formatting of these components are extraordinary.

Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple languages, character sets, privacy issues, and different currency and date notation. Companies working with international data are highly dependent on understanding name specifics, address conventions, languages, code pages, culture, habits, business rules and legislation.

In my presentation during the 2012 ECCMA Data Quality Summit, I will address natural language processing methods and international name and address specifics. Furthermore, I will show some examples of the application of the insights with regard to international data quality. For more information, you can also check out our website.



I know where your house lives….

In an article I recently read, I came across a reference to a website called ‘We”. This website is billed as a social networking privacy experiment that has been designed to show what could happen when you tweet about being at home with locations enabled, particularly from a mobile device. On the homepage there is statement that says: Only the past hour of data is displayed, after that it is fully deleted to protect the users privacy. At first I thought that this was just another social media craze, but then I clicked on one the tweets that are displayed on the site. These tweets are shown in total, whereas in the address some crucial parts asterisked out.

It looks like this:

***risoncheney lives near *****ock Street, ***don, *** 0, United Kingdom @SeriyaEzig I don’t like being out in it but when your at home! It’s nice ha, and I love the winter! – about 14 minutes ago

When I clicked through, I found that the consequent information on the tweet was way more than I thought possible. Next to a a Google Street Map showing the vicinity of the address, nearby places and an Instagram photo selection, there was also a list of recently commited crimes in the area. A further statement said: All of this information is publicly accessible – is simply pulling it all together to show what could happen when you tweet with your location or check in to somewhere. This page will be deleted one hour after it was created to preserve the users privacy.


This is a summary of where the latest crimes that have been reported nearby this house. In total 1168 crimes were committed.


Tabernacle Avenue
Tabernacle Avenue
Celandine Way
Newham Way
Beauchamp Road
Sports/recreation Area
The Broadway
Holland Road
Arragon Road
Comyns Close
…and 70 more

» More detailed crime data

Credon Road
Boleyn Road
Gwendoline Avenue
Ismailia Road
Ismailia Road
Ismailia Road
Ismailia Road
Kings Court
Dacre Road
Bethell Avenue
…and 351 more

» More detailed crime data
Criminal damage

Ismailia Road
Kings Court
Addington Road
Daisy Road
Clarence Road
Crescent Road
Dundee Road
Durban Road
Durban Road
Durban Road
…and 50 more

» More detailed crime data

Tabernacle Avenue
Warmington Street
Tinto Road
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
…and 166 more

» More detailed crime data

Is this a good prank or is it a fair warning on the use of geolocation? I think the latter, because this experiment clearly shows the dangers of the ever present urge to “share information”. The title of my blog refers to a Dutch saying, which is used as a humourous warning for an imminent visit. Once you actually chosse to share your data, beware of the fact that they are really “out there”……. It is exactly as WeKnowYourHouse says: We’ll show what could happen when you tweet with your location or check in to somewhere…….


Single customer view for REAL customer interaction

Having worked in the data and information quality industry for quite some years now, I’ve noticed that our industry feels that there is an urgent need for new acronyms every couple of years. Here’s a small selection: CRM, ERP, BI, SaaS, CDI, MDM, FTR….. Are you still with me? If so, you have probably been in this business for a substantial amount of time as well. As these acronyms mysteriously or automagically gain and loose popularity, I am now convinced that they all, more or less, serve the same purpose: They intend to be the “theoretical foundation” for solution selling.

Organizations spend a lot of time on optimizing their production chain, their invoicing processes and the quality of their customer database(s). For this, all kinds of tools and systems are being used (and the corresponding acronyms become popular…;-) . Some of these tools and systems are really intelligent, but many times the actual purpose of the deployment of these means is lost in the process. We need to really interact with our customers to help them benefit from the solutions we offer! Of course, we will have to make all the necessary information for customer interaction (social media data, invoicing data, transaction history data, etc.) available for everyone involved at all times.

Eventually, we all want to personalize our customer interaction. Make it a human interaction. Build a relationship…… Well, I could go on explaining my views on this subject, but as it happens we have made this one-minute-movie that explains it much better. Check it out. It’s worth it!