Post codes: new tool or old school?

For a long time the Irish postal services were my favourite kind: they didn’t have post codes in Ireland because “we don’t need them: our sorting machines are so sophisticated that we can sort the mail by using the street and place names only.” At least, this is what I remember having seen on their website a long time ago. Post codes are for losers, it implied.

But last week the news was that the Irish government issued a tender for a national post code project. The new tool should make services more efficient and delivery cheaper. Surrender to the post code system, as in the rest of Europe.

Simultaneously there is a movement in the Netherlands which undermines the post code system. Post codes are not an “authentic” part of the “BAG”, the Common Key Registers for Addresses and Buildings. Government institutions in the Netherlands are forced by law to use address data from the BAG, but post codes are issued by the privatised postal operator PostNL; and they disagree sometimes. Some people have no post code because PostNL won’t issue one for them. Consequently, they are deprived from a lot of services requiring a post code, such as applying for government grants. When confronted with this problem, local authorities replied people should turn to using “BAG Id’s”: post codes will soon be “old school”.

Also posted on www.humaninference.de

 

Short question, complex answer: Who is who and what is what in your database?

 

Any organization that deals with customer, prospect, supplier, distributor, product and service information, uses all kinds of data in their day-to-day business processes. Identification of a customer or a product within an automated system, using a specific id-number, the name or any other identifying feature, is a key issue in these processes. Furthermore, it is a task that needs considerable attention, since the collection and management of data is essentially error-prone. People make mistakes, names are understood incorrectly, numbers are typed in the wrong order; there are just too many reasons for defective data and poor information quality.

The collective term ‘business data’ is often used without a precise notion of what business data actually contain. It is not just the customer identification numbers and product codes. Naturally, the sort and the importance of data used in a business process will differ from organization to organization. However, a closer look at the seemingly endless variation will show that names and addresses of persons and organizations are as detailed and complicated as they are identifying. The following classification will show the details of names, addresses and complementary data.

* In personal names we will encounter: given (first) names, middle names, initials, surnames, surname prefixes, surname suffixes, forms of address, titles, functions, qualifications, professions, patronymics and nicknames.

* The name of an organization can consist of virtually everything: legal forms, fantasy words, natural language words, personal names, numbers, Roman numerals, ordinals, letters, acronyms, geographical indications, suffixes, articles, prepositions, conjunctions, indication of year of establishment and non-alphabetical signs.

* Postal Address data combine recipient information with delivery points: countries, regions, towns, districts, proximate towns, delivery service indicators, delivery service qualifiers, postcodes, addressee and mailee indicators, thoroughfare names, thoroughfare types,  house or plot  numbers, house number additions, building names, building types and delivery point access data, such as wing, floor or door.

* Complementary data used in business processes include: phone numbers, fax numbers, e-mail addresses, dates of birth, contract dates, social media account id’s, product and brand names, product codes, product numbers, gender indication, financial data, lifestyle data and transaction data.

Defining the data groups as precisely and as detailed as possible, is the first step towards useful interpretation. People, applying their natural language processing capabilities, structure the information as they interpret it. They will use their frame of reference, which includes their knowledge dictionary, their linguistic repository, statistical information and mathematical information.

Knowledge-based interpretation, incorporated in an automated system to solve data quality issues, must work in exactly the same way. Consider the following examples: Continue reading ‘Short question, complex answer: Who is who and what is what in your database?’

International data quality

When I read Henrik Liliendahl Sørensen’s blog on cross border data quality, I made a mental note to write a follow-up blog, because his theme closely borders on a presentaion I am preparing for the 2012 ECCMA Data Quality Summit in Ocober. As it happened, the organization  committee of the summit asked me to write an article on my upcoming presentation, and so I thought I’d combine my efforts and use the article as input for this blog.

As Henrik pointed out, there are a lot of data and information quality aspects to consider when crossing the border, and they cannot be solved by using domestic tools such as a national change of address service. Organizations are investing substantial amounts of money to deal with issues and initiatives such as the value of a single customer view, data integration, fraud prevention, operational risk management and compliance. But how do these investments equip companies for the inevitable internationalization of our business community?

Apparently, a lot of companies doing business abroad often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. If we take a look at European contact data diversity, here are a couple of examples:

The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács are illustrative for the variety of names in Europe. These names all mean “Smith” in different countries. Naturally, there is a large variety of names in the US as well, but the rules and habits concerning structure, storage, exchange and representation are far more intricate in the various European countries. Think of the use of patronymics (Sergei Ivanovich Golubev and Olga Ivanovna Golubeva), prefix sorting and different significations for similar name components. An even greater challenge lies in the interpretation and processing of European postal addresses. The variety in address components and the differences in order and formatting of these components are extraordinary.

Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple languages, character sets, privacy issues, and different currency and date notation. Companies working with international data are highly dependent on understanding name specifics, address conventions, languages, code pages, culture, habits, business rules and legislation.

In my presentation during the 2012 ECCMA Data Quality Summit, I will address natural language processing methods and international name and address specifics. Furthermore, I will show some examples of the application of the insights with regard to international data quality. For more information, you can also check out our website.