Post codes: new tool or old school?

For a long time the Irish postal services were my favourite kind: they didn’t have post codes in Ireland because “we don’t need them: our sorting machines are so sophisticated that we can sort the mail by using the street and place names only.” At least, this is what I remember having seen on their website a long time ago. Post codes are for losers, it implied.

But last week the news was that the Irish government issued a tender for a national post code project. The new tool should make services more efficient and delivery cheaper. Surrender to the post code system, as in the rest of Europe.

Simultaneously there is a movement in the Netherlands which undermines the post code system. Post codes are not an “authentic” part of the “BAG”, the Common Key Registers for Addresses and Buildings. Government institutions in the Netherlands are forced by law to use address data from the BAG, but post codes are issued by the privatised postal operator PostNL; and they disagree sometimes. Some people have no post code because PostNL won’t issue one for them. Consequently, they are deprived from a lot of services requiring a post code, such as applying for government grants. When confronted with this problem, local authorities replied people should turn to using “BAG Id’s”: post codes will soon be “old school”.

Also posted on


People are strange(rs)

While I was reading Peter Hesselinks’s blog post, I felt an immediate urge to listen to The Doors, without a doubt one of the most influential rock bands of the last century. Listening to my iPod I came across another fitting “lyrics analogy”, which I found quite suitable as a title for this post….

The concept of recognizing and knowing your customer is, in essence, an ancient concept. Having a clear view of who your customer is and what he or she is actually buying (or intending to buy), has proven to be a serious business advantage over the years. You do not want your customer to feel like a stranger.

In the 1960’s it was quite common that the dairyman or the milkman would deliver from door to door. He usually knew how much milk and other products every family wanted. If he had accidentally delivered curdled milk, you would have made sure to tell him the next day. The milkman would almost automatically be informed if a family would move to another house or when it was some child’s birthday for which he consequently would have brought a special treat… This was a convenient and survivable business situation.

Nowadays, we live in a multi-channel society in which customers are used to do business in a variety of ways, which of course is far less transparent than the situation described above.

However, be it in a shop or through a website; essentially the current customer wishes are not that different than these of the customers of some fifty years ago: They still want to be recognized, they still prefer a personal approach, and they do not want to have to spend time informing you of a simple move or the purchase of another product.

But for the businesses serving that customer, a great deal has changed. Customer data is stored in a CRM system, complaints in the complaints database, the payment history in financial software and the order history in an ERP suite. This information fragmentation leads to problems with regard to the single customer view. And these problems impact virtually every area of the value chain of your business. From primary activities like inbound- and outbound logistics, marketing, sales and operations to supporting activities like procurement and human resources. Does the following list ring any bells?

  • Adding the same customer information manually in multiple databases
  • Building workarounds for customer data problems
  • Searching for missing data
  • Manually enriching customer data in one system or the same customer data in multiple systems
  • Assembling customer data across disintegrated databases

The solution to these problems lies in Master Data Management (MDM) of customer data. MDM enables companies to truly serve their customers by having the information they need at their fingertips, when they need it. Start your single customer view today.


International data quality

When I read Henrik Liliendahl Sørensen’s blog on cross border data quality, I made a mental note to write a follow-up blog, because his theme closely borders on a presentaion I am preparing for the 2012 ECCMA Data Quality Summit in Ocober. As it happened, the organization  committee of the summit asked me to write an article on my upcoming presentation, and so I thought I’d combine my efforts and use the article as input for this blog.

As Henrik pointed out, there are a lot of data and information quality aspects to consider when crossing the border, and they cannot be solved by using domestic tools such as a national change of address service. Organizations are investing substantial amounts of money to deal with issues and initiatives such as the value of a single customer view, data integration, fraud prevention, operational risk management and compliance. But how do these investments equip companies for the inevitable internationalization of our business community?

Apparently, a lot of companies doing business abroad often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. If we take a look at European contact data diversity, here are a couple of examples:

The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács are illustrative for the variety of names in Europe. These names all mean “Smith” in different countries. Naturally, there is a large variety of names in the US as well, but the rules and habits concerning structure, storage, exchange and representation are far more intricate in the various European countries. Think of the use of patronymics (Sergei Ivanovich Golubev and Olga Ivanovna Golubeva), prefix sorting and different significations for similar name components. An even greater challenge lies in the interpretation and processing of European postal addresses. The variety in address components and the differences in order and formatting of these components are extraordinary.

Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple languages, character sets, privacy issues, and different currency and date notation. Companies working with international data are highly dependent on understanding name specifics, address conventions, languages, code pages, culture, habits, business rules and legislation.

In my presentation during the 2012 ECCMA Data Quality Summit, I will address natural language processing methods and international name and address specifics. Furthermore, I will show some examples of the application of the insights with regard to international data quality. For more information, you can also check out our website.



I know where your house lives….

In an article I recently read, I came across a reference to a website called ‘We”. This website is billed as a social networking privacy experiment that has been designed to show what could happen when you tweet about being at home with locations enabled, particularly from a mobile device. On the homepage there is statement that says: Only the past hour of data is displayed, after that it is fully deleted to protect the users privacy. At first I thought that this was just another social media craze, but then I clicked on one the tweets that are displayed on the site. These tweets are shown in total, whereas in the address some crucial parts asterisked out.

It looks like this:

***risoncheney lives near *****ock Street, ***don, *** 0, United Kingdom @SeriyaEzig I don’t like being out in it but when your at home! It’s nice ha, and I love the winter! – about 14 minutes ago

When I clicked through, I found that the consequent information on the tweet was way more than I thought possible. Next to a a Google Street Map showing the vicinity of the address, nearby places and an Instagram photo selection, there was also a list of recently commited crimes in the area. A further statement said: All of this information is publicly accessible – is simply pulling it all together to show what could happen when you tweet with your location or check in to somewhere. This page will be deleted one hour after it was created to preserve the users privacy.


This is a summary of where the latest crimes that have been reported nearby this house. In total 1168 crimes were committed.


Tabernacle Avenue
Tabernacle Avenue
Celandine Way
Newham Way
Beauchamp Road
Sports/recreation Area
The Broadway
Holland Road
Arragon Road
Comyns Close
…and 70 more

» More detailed crime data

Credon Road
Boleyn Road
Gwendoline Avenue
Ismailia Road
Ismailia Road
Ismailia Road
Ismailia Road
Kings Court
Dacre Road
Bethell Avenue
…and 351 more

» More detailed crime data
Criminal damage

Ismailia Road
Kings Court
Addington Road
Daisy Road
Clarence Road
Crescent Road
Dundee Road
Durban Road
Durban Road
Durban Road
…and 50 more

» More detailed crime data

Tabernacle Avenue
Warmington Street
Tinto Road
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
Petrol Station
…and 166 more

» More detailed crime data

Is this a good prank or is it a fair warning on the use of geolocation? I think the latter, because this experiment clearly shows the dangers of the ever present urge to “share information”. The title of my blog refers to a Dutch saying, which is used as a humourous warning for an imminent visit. Once you actually chosse to share your data, beware of the fact that they are really “out there”……. It is exactly as WeKnowYourHouse says: We’ll show what could happen when you tweet with your location or check in to somewhere…….


Boundless Search

We live with restrictions every day.

  • A rafter blocks my cellar stairs, so I always bend when I enter.
  • At the end of the street a barking dog runs to the fence whenever I pass, so Ialways cross the street just before I reach the dog.

We learn to live with restrictions and they become a habit. After a while you just stopAngry dog realizing why you do things the way you do. The rafter has been removed, and the dog has died. Then why do I still bend on the cellar stairs, and why do I still cross the street before I reach the end? Recently I was confronted with similar obsolete restrictions at Human Inference customer support.

A rafter is visible and a barking dog can be heard, so it doesn’t take long before my habits change to fit to the new situation. It’s different however for technical restrictions of which you never get to know that they have disappeared. When I build descriptions for more than a million source records in a SQL Server database, I automatically switch from the free SQL Server Express database to an Enterprise edition. A customer decided to build descriptions for 7 million source records in a SQL Server Express edition. I was rather surprised the build was successful at the end. It turned out that in SQL Server Express 2008 the maximum database size is 10 GB as compared to SSE 2005 having 4GB. As It turned out I had been crossing the street for years to hide for a dead dog.

Searching with HIquality Identify

HIquality Identify is used for search, deduplication and data matching. To retrieve fast search results, subsets are used to preselect records for evaluation. Before the actual search, the maximum number of evaluations as set in the configuration is checked. When a subset exceeds this maximum, it is skipped. When all subsets are skipped, a message is returned indicating that not enough search data was entered. Searching for Müller with house number 9 in the whole of Germany for example will preselect too many candidates, and is, even when results are found, not very useful. Besides the number of candidates per subset, the total of evaluations in all subsets is checked with the maximum. This maximum is not allowed to be above the magical limit of 32768.

Continue reading ‘Boundless Search’

We make ‘null’ mistakes

Wherever software is created, mistakes are being made. Software providers often presume their products are bug-free, but software of that kind doesn’t exist. Our departments works hard to prevent it, yet in our HIquality Life Cycle new bugs could still be introduced, even in the oldest modules that have been in use for over 25 years already. 

HIquality bug cycle

Usually our customers are satisfied with our product suite. At customer support I never receive information about the successful implementations. I got to know our software through the problems that occur, and in almost 15 years of acceptance testing and customer support, I’ve seen all kind of bugs passing by.
HIquality bug cycleSoftware crashes and never ending loops are nasty. Worse are those bugs that are not that visible in the beginning, but keep on growing in the course of time.
Recently we caught such a bug in our longest existing product HIquality Identify. Continue reading ‘We make ‘null’ mistakes’