Post codes: new tool or old school?

For a long time the Irish postal services were my favourite kind: they didn’t have post codes in Ireland because “we don’t need them: our sorting machines are so sophisticated that we can sort the mail by using the street and place names only.” At least, this is what I remember having seen on their website a long time ago. Post codes are for losers, it implied.

But last week the news was that the Irish government issued a tender for a national post code project. The new tool should make services more efficient and delivery cheaper. Surrender to the post code system, as in the rest of Europe.

Simultaneously there is a movement in the Netherlands which undermines the post code system. Post codes are not an “authentic” part of the “BAG”, the Common Key Registers for Addresses and Buildings. Government institutions in the Netherlands are forced by law to use address data from the BAG, but post codes are issued by the privatised postal operator PostNL; and they disagree sometimes. Some people have no post code because PostNL won’t issue one for them. Consequently, they are deprived from a lot of services requiring a post code, such as applying for government grants. When confronted with this problem, local authorities replied people should turn to using “BAG Id’s”: post codes will soon be “old school”.

Also posted on


People are strange(rs)

While I was reading Peter Hesselinks’s blog post, I felt an immediate urge to listen to The Doors, without a doubt one of the most influential rock bands of the last century. Listening to my iPod I came across another fitting “lyrics analogy”, which I found quite suitable as a title for this post….

The concept of recognizing and knowing your customer is, in essence, an ancient concept. Having a clear view of who your customer is and what he or she is actually buying (or intending to buy), has proven to be a serious business advantage over the years. You do not want your customer to feel like a stranger.

In the 1960’s it was quite common that the dairyman or the milkman would deliver from door to door. He usually knew how much milk and other products every family wanted. If he had accidentally delivered curdled milk, you would have made sure to tell him the next day. The milkman would almost automatically be informed if a family would move to another house or when it was some child’s birthday for which he consequently would have brought a special treat… This was a convenient and survivable business situation.

Nowadays, we live in a multi-channel society in which customers are used to do business in a variety of ways, which of course is far less transparent than the situation described above.

However, be it in a shop or through a website; essentially the current customer wishes are not that different than these of the customers of some fifty years ago: They still want to be recognized, they still prefer a personal approach, and they do not want to have to spend time informing you of a simple move or the purchase of another product.

But for the businesses serving that customer, a great deal has changed. Customer data is stored in a CRM system, complaints in the complaints database, the payment history in financial software and the order history in an ERP suite. This information fragmentation leads to problems with regard to the single customer view. And these problems impact virtually every area of the value chain of your business. From primary activities like inbound- and outbound logistics, marketing, sales and operations to supporting activities like procurement and human resources. Does the following list ring any bells?

  • Adding the same customer information manually in multiple databases
  • Building workarounds for customer data problems
  • Searching for missing data
  • Manually enriching customer data in one system or the same customer data in multiple systems
  • Assembling customer data across disintegrated databases

The solution to these problems lies in Master Data Management (MDM) of customer data. MDM enables companies to truly serve their customers by having the information they need at their fingertips, when they need it. Start your single customer view today.


Hello, I X U, Won’t you tell me your name?

This is a guest post by Peter Hesselink of SDL Content Management Technologies

You want to address visitors in a proper way. Preferably by calling them by their name, showing that you know them, make them feel welcome, indicate you want to have a dialogue with them.

Hello, I welcome you, won’t you tell me your name?

When people visit your site they remain anonymous until they ‘reveal’ themselves. Registering, logging in, etcetera. Until that time they stay visitors or even ‘strangers’.

Hello, I like you, won’t you tell me your name?

You can tell a lot about people by their behaviour, from their interaction with your website. And if you like what you are seeing, you really would like to get to know them personally. Maybe they are already known to you, your customer, but you don’t know that, are unaware of the fact or unable to ascertain.

They even may like you as well, and their ‘like’ makes it possible to reach them, but not at that moment, with a personalized message.

Hello, I know you, won’t you tell me your name?

This is what happens all the time: people revisit your website, which you (can) know thanks to the wonderful invention of cookies. The whole day a lot of UVO’s (Unidentified Visiting Objects – people, robots and crawlers) come to your website. And you know them, because you recognize them by their cookie or even IP address. You can even serve them personalized content, but still can not address them by their name. You do not really ‘know’ these unidentified individuals, these UVP’s who may be VIP’s to your company.

Now it can become embarrassing. You probably also have had the experience of meeting somebody who you have spoken to before, but cannot remember their name anymore, I have had these experiences … Because you met them in a different place, and/or long ago (your memory is not what it used to be anymore).

This can happen on your website: you know they are there, because they have (re-)registered or logged-in, or from previous behaviour or characteristics. But you do not ‘recognize’ them or show this by personalizing the website content.

This all can be caused due to the fact that information is being collected online, through the website, email, or other means, on different moments, stages and places in the customer journey. And stored in different systems and databases. The total system cannot ‘recollect’ it.

It becomes annoying when for instance a customer has to resubmit information which he or she has provided before. They become frustrated and dissatisfied.

Having to ask again … Or not knowing that it is the same person as the one already stored in your database …

A registration could do the trick, after having logged with their user name and password in you are better able to meet their expectations. Or by letting the customer provide unique, identifiable information, like a customer number or something alike. But this does not make it more (user) ‘friendly’.

Hello, I love you, won’t you tell me your name?

That only works when you are somebody like Jim Morrison of the Doors …

Ambient recognition

Data is being collected at several touch points, moments, implicit and explicit (watch out for another post about implicit and explicit profiling) et cetera and stored in different systems, databases and so on. Having a ‘single view of the customer’ is a challenge. But customers expect that companies have this. They hate it when they have the idea that the company does not ‘know’ them, cannot recollect it.

From back in the days I was working at Acxiom Corporation here in The Netherlands I know the challenge in getting the right name and address information and getting it right (the area of DQM and MDM).

Recently I attended an event of the Dutch Dialogue Market Association, titled Data & Dialogue, chaired by Holger Wandt, principal advisor at Human Inference, an expert in the field of Data Quality and Integration (Gartner thinks so as well, named HI in their Magic Quadrant for data quality tools  as a visionary). We spoke afterwards about these challenges.

So how to achieve this? Well, by best combining ‘all worlds’. In a next post on I will draw an outline of such a solution making use of Human Inference tools and the Tridion Ambient Data Framework (watch the video).

Cleaning Danish customer data with the CVR registry and DataCleaner, part 3 of 3

In my previous two blogs I’ve dealt with the background of the Danish CVR registry being made available to everyone (and more public registries to come) as well as how to do the initial pre-processing steps with DataCleaner to fetch CVR data about your customers. We ended up with a CSV file from the CVR service that looks like this:

In this blog entry I will go through the process of integrating this data in your customer database. Let’s first visit a few examples of why that would be relevant:

  • You can make sure that your address is up-to-date and that letters and invoices to your customer reach their intended destination.
  • You can add more communication channels. The CVR registry contains email, phone and other basic communication information. Having this information available will improve the capabilities and reach of your support or customer service teams.
  • You can make it really easy for your customer. Instead of asking the customer for all kinds of information again and again, you can enter the customer’s name and let the data quality infrastructure handle the rest.
  • You can gain valuable insights with Business Intelligence tools, integrating the customer industries, company size and other metadata about the companies – provided through CVR.

The first step is to register our new file as a datastore in DataCleaner. That part is easily done using the icon that says CSV on the front screen of the application. Once that is done, like previously, we will connect to our customer database with DataCleaner using the ‘Analyze’ button. What we want to do is technically to join the database table with the CSV file. For this we can use the ‘Table lookup’ component, found in the “Transformation” menu. Here’s how I’ve set up the Table lookup:

In the screenshot above you can see that I’ve set the condition to be that “cvr” from the database needs to be the same as the “cvrnr” field of the file. In the output I’m interested in getting the following fields from the CSV file: Name (“navn”), address (“adresse”), zip code (“postnr”), phone number (“telefon”) and email address.

Now let’s see a preview of this Table lookup transformation:

There’s a few interesting things to notice here:

  • Not all the CVR numbers yielded a result. The last record in the example above shows only  <null> values. This means that the CVR number was not found in the file. 
  • If we compare with the input, the names are not totally alike, but pretty close. The same holds for addresses, zip codes and so on. 
  • Potentially we could face a situation where the number itself had a typo in it. Then we would get another company’s information. That situation could be quite severe, so we would want to do some checks on e.g. the similarity of the name before we automatically update our customer database!

Let’s look at these issues step by step. To check the similarity, we want to get a score. A simple and free mechanism for similarity scoring / matching is available for DataCleaner through the “EasyDQ additionals” extension.

  • Go to the EasyDQ additionals page on the DataCleaner website.
  • Click the “Install” button on that page. DataCleaner will now flash and ask for permission to install. Give the permission.
  • Once installed, click “Transform -> EasyDQ additionals -> Similarity evaluator”.
  • Now set up the similarity check so that you compare e.g. names and addresses:

If we preview this transformation’s data, here’s what we see:

Notice the Similarity score to the right. This is a score of the textual differences between the different values (0 = no similarity, 100 = textual equality). Obviously, we can set up different business rules based on this score. As you can see even in basic company names, there might be quite a difference between what you already know and what the “official” truth is. Therefore it is normal to only automate up to a certain point – many of these updates need to be validated by a human being first (or richer matching capabilities, for instance provided by Human Inference’scommercial add-ons).

 Let’s suppose we want a business rule like this:

When similarity score is less than 40, there’s probably a typo in the registered CVR number or the related name and address. In such cases we want to raise an alert to the customer’s account managers or a data steward team.

  • When the similarity score is above 85, we want to automatically update the customer database.
  • Anything in between a similarity score of 40 and 85, we want to put it into a staging table where it can be reviewed and propagated by a data inspection team.

For the sake of this blog entry’s scope and length, I am only going to demonstrate the automatic update (similarity score above 85). The approach for handling the remaining subroutines should become visible through this example.

First, I will add a filter to check the value of the similarity score. Using this filter we will direct the further action to be taken:

Click “Transform -> Filter -> Number range”. 

  • On the configuration panel of the Number range filter, select the Similarity score column.
  • Set the “Lowest value” to 40, and the “Highest value” to 85.

  • Now click the “HIGHER” button and select “Write data… -> Update table”. 
  • On the configuration panel of the Update table component, select the customer database as the target datastore, and the schema and table of your customer data.
  • For the “Update condition” panel, select the id of both source and target to be mapped together. This will ensure that the update targets the same record as the input record.
  • For the “Values to update” panel, select the lookup values and map them back to the field names in the database.
  • After configuration, your “Update table” configuration panel will look like this:
Now if you click execute, you will automatically update the customer database with updated information from CVR! In many real-life scenarios, I would add additional information to the table as well. Obviously that requires a bit more, since you need to add new columns to the table definitions. Additional information would typically include a timestamp for the latest CVR check, a log of changes and/or overridden values and newly found information from the CVR export – such as industry type, phone number, email, number of employees and more. These choices are always important to think of, but depend a lot on the level of data management you wish to apply to your data.

I hope these three blog entries have been both educational and interesting to read. As more and more open data becomes available to everyone, I see a natural fit for open source software to provide the processing power to handle it. In the DataCleaner team we will follow this trend closely and provide you with the best tooling to utilize the universe of data that is rapidly becoming available in nations, industries, communities and cyberspace.



Cleaning Danish customer data with the CVR registry and DataCleaner, part 2 of 3

In my previous blog post we learned that the Danish government has initiated a progressive strategy to make basic public data freely available. Now let’s see how we can use this data to cleanse our data. I will be using DataCleaner to pre- and post-process data needed for the cleansing operation.

We’ll assume you have some customer database with (among other information) a CVR number for (some of) your customers. Here’s a small example MySQL database table that I will use in this blog:

Alternatively you can also use a “P number”, identifying the physical location of the company. As a preparation step we need to build a list of these numbers only, which will be sent to the CVR “Masseudtræk” (mass extraction) service.

Download DataCleaner, start it up and register your customer database as a datastore. We will connect to the MySQL database, containing customer information, but you should connect to whatever datastore your customer information resides in:

Once the database is registered as a datastore, select it with the checkbox to the left , and click the big “Analyze!” button. This will open up the job editing mode of DataCleaner. Expand the tree on the left side to expose your table structure. Here’s what this looks like on my example database:

To create the list of CVR numbers we only need the column containing this information. In my case, this is the “cvr” column. So I double click this column, which adds it to the “Source” tab in the right side of the screen. Before we export the list, we want to make sure we only get the Danish customers in our result. So I also add the country column:

To perform the exclusion of non-Danish countries, we need to use an Equals filter. Click “Transform -> Filter -> Equals”. This will add a new tab, in which we can select the column to do filtering on, as well as the number of values that we’re looking for. In my database it seems that the country column contains both “Denmark” and “DK” values, so I add them both:

With this filter applied, we’re actually ready to write the export file. We’re interested in all records that evaluate “VALID” in this filter (e.g. equal to either “DK” or “Denmark”), so click this button and select “Set as default requirement”. Now click the “Write data” item in the top menu, and select “Create CSV file”:

This will bring up a new tab for configuring the exported CSV file. Fill in a name of the CSV file, and select a whitespace as the “Quote character” (since the CVR system does not currently support quote characters in the input):

Now click the big “Execute” button. The job will run and produce a text file similar to:

Notice the initial “cvr” header. This is added by DataCleaner to keep track of column names, but in this context it is a little inconvenient, since the CVR system requires no headers. So delete that line.

Now you can go to the CVR website and click “Masseudtræk”. Upload your file using the web form, and wait for it to be processed:

Note: Lately I’ve experienced quite some downtime on this service. So if you get an error message at this point, it is not uncommon at the moment. I imagine the CVR registry will become more stable as soon as  they get used to the new load on their servers ;-)

When the file has been processed, the resulting company information will be available for you in your email inbox. Click the link in the email and download the file on your local disk. It will look something like this, if you open it in a text editor:

Now you have an up-to-date, high-quality file with company information for your customers. This includes both information that you already might have (like address information), but potentially also new data that you did not already have (like phone number, industry code, number of employees etc.).

In my next blog post I will go through the process of writing this information to your customer database – updating it with the most recent information, and enriching it with information that was never there before!

Cleaning Danish customer data with the CVR registry and DataCleaner, part 1 of 3


Recently the Danish government has initiated a strategy to open up and make freely available basic public master data on a wide range of areas such as company registrations (CVR), social security (CPR), real estate, employment, environment and more. The initiative’s name is “Good basic data for everyone – a driver for growth and effeciency” – read the full report here.

Over the coming years this means that public registries will be made available to the public. Of course privacy laws are still active and the data is exposed in a manner where access is only granted in relationship with specific search criteria. Still, you could say that this is a huge step toward liberalizing and making available public data for everyone.

At Human Inference we recognize the value of good data quality, and we believe that this initiative by the Danish government will do a lot of good for both the public sector and the private sector in Denmark. In the government report, it is estimated that more than 100 million EUR per year in costs will be saved for the private and public sector in Denmark.

The timeline of the initiative spans many years of work, and it will be interesting to see how the end result is shaped. A notion of a central Data Distributor service is included in the plans, which will make data access streamlined and uniform to further improve availability. Personally I think this is a major step forward, and I cannot wait for this plan to be realized.


The first steps have already been taken though! As per the first of January 2013, the price of accessing the CVR (company identification) registry was set to 0 DKR – free for everyone who accepts terms and agreements. CVR, in my opinion, is one of two very central registries when it comes to common data quality issues in Denmark. The other one is the CPR registry for person identification and social security, which according to plans will be opened in 2014 using the new Data Distributor platform.


So how do you start using the CVR data? In the my next blog post (which will appear net Monday) I will give you a tutorial on how to cleanse your company customer data with CVR, using DataCleaner for all the pre- and post-processing needed. So stay tuned for part 2, which will be a real hands-on walkthrough!