How-to create the Golden Record


The term Golden Record is closely related to Customer Data Integration or MDM for Customer data. It refers to the “single truth” which has been created or calculated from all those duplicate customer records from different systems. This post is not about finding or tagging all those duplicate records. There all kinds of ways to find them using advanced statistical methods, fuzzy matching etc.

But what do you once you have found the duplicates. How do you create the best possible customer data out of all gathered elements? Continue reading ‘How-to create the Golden Record’

Confusing streetnames ending in an unfortunate fatality

Just a few days ago I wrote about the many standards we have for streetnames in the Netherlands. But on top of that new streetnames are added constantly for newly build neighboorhoods. Sometimes this also results into changing of existing streetnames. This was also the case last week, when rescue people were not able to find the exact location in Putten. An emergency call was made for a 60 year old man, who suffered from heart failure. People who tried to re-animate the man heard the ambulance passing by, but they didn’t see the ambulance. The end result was that they arrived after 19 minutes and they were too late to save the man’s life. This is a very unfortunate accident and an investigation has been started to find out what exactly went wrong. Preliminary results shows that the navigition systems of both the police and the ambulance were not up-to-date.

I have looked at the location using Google Maps. Normally you expect that a street consists of one thoroughfare. But in this case the street, named “Kraakweg”, consists of three different parts, which are clearly not in one direct line. I have indicated it with 1, 2 and 3. Number 4 indicates another street, but with almost the same name “De Kraak”.


Continue reading ‘Confusing streetnames ending in an unfortunate fatality’

Any close encounters with the FBI terrorist watchlist?

tsc080105aJust before this summer the U.S. Department of Justice filed a report about the FBI Terrorist Watchlist. This watchtlist serves as a critical tool for screening and law enforcement personnel for alerting them when they come across a known or suspected terrorist. It is used by personnel at airports, harbours and the borderline. Also when you apply for a visum you are matched against this watchlist. The Terrorist Screening Center, a subsidiary of the FBI, is responsible for maintaining the watchlist.

This watchlist was created in 2004 from several other lists and at that time it consisted of about 68.000 entries. I use the word entries, because in the years after it became fuzzy if one record is the same as one individual. By the end of 2008 the list had grown to over 1,1 million entries. In 2008 after the American Civil Liberties Union (ACLU) mentioned that the list had passed the 1 million, the government came with an explanation. Although we have recorded over 1 million entries in the database, the net result is that these records correspond to about 400.000 individuals. Terrorist often use different and thus multiple identities, use several (falsified) passports etc. But adding entries with only the first initials and last name, while an entry of the full first names and last name already exists will result in unwanted side-effects. Continue reading ‘Any close encounters with the FBI terrorist watchlist?’

Bi-lingual streetnames in Amsterdam, do we really need it?

StraatnaambordSo once in a while I visit Amsterdam and have a drink or two in the centre. Afterwards I use the tram to get back to the hotel. This weekend I was quite surprised to find out that all the streetnames are announced in English, at each stop. The easy and obvious one is of course Centraal Station, which was translated to Central Station. I also can see how they came up with Rembrandt Square instead of Rembrandtsplein. But translating “Spui” to “Courtyard with a chapel” doesn’t help any tourists to find their destination. Continue reading ‘Bi-lingual streetnames in Amsterdam, do we really need it?’

WolframAlpha providing statistics about given names

wolframalpha_ramon1A week ago, the new search engine WolframAlpha has been launched. At first it was being compared to the search engine we all know, namely Google. It took bloggers, news editors and the rest of the world some time to understand that this is no search engine at all. WolframAlpha wants to become the Computational Knowledge Engine we all will be using. It is more like the Encarta application we older guys used in times when viewing a movie in poststamp-format used to be fun.

In fact, WolframAlpha has done a wonderful job. Enter any question you like and it will present nice formatted answers, illustrated with diagrams and links to sources for further research. It is interesting to see how several sources are combined and presented very clearly. Continue reading ‘WolframAlpha providing statistics about given names’

Dutch comedians making fun of names

ankeiler_201_programma_info_tcm8-72662The dutch radio program Andersmans Veren (Radio 2, AVRO) broadcasted a special on names on February 15th. Although not related to any burning data quality issues it is almost two hours of fun to listen to. From Theo Maassen to Toon Hermans, and of course Hans Teeuwen with his famous sketch. Point your browser to the radio stream or download it as a podcast. Note: only suited for people who understand Dutch.