<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Value Talk &#187; address standardization</title>
	<atom:link href="http://datavaluetalk.com/tag/address-standardization/feed/" rel="self" type="application/rss+xml" />
	<link>http://datavaluetalk.com</link>
	<description>Customer data is a valuable asset. Why not treat it that way?</description>
	<lastBuildDate>Thu, 10 May 2012 14:49:53 +0000</lastBuildDate>
	<language>nl</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>How being focused can blur your vision</title>
		<link>http://datavaluetalk.com/data-quality/how-being-focused-can-blur-your-vision/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=how-being-focused-can-blur-your-vision</link>
		<comments>http://datavaluetalk.com/data-quality/how-being-focused-can-blur-your-vision/#comments</comments>
		<pubDate>Mon, 01 Nov 2010 08:50:02 +0000</pubDate>
		<dc:creator>Paul Drenth</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[naming confusion]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1567</guid>
		<description><![CDATA[In our company we are all very dedicated to serving our customers with their business problems with bad quality customer master data. Aren’t we all? A few days ago, one of our customer support desk engineers sought an answer to what happens with the addresses on the islands of the former Netherlands Antilles. See also [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1575" title="sintmaarten 2" src="http://datavaluetalk.com/cms/wp-content/uploads/2010/11/sintmaarten-2-150x150.jpg" alt="sintmaarten 2" width="150" height="150" /></p>
<p>In our company we are all very dedicated to serving our customers with their business problems with bad quality customer master data. Aren’t we all?</p>
<p>A few days ago, one of our customer support desk engineers sought an answer to what happens with the addresses on the islands of the former Netherlands Antilles. See also my previous post <a href="http://datavaluetalk.com/2010/09/14/the-dissolution-of-a-nation/" target="_blank">The dissolution of a nation</a>. Kids of my generation had to memorize the names of these islands at primary school: the ABC islands – Aruba, Bonaire, and Curaçao – and the three islands with an “S”: Saba, Sint Eustatius and Sint Maarten. My colleague, now fully internet savvy, wanted to look up an address on Sint Maarten. Why not use an internet map and type “sint maarten”?</p>
<p>Yes, here it is! They even have a Spar supermarket there (just like home), and the address of this supermarket shows a postal code! A postal code with the same structure as in the Netherlands (NNNN AA). Pleased with this catch, he started to compose an answer to the customer.</p>
<p>Just before sending it, I passed his desk and we started talking about this (the topic has my attention, you know). And he showed me the map proving his arguments: the coastline was near. But when we zoomed out, the picture became clearer: tunnel vision obscured that he had been focused on Sint Maarten near the Dutch coast!</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/how-being-focused-can-blur-your-vision/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The value of Christmas cards</title>
		<link>http://datavaluetalk.com/data-quality/the-value-of-christmas-cards/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-value-of-christmas-cards</link>
		<comments>http://datavaluetalk.com/data-quality/the-value-of-christmas-cards/#comments</comments>
		<pubDate>Thu, 03 Dec 2009 07:58:07 +0000</pubDate>
		<dc:creator>Ron Mulderij</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[christmas cards]]></category>
		<category><![CDATA[CRM]]></category>
		<category><![CDATA[CRM-system]]></category>
		<category><![CDATA[customer view]]></category>
		<category><![CDATA[marketing]]></category>
		<category><![CDATA[process management]]></category>
		<category><![CDATA[validation]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1258</guid>
		<description><![CDATA[Every year when autumn comes the assistants of the sales department get a little nervous. They know what will happen in short term. It’s almost Christmas and the selections of contacts to receive a Christmas card have to be made. Every year it’s the same. First the selections for every account manager are made and [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1260" title="christmas tree" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/12/christmas-tree.jpg" alt="christmas tree" width="122" height="122" /></p>
<p>Every year when autumn comes the assistants of the sales department get a little nervous. They know what will happen in short term. It’s almost Christmas and the selections of contacts to receive a Christmas card have to be made.</p>
<p>Every year it’s the same. First the selections for every account manager are made and they will have to check manually if these are correct. This year will be the same as ever, which means that:</p>
<ul>
<li>relevant companies and contacts are missing</li>
<li>new companies and contact persons will be added</li>
<li>contact persons will be deleted</li>
<li>contact persons will be transferred to their new company</li>
<li>addresses appear to be not up-to-date<span id="more-1258"></span></li>
</ul>
<p>This is one of the best audits to validate the quality of your data. The number of additions, modifications and deletions indicates how accurate your employees have maintained the data in the CRM system. I think that most of us recognize the problem and maybe are part of the cycle themselves. Moreover, it clearly shows where your data management procedures fail. Most likely the account managers are expected to maintain the data themselves, which they don’t see as their primary task. If not account managers have to maintain the data the sales support team usually has to do so. But who will give them the required input? Only accurate data entry, actively monitoring and periodically checking against external sources will help to keep your CRM system up-to-date.</p>
<p>In general the Christmas card has little value for your relation with your contact persons, but such an intense mass mailing is very profitable for your data quality. Until your procedures are in place, keep sending Christmas cards to optimize your <a title="data quality" href="http://www.humaninference.com" target="_blank">data quality</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/the-value-of-christmas-cards/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Confusing streetnames ending in an unfortunate fatality</title>
		<link>http://datavaluetalk.com/data-quality/confusing-streetnames-ending-in-an-unfortunate-fatality/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=confusing-streetnames-ending-in-an-unfortunate-fatality</link>
		<comments>http://datavaluetalk.com/data-quality/confusing-streetnames-ending-in-an-unfortunate-fatality/#comments</comments>
		<pubDate>Fri, 21 Aug 2009 06:35:01 +0000</pubDate>
		<dc:creator>Ramon de Noronha</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[naming confusion]]></category>
		<category><![CDATA[street names]]></category>
		<category><![CDATA[toponymics]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1141</guid>
		<description><![CDATA[Just a few days ago I wrote about the many standards we have for streetnames in the Netherlands. But on top of that new streetnames are added constantly for newly build neighboorhoods. Sometimes this also results into changing of existing streetnames. This was also the case last week, when rescue people were not able to [...]]]></description>
			<content:encoded><![CDATA[<p>Just a few days ago I wrote about the many standards we have for streetnames in the Netherlands. But on top of that new streetnames are added constantly for newly build neighboorhoods. Sometimes this also results into changing of existing streetnames. This was also the case last week, when rescue people were not able to find the exact location in Putten. An emergency call was made for a 60 year old man, who suffered from heart failure. People who tried to re-animate the man heard the ambulance passing by, but they didn&#8217;t see the ambulance. The end result was that they arrived after 19 minutes and they were too late to save the man&#8217;s life. This is a very unfortunate accident and an investigation has been started to find out what exactly went wrong. Preliminary results shows that the navigition systems of both the police and the ambulance were not up-to-date.</p>
<p>I have looked at the location using Google Maps. Normally you expect that a street consists of one thoroughfare. But in this case the street, named &#8220;Kraakweg&#8221;, consists of three different parts, which are clearly not in one direct line. I have indicated it with 1, 2 and 3. Number 4 indicates another street, but with almost the same name &#8220;De Kraak&#8221;.</p>
<p><img class="aligncenter size-large wp-image-1142" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/08/kraakweg1234-1024x539.jpg" alt="kraakweg1234" width="606" height="362" /></p>
<p> <span id="more-1141"></span></p>
<p>To make things even more confusing. The &#8220;Kraakweg&#8221; used to be called the &#8220;Stenenkamerseweg&#8221;. The &#8220;Stenenkamerseweg&#8221; still exists, but is more than one kilometer from the &#8220;Kraakweg&#8221;. Double streetname signs are used in this area.<img class="size-full wp-image-1143 alignleft" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/08/kraakweg_stenenkamerseweg.jpg" alt="kraakweg_stenenkamerseweg" width="399" height="198" /> So if your are not familiar with the neighbourhood you are completely left out in the dark whether you are at the &#8220;Stenenkamersweg&#8221; or at the &#8220;Kraakweg&#8221;.</p>
<p><em>Photo taken by Ruben Schipper. </em>Residents have been complaining over and over about the confusing situation and roadsigns. Hopefully now it will be settled for once and for all.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/confusing-streetnames-ending-in-an-unfortunate-fatality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bi-lingual streetnames in Amsterdam, do we really need it?</title>
		<link>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=bi-lingual-streetnames-in-amsterdam-do-we-really-need-it</link>
		<comments>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 08:10:47 +0000</pubDate>
		<dc:creator>Ramon de Noronha</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[persistent identification]]></category>
		<category><![CDATA[standardization]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1110</guid>
		<description><![CDATA[So once in a while I visit Amsterdam and have a drink or two in the centre. Afterwards I use the tram to get back to the hotel. This weekend I was quite surprised to find out that all the streetnames are announced in English, at each stop. The easy and obvious one is of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1111" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/08/Straatnaambord.jpg" alt="Straatnaambord" width="305" height="137" />So once in a while I visit Amsterdam and have a drink or two in the centre. Afterwards I use the tram to get back to the hotel. This weekend I was quite surprised to find out that all the streetnames are announced in English, at each stop. The easy and obvious one is of course Centraal Station, which was translated to Central Station. I also can see how they came up with Rembrandt Square instead of Rembrandtsplein. But translating &#8220;Spui&#8221; to &#8220;Courtyard with a chapel&#8221; doesn&#8217;t help any tourists to find their destination.<span id="more-1110"></span></p>
<p>In Holland we already have three officially approved manners of naming streets an addresses. Nowadays we have the TNT Post standard, based on the very first publication of the postal code book, and afterwards corrected several times. This naming convention was the basis for the NEN 5825 standard (NEN is the dutch variant of ISO). But the true source of streetnames comes from the municipality and is called &#8220;Raadsbesluit&#8221;.Due to the different versions this can easily result in five different ways of spelling of the same street, as the example below shows:</p>
<table border="0" cellspacing="1" cellpadding="2" width="85%" align="center">
<tbody>
<tr>
<td><span style="font-size: x-small;">Original Postal Code Book (1978):</span></td>
<td><span style="font-size: x-small;">s en schepenenstr</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Corrected TNT Post &#8211; standard<br />
</span></td>
<td><span style="font-size: x-small;">schout en s str</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NEN- 5825 standard, version 1991: </span></td>
<td><span style="font-size: x-small;">Schout en Schepenenstr</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NEN- 5825 standard, version 2002: </span></td>
<td><span style="font-size: x-small;">Sch en Schepenenstraat</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Raadsbesluit:<br />
</span></td>
<td><span style="font-size: x-small;">Schout en Schepenenstraat</span></td>
</tr>
</tbody>
</table>
<p>What do you think, should we add a new &#8220;English&#8221; standard to existing standards. What are the pro&#8217;s and con&#8217;s for having English labels of the streetnames, please add your opinion in the comments. Should we also replace all signs and add the English label for the streetnames?  For more information and history of Dutch street names I recommend the following site <em><a title="Alles over straatnaam" href="http://www.allesoverstraatnamen.nl/" target="_blank">&#8220;alles over straatnamen&#8221;</a></em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Toponymic confusion revisited</title>
		<link>http://datavaluetalk.com/data-quality/toponymic-confusion-revisited/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=toponymic-confusion-revisited</link>
		<comments>http://datavaluetalk.com/data-quality/toponymic-confusion-revisited/#comments</comments>
		<pubDate>Wed, 27 May 2009 13:08:07 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[Chargoggagoggmanchauggagoggchaubunagungamaugg]]></category>
		<category><![CDATA[Chaubunagungamaug]]></category>
		<category><![CDATA[longest lake name]]></category>
		<category><![CDATA[longest place name]]></category>
		<category><![CDATA[Nipmuc]]></category>
		<category><![CDATA[standardisation]]></category>
		<category><![CDATA[standardization]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=969</guid>
		<description><![CDATA[The local authorities in the town of Webster, Massachusetts are planning to change the road signs that lead to the local lake. The sign leads to lake &#8220;Chargoggagoggmanchaoggagoggchaubunaguhgamaugg&#8221;, but it should actually lead to &#8220;Chargoggagoggmanchauggagoggchaubunagungamaugg&#8221;. According to the Guiness Book of Records, the name of the lake is the fifth longest word in the world [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-thumbnail wp-image-982" title="chaubunagungamaug_lake_sign6" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/05/chaubunagungamaug_lake_sign6-150x150.jpg" alt="chaubunagungamaug_lake_sign6" width="150" height="150" /></p>
<p>The local authorities in the town of Webster, Massachusetts are planning to change the road signs that lead to the local lake. The sign leads to lake &#8220;Chargoggagoggmancha<span style="color: #ff0000;">o</span>ggagoggchaubunagu<span style="color: #ff0000;">h</span>gamaugg&#8221;, but it should actually lead to &#8220;Chargoggagoggmancha<span style="color: #ff0000;">u</span>ggagoggchaubunagu<span style="color: #ff0000;">n</span>gamaugg&#8221;.</p>
<p>According to the Guiness Book of Records, the name of the lake is the fifth longest word in the world and  the longest lake name anywhere. The name originates from the local language of  the Nipmuc indians. Freely translated,  the name means &#8220;You fish on your side, I fish on my side and nobody fishes in the middle of the lake&#8221;.  A nice example of native Amercican <em>divide and conquer</em>&#8230;</p>
<p>The interesting bit, however, is that there are 26 spelling variations of the name in the US Geographic Names System and that none of these variations match the actual road signs.</p>
<p>Naturally, the authorities could spend time and money to find out how these mistakes have been brought about. I think, however, that an investment in standardisation would be a much wiser choice.</p>
<p>This example is of course rather extraordinary and the discriminating value of &#8220;Chargoggagoggmancha<span style="color: #000000;">u</span>ggagoggchaubunagu<span style="color: #000000;">n</span>gamaugg&#8221; is quite high. But different spelling of geographical items will eventually lead to toponymic confusion (see my <a href="http://datavaluetalk.com/2009/02/13/toponymic-confusion/" target="_self">blogpost</a> earlier this year).  Apparently, the inhabitants of Webster call the  lake  &#8220;Lake Webster&#8221;. I wonder whether that has got something to do with the pronunciation of Chargoggagoggmancha<span style="color: #000000;">u</span>ggagoggchaubunagu<span style="color: #000000;">n</span>gamaugg&#8230;?</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/toponymic-confusion-revisited/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>The names they are a-changin&#8217;</title>
		<link>http://datavaluetalk.com/data-quality/the-names-they-are-a-changin/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-names-they-are-a-changin</link>
		<comments>http://datavaluetalk.com/data-quality/the-names-they-are-a-changin/#comments</comments>
		<pubDate>Thu, 15 Jan 2009 13:08:33 +0000</pubDate>
		<dc:creator>Emil van den Berg</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[new street name]]></category>
		<category><![CDATA[new street names]]></category>
		<category><![CDATA[outdated street names]]></category>
		<category><![CDATA[street name change]]></category>
		<category><![CDATA[street names change]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=481</guid>
		<description><![CDATA[Recently, a Frisian municipality decided to use Frisian names for the localities and streets, in stead of their Dutch versions. (Frisian is a language that is spoken in the province Friesland, in the north of the Netherlands.) In some cases, this means only a little change in the street type; so for example, &#8216;Van Sytzamaweg&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, a <a href="http://www.dantumadeel.nl/web/Mediaitem/Definitieve-lijst-Frysktalige-straatnamen.htm" target="_blank">Frisian municipality</a> decided to use Frisian names for the localities and streets, in stead of their Dutch versions. (Frisian is a language that is spoken in the province Friesland, in the north of the Netherlands.)</p>
<p>In some cases, this means only a little change in the street type; so for example, &#8216;Van Sytzamaweg&#8217; is changed to &#8216;Van Sytzamawei&#8217;. In other cases the resemblance is only knowledgeable for people who know both Dutch and Frisian: &#8216;Spreeuwenstraat&#8217; is changed to &#8216;Protterstrjitte&#8217;.</p>
<p><img class="alignnone size-full wp-image-487" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/01/bordeaux_street.jpg" alt="bordeaux_street" width="370" height="346" />Language preference issues like this form one of the reasons why streets or localities get a new name. There are also some other situations: sometimes people give a street a new name because they want to remember someone; sometimes it is the reverse: they want to forget the person who was in the old name.<span id="more-481"></span></p>
<p>Name changes can also occur if two localities are merged into one locality; in such situations it can occur that two streets in the new locality have the same name; then usually one of them gets a new name, in order to prevent confusion.</p>
<p>In some towns the names of the streets are carved in stone, like in Bordeaux (the beautiful French town that you should not miss if you go to the <a href="http://www.laposte.fr/sna/rubrique.php3?id_rubrique=110" target="_blank">Service National de l&#8217;Adresse</a> in order to get your French address standardization tool certified). As it is not so easy to erase a name that is carved in stone, this leads to the situation that the naming history of the streets is rather visible. This could help delivering mail that is sent to outdated addresses.</p>
<p>However, even in Bordeaux they have stopped carving names in stone. So it seems better to invest in a good <a title="data quality solution" href="http://www.humaninference.com" target="_blank">data quality solution</a>, that is able to replace the out-dated names by their new versions.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/the-names-they-are-a-changin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The obfuscated address contest</title>
		<link>http://datavaluetalk.com/data-quality/the-obfuscated-address-contest/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=the-obfuscated-address-contest</link>
		<comments>http://datavaluetalk.com/data-quality/the-obfuscated-address-contest/#comments</comments>
		<pubDate>Sun, 23 Nov 2008 20:50:47 +0000</pubDate>
		<dc:creator>Emil van den Berg</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[fuzzy matching]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=208</guid>
		<description><![CDATA[Programmers sometimes organize contests in writing code that is perfectly understandable for a compiler, but very difficult to understand for people. When working on products for address standardisation, one can discover an interesting variant: people sometimes write &#8211; unintentionally, I suppose &#8211; addresses in such a way that they are rather understandable for people, but [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2008/11/cimg6628.jpg"><img class="alignleft size-full wp-image-209" src="http://datavaluetalk.com/cms/wp-content/uploads/2008/11/cimg6628.jpg" alt="" width="209" height="140" /></a> Programmers sometimes organize <a href="http://en.wikipedia.org/wiki/Obfuscated_code" target="_blank">contests</a> in writing code that is perfectly understandable for a compiler, but very difficult to understand for people.</p>
<p>When working on products for address standardisation, one can discover an interesting variant: people sometimes write &#8211; unintentionally, I suppose &#8211; addresses in such a way that they are rather understandable for people, but very difficult to process for computers.</p>
<p>Consider for example this street name:</p>
<p style="30px;"><span style="#0000ff;"> <em>Kerkchoosteeg hoogl</em></span></p>
<p>The official version is:</p>
<p style="30px;"><span style="#0000ff;"> </span><em><span style="#0000ff;">Hooglandsekerk-choorsteeg</span> (&#8216;high land church &#8211; choir alley&#8217;)</em></p>
<p>This street contains a couple of errors:</p>
<ul>
<li>A hyphen is missing.</li>
<li>One &#8216;r&#8217; is missing.</li>
<li>One word (&#8216;Hooglandsekerk&#8217;) has been split up into two words.</li>
<li>The first word (&#8216;Hooglandse&#8217;) is written at the end.</li>
<li>One word is abbreviated (&#8216;hoogl&#8217;).</li>
</ul>
<p>The first two errors are not very special, but the last three can only be discovered in common: it can only be discovered that the word &#8216;hooglandsekerk&#8217; has been split up into two words, if at the same time it is understood that the left part has been abbreviated and moved to the end.<br />
<span id="more-208"></span></p>
<p>Similar problems occur in:</p>
<p style="30px;"><em><span style="#0000ff;">O Lieve Vrouweschutstr</span></em></p>
<p>Official:</p>
<p style="30px;"><em><span style="#0000ff;">O.L.V. Schutsstraat</span></em></p>
<p>&#8216;O.L.V.&#8217; is a common abbreviation of the &#8216;Onze lieve Vrouwe&#8217; (&#8216;Our Lady&#8217;); in order to determine that this abbreviation plays a role here, it must be understood that the word &#8216;Vrouwe&#8217; is part of &#8216;Vrouweschutstr&#8217; and that the rest of this word matches fairly well with &#8216;Schutsstraat&#8217;, which again is only possible if the missing &#8216;s&#8217; and the abbreviation &#8216;str&#8217; (for &#8216;straat&#8217;) are correctly handled.</p>
<p>An extra complication occurs if there are multiple candidates:</p>
<p style="30px;"><em><span style="#0000ff;">rue dendicolle</span></em></p>
<p>This has some resemblance with two official streets:</p>
<p style="30px;"><em><span style="#0000ff;">RUE HENRI COLLET<br />
RUE JEAN RENAUD DANDICOLLE</span></em></p>
<p>The first candidate has four differences:</p>
<ul>
<li>Three typos (&#8216;d&#8217;-'h&#8217;, &#8216;d&#8217;-'r&#8217;, missing &#8216;t&#8217;).</li>
<li>One missing space.</li>
</ul>
<p>The second candidate has three differences:</p>
<ul>
<li>Two missing first names.</li>
<li>One typo (&#8216;e&#8217;-'a&#8217;; in French these letters get in this context the same pronounciation).</li>
</ul>
<p>The second street matches clearly better; this can only be determined if the errors in the first case are considered more severe than the errors in the second case.</p>
<p>If an address consists of many elements, there are even more possibilities to make things difficult:</p>
<p style="30px;"><em><span style="#0000ff;">30 FERMONT ROAD<br />
199 CANARY WHARF<br />
E33 9SF<br />
E33 9SF LONDON</span></em></p>
<p>Official:</p>
<p style="30px;"><em><span style="#0000ff;">Flat 199<br />
Canary Wharf<br />
30 Fairmont Road<br />
LONDON<br />
E33 9SA</span></em></p>
<ul>
<li>Street and house number are on the first line, but should be on the third line.</li>
<li>&#8216;Fermont&#8217; must be written as &#8216;Fairmont&#8217;; the pronounciation is not equal, but fairly similar.</li>
<li>The address contains two postcodes; both are not on the right position and both are incorrect.</li>
<li>&#8217;199&#8242; should be &#8216;Flat 199&#8242; and must be written on a separate line.</li>
</ul>
<p>A product that can recognize the error situations shown in these examples, must be able to switch constantly between different error types. Searching for displaced words or address fields must occur in combination with resolving abbreviations, determining whether to accept typing errors, and distinguising between typo&#8217;s that lead to different pronounciations and typo&#8217;s which don&#8217;t.<br />
A product like this is never completely finished; therefore, when developing, it is good to start with the most common errors and the most common combinations of errors, and to add error situations in next releases. Investigating examples like these in an early stage of development helps setting up an architecture that is ready for further development.</p>
<p>Anyone got other nice examples?</p>
<p>(The addresses are real life examples; only the British example has been changed for privacy reasons, without changing the errors.)</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/the-obfuscated-address-contest/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

