<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Value Talk &#187; interpretation</title>
	<atom:link href="http://datavaluetalk.com/tag/interpretation/feed/" rel="self" type="application/rss+xml" />
	<link>http://datavaluetalk.com</link>
	<description>Customer data is a valuable asset. Why not treat it that way?</description>
	<lastBuildDate>Thu, 10 May 2012 14:49:53 +0000</lastBuildDate>
	<language>nl</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Ask Me is linked with Any Body and relates with Walther Von Stolzing</title>
		<link>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing</link>
		<comments>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 08:51:26 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Names]]></category>
		<category><![CDATA[cleansing]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[name]]></category>
		<category><![CDATA[names]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1991</guid>
		<description><![CDATA[Weird subject, isn&#8217;t it? Quite obvious for everybody, the persons &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are artificial names. They will never belong to a real person. How they relate to &#8216;Walter von Stolzing&#8217; will follow. For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Obama.png"><img class="alignleft size-thumbnail wp-image-2022" title="I'm Obama" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Obama-150x150.png" alt="" width="150" height="150" /></a>Weird subject, isn&#8217;t it? Quite obvious for everybody, the persons &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are artificial names. They will never belong to a real person. How they relate to &#8216;Walter von Stolzing&#8217; will follow.</p>
<p>For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize that &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are fake names. People are using these either in test situations or to hide their actual names.</p>
<p>In the old days we only needed to test on &#8216;Test Test&#8217;, in more recent years we see great inventiveness on these fake names. A brief example can be seen in the following list.</p>
<div align="center">
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="137">Alpha Beta</td>
<td valign="top" width="137">Any Body</td>
</tr>
<tr>
<td valign="top" width="137">Ask Me</td>
<td valign="top" width="137">Best Friend</td>
</tr>
<tr>
<td valign="top" width="137">Blue Sky</td>
<td valign="top" width="137">Cool Dude</td>
</tr>
<tr>
<td valign="top" width="137">Dress Code</td>
<td valign="top" width="137">El Comandante</td>
</tr>
<tr>
<td valign="top" width="137">Guess Who</td>
<td valign="top" width="137">In Cognito</td>
</tr>
</tbody>
</table>
</div>
<p>In case you cannot rely on reference data and interpretation you need to provide a check list. Providing it is one thing, but since users tend to be really creative, maintaining it is essential.<span id="more-1991"></span></p>
<p>In these 25 years we identified a move from &#8216;real fake names&#8217; towards &#8216;real names used in a fake way&#8217;. In the USA, for example, we identified popular Hollywood names and names of politicians being used as fake names. Currently the usage of the name &#8216;George Bush&#8217; is decreasing, whereas &#8216;Barack Obama&#8217; is increasingly used. We recognize the false usage of these names because of the change in frequency figures of the given name and family name as well as the usage of the combination itself. Remarkable is that &#8216;Abraham Lincoln&#8217; and &#8216;George Washington&#8217; are quite steady.</p>
<p>Back to &#8216;Walter von Stolzing&#8217;. By now you might have guessed what is happening here. We recognized that in German speaking areas this name is also passing our threshold on validity. By <a href="http://en.wikipedia.org/wiki/Die_Meistersinger_von_N%C3%BCrnberg" rel="nofollow">googling</a> the name you can see that Walter is actually a character in Wagner’s opera &#8216;Die Meistersinger von Nürnberg&#8217; back from 1868!</p>
<p>Let’s see if in 100 years time people are still using &#8216;Darth Vader&#8217;, &#8216;Lord Rings&#8217; or &#8216;Snoop Dogg&#8217;!</p>
<p>All the names used in this blog are ‘real’ names coming from a popular social media site. Please check our <a href="http://www.humaninference.nl/producten/data-cleansing">data cleansing</a> products in case you need cleansing solutions.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More than a name&#8230;..</title>
		<link>http://datavaluetalk.com/data-quality/more-than-a-name/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=more-than-a-name</link>
		<comments>http://datavaluetalk.com/data-quality/more-than-a-name/#comments</comments>
		<pubDate>Tue, 29 Mar 2011 11:33:37 +0000</pubDate>
		<dc:creator>Jochem Poeder</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Firefox plugin]]></category>
		<category><![CDATA[HIquality Name worldwide]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[name interpretation]]></category>
		<category><![CDATA[name suggestion]]></category>
		<category><![CDATA[name validation]]></category>
		<category><![CDATA[Name Worldwide]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1758</guid>
		<description><![CDATA[Everyone in this world has a name. When we hear a name however, it is really hard to precisely know what it consists of and how the consisting parts should be written. A name might, for example, contain salutation, one or more titles, given names, initials, one or more family names, and additions. Here&#8217;s an [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/data-quality/more-than-a-name/attachment/my-name-2/" rel="attachment wp-att-1786"><img class="alignleft size-thumbnail wp-image-1786" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/03/my-name1-150x150.png" alt="" width="139" height="116" /></a></p>
<p>Everyone in this world has a name. When we hear a name however, it is really hard to precisely know what it consists of and how the consisting parts should be written. A name might, for example, contain salutation, one or more titles, given names, initials, one or more family names, and additions. Here&#8217;s an example of a name with different name parts:</p>
<p><em>Mr Peter M Smith PhD</em></p>
<ul>
<li>Mr &#8211; Salutation, also called honorific, a polite way of addressing a person</li>
<li>Peter &#8211; Given name. The name given to a person at birth. You have male and female names, and sometimes a name can be carried by both. But, in general, it is possible to derive the gender of the person from his/her name(s).</li>
<li>M &#8211; Initial. An abbreviated form of a given name</li>
<li>Smith &#8211; Family name</li>
<li>PhD &#8211; Addition, in this case an academic title</li>
</ul>
<p>This may appear easy, but due to all different naming conventions in the world, it is definitely not! At Human Inference, we have automated this process by creating a <a href="https://addons.mozilla.org/nl/firefox/addon/nameworldwide/" target="_blank">Firefox plugin </a>that can help you interpret the various name parts and assign a gender to the name. It also finds the names which most closely resemble the ones you typed as input.</p>
<p>You can type the full name of a person, and that’s all you need to do. The plugin will make the most probable interpretation, based on the vast knowledge of names it has. It places the parts in the appropriate fields and displays the predicted gender. On top of that it will give you close alternatives for the names or for the way the input can be interpreted. These are shown as a list of suggestions when you right click on the input field. If you think that any of the other suggested interpretations is what you were looking for, you can click on it and it is displayed instead.</p>
<p>Summarizing, the plugin</p>
<ul>
<li>can correct the mistakes that you made writing someone&#8217;s name.</li>
<li>can be used to segment the name correctly.</li>
<li>can provide you with closely resembling suggestions.</li>
<li>can predict the gender of the name.</li>
<li><strong><em>is free of charge!<span id="more-1758"></span></em></strong></li>
</ul>
<p><strong><em><a href="http://datavaluetalk.com/data-quality/more-than-a-name/attachment/nameworldwide-4/" rel="attachment wp-att-1777"><img class="aligncenter size-large wp-image-1777" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/03/NameWorldwide3-1024x690.jpg" alt="" width="1024" height="690" /></a></em></strong></p>
<p>The plugin is powered by our HIquality Name Worldwide product. Name Worldwide has a vast knowledge base of names along with its gender, the region in which it is found and the frequency of the name. It&#8217; s powered by intelligent algorithms which interpret the name as soon as you type it. Apart from interpreting a name, Name Worldwide provides lots of other features like validation and suggestion. And you can easily power your web form with Name Worldwide by writing less than 5 lines of javascript code. Imagine someone filling in your web form, and his or her name and lots of similar names pop up before they are even completely typed in&#8230;&#8230;</p>
<p>Enjoy using the plugin and please contact Human Inference for further information on <a title="data quality" href="http://www.humaninference.com">data quality</a> solutions.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/more-than-a-name/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Know your customer to trust your data</title>
		<link>http://datavaluetalk.com/data-quality/know-your-customer-to-trust-your-data/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=know-your-customer-to-trust-your-data</link>
		<comments>http://datavaluetalk.com/data-quality/know-your-customer-to-trust-your-data/#comments</comments>
		<pubDate>Tue, 07 Dec 2010 13:53:36 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[customer llifetime value]]></category>
		<category><![CDATA[data quality strategy]]></category>
		<category><![CDATA[data trust]]></category>
		<category><![CDATA[identification]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[know your customer]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1590</guid>
		<description><![CDATA[The success of many business processes is linked directly to the quality of customer data. This is not only an obvious fact, but a recurring conclusion of many field studies: Incorrect, incomplete and inaccurate data will have a direct impact on your business succes rate. The symptomatology of this increase is established in inefficient marketing [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1596" title="blinddoek" src="http://datavaluetalk.com/cms/wp-content/uploads/2010/12/blinddoek-150x150.jpg" alt="blinddoek" width="150" height="150" /></p>
<p>The success of many business processes is linked directly to the quality of customer data. This is not only an obvious fact, but a recurring conclusion of many field studies: Incorrect, incomplete and inaccurate data will have a direct impact on your business succes rate. The symptomatology of this increase is established in inefficient marketing and sales processes, customer dissatisfaction, difficult cross- and upsell, unreliable analyses and many other disturbances in the day-to-day business of almost every organization dealing with customer, supplier and/or partner data.</p>
<p>In essence, it all comes down to knowing your data, in order to be able to trust your data. If you trust your data, you are definitely doing something right. So, how do you establish that trust? For this, you first have to answer a short, yet rather complex question: What is what in my database(s)? In other words: You have to identify and interpret the data you are working with .</p>
<p>A robust customer data identification solution intelligently interprets the details of both natural and legal persons. That process has to take account of the significance of words in a specific context, usage of company names, abbreviations, synonyms, acronyms, spelling mistakes, notation methods, standards and phonetic similarity of words. All in all, this is not a simple task; it more or less mimics the capabilities that humans show when interpreting data &#8230;<span id="more-1590"></span></p>
<p>It is, however, the first step in a solid data quality strategy. This strategy should entail some sort of methodic, recursive approach. This makes sense, since data cleansing is basically a process of recurring steps. Initial cleansing should, for example, be combined with methods to prevent future pollution. In other words: Do not only fight the symptoms of bad quality, but eliminate the root causes and make sure your clean data will stay clean. Underneath you will find an illustration of such an approach:</p>
<p><img class="alignleft size-full wp-image-1591" title="DQ strategy" src="http://datavaluetalk.com/cms/wp-content/uploads/2010/12/DQ-strategy.jpg" alt="DQ strategy" width="444" height="256" /></p>
<p>Effective customer service, targeted cross- and upsell, cost decrease and creation of cutomer lifetime value are but a few goals that will be achieved by defining and deploying the right data quality strategy. So start to <a title="Banken - Know your customer" href="http://www.humaninference.nl/branches/banken" target="_blank">know your customer</a> and learn to trust your data&#8230;..</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/know-your-customer-to-trust-your-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>International data quality &#8211; Is a football always a football?</title>
		<link>http://datavaluetalk.com/data-quality/international-data-quality-is-a-football-always-a-football/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=international-data-quality-is-a-football-always-a-football</link>
		<comments>http://datavaluetalk.com/data-quality/international-data-quality-is-a-football-always-a-football/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 06:36:41 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[customer data]]></category>
		<category><![CDATA[duplicate identification]]></category>
		<category><![CDATA[file merging]]></category>
		<category><![CDATA[input control]]></category>
		<category><![CDATA[international data]]></category>
		<category><![CDATA[internationalisation]]></category>
		<category><![CDATA[internationalization]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[natural language processing]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1539</guid>
		<description><![CDATA[High quality customer data have become the prerequisite for successful business decisions. In order to reach the intended data quality level, a lot of money is being invested in solutions for input control, file merging, data enrichment and duplicate identification. But do these investments guarantee high quality data and information? For example, are the data [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1550" title="football 2" src="http://datavaluetalk.com/cms/wp-content/uploads/2010/10/football-2.jpg" alt="football 2" width="130" height="130" /></p>
<p><img class="alignleft size-full wp-image-1559" title="football 1" src="http://datavaluetalk.com/cms/wp-content/uploads/2010/10/football-12.jpg" alt="football 1" width="130" height="130" />High quality customer data have become the prerequisite for successful business decisions. In order to reach the intended data quality level, a lot of money is being invested in solutions for input control, file merging, data enrichment and duplicate identification. But do these investments guarantee high quality data and information? For example, are the data quality tools and processes equipped for the inevitable internationalization of our business community? Is a football always a football?</p>
<p><strong>Natural language processing</strong></p>
<p>Why do we know that <strong><em>William Jones International Logistics Ltd</em></strong> and <strong><em>W. Jones Int. Transport Co.</em></strong> are probably different notations for the same company? How do we determine that <strong><em>Leonard</em></strong><em> </em>is a given name in <strong><em>Leonard Peters</em></strong> and a surname in <strong><em>Leonard &amp; Peters</em></strong>? Without being all that aware of it, we are using methods such as pattern recognition, context analysis and other linguistic considerations. To answer the question ”what is what in customer data?” people will use their knowledge of language and culture to interpret the data they will encounter in daily life.<span id="more-1539"></span></p>
<p>Correct, automated interpretation of customer data needs to <em>imitate</em> this natural language processing abilities. This requires knowlegde, containing relevant information on the components customer data consist of. Furthermore, a &#8220;grammar&#8221; is needed to take care of issues such as context rules, ambiguity checks, structure recognition, semantic associations and probability estimates. Using the knowledge and the grammar, the software solution decides what is the most probable signification of a word in a database record. This is the basis for all customer <a title="data quality" href="http://www.humaninference.com" target="_blank">data quality</a> processes. A quick look at the following example illustrates the power of this approach. We, as humans, immediately understand the signification ambiguity of &#8220;ART&#8221;. In addition, we also understand that automated interpretation represents a high level of complexity:</p>
<p><strong><em><span style="text-decoration: underline;">Art</span> Johnson Sporting Goods</em></strong></p>
<p><strong><em><span style="text-decoration: underline;">Art</span> Gallery Johnson &amp; Johnson</em></strong></p>
<p><strong><em><span style="text-decoration: underline;">ART</span> Ltd. Auto Rendition Technology </em></strong></p>
<p>If we take a look at international busines initiatives, things become even more complex. Apparently, a lot of companies doing business abroad, often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. For this post, I will limit myself to some focus points when dealing with names in international context.</p>
<p>The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács all mean “Smith” in different countries. That&#8217;s a factual observation, which is not necessarily helpful in solving complex data quality problems. There are, however, many aspects of the various national naming conventions, that represent a challenge for every company doing business across the border. Here are some examples:</p>
<p><em>Signification of name components</em></p>
<p>Due to divergent naming conventions, there is a great variety in storage, exchange, representation and signification of names. For example, the first name Joan is male in Spain and female in Belgium. Also, the representation will be exactly reverse. The form of address ‘Señor’ is the male equivalent of ‘Mevrouw’. In Spain: <strong><em>Señor Joan Martinez Fonseca Andrade</em></strong>. In Belgium: <strong><em>Mevrouw Vandenwalle, Joan</em></strong><em>.</em></p>
<p><em>Prefix sorting</em></p>
<p>A name like <strong><em>Van Buren</em></strong> would be sorted under ‘V’ in the US. In the Netherlands, for example, that name will always be found under the letter ‘B’. Additionally, the spelling (initial capital or not?) of prefixes differs per situation, per country.</p>
<p><em>Patronymics</em></p>
<p>The use of patronymics (names derived from the father’s first name) is highly country-specific. Whereas a Russian man whose father’s first name is Ivan, will add the patronymic Ivanovich to his family name, his sister will use Ivanovna: <strong><em>Sergei Ivanovich Golubev</em></strong> and <strong><em>Olga Ivanovna Golubeva</em></strong>. In Iceland, it is impossible to establish direct relation through analysis of family names. Here the patronymic serves as the family name itself: The son and daughter of <strong><em>Björn Thorgeirson</em></strong> will be called <strong><em>Nils Björnson</em></strong> and <strong><em>Anna Björnsdottir</em></strong> respectively.</p>
<p>Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple character sets, privacy issues, multi-lingualism and different currency and date notation. The examples given are meant to illustrate the following: Companies working with international data must invest in understanding customer data specifics. One size does not fit all!</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/international-data-quality-is-a-football-always-a-football/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Your name is too &#8220;common&#8221;&#8230;.</title>
		<link>http://datavaluetalk.com/data-governance/your-name-is-too-common/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=your-name-is-too-common</link>
		<comments>http://datavaluetalk.com/data-governance/your-name-is-too-common/#comments</comments>
		<pubDate>Mon, 07 Sep 2009 13:14:24 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Banks]]></category>
		<category><![CDATA[Chinese characters]]></category>
		<category><![CDATA[customer view]]></category>
		<category><![CDATA[deduplication]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[single customer view]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1207</guid>
		<description><![CDATA[A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-thumbnail wp-image-1209" title="chinese-characters" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/09/chinese-characters-150x150.jpg" alt="chinese-characters" width="150" height="150" /></p>
<p>A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The bank states that the refusal is nothing personal, since nobody with the name Li Jun will be accepted as customer in the near future&#8230;.. In the meanttime, Li Jun is taking legal action against the bank.<span id="more-1207"></span></p>
<p>When I read this news article this morning, my first thoughts were that it was perhaps a hoax. It turns out , however, that the news fact is true. From a data quality point of view this strikes me as really strange. How does this particular bank manage its customer data? Are there no additional identifiers (address, date of birth, etc.) to determine that you are actually dealing with the customer you think you are dealing with? Imagine that every John Smith would have a hard time to open a bank account, to apply for a job or to buy a product via the web. Or Jenny Jones? Bob Johnson? When is a name too &#8220;common&#8221;? It is common misbelief that the complexity of ideographic characacters such as Mandarin Chinese makes it harder to identify. At Human Inference we carried out some pretty serious dedups of Chinese files and-taking into account that Mandarin Chinese is a tonal language and other priciples of fault-tolearnce apply- the duplicate identification was rather accurate.</p>
<p>It is all a matter of using an intelligent <a title="data matching" href="http://www.humaninference.com/products/data-matching" target="_blank">data matching</a> method and knowing what kind of data one is working on. Every name can be identified; even &#8220;common&#8221; names.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-governance/your-name-is-too-common/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Any close encounters with the FBI terrorist watchlist?</title>
		<link>http://datavaluetalk.com/data-governance/any-close-encounters-with-the-fbi-terrorist-watchlist/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=any-close-encounters-with-the-fbi-terrorist-watchlist</link>
		<comments>http://datavaluetalk.com/data-governance/any-close-encounters-with-the-fbi-terrorist-watchlist/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 09:14:34 +0000</pubDate>
		<dc:creator>Ramon de Noronha</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[compliance]]></category>
		<category><![CDATA[identification]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[persistent identification]]></category>
		<category><![CDATA[processes]]></category>
		<category><![CDATA[suspect list matching]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1125</guid>
		<description><![CDATA[Just before this summer the U.S. Department of Justice filed a report about the FBI Terrorist Watchlist. This watchtlist serves as a critical tool for screening and law enforcement personnel for alerting them when they come across a known or suspected terrorist. It is used by personnel at airports, harbours and the borderline. Also when [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1127" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/08/tsc080105a.jpg" alt="tsc080105a" width="160" height="152" />Just before this summer the U.S. Department of Justice filed a report about the FBI Terrorist Watchlist. This watchtlist serves as a critical tool for screening and  law enforcement personnel for alerting them when they come across a known or suspected terrorist. It is used by personnel at airports, harbours and the borderline. Also when you apply for a visum you are matched against this watchlist. The Terrorist Screening Center, a subsidiary of the FBI, is responsible for maintaining the watchlist.</p>
<p>This watchlist was created in 2004 from several other lists and at that time it consisted of about 68.000 entries. I use the word entries, because in the years after it became fuzzy if one record is the same as one individual. By the end of 2008 the list had grown to over 1,1 million entries. In 2008 after the American Civil Liberties Union (ACLU) mentioned that the list had <a title="Numbers don't add up" href="http://www.aclu.org/privacy/gen/36064res20080721.html" target="_blank">passed the 1 million</a>, the government came with an explanation. <em>Although we have recorded over 1 million entries in the database, the net result is that these records correspond to about 400.000 individuals. </em>Terrorist often use different and thus multiple identities, use several (falsified) passports etc. But adding entries with only the first initials and last name, while an entry of the full first names and last name already exists will result in unwanted side-effects.<span id="more-1125"></span></p>
<p>We all know, as being interested in data quality and identity resolution, that J. Robinson will result into much more matches (hits) than James Robinson. Indeed the number of found matches will sky-rocket and have to be evaluated manually. Might this be the reason, that we see more and more security personnel on airports?</p>
<p>In the<a href="http://www.usdoj.gov/oig/reports/FBI/a0925/final.pdf" target="_blank"> latest audit report</a> of the U.S. Department of Justice about this watchlist one other problem was analyzed. While extensive procedures were made for nominating and adding suspects to the watchlist, there is no procedure for removing people from the list. Based on a sample of almost 70.000 entries and investigation of the individuals an astounding number of 35% omissions was found. People who had died were still on the list, people who were no longer investigated upon, cases which had been closed etc. So this watchlist is <a href="http://www.aclu.org/privacy/spying/watchlistcounter.html" target="_blank">growing and growing</a>. Resulting in screening personnel who ensnare many innocent travelers as suspected terrorists. And wasting their time and divert their energies from looking for true terrorists. It seems to me that FBI and TSC can benefit from better Data Governance, what do you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-governance/any-close-encounters-with-the-fbi-terrorist-watchlist/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Bi-lingual streetnames in Amsterdam, do we really need it?</title>
		<link>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=bi-lingual-streetnames-in-amsterdam-do-we-really-need-it</link>
		<comments>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/#comments</comments>
		<pubDate>Mon, 17 Aug 2009 08:10:47 +0000</pubDate>
		<dc:creator>Ramon de Noronha</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[address standardization]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[persistent identification]]></category>
		<category><![CDATA[standardization]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1110</guid>
		<description><![CDATA[So once in a while I visit Amsterdam and have a drink or two in the centre. Afterwards I use the tram to get back to the hotel. This weekend I was quite surprised to find out that all the streetnames are announced in English, at each stop. The easy and obvious one is of [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignleft size-full wp-image-1111" src="http://datavaluetalk.com/cms/wp-content/uploads/2009/08/Straatnaambord.jpg" alt="Straatnaambord" width="305" height="137" />So once in a while I visit Amsterdam and have a drink or two in the centre. Afterwards I use the tram to get back to the hotel. This weekend I was quite surprised to find out that all the streetnames are announced in English, at each stop. The easy and obvious one is of course Centraal Station, which was translated to Central Station. I also can see how they came up with Rembrandt Square instead of Rembrandtsplein. But translating &#8220;Spui&#8221; to &#8220;Courtyard with a chapel&#8221; doesn&#8217;t help any tourists to find their destination.<span id="more-1110"></span></p>
<p>In Holland we already have three officially approved manners of naming streets an addresses. Nowadays we have the TNT Post standard, based on the very first publication of the postal code book, and afterwards corrected several times. This naming convention was the basis for the NEN 5825 standard (NEN is the dutch variant of ISO). But the true source of streetnames comes from the municipality and is called &#8220;Raadsbesluit&#8221;.Due to the different versions this can easily result in five different ways of spelling of the same street, as the example below shows:</p>
<table border="0" cellspacing="1" cellpadding="2" width="85%" align="center">
<tbody>
<tr>
<td><span style="font-size: x-small;">Original Postal Code Book (1978):</span></td>
<td><span style="font-size: x-small;">s en schepenenstr</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Corrected TNT Post &#8211; standard<br />
</span></td>
<td><span style="font-size: x-small;">schout en s str</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NEN- 5825 standard, version 1991: </span></td>
<td><span style="font-size: x-small;">Schout en Schepenenstr</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">NEN- 5825 standard, version 2002: </span></td>
<td><span style="font-size: x-small;">Sch en Schepenenstraat</span></td>
</tr>
<tr>
<td><span style="font-size: x-small;">Raadsbesluit:<br />
</span></td>
<td><span style="font-size: x-small;">Schout en Schepenenstraat</span></td>
</tr>
</tbody>
</table>
<p>What do you think, should we add a new &#8220;English&#8221; standard to existing standards. What are the pro&#8217;s and con&#8217;s for having English labels of the streetnames, please add your opinion in the comments. Should we also replace all signs and add the English label for the streetnames?  For more information and history of Dutch street names I recommend the following site <em><a title="Alles over straatnaam" href="http://www.allesoverstraatnamen.nl/" target="_blank">&#8220;alles over straatnamen&#8221;</a></em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/bi-lingual-streetnames-in-amsterdam-do-we-really-need-it/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Budget for Data Quality seems no problem</title>
		<link>http://datavaluetalk.com/data-quality/budget-for-data-quality-seems-no-problem/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=budget-for-data-quality-seems-no-problem</link>
		<comments>http://datavaluetalk.com/data-quality/budget-for-data-quality-seems-no-problem/#comments</comments>
		<pubDate>Thu, 16 Oct 2008 12:00:36 +0000</pubDate>
		<dc:creator>Emile van de Klok</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Budget]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[processes]]></category>

		<guid isPermaLink="false">http://datavaluetalk.wordpress.com/?p=95</guid>
		<description><![CDATA[A survey of Human Inference in 2008 indicates that processes are the biggest experienced challenge in relation to Data Quality. However the subject that seems to be no problem is the budget. Human inference differentiates itself by interpretation of knowledge. So from this perspective I wonder how the respondents interpreted the word &#8220;processes&#8221;. Do they [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp">
<p class="MsoNormal" style="margin: 0;"><span style="font-size: small; font-family: Myriad Pro;">A survey of Human Inference in 2008 indicates that processes are the biggest experienced challenge in relation to <a title="data quality" href="http://www.humaninference.com" target="_blank">Data Quality</a>. However the subject that seems to be no problem is the budget. Human inference differentiates itself by interpretation of knowledge. So from this perspective I wonder how the respondents interpreted the word &#8220;processes&#8221;. Do they mean the processes within the value chain of their companies or do they actually mean the process of obtaining a budget for Data Quality? The latter would actually explain a lot.</span></p>
</div>
<div id="attachment_75" class="wp-caption alignnone" style="width: 310px"><a href="http://datavaluetalk.files.wordpress.com/2008/10/survey-challenge-dq.jpg"><img class="size-medium wp-image-75" title="survey-challenge-dq" src="http://datavaluetalk.files.wordpress.com/2008/10/survey-challenge-dq.jpg?w=300" alt="HI Survey Results" width="300" height="188" /></a><p class="wp-caption-text">HI Survey Results</p></div>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/budget-for-data-quality-seems-no-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

