<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Value Talk</title>
	<atom:link href="http://datavaluetalk.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://datavaluetalk.com</link>
	<description>Customer data is a valuable asset. Why not treat it that way?</description>
	<lastBuildDate>Mon, 09 Jan 2012 11:38:42 +0000</lastBuildDate>
	<language>nl</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>We make &#8216;null&#8217; mistakes</title>
		<link>http://datavaluetalk.com/data-quality/we-make-null-mistakes/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=we-make-null-mistakes</link>
		<comments>http://datavaluetalk.com/data-quality/we-make-null-mistakes/#comments</comments>
		<pubDate>Mon, 09 Jan 2012 10:23:22 +0000</pubDate>
		<dc:creator>Frano Bebseler</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Services]]></category>
		<category><![CDATA[customer data matching]]></category>
		<category><![CDATA[MDM for customer data]]></category>
		<category><![CDATA[single customer view]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=2072</guid>
		<description><![CDATA[Wherever software is created, mistakes are being made. Software providers often presume their products are bug-free, but software of that kind doesn’t exist. Our departments works hard to prevent it, yet in our HIquality Life Cycle new bugs could still be introduced, even in the oldest modules that have been in use for over 25 years already.  HIquality [...]]]></description>
			<content:encoded><![CDATA[<p><em>Wherever software is created, mistakes are being made. Software providers often presume their products are bug-free, but software of that kind doesn’t exist. Our departments works hard to prevent it, yet in our HIquality Life Cycle new bugs could still be introduced, even in the oldest modules that have been in use for over 25 years already.</em> </p>
<p><strong>HIquality bug cycle</strong></p>
<p>Usually our customers are satisfied with our product suite. At customer support I never receive information about the successful implementations. I got to know our software through the problems that occur, and in almost 15 years of acceptance testing and customer support, I’ve seen all kind of bugs passing by.<br />
<a href="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/bed-bug-life-cycle.jpg"><img class="alignleft size-thumbnail wp-image-2075" title="bed-bug-life-cycle" src="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/bed-bug-life-cycle-150x150.jpg" alt="HIquality bug cycle" width="150" height="150" /></a>Software crashes and never ending loops are nasty. Worse are those bugs that are not that visible in the beginning, but keep on growing in the course of time.<br />
Recently we caught such a bug in our longest existing product HIquality Identify.<span id="more-2072"></span></p>
<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/dq_lifecycle_copy-resized.png"><img class="alignright size-full wp-image-2074" title="dq_lifecycle_copy resized" src="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/dq_lifecycle_copy-resized.png" alt="Data quality life cycle" width="150" height="150" /></a>HIquality Identify is often used in search applications. Just like police and justice use “descriptions” of a criminal, HIquality Identify uses descriptions of source data to detect the right records in a database. Source records are decomposed to core words and the phonological codes of the core words of streets, names and places are stored in the description table. This table is the base of the search application.</p>
<p><strong>Nulls, nels and nols</strong><br />
Whenever a source record is changed, the descriptions have to be updated as well. A synchronize procedure is used to keep the description table up to date.<br />
Due to a little mistake in this procedure we recently released a version of the Oracle Upgrade pack, that didn’t recognize null values in the database any more. Empty fields in the database resulted in core words with the value ‘null’, and the phonological codes ‘nel’ and ‘nol’.<br />
As a result the scores of evaluations became less accurate, and end scores became too high. The phonological codes of the core words are used as indexes. These indexes are used to pre-calculate the maximum number of evaluations. Since more and more of these fields are changed to nul nel and nol, after several months, instead of search results time-outs occur, stating that not enough relevant search data was entered. <a href="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/Null2resized.png"><img class="aligncenter size-full wp-image-2107" title="Null2resized" src="http://datavaluetalk.com/cms/wp-content/uploads/2012/01/Null2resized.png" alt="" width="492" height="300" /></a></p>
<p><strong>Did you want to know this?</strong><br />
In a short time we figured out which customers received this special software release, and our consultants visited them for an upgrade. In the end all of them upgraded without knowing the actual reason or what kind of harm potentially could have been caused.<br />
As a customer you want problems to be repaired and bugs to be fixed, without knowing every single detail. Is this nol-worm kind of virus something you would have wanted to know about? Probably not. In case you will have nightmares about nulls, nels and nols in your database, you can contact me any time at Human Inference’s Customer Support, where I cope with all those kind of things that can go wrong in software.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/we-make-null-mistakes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>First time right? Let your data decide!</title>
		<link>http://datavaluetalk.com/data-quality/first-time-right-let-your-data-decide/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=first-time-right-let-your-data-decide</link>
		<comments>http://datavaluetalk.com/data-quality/first-time-right-let-your-data-decide/#comments</comments>
		<pubDate>Thu, 03 Nov 2011 11:01:39 +0000</pubDate>
		<dc:creator>Graham Rhind</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[First Time Right]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=2056</guid>
		<description><![CDATA[Data quality consultants will tell you that collecting data correctly, getting it right first time is essential, whilst in contrast almost every organisation actually puts most of their budget and labour into attempting to cleanse data after collection. The proactive versus reactive debate rages, but in fact data quality must be both a proactive and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/11/FTR-quote.png"><img class="alignleft size-thumbnail wp-image-2058" title="FTR " src="http://datavaluetalk.com/cms/wp-content/uploads/2011/11/FTR-quote-150x150.png" alt="" width="150" height="150" /></a>Data quality consultants will tell you that collecting data correctly, getting it <a title="First Time Right whitepaper" href="http://www.humaninference.com/solutions/first-time-right">right first time </a>is essential, whilst in contrast almost every organisation actually puts most of their budget and labour into attempting to cleanse data after collection.</p>
<p>The proactive versus reactive debate rages, but in fact data quality must be both a proactive and a reactive process. The data will dictate which to use, or whether both are required.<span id="more-2056"></span></p>
<p>The first time right proactive approach applies to all data. Some data, such as transactional data, must be collected correctly because corrections after the event are time-consuming and very costly. Note the wrong information when a product purchase is telephoned in, for example, and your customer, the call centre staff, the warehouse staff and so on will all need to be included when trying to correct the error after the wrong products are delivered, with the accompanying damage to your reputation and your brand image.</p>
<p>When transactional data is collected correctly first time, no further cleansing will ever be required – the record represents a historic event that will never change.</p>
<p>Other data may be cleansed later, but reactive cleansing never achieves the high level of data accuracy that right first time practices achieve. Postal addresses are a good example. Many addresses can be found and validated against postal data files after collection. But without the ability to hold a dialogue with the customer, reactive processing will only correct a percentage of the issues.</p>
<p>The situation with personal names is far more serious. People give their children’s names unusual spellings, casings and other twists to mark them out from the crowd. Without being collected correctly first time, no automated post-processing can correct these errors. In fact, many attempts to correct the errors will result in further inaccuracies being introduced. Organisations also often reactively gender code their customers based on given names. People with the same names can have different genders both within and between cultures, and as in our mobile world people move around, you can no longer assume that the American Joan is a female and not a male originating from Spain, for example. This sort of post-processing significantly reduces data accuracy.</p>
<p>So if first time right is always the best policy for data quality, where does reactive data processing come in? When the data refers to an entity, you need to remember that the entity’s situation may change over time. If it is a person they may marry, change their name, move house, change job etc. If it is a city with a postal code the city name may change, or the postal code, or the street name may change. Somebody living in Sudan on 8th July 2011 may find themselves living in South Sudan on 9th July. Somebody buying with Estonian Krooni on 31st December 2010 will be buying in Euros on 1st January 2011. It is remarkable how we attempt to track the changes in the lifecycles of our customers but tend to overlook the dynamic changes in their environments which affect them and you – they happen very much more often than we think.</p>
<p>So the lesson must be that first time right is always right and should never be replaced with reactive processing. Reactive processing is required when real world change affects our data.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/first-time-right-let-your-data-decide/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Easy mashup of ETL and DQ</title>
		<link>http://datavaluetalk.com/data-quality/an-easy-mashup-of-etl-and-dq/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=an-easy-mashup-of-etl-and-dq</link>
		<comments>http://datavaluetalk.com/data-quality/an-easy-mashup-of-etl-and-dq/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 11:09:20 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Data Services]]></category>
		<category><![CDATA[ETL]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=2045</guid>
		<description><![CDATA[Today I saw how easy it can be to make a mashup from ETL and DataQuality tools. More and more ETL vendors see the need to not only extract, transform and load data, but at the same time also enhance the data by hand with data quality tools. Most of them stick to so-called tick mark data [...]]]></description>
			<content:encoded><![CDATA[<p>Today I saw how easy it can be to make a mashup from ETL and DataQuality tools. More and more ETL vendors see the need to not only extract, transform and load data, but at the same time also enhance the data by hand with data quality tools. Most of them stick to so-called tick mark data quality – main stream easy to get enhancements. These results are mostly experienced as disappointing or at max average. Building ETL solutions is another ball-game than building data quality solutions. You need to mash these worlds together.<br />
Together with <a href="http://www.pentaho.com/">Pentaho</a> we as Human Inference are creating a mashup with their <a href="http://kettle.pentaho.com/">Kettle ETL </a>tool and our HIquality Data Quality solutions. The nice thing is that the data quality solutions can be used both in the cloud as well as on-premise.<br />
It’s almost finished now and as a teaser I just want to show you a hot screenshot of it. Soon available as add-on from our easyDQ website, followed by an inclusion in the coming Pentaho release. If you need it right away, please <a href="http://www.humaninference.com/contact">contact us </a>directly.</p>
<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/ETL-Pentaho1.jpg"><img class="alignleft size-full wp-image-2049" title="ETL &amp; DQ" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/ETL-Pentaho1.jpg" alt="" width="786" height="337" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/an-easy-mashup-of-etl-and-dq/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Komerc in Croatia</title>
		<link>http://datavaluetalk.com/data-quality/komerc-in-croatia/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=komerc-in-croatia</link>
		<comments>http://datavaluetalk.com/data-quality/komerc-in-croatia/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 13:32:43 +0000</pubDate>
		<dc:creator>Graham Rhind</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Names]]></category>
		<category><![CDATA[company names]]></category>
		<category><![CDATA[first time right]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=2031</guid>
		<description><![CDATA[People find many ways to be unique, including in their choice of names and how they are written.  Common names may be written in any number of ways (Zachery, Zaccari, Zachery, Zakarey and so) and in any number of forms (Za’Korey, zaKori). This variation, and the importance that the customer attaches to it, reinforces the [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Konzalting-Zagreb1.jpg"><img class="alignleft size-thumbnail wp-image-2036" title="Konzalting Zagreb" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Konzalting-Zagreb1-150x66.jpg" alt="" width="150" height="66" /></a>People find many ways to be unique, including in their choice of names and how they are written.  Common names may be written in any number of ways (Zachery, Zaccari, Zachery, Zakarey and so) and in any number of forms (Za’Korey, zaKori). This variation, and the importance that the customer attaches to it, reinforces the importance of first time right when collecting information about a person’s name.</p>
<p style="text-align: left;">I was reminded recently that this rule applies also to company names when reviewing a directory containing Croatian companies.  The directoryshowed a great variation in words that at first glance would seem ideal candidates for correction and standardization. For example, many companies contained strings like these:</p>
<p style="text-align: center;">Commerc, Comerce, Comerc, Kommerce, Kommerc, Komerce, Komerc</p>
<p>There are many other examples which had me scratching my head: Compani, Konsulting, Konzalting, Konsalting and so on.</p>
<p>Why the variance is spelling?  Are these companies with the English word commerce in their names where that word has been typed as heard by call centre workers with a limited knowledge of English? Are they typos of a valid Croatian word? Are they accurate representations of a valid Croatian word as rendered in different dialects? Is it a mixture of all these factors? </p>
<p><span id="more-2031"></span></p>
<p>I am assured that commerce does not have a similar Croatian equivalent. The best translation of commerce would be trgovina or obrt. Typos and mis-rendering aside, it would appear that in a significant number of cases these strings as written are actually part of an accurate company name – they are attempts to anglicise and internationalise the company name, either mis-spelling the English word or rendering it to sound like &#8216;commerce&#8217; in the local Croatian dialect. </p>
<p>If one needed to reactively cleanse company data that has not been collected correctly, one could choose to translate these words when found, or to standardise them to &#8216;commerce&#8217;, but either process would reduce the accuracy of the data, because, wrong or not, these words are included as parts of company names. They show the individuality of the company, and attempts to &#8216;correct&#8217; data is as bad as correcting a personal name like SuZann because it is &#8216;written wrongly&#8217;.</p>
<p> There are parts of company names that can be processed, most often the legal form of the company – PLC, Ltd, Inc. or, to continue the Croatian theme, d.o.o., but the rest of the company string needs to be left alone. </p>
<p>Like personal names, company names are carefully chosen to show individuality, to set that company apart from its competitors. The names vary greatly in form and spelling. Unless one has an intimate knowledge of a company, viewing a name after data collection will not show where any errors in it exist, and no post processing will allow those errors to be corrected – in fact, post processing often reduces the accuracy of company names.  As with personal names, the rule for company names is to collect then correctly – <a title="First Time Right solution" href="http://www.humaninference.com/solutions/first-time-right">right first time</a>.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/komerc-in-croatia/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ask Me is linked with Any Body and relates with Walther Von Stolzing</title>
		<link>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing</link>
		<comments>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 08:51:26 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Names]]></category>
		<category><![CDATA[cleansing]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[interpretation]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[name]]></category>
		<category><![CDATA[names]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1991</guid>
		<description><![CDATA[Weird subject, isn&#8217;t it? Quite obvious for everybody, the persons &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are artificial names. They will never belong to a real person. How they relate to &#8216;Walter von Stolzing&#8217; will follow. For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Obama.png"><img class="alignleft size-thumbnail wp-image-2022" title="I'm Obama" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/10/Obama-150x150.png" alt="" width="150" height="150" /></a>Weird subject, isn&#8217;t it? Quite obvious for everybody, the persons &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are artificial names. They will never belong to a real person. How they relate to &#8216;Walter von Stolzing&#8217; will follow.</p>
<p>For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize that &#8216;Ask Me&#8217; and &#8216;Any Body&#8217; are fake names. People are using these either in test situations or to hide their actual names.</p>
<p>In the old days we only needed to test on &#8216;Test Test&#8217;, in more recent years we see great inventiveness on these fake names. A brief example can be seen in the following list.</p>
<div align="center">
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top" width="137">Alpha Beta</td>
<td valign="top" width="137">Any Body</td>
</tr>
<tr>
<td valign="top" width="137">Ask Me</td>
<td valign="top" width="137">Best Friend</td>
</tr>
<tr>
<td valign="top" width="137">Blue Sky</td>
<td valign="top" width="137">Cool Dude</td>
</tr>
<tr>
<td valign="top" width="137">Dress Code</td>
<td valign="top" width="137">El Comandante</td>
</tr>
<tr>
<td valign="top" width="137">Guess Who</td>
<td valign="top" width="137">In Cognito</td>
</tr>
</tbody>
</table>
</div>
<p>In case you cannot rely on reference data and interpretation you need to provide a check list. Providing it is one thing, but since users tend to be really creative, maintaining it is essential.<span id="more-1991"></span></p>
<p>In these 25 years we identified a move from &#8216;real fake names&#8217; towards &#8216;real names used in a fake way&#8217;. In the USA, for example, we identified popular Hollywood names and names of politicians being used as fake names. Currently the usage of the name &#8216;George Bush&#8217; is decreasing, whereas &#8216;Barack Obama&#8217; is increasingly used. We recognize the false usage of these names because of the change in frequency figures of the given name and family name as well as the usage of the combination itself. Remarkable is that &#8216;Abraham Lincoln&#8217; and &#8216;George Washington&#8217; are quite steady.</p>
<p>Back to &#8216;Walter von Stolzing&#8217;. By now you might have guessed what is happening here. We recognized that in German speaking areas this name is also passing our threshold on validity. By <a href="http://en.wikipedia.org/wiki/Die_Meistersinger_von_N%C3%BCrnberg">googling</a> the name you can see that Walter is actually a character in Wagner’s opera &#8216;Die Meistersinger von Nürnberg&#8217; back from 1868!</p>
<p>Let’s see if in 100 years time people are still using &#8216;Darth Vader&#8217;, &#8216;Lord Rings&#8217; or &#8216;Snoop Dogg&#8217;!</p>
<p>All the names used in this blog are ‘real’ names coming from a popular social media site. Please check our <a href="http://www.humaninference.com/products/data-cleansing">cleansing products</a> in case you need cleansing solutions.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/ask-me-is-linked-with-any-body-and-relates-with-walther-von-stolzing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Know Your Customers &#8211; improving your Corporate Social Responsibility</title>
		<link>http://datavaluetalk.com/data-governance/know-your-customers-improving-your-corporate-social-responsibility/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=know-your-customers-improving-your-corporate-social-responsibility</link>
		<comments>http://datavaluetalk.com/data-governance/know-your-customers-improving-your-corporate-social-responsibility/#comments</comments>
		<pubDate>Fri, 30 Sep 2011 13:44:58 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Governance]]></category>
		<category><![CDATA[AFM]]></category>
		<category><![CDATA[Banking]]></category>
		<category><![CDATA[Banking & Finance]]></category>
		<category><![CDATA[cdi]]></category>
		<category><![CDATA[corporate social responsibility]]></category>
		<category><![CDATA[golden record]]></category>
		<category><![CDATA[know your customer]]></category>
		<category><![CDATA[MDM for customer data]]></category>
		<category><![CDATA[OECD]]></category>
		<category><![CDATA[transparency]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1987</guid>
		<description><![CDATA[It&#8217;s not only what you achieve, it&#8217;s also how you behave. Some small organizations can still behave somewhat undetected way to achieve successful results. For medium and large organizations that is not what governments and customers expect from them. Transparency on Corporate Social Responsibility (CSR) are key in this and therefore a significant number of countries agreed [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp"><a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/09/blinddoek1.jpg"><img class="alignleft size-thumbnail wp-image-2013" title="blinddoek" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/09/blinddoek1-150x150.jpg" alt="" width="129" height="133" /></a>It&#8217;s not only what you achieve, it&#8217;s also how you behave. Some small organizations can still behave somewhat undetected way to achieve successful results. For medium and large organizations that is not what governments and customers expect from them. Transparency on Corporate Social Responsibility (CSR) are key in this and therefore a significant number of countries agreed on these in, amongst others, the <a title="OECD Guidelines for Multinational Enterprises" href="http://www.oecd.org/dataoecd/43/29/48004323.pdf" target="_blank">OECD Guidelines for Multinational Enterprises</a>.</div>
<p style="text-align: left;">This week, the latest results have been presented in The Netherlands on <a title="Praktijkonderzoek Transparantie" href="http://www.eerlijkebankwijzer.nl/site/praktijkonderzoek_transparantie.pdf" target="_blank">Transparency in the Banking</a> area. And although some institutions score really good, others really need to take it at least one mile further to get a good or even fair score.</p>
<p style="text-align: left;">We agree with the recommendations of the report that compliance regulations can help/force in being more transparent, e.g., the SEC in the USA is enforcing more detailed information than their Dutch peer, the AFM. And also for Basel II the financial institutions need to know who they are dealing with in the end. The phrase - <em>in the end</em> &#8211; makes it even more difficult for the CSR, because not only the ultimate legal entity is now needed, but additional details per region and per sector are required.<span id="more-1987"></span></p>
<p style="text-align: left;">In our daily practice in implementing<a title="Customer Data Integration" href="http://www.humaninference.com/solutions/single-customer-view" target="_blank"> Customer Data Integration</a> (CDI or MDM for Customer Data) projects, we face these challenges at our customers. They are absolutely willing to provide the right figures, however it&#8217;s far from a trivial task. There are many underlying systems that were never created to aggregate this kind of information in an easy way sufficient for reporting the CSR. There is a huge demand on bridging the gap between these systems in an non-intrusive way. To combine individual records in and across systems in so-called Golden Records, so on these can be used both for compliance and transparency on your social responsibility.</p>
<p style="text-align: left;"> </p>
<p style="text-align: left;"> </p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-governance/know-your-customers-improving-your-corporate-social-responsibility/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>200 years of family names</title>
		<link>http://datavaluetalk.com/data-quality/200-years-of-family-names/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=200-years-of-family-names</link>
		<comments>http://datavaluetalk.com/data-quality/200-years-of-family-names/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 09:44:06 +0000</pubDate>
		<dc:creator>Paul Drenth</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[family names]]></category>
		<category><![CDATA[municipal register]]></category>
		<category><![CDATA[Napoleon]]></category>
		<category><![CDATA[surnames]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1948</guid>
		<description><![CDATA[Today is a memorable day for data quality in the Netherlands. Exactly two hundred years ago, on August 18, 1811, the French emperor (and occupier) Napoleon Bonaparte issued the decree that all citizens of the northern provinces of the Netherlands were to choose a surname. This name was very useful in the municipal registers of [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/data-quality/200-years-of-family-names/attachment/napoleon4/" rel="attachment wp-att-1950"><img class="alignleft size-thumbnail wp-image-1950" title="napoleon4" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/08/napoleon4-150x150.jpg" alt="" width="150" height="150" /></a>Today is a memorable day for data quality in the Netherlands. Exactly two hundred years ago, on August 18, 1811, the French emperor (and occupier) Napoleon Bonaparte issued the decree that all citizens of the northern provinces of the Netherlands were to choose a surname. This name was very useful in the municipal registers of the Dutch inhabitants: how else could the French army know which lad to draw for military service, or which peasant to pursue for taxes?<span id="more-1948"></span></p>
<p>After the retreat of the French, the Dutch authorities wisely kept the system of family names. They too wanted their tax system to function properly (they were Dutch, after all). The registration of family names gradually became common practise and with the growing population in the Netherlands, the municipal registers kept growing as well. During the last decades the scattered registers of all the municipalities were investigated, linked and combined, and put in large computer systems.</p>
<p>And now it became apparent that in the old days, data quality might not have been a top priority after all: For example, in the Netherlands there are more than thirty spelling variations of the name Mathijsen.<a href="http://datavaluetalk.com/data-quality/200-years-of-family-names/attachment/mathijsen-2/" rel="attachment wp-att-1951"><img class="alignleft size-medium wp-image-1951" title="mathijsen" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/08/mathijsen1-300x191.gif" alt="" width="300" height="191" /></a></p>
<p>Which leaves the true data quality fans wondering whether a family name is actually the right family name…..</p>
<p>For more information on family names, please check the <a href="http://www.humaninference.com/products/data-cleansing" target="_self">Human Inference website</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/200-years-of-family-names/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What is equal? &#8211; challenges with sound and synonyms</title>
		<link>http://datavaluetalk.com/data-quality/what-is-equal-challenges-with-sound-and-synonyms/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=what-is-equal-challenges-with-sound-and-synonyms</link>
		<comments>http://datavaluetalk.com/data-quality/what-is-equal-challenges-with-sound-and-synonyms/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 13:43:45 +0000</pubDate>
		<dc:creator>Winfried van Holland</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[apples and oranges]]></category>
		<category><![CDATA[fuzzy matching]]></category>
		<category><![CDATA[natural language processing]]></category>
		<category><![CDATA[String comparison]]></category>
		<category><![CDATA[synonyms]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1882</guid>
		<description><![CDATA[What to do when basic string comparison (fuzzy search) techniques won&#8217;t give the right results? Fuzzy search helps to find matches in situations where people make typo&#8217;s (e.g. compare Human Inference with Human Inverence) or make up abbreviations (King str. with King street) or ignore diacritics (Sørensen and Soerensen). In case the &#8216;wrong word&#8217; is [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/data-quality/what-is-equal-challenges-with-sound-and-synonyms/attachment/phonology-2/" rel="attachment wp-att-1934"><img class="alignleft size-thumbnail wp-image-1934" title="Phonology" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/08/Phonology1-150x150.png" alt="" width="150" height="150" /></a>What to do when basic string comparison (fuzzy search) techniques won&#8217;t give the right results? Fuzzy search helps to find matches in situations where people make typo&#8217;s (e.g. compare Human Inference with Human In<strong>v</strong>erence) or make up abbreviations (King str. with King street) or ignore diacritics (Sørensen and Soerensen). In case the &#8216;wrong word&#8217; is not a real used word it becomes obvious that after correcting the typo we have a match.</p>
<p>More challenges appear if the typo has caused another existing word; now we need to make a decision on how equal the two entries are. In case you have some knowledge on the frequency of usage of words you can use that in the equation. How to get the frequency of usage for words is another ballgame &#8211; at least you can assume that a &#8216;wrong word&#8217; is never used (bit of a paradox).<span id="more-1882"></span></p>
<p>A large group of possible matches that are not found (i.e. missed matches) by fuzzy search methods are the ones that sound the same but are written rather differently. Often a callcenter agent types the name exactly like he hears it. An example would be the family name ‘Farren’ and ‘Pharan’. They have already so many differences that it becomes rather hard for a string comparison to treat both entries as equal. Phonetic search would definitely help here. Drawback on only phonetics is that you can now combine entries that are for sure no matches (i.e. mismatches), e.g.:</p>
<ol>
<li>René Meierhofer and</li>
<li>Renée Mayrhofer</li>
</ol>
<p>Two valid family names, but the given names show both a male and a female entry.</p>
<p>In a real life example, we would expect a complete name with titles and we&#8217;d still need to match in a correct way. Take, for example,</p>
<ol>
<li>Dr. John J. Farren jr.</li>
<li>John J. Pharan jr. PhD</li>
</ol>
<p>Pure string comparisons based searches won’t work in this case. The complete entry could be matched in combination with some smart academic synonyms and some n-gram or matrix comparison on the individual elements.</p>
<p>Introducing synonyms immediately generates new types of challenges. In address matching you will go a long way when you take into account the abbreviations for street types (Avenue for Av., Street for Str. etc). For company names it definitely helps to have a synonym table on legal forms (Limited for Ltd, Incorporated for Inc., etc). With the actual company name itself it becomes more challenging. A German example might look like:</p>
<ol>
<li>Fahrrad-Handel Anna Cintula and</li>
<li>Zweirad-Shop Anna Cintula,</li>
</ol>
<p>Two synonyms for bike shop. Quite often people think in such situations that by adding a synonym table the challenge is gone. They are absolutely right for part of the problem but still there is a large set of words that get their specific meaning based on the context of that word &#8211; and by that they refer to a particular synonym. If we take for example the following three entries, it seems evident that we cannot replace the word &#8216;art&#8217; with one single synonym here</p>
<ol>
<li><strong>Art</strong> Gallery Garfunkel</li>
<li><strong>ART</strong> Auto Rendition Technology</li>
<li>Paul Simon &amp; <strong>Art</strong> Garfunkel</li>
</ol>
<p>String comparison is fine as a start in matching problems. To really avoid a serious amount of mismatches or missed matches – preventing a serious amount of manual work &#8211; you need to know what you’re dealing with. You need to compare <a title="High precision matching – apples, oranges or fruit salad?" href="http://datavaluetalk.com/2010/10/21/high-precision-matching-apples-oranges-or-fruit-salad/" target="_blank">apples with apples, oranges with oranges</a>. What would really help here, is a bit of natural language processing ;-)</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/what-is-equal-challenges-with-sound-and-synonyms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Has your name ever hurt you? &#8211; when nomen becomes omen</title>
		<link>http://datavaluetalk.com/data-quality/when-nomen-becomes-omen/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=when-nomen-becomes-omen</link>
		<comments>http://datavaluetalk.com/data-quality/when-nomen-becomes-omen/#comments</comments>
		<pubDate>Mon, 08 Aug 2011 12:46:30 +0000</pubDate>
		<dc:creator>Esther Labrie</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Names]]></category>
		<category><![CDATA[customer data]]></category>
		<category><![CDATA[customer view]]></category>
		<category><![CDATA[first name]]></category>
		<category><![CDATA[identity]]></category>
		<category><![CDATA[knowledge]]></category>
		<category><![CDATA[names]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1887</guid>
		<description><![CDATA[Addressing clients with the right data often means the difference between making a profit and not making a profit. Working with data quality experts has made me ever more consious of the value personal data represents for people. In this respect names are especially intriguing to me, as owners appear to identify with their name [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://datavaluetalk.com/data-quality/when-nomen-becomes-omen/attachment/baby-baby-names-3/" rel="attachment wp-att-1899"><img class="alignleft size-thumbnail wp-image-1899" title="bad baby names" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/08/baby-baby-names2-150x150.jpg" alt="" width="150" height="150" /></a>Addressing clients with the right data often means the difference between making a profit and not making a profit. Working with data quality experts has made me ever more consious of the value personal data represents for people. In this respect names are especially intriguing to me, as owners appear to identify with their name <em>a lot</em>. So I decided to do a little research and determine if people really are what their name tells you. Can <em>nomen</em> indeed become <em>omen</em>?</p>
<p>Your parents probably gave a lot of thought to the name they once gave you, and as it turns out they were right to do so! Research tells us a name can do wonders for its owner, as well as a lot of damage for that matter. Let’s have a look at some remarkable results.</p>
<p><strong>Peter for President!<br />
</strong>Recent studies show that in the US a student called Fred is more likely to fail his exam than a student who just happened to be named Andrew: people tend to indentify with their name and, in general, have a positive feeling about letters that correspond with their initials. Consequently Fred is far more likely to settle for a meager F, while Andrew will have an extra motive to strive for an A. <span id="more-1887"></span>It also explains how in choosing a partner we show a slight preference for someone whose name resembles our own, or why Mary will prefer to live in Maryland, while Monica is more inclined to settle in Santa Monica. Most of these preferences only show themselves through our subliminal selves, so we are not actually aware of the motivation for some of our choises. Another US study endorses these findings: inspired by the results mentioned above, researchers decided they’d investigate on another letter. They came up with the letter K, which in baseball stands for strikeout. The study showed once again that there is a connection between a letter and its causer: batters whose names began with a K struck out more often than other batters.</p>
<p><strong>Ominous names<br />
</strong>A UK research tells us that as much as one in 5 parents regret how they named their child. The novelty might have worn off after a few years, but can there be any real objections to a certain name? Apparently, there are plenty! Ironically it’s not the parents who’ll have to carry this burden for the rest of their lives…</p>
<p><strong>“Hi, I’m Antwan, but you can call me Antoine…”<br />
</strong>It seems that even children’s language skills are influenced by their name. This has to do with the effect negative emotions can have on a child’s performance. If for example you decided to name your son ‘Gene’ but spell it ‘Jene’, he is very likely to get confronted with disbelief from his teachers. “Are you sure your name isn’t spelled with a ‘G’?” This can severely undermine Jene’s sense of confidence. That explains why children with an unusual name or a name that is unusually spelled generally are less adequate spellers and readers.</p>
<p><strong>“But Sissi is a Royal name, dear!”<br />
</strong>When a girl is called Frankie we think it’s a fun name, a cool and robust statement to fit a strong personality. Yet when a boy is called Mckenzie, (yes, some parents think it’s cute to give their boy a name that has a feminine touch to it ) we see a similar effect, but with a different outcome. This is something his parents obviously had not foreseen: their son will constantly be shaking off his girly image. The effect is striking: boys with a androgynous name misbehave more often than their unambiguously named peers, especially when they reach puberty. A boy called Mckenzie or Aubrey is even more likely to display bad behaviour when there is a girl with the same name among his peers. One more reason for parents to stick to conventions when choosing a name for their newborn.</p>
<p><strong>Want to produce the new Einstein? Call her Kate!<a href="http://datavaluetalk.com/data-quality/when-nomen-becomes-omen/attachment/einstein/" rel="attachment wp-att-1911"><img class="alignright size-thumbnail wp-image-1911" title="The new Einstein? Kate!" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/08/einstein-150x150.jpg" alt="The new Einstein? Kate!" width="150" height="150" /></a><br />
</strong>A name can be a burden, but if you use this knowledge wisely, you might just turn it into an advantage. What happens to a girl when she has finished school and needs to choose what subject to study? Well, according to a US study, her choice depends on her name. As it turns out girls with a very feminine name like Julietta or Isabella are more likely to study humanities, while those whose name is less obviously feminine are more partial towards science. The question is: who’s aspiring to whom? Could it be that parents would treat Kate in a different way than Barbara? Or did the parents subconciously decide they wanted to raise a scientist when they decided to call their daughter Kate?</p>
<p><strong>Would you rather hire Vanity or Grace?<br />
</strong>Of course it’s not just letters or gender that determines how we feel about a name. In fact, how other people perceive us very much depends on the meaning of our name. For example: when looking for a new member on your marketing team, would you rather hire Vanity or Grace? In spite of what her name tells us, Grace might be a job jumper who doesn’t know how to work in unison with her colleagues. Vanity on the other hand could just be a daughter of a well-read mother who had just finished her latest Thackeray when she gave birth. Still, both women will either meet a lot of prejudice or feel the need to live up to a very high standard because of their name.</p>
<p>It all goes to show that a name defenitely posesses some self-fulfilling qualities. Given the fact that so many parents regret their choice of names afterwards makes me think that the owners of that name might share these sentiments. So what does that mean when looking at it from a data quality point of view? Unisex names for example are responsible for a lot of data quality issues. As the borders between male and female names are fading we’ll need to update our knowledge continually. The human in Human Inference will definitely take care of that. After all, we wouldn’t want to you to put off Mrs Clinton when sending her a petition to take pity on the Syrian citizens starting: &#8220;<em>Dear Mr. Clinton</em>…”.</p>
<p>Source: Livescience.com &amp; Babynames.com</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/when-nomen-becomes-omen/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Marketing? &#8211; Let your ingredients interact!</title>
		<link>http://datavaluetalk.com/data-quality/marketing-let-your-ingredients-interact/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=marketing-let-your-ingredients-interact</link>
		<comments>http://datavaluetalk.com/data-quality/marketing-let-your-ingredients-interact/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 07:53:36 +0000</pubDate>
		<dc:creator>Holger Wandt</dc:creator>
				<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[conversion rate]]></category>
		<category><![CDATA[customer data]]></category>
		<category><![CDATA[data improvement]]></category>
		<category><![CDATA[data improver]]></category>
		<category><![CDATA[lead generation]]></category>
		<category><![CDATA[marketing campaigns]]></category>

		<guid isPermaLink="false">http://datavaluetalk.com/?p=1851</guid>
		<description><![CDATA[&#160;  Throughout the years Human Inference has carried out and supported research with regard to the importance, the impact and the perception of customer data quality in business environments. This reasearch shows that the phenomenon customer data quality is subject to a perception shift. In general, one could argue that, in the early years, data [...]]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p> <a href="http://datavaluetalk.com/cms/wp-content/uploads/2011/07/ingredients2.jpg"><img class="alignleft size-thumbnail wp-image-2017" title="ingredients" src="http://datavaluetalk.com/cms/wp-content/uploads/2011/07/ingredients2-150x150.jpg" alt="" width="150" height="150" /></a>Throughout the years Human Inference has carried out and supported research with regard to the importance, the impact and the perception of customer data quality in business environments. This reasearch shows that the phenomenon customer data quality is subject to a perception shift. In general, one could argue that, in the early years, data quality used to be perceived as &#8220;something that is being carried out by the IT-department&#8221;, whereas nowadays more and more companies and organizations are recognizing the importance of customer data and information quality. Issues and initiatives such as the value of a single customer view, data integration, fraud prevention, customer relationship management, operational risk management, compliance and anti-terrorism have become boardroom themes.<span id="more-1851"></span></p>
<p>As a result, high quality customer data has become the prerequisite for successful business decisions. In order to reach the intended data quality level, a lot of money is being invested in solutions for input control, file merging, data enrichment and duplicate identification. But do these investments guarantee more effective marketing campaigns, accurate lead generation, a higher conversion rate and, eventually, better business results?</p>
<p>In my opinion, sound data quality management can be compared with good cooking: It is all about being in control. In cooking, putting the right ingredients together does not necessarily account for an excellent meal. You need to know how and when (timing!) the ingredients “interact” in order to achieve the taste and presentation you are aiming for.</p>
<p>In marketing campaigns, being in control is essential. The &#8220;ingredients&#8221; of your marketing campaign have to interact. Selection, content, targeting timing, conversion, etc. need to be effectively combined and balanced. Marketing campaigns are more or less the tangible outcome of your investments in customer data improvement. Reaching the people you actually want to reach, not spending unneccesary amounts of money on manual rework and pleasing your customer or prospect with an accurate offer. To see Human Inference&#8217;s take on this subject, <a href="http://www.humaninference.com/solutions/hiquality-data-improver" target="_blank">check out the movie </a>on the HIquality data Improver on our website.</p>
]]></content:encoded>
			<wfw:commentRss>http://datavaluetalk.com/data-quality/marketing-let-your-ingredients-interact/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

