When I read Henrik Liliendahl Sørensen’s blog on cross border data quality, I made a mental note to write a follow-up blog, because his theme closely borders on a presentaion I am preparing for the 2012 ECCMA Data Quality Summit in Ocober. As it happened, the organization committee of the summit asked me to write an article on my upcoming presentation, and so I thought I’d combine my efforts and use the article as input for this blog.
As Henrik pointed out, there are a lot of data and information quality aspects to consider when crossing the border, and they cannot be solved by using domestic tools such as a national change of address service. Organizations are investing substantial amounts of money to deal with issues and initiatives such as the value of a single customer view, data integration, fraud prevention, operational risk management and compliance. But how do these investments equip companies for the inevitable internationalization of our business community?
Apparently, a lot of companies doing business abroad often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. If we take a look at European contact data diversity, here are a couple of examples:
The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács are illustrative for the variety of names in Europe. These names all mean “Smith” in different countries. Naturally, there is a large variety of names in the US as well, but the rules and habits concerning structure, storage, exchange and representation are far more intricate in the various European countries. Think of the use of patronymics (Sergei Ivanovich Golubev and Olga Ivanovna Golubeva), prefix sorting and different significations for similar name components. An even greater challenge lies in the interpretation and processing of European postal addresses. The variety in address components and the differences in order and formatting of these components are extraordinary.
Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple languages, character sets, privacy issues, and different currency and date notation. Companies working with international data are highly dependent on understanding name specifics, address conventions, languages, code pages, culture, habits, business rules and legislation.
In my presentation during the 2012 ECCMA Data Quality Summit, I will address natural language processing methods and international name and address specifics. Furthermore, I will show some examples of the application of the insights with regard to international data quality. For more information, you can also check out our website.