Short question, complex answer: Who is who and what is what in your database?

 

Any organization that deals with customer, prospect, supplier, distributor, product and service information, uses all kinds of data in their day-to-day business processes. Identification of a customer or a product within an automated system, using a specific id-number, the name or any other identifying feature, is a key issue in these processes. Furthermore, it is a task that needs considerable attention, since the collection and management of data is essentially error-prone. People make mistakes, names are understood incorrectly, numbers are typed in the wrong order; there are just too many reasons for defective data and poor information quality.

The collective term ‘business data’ is often used without a precise notion of what business data actually contain. It is not just the customer identification numbers and product codes. Naturally, the sort and the importance of data used in a business process will differ from organization to organization. However, a closer look at the seemingly endless variation will show that names and addresses of persons and organizations are as detailed and complicated as they are identifying. The following classification will show the details of names, addresses and complementary data.

* In personal names we will encounter: given (first) names, middle names, initials, surnames, surname prefixes, surname suffixes, forms of address, titles, functions, qualifications, professions, patronymics and nicknames.

* The name of an organization can consist of virtually everything: legal forms, fantasy words, natural language words, personal names, numbers, Roman numerals, ordinals, letters, acronyms, geographical indications, suffixes, articles, prepositions, conjunctions, indication of year of establishment and non-alphabetical signs.

* Postal Address data combine recipient information with delivery points: countries, regions, towns, districts, proximate towns, delivery service indicators, delivery service qualifiers, postcodes, addressee and mailee indicators, thoroughfare names, thoroughfare types,  house or plot  numbers, house number additions, building names, building types and delivery point access data, such as wing, floor or door.

* Complementary data used in business processes include: phone numbers, fax numbers, e-mail addresses, dates of birth, contract dates, social media account id’s, product and brand names, product codes, product numbers, gender indication, financial data, lifestyle data and transaction data.

Defining the data groups as precisely and as detailed as possible, is the first step towards useful interpretation. People, applying their natural language processing capabilities, structure the information as they interpret it. They will use their frame of reference, which includes their knowledge dictionary, their linguistic repository, statistical information and mathematical information.

Knowledge-based interpretation, incorporated in an automated system to solve data quality issues, must work in exactly the same way. Consider the following examples: Continue reading ‘Short question, complex answer: Who is who and what is what in your database?’

International data quality

When I read Henrik Liliendahl Sørensen’s blog on cross border data quality, I made a mental note to write a follow-up blog, because his theme closely borders on a presentaion I am preparing for the 2012 ECCMA Data Quality Summit in Ocober. As it happened, the organization  committee of the summit asked me to write an article on my upcoming presentation, and so I thought I’d combine my efforts and use the article as input for this blog.

As Henrik pointed out, there are a lot of data and information quality aspects to consider when crossing the border, and they cannot be solved by using domestic tools such as a national change of address service. Organizations are investing substantial amounts of money to deal with issues and initiatives such as the value of a single customer view, data integration, fraud prevention, operational risk management and compliance. But how do these investments equip companies for the inevitable internationalization of our business community?

Apparently, a lot of companies doing business abroad often seem to forget that they are dealing with a large variety of languages, names, address conventions and other culturally embedded business rules and habits. If we take a look at European contact data diversity, here are a couple of examples:

The names Haddad, Hernández, Le Fèvre, Smid, Ferreiro, Schmidt, Kuznetsov en Kovács are illustrative for the variety of names in Europe. These names all mean “Smith” in different countries. Naturally, there is a large variety of names in the US as well, but the rules and habits concerning structure, storage, exchange and representation are far more intricate in the various European countries. Think of the use of patronymics (Sergei Ivanovich Golubev and Olga Ivanovna Golubeva), prefix sorting and different significations for similar name components. An even greater challenge lies in the interpretation and processing of European postal addresses. The variety in address components and the differences in order and formatting of these components are extraordinary.

Naturally, there are many more data and information quality aspects to consider when crossing the border. Think of multiple languages, character sets, privacy issues, and different currency and date notation. Companies working with international data are highly dependent on understanding name specifics, address conventions, languages, code pages, culture, habits, business rules and legislation.

In my presentation during the 2012 ECCMA Data Quality Summit, I will address natural language processing methods and international name and address specifics. Furthermore, I will show some examples of the application of the insights with regard to international data quality. For more information, you can also check out our website.

 

 

Ask Me is linked with Any Body and relates with Walther Von Stolzing

Weird subject, isn’t it? Quite obvious for everybody, the persons ‘Ask Me’ and ‘Any Body’ are artificial names. They will never belong to a real person. How they relate to ‘Walter von Stolzing’ will follow.

For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize that ‘Ask Me’ and ‘Any Body’ are fake names. People are using these either in test situations or to hide their actual names.

In the old days we only needed to test on ‘Test Test’, in more recent years we see great inventiveness on these fake names. A brief example can be seen in the following list.

Alpha Beta Any Body
Ask Me Best Friend
Blue Sky Cool Dude
Dress Code El Comandante
Guess Who In Cognito

In case you cannot rely on reference data and interpretation you need to provide a check list. Providing it is one thing, but since users tend to be really creative, maintaining it is essential. Continue reading ‘Ask Me is linked with Any Body and relates with Walther Von Stolzing’

Has your name ever hurt you? – when nomen becomes omen

Addressing clients with the right data often means the difference between making a profit and not making a profit. Working with data quality experts has made me ever more consious of the value personal data represents for people. In this respect names are especially intriguing to me, as owners appear to identify with their name a lot. So I decided to do a little research and determine if people really are what their name tells you. Can nomen indeed become omen?

Your parents probably gave a lot of thought to the name they once gave you, and as it turns out they were right to do so! Research tells us a name can do wonders for its owner, as well as a lot of damage for that matter. Let’s have a look at some remarkable results.

Peter for President!
Recent studies show that in the US a student called Fred is more likely to fail his exam than a student who just happened to be named Andrew: people tend to indentify with their name and, in general, have a positive feeling about letters that correspond with their initials. Consequently Fred is far more likely to settle for a meager F, while Andrew will have an extra motive to strive for an A. Continue reading ‘Has your name ever hurt you? – when nomen becomes omen’

Changing trend U.S. immigrants: sticking to their name is custom

steinway
“New Life in U.S. No Longer Means New Name”
That’s the title of an article published in The New York Times this week. In short it shows evidence of a declining need to fit in with Western standards.
“For the most part, nobody changes to American names any more at all,” said Cheryl R. David, former chairwoman of the New York chapter of the American Immigration.
(Source: The New York Times)
Mr. Steinway (the famous German-born pianomaker who abandoned the name Steinweg in pursuit of economic success) is a perfect example of the 19th and 20th century convention of immigrants adopting Anglicized names.
What used to be needed to blend in and speed assimilation is no longer required. Economic powers are changing, as shown in this article in The Financial Times: “Indian economy shows 8.8% growth.” The world’s population is moving around more than ever, settling temporarily or permanently in other regions and countries.
So what does this mean for people in the data quality playing field? Continue reading ‘Changing trend U.S. immigrants: sticking to their name is custom’

Matching persons with different official names

Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while officially his name is Hermanus. Most data matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.

It becomes more challenging if there is no longer a link between nickname and official name. That may happen, for example, if people move from one cultural region to another where also other writing sets are used. Take for example my chinese friend 高为民, whose Latin name would be Gao Weimin (family name first), but the moment he works in Europe or the US he is using the Latin variant William Gao. There is no common relation to the name William and Weimin both in Latin or Chinese and it they are no phonetic variants of each other. Continue reading ‘Matching persons with different official names’