Short question, complex answer: Who is who and what is what in your database?

 

Any organization that deals with customer, prospect, supplier, distributor, product and service information, uses all kinds of data in their day-to-day business processes. Identification of a customer or a product within an automated system, using a specific id-number, the name or any other identifying feature, is a key issue in these processes. Furthermore, it is a task that needs considerable attention, since the collection and management of data is essentially error-prone. People make mistakes, names are understood incorrectly, numbers are typed in the wrong order; there are just too many reasons for defective data and poor information quality.

The collective term ‘business data’ is often used without a precise notion of what business data actually contain. It is not just the customer identification numbers and product codes. Naturally, the sort and the importance of data used in a business process will differ from organization to organization. However, a closer look at the seemingly endless variation will show that names and addresses of persons and organizations are as detailed and complicated as they are identifying. The following classification will show the details of names, addresses and complementary data.

* In personal names we will encounter: given (first) names, middle names, initials, surnames, surname prefixes, surname suffixes, forms of address, titles, functions, qualifications, professions, patronymics and nicknames.

* The name of an organization can consist of virtually everything: legal forms, fantasy words, natural language words, personal names, numbers, Roman numerals, ordinals, letters, acronyms, geographical indications, suffixes, articles, prepositions, conjunctions, indication of year of establishment and non-alphabetical signs.

* Postal Address data combine recipient information with delivery points: countries, regions, towns, districts, proximate towns, delivery service indicators, delivery service qualifiers, postcodes, addressee and mailee indicators, thoroughfare names, thoroughfare types,  house or plot  numbers, house number additions, building names, building types and delivery point access data, such as wing, floor or door.

* Complementary data used in business processes include: phone numbers, fax numbers, e-mail addresses, dates of birth, contract dates, social media account id’s, product and brand names, product codes, product numbers, gender indication, financial data, lifestyle data and transaction data.

Defining the data groups as precisely and as detailed as possible, is the first step towards useful interpretation. People, applying their natural language processing capabilities, structure the information as they interpret it. They will use their frame of reference, which includes their knowledge dictionary, their linguistic repository, statistical information and mathematical information.

Knowledge-based interpretation, incorporated in an automated system to solve data quality issues, must work in exactly the same way. Consider the following examples: Continue reading ‘Short question, complex answer: Who is who and what is what in your database?’

Ask Me is linked with Any Body and relates with Walther Von Stolzing

Weird subject, isn’t it? Quite obvious for everybody, the persons ‘Ask Me’ and ‘Any Body’ are artificial names. They will never belong to a real person. How they relate to ‘Walter von Stolzing’ will follow.

For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize that ‘Ask Me’ and ‘Any Body’ are fake names. People are using these either in test situations or to hide their actual names.

In the old days we only needed to test on ‘Test Test’, in more recent years we see great inventiveness on these fake names. A brief example can be seen in the following list.

Alpha Beta Any Body
Ask Me Best Friend
Blue Sky Cool Dude
Dress Code El Comandante
Guess Who In Cognito

In case you cannot rely on reference data and interpretation you need to provide a check list. Providing it is one thing, but since users tend to be really creative, maintaining it is essential. Continue reading ‘Ask Me is linked with Any Body and relates with Walther Von Stolzing’

More than a name…..

Everyone in this world has a name. When we hear a name however, it is really hard to precisely know what it consists of and how the consisting parts should be written. A name might, for example, contain salutation, one or more titles, given names, initials, one or more family names, and additions. Here’s an example of a name with different name parts:

Mr Peter M Smith PhD

  • Mr – Salutation, also called honorific, a polite way of addressing a person
  • Peter – Given name. The name given to a person at birth. You have male and female names, and sometimes a name can be carried by both. But, in general, it is possible to derive the gender of the person from his/her name(s).
  • M – Initial. An abbreviated form of a given name
  • Smith – Family name
  • PhD – Addition, in this case an academic title

This may appear easy, but due to all different naming conventions in the world, it is definitely not! At Human Inference, we have automated this process by creating a Firefox plugin that can help you interpret the various name parts and assign a gender to the name. It also finds the names which most closely resemble the ones you typed as input.

You can type the full name of a person, and that’s all you need to do. The plugin will make the most probable interpretation, based on the vast knowledge of names it has. It places the parts in the appropriate fields and displays the predicted gender. On top of that it will give you close alternatives for the names or for the way the input can be interpreted. These are shown as a list of suggestions when you right click on the input field. If you think that any of the other suggested interpretations is what you were looking for, you can click on it and it is displayed instead.

Summarizing, the plugin

  • can correct the mistakes that you made writing someone’s name.
  • can be used to segment the name correctly.
  • can provide you with closely resembling suggestions.
  • can predict the gender of the name.
  • is free of charge! Continue reading ‘More than a name…..’

Know your customer to trust your data

blinddoek

The success of many business processes is linked directly to the quality of customer data. This is not only an obvious fact, but a recurring conclusion of many field studies: Incorrect, incomplete and inaccurate data will have a direct impact on your business succes rate. The symptomatology of this increase is established in inefficient marketing and sales processes, customer dissatisfaction, difficult cross- and upsell, unreliable analyses and many other disturbances in the day-to-day business of almost every organization dealing with customer, supplier and/or partner data.

In essence, it all comes down to knowing your data, in order to be able to trust your data. If you trust your data, you are definitely doing something right. So, how do you establish that trust? For this, you first have to answer a short, yet rather complex question: What is what in my database(s)? In other words: You have to identify and interpret the data you are working with .

A robust customer data identification solution intelligently interprets the details of both natural and legal persons. That process has to take account of the significance of words in a specific context, usage of company names, abbreviations, synonyms, acronyms, spelling mistakes, notation methods, standards and phonetic similarity of words. All in all, this is not a simple task; it more or less mimics the capabilities that humans show when interpreting data … Continue reading ‘Know your customer to trust your data’

International data quality – Is a football always a football?

football 2

football 1High quality customer data have become the prerequisite for successful business decisions. In order to reach the intended data quality level, a lot of money is being invested in solutions for input control, file merging, data enrichment and duplicate identification. But do these investments guarantee high quality data and information? For example, are the data quality tools and processes equipped for the inevitable internationalization of our business community? Is a football always a football?

Natural language processing

Why do we know that William Jones International Logistics Ltd and W. Jones Int. Transport Co. are probably different notations for the same company? How do we determine that Leonard is a given name in Leonard Peters and a surname in Leonard & Peters? Without being all that aware of it, we are using methods such as pattern recognition, context analysis and other linguistic considerations. To answer the question ”what is what in customer data?” people will use their knowledge of language and culture to interpret the data they will encounter in daily life. Continue reading ‘International data quality – Is a football always a football?’

Your name is too “common”….

chinese-characters

A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The bank states that the refusal is nothing personal, since nobody with the name Li Jun will be accepted as customer in the near future….. In the meanttime, Li Jun is taking legal action against the bank. Continue reading ‘Your name is too “common”….’