Weird subject, isn’t it? Quite obvious for everybody, the persons ‘Ask Me’ and ‘Any Body’ are artificial names. They will never belong to a real person. How they relate to ‘Walter von Stolzing’ will follow.

For over 25 years Human Inference has collected reference data, for instance on persons. Because of our reference set we immediately recognize that ‘Ask Me’ and ‘Any Body’ are fake names. People are using these either in test situations or to hide their actual names.

In the old days we only needed to test on ‘Test Test’, in more recent years we see great inventiveness on these fake names. A brief example can be seen in the following list.

Alpha Beta Any Body
Ask Me Best Friend
Blue Sky Cool Dude
Dress Code El Comandante
Guess Who In Cognito

In case you cannot rely on reference data and interpretation you need to provide a check list. Providing it is one thing, but since users tend to be really creative, maintaining it is essential.

Has your name ever hurt you? – when nomen becomes omen

Addressing clients with the right data often means the difference between making a profit and not making a profit. Working with data quality experts has made me ever more consious of the value personal data represents for people. In this respect names are especially intriguing to me, as owners appear to identify with their name a lot. So I decided to do a little research and determine if people really are what their name tells you. Can nomen indeed become omen?

Your parents probably gave a lot of thought to the name they once gave you, and as it turns out they were right to do so! Research tells us a name can do wonders for its owner, as well as a lot of damage for that matter. Let’s have a look at some remarkable results.

Peter for President!
Recent studies show that in the US a student called Fred is more likely to fail his exam than a student who just happened to be named Andrew: people tend to indentify with their name and, in general, have a positive feeling about letters that correspond with their initials. Consequently Fred is far more likely to settle for a meager F, while Andrew will have an extra motive to strive for an A.

We have 180 million names! Which one is right?

The internet is an ocean of wealthy content, but unfortunately, as in the real world, it’s heavily polluted.

As a company in business for 25 years, Human Inference absolutely sees the benefits of the internet. For our reasoning processes, based on natural language processing, we gather content and we classify this content on type, such as given names, family names, prefix, suffix, etc. (See also my blog post on the comparison of apples and oranges ….)

In the past this was done manually by, for example, investigating telephone books or manual research of census lists. But these were the 'pioneer years'. What we see now is an enormous amount of content that can be gathered on the internet. It's quite easy to find an internet page with 180 million records of person names. Great, so knowledge gathering is passé now?

Your name is too “common”….


A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The bank states that the refusal is nothing personal, since nobody with the name Li Jun will be accepted as customer in the near future….. In the meanttime, Li Jun is taking legal action against the bank.

Any close encounters with the FBI terrorist watchlist?

tsc080105aJust before this summer the U.S. Department of Justice filed a report about the FBI Terrorist Watchlist. This watchtlist serves as a critical tool for screening and law enforcement personnel for alerting them when they come across a known or suspected terrorist. It is used by personnel at airports, harbours and the borderline. Also when you apply for a visum you are matched against this watchlist. The Terrorist Screening Center, a subsidiary of the FBI, is responsible for maintaining the watchlist.

This watchlist was created in 2004 from several other lists and at that time it consisted of about 68.000 entries. I use the word entries, because in the years after it became fuzzy if one record is the same as one individual. By the end of 2008 the list had grown to over 1,1 million entries. In 2008 after the American Civil Liberties Union (ACLU) mentioned that the list had passed the 1 million, the government came with an explanation. Although we have recorded over 1 million entries in the database, the net result is that these records correspond to about 400.000 individuals. Terrorist often use different and thus multiple identities, use several (falsified) passports etc. But adding entries with only the first initials and last name, while an entry of the full first names and last name already exists will result in unwanted side-effects.

Budget for Data Quality seems no problem

A survey of Human Inference in 2008 indicates that processes are the biggest experienced challenge in relation to Data Quality. However the subject that seems to be no problem is the budget. Human inference differentiates itself by interpretation of knowledge. So from this perspective I wonder how the respondents interpreted the word “processes”. Do they mean the processes within the value chain of their companies or do they actually mean the process of obtaining a budget for Data Quality? The latter would actually explain a lot.

HI Survey Results

