A major bank in Dongguan (China) refused a potential customer because his name is Li Jun. Apparently, there were already over 300 bank accounts assigned to the name Li Jun. Not that this particular Li Jun was responsible for opening all these accounts, there were just too many men with exactly the same name. The bank states that the refusal is nothing personal, since nobody with the name Li Jun will be accepted as customer in the near future….. In the meanttime, Li Jun is taking legal action against the bank.
When I read this news article this morning, my first thoughts were that it was perhaps a hoax. It turns out , however, that the news fact is true. From a data quality point of view this strikes me as really strange. How does this particular bank manage its customer data? Are there no additional identifiers (address, date of birth, etc.) to determine that you are actually dealing with the customer you think you are dealing with? Imagine that every John Smith would have a hard time to open a bank account, to apply for a job or to buy a product via the web. Or Jenny Jones? Bob Johnson? When is a name too “common”? It is common misbelief that the complexity of ideographic characacters such as Mandarin Chinese makes it harder to identify. At Human Inference we carried out some pretty serious dedups of Chinese files and-taking into account that Mandarin Chinese is a tonal language and other priciples of fault-tolearnce apply- the duplicate identification was rather accurate.
It is all a matter of using an intelligent data matching method and knowing what kind of data one is working on. Every name can be identified; even “common” names.