Data Cleansing with intelligent identification


In many cases an inductive method of data cleansing is the way to go. With the right tools and expertise you can inspect, transform and cleanse entities in a database and reach high levels of data quality without the need to use external reference data. In some cases, however, only working with the internal data and inductively identifying and fixing data patterns is not sufficient. Let’s take a practical example: a bank needs to report on a particular segment of its clients to German bank supervisor BaFin – the Federal Financial Supervisory Authority aka Bundesanstalt für Finanzdienstleistungsaufsicht. The bank apparently has done its homework and has created a central database containing all entities needed for the compliance check. Moreover, the bank has worked out a rather complex set of rules how data must be processed and corrected. One of the most important anchor points in this specific framework is the separation between B2C and B2B entities and for the latter the exact identification of the correct legal form. But what if you cannot trust this identification?

After having profiled the data I quickly found thousands of conflicts, e.g. records with the legal form GmbH (limited company) were at the same time tagged as B2C customer. So, which entry is correct? And: if it is a B2B entity, can we be sure that the legal form GmbH is correct? Especially as the bank has set different rules for processing GbR, GmbH, KG, GmbH & Co. KG, AG, KGaA, just to name a few legal forms…

This is when cooperation with a specialised data provider is needed. And it is actually an advantage if this is not the first time you work with external reference data of this partner. Knowing the providers and the specifics of the various data models in use is a prerequisite for successfully enriching data. Not to forget having an excellent, fault-tolerant matching engine that helps you to link corrupted or wrong name entities to the reference database! In the case of the bank, using the right data provider and matching its content to the customers in the compliance database resolved more than ca 90% of conflicts related to the wrong legal form. And this of course is the cornerstone for being compliant!

0 Responses to “Data Cleansing with intelligent identification”

Comments are currently closed.