Any close encounters with the FBI terrorist watchlist?

tsc080105aJust before this summer the U.S. Department of Justice filed a report about the FBI Terrorist Watchlist. This watchtlist serves as a critical tool for screening and law enforcement personnel for alerting them when they come across a known or suspected terrorist. It is used by personnel at airports, harbours and the borderline. Also when you apply for a visum you are matched against this watchlist. The Terrorist Screening Center, a subsidiary of the FBI, is responsible for maintaining the watchlist.

This watchlist was created in 2004 from several other lists and at that time it consisted of about 68.000 entries. I use the word entries, because in the years after it became fuzzy if one record is the same as one individual. By the end of 2008 the list had grown to over 1,1 million entries. In 2008 after the American Civil Liberties Union (ACLU) mentioned that the list had passed the 1 million, the government came with an explanation. Although we have recorded over 1 million entries in the database, the net result is that these records correspond to about 400.000 individuals. Terrorist often use different and thus multiple identities, use several (falsified) passports etc. But adding entries with only the first initials and last name, while an entry of the full first names and last name already exists will result in unwanted side-effects.

Data Cleansing with intelligent identification


In many cases an inductive method of data cleansing is the way to go. With the right tools and expertise you can inspect, transform and cleanse entities in a database and reach high levels of data quality without the need to use external reference data. In some cases, however, only working with the internal data and inductively identifying and fixing data patterns is not sufficient. Let's take a practical example: a bank needs to report on a particular segment of its clients to German bank supervisor BaFin – the Federal Financial Supervisory Authority aka Bundesanstalt für Finanzdienstleistungsaufsicht. The bank apparently has done its homework and has created a central database containing all entities needed for the compliance check. Moreover, the bank has worked out a rather complex set of rules how data must be processed and corrected. One of the most important anchor points in this specific framework is the separation between B2C and B2B entities and for the latter the exact identification of the correct legal form. But what if you cannot trust this identification?