
A recent article in a Dutch newspaper describes the success the Dutch police force is realizing with data mining products. Policemen are using data mining software to predict time and place of potential criminal activities, such as burglary and robbery, and direct extra police attention to these hotspots at those hours.
As with any data mining project, the quality of the analyses depends heavily on the quality of the data entered in the data warehouse.
Every statement entered in the system, every location, description of people, every relevant object needs to be comparable.
Address standardization products can help when entering locations precise and first time right in the system. Other data quality solutions are available for entering names and other data of people – suspects, victims, and witnesses.
But what about the other aspects of a statement? Was the crime the theft of a car, a vehicle, a van, a pick-up, etc? Did the villain pick a purse or a wallet? A bicycle or a bike? The list of synonyms for objects of crime is endless.
I think the criminal community should come to an agreement and decide on standards to make analyses of these data mining projects even more successful. Now that Christmas is nearing,we all want a better world, isn’t it?
In his excellent post 
Dealing with matching of persons or contact data in general, we are all aware that individuals can make use of abbreviations or nicknames as kind of synonyms for their name. Classic examples are the usage of the name Bill for the actual name William, or like my own father is using the name Mans while officially his name is Hermanus. Most matching engines make use of a kind of synonym table to take care of this. That can be done because within a culture or region the nicknames are quite often linked to the same names and people do not tend to use completely different official registered names.
