Late 2009 in their report on Who’s Who in Open-Source Data Quality, Andreas Bitterer and Ted Friedman from Gartner, pointed already to DataCleaner as a promising tool. A tool that, in their opinion, could certainly improve by offering more high end Cleansing functions and improve the rather basic User Experience.
Since then, a lot has happened in the DataCleaner space and in the profiling market. Before the launch of version 2 we notified everybody on the acquisition of eobjects.org or DataCleaner by Human Inference. It might be that some of you were curious on what would happen with the functionality, and as stated at that time we would continue with the community and further participate and expand in it. Under the flag of Human Inference we launched the renewed DataCleaner 2.0, where we definitely increased the customer experience with an enhanced user interface together with possibilities to provide filters or filter flows. The filter flows show their benefit if you analyze your data source and want to create new (temporary) data sources based on matching criteria. You can do that either manually, or in a completely automated way to monitor your data.
With Open Source in general, and with DataCleaner in particular we want the community to participate in the functionality of the product. Since long DataCleaner contains the RegexSwap: the community where you can share regular expressions. Why would everybody reinvent the same wheel to build a regular expression on creditcard checks, emails, etc?
Next to regular expressions that can be used to profile data, there is the need on data cleansing functions that contain much more business logic that can hardly be covered in a regular expression. For example, to validate of the syntax of an email is correct is something else than validating if there is also a running mail server attached to the domain. Cleansing functions are already part of DataCleaner but there is always a need for other or more advanced functional extensions. To prevent that you need to create them in the ‘DataCleaner’ way we have created an easy extension sharing mechanism.
With the introduction of DataCleaner 2.2 we now introduce the logical next step with the ExtensionSwap. Individuals and companies can provide their functionality as extensions to DataCleaner. We provide an extension store where you can easily select that functionality you were looking for and combine it in your profiling. Some of these extensions will be working on premise, some of them will be easy accessible via the cloud. We’ve already provided a webcast and an example project on how to create and distribute your own extensions – it’s a piece of cake.
As DataCleaner can run in the cloud we can also imagine that people or companies have specific cleansing functions ‘hanging around’ that they want to combine in an easy way in their profiling steps. The same is true for Human Inference where we have our famous Contact related cleansing functions and want to combine that in profiling. For now we provide the HIquality Contacts as a preview and free trial version for everyone. Use it and please provide feedback in case you are missing functionality.