Late 2009 in their report on Who’s Who in Open-Source Data Quality, Andreas Bitterer and Ted Friedman from Gartner, pointed already to DataCleaner as a promising tool. A tool that, in their opinion, could certainly improve by offering more high end Cleansing functions and improve the rather basic User Experience.
Since then, a lot has happened in the DataCleaner space and in the profiling market. Before the launch of version 2 we notified everybody on the acquisition of eobjects.org or DataCleaner by Human Inference. It might be that some of you were curious on what would happen with the functionality, and as stated at that time we would continue with the community and further participate and expand in it. Under the flag of Human Inference we launched the renewed DataCleaner 2.0, where we definitely increased the customer experience with an enhanced user interface together with possibilities to provide filters or filter flows. The filter flows show their benefit if you analyze your data source and want to create new (temporary) data sources based on matching criteria. You can do that either manually, or in a completely automated way to monitor your data.
With Open Source in general, and with DataCleaner in particular we want the community to participate in the functionality of the product. Since long DataCleaner contains the RegexSwap: the community where you can share regular expressions. Why would everybody reinvent the same wheel to build a regular expression on creditcard checks, emails, etc?
Next to regular expressions that can be used to profile data, there is the need on data cleansing functions that contain much more business logic that can hardly be covered in a regular expression. For example, to validate of the syntax of an email is correct is something else than validating if there is also a running mail server attached to the domain. Cleansing functions are already part of DataCleaner but there is always a need for other or more advanced functional extensions. To prevent that you need to create them in the ‘DataCleaner’ way we have created an easy extension sharing mechanism. Continue reading ‘DataCleaner adds expert cleansing functions- added value in Open Source’