We live with restrictions every day.
- A rafter blocks my cellar stairs, so I always bend when I enter.
- At the end of the street a barking dog runs to the fence whenever I pass, so Ialways cross the street just before I reach the dog.
We learn to live with restrictions and they become a habit. After a while you just stop realizing why you do things the way you do. The rafter has been removed, and the dog has died. Then why do I still bend on the cellar stairs, and why do I still cross the street before I reach the end? Recently I was confronted with similar obsolete restrictions at Human Inference customer support.
A rafter is visible and a barking dog can be heard, so it doesn’t take long before my habits change to fit to the new situation. It’s different however for technical restrictions of which you never get to know that they have disappeared. When I build descriptions for more than a million source records in a SQL Server database, I automatically switch from the free SQL Server Express database to an Enterprise edition. A customer decided to build descriptions for 7 million source records in a SQL Server Express edition. I was rather surprised the build was successful at the end. It turned out that in SQL Server Express 2008 the maximum database size is 10 GB as compared to SSE 2005 having 4GB. As It turned out I had been crossing the street for years to hide for a dead dog.
Searching with HIquality Identify
HIquality Identify is used for search, deduplication and data matching. To retrieve fast search results, subsets are used to preselect records for evaluation. Before the actual search, the maximum number of evaluations as set in the configuration is checked. When a subset exceeds this maximum, it is skipped. When all subsets are skipped, a message is returned indicating that not enough search data was entered. Searching for Müller with house number 9 in the whole of Germany for example will preselect too many candidates, and is, even when results are found, not very useful. Besides the number of candidates per subset, the total of evaluations in all subsets is checked with the maximum. This maximum is not allowed to be above the magical limit of 32768.