What is equal? – challenges with sound and synonyms

What to do when basic string comparison (fuzzy search) techniques won’t give the right results? Fuzzy search helps to find matches in situations where people make typo’s (e.g. compare Human Inference with Human Inverence) or make up abbreviations (King str. with King street) or ignore diacritics (Sørensen and Soerensen). In case the ‘wrong word’ is not a real used word it becomes obvious that after correcting the typo we have a match.

More challenges appear if the typo has caused another existing word; now we need to make a decision on how equal the two entries are. In case you have some knowledge on the frequency of usage of words you can use that in the equation. How to get the frequency of usage for words is another ballgame – at least you can assume that a ‘wrong word’ is never used (bit of a paradox). Continue reading ‘What is equal? – challenges with sound and synonyms’