Fødselsnummer – Crossing centuries in Norway

Norwegian Fødselsnummer examples

The Norwegian Fødselsnummer (Birthnumber) is an 11-digit number with 2 control digits. The 10-th digit is a control digit calculated with a weighted modulo 11 variant over the first 9 digits. The 11-th digit is a control digit calculated with another weighted modulo 11 variant over the first 9 digits combined with the 10-th control digit.

As in other countries also this number is based on the date of birth. The first 6 digits represent the birth date as “ddmmyy”. Problem with a 6-digit date is that you cannot identify the century – is a Fødselsnummer starting with 121009 someone born in 1909 or 2009? The Norwegian government has solved this by grouping the following 3 individual digits (individual number) in groups representing a certain era. If you are born between 1854-1899, then your individual number must be between 500 and 749, born between 1900-1999 then your number lies between 000 and 499, and for those born recently between 2000-2039 then your number lies between 500 and 999. With some exceptions for those with an individual number between 900 and 999. Continue reading ‘Fødselsnummer – Crossing centuries in Norway’

New Matching Engines go beyond apples and oranges

Beyond apples and oranges

Professional matching engines are becoming more and more intelligent. Within Human Inference, we also see that our matching techniques are capable of using more and more intelligence, and needless to say that we incorporate and use this intelligence in our engines in order to adopt to the way that humans do their matching.

Traditional data quality or matching engines were based on atomic string comparison functions like match-codes, phonetic comparison, Levenshtein string distance, n-gram comparisons or similar functions. These kinds of functions are relatively easy to implement and to use although a significant amount of plumbing is needed to get reasonable results. Open source projects like the Lucene search engine, and variants, provide a solid and proven set of these functions. The drawback of these functions is that it’s not always clear for what purpose one needs to utilize a particular function. An even larger issue is the fact that these low-level DQ functions cannot distinguish between apples and oranges – you end up comparing family names with street names. We still see that, for example BI vendors, claim to provide data quality functionality, while they only provide these atomic comparisons. Continue reading ‘New Matching Engines go beyond apples and oranges’

Is 270368A172X a correct Finnish Henkilötunnus?

FinlandHetu270368A172X-150x150

The Finnish national personal identification number is the Henkilötunnus, aka Hetu or Ht, has the following format – ddmmyyc999C. For details how to calculate the control character, I refer to the overview blog on National Identification Numbers.

Validating the Hetu 270368A172X shows that it is indeed a correct number. The number 270368172 generates indeed 29 for the modulo 31 proof, represented by control character “X” in the checksum list. The number shows that this is the 86-th girl born on the 27th of March 2068.

The latter might is exactly the start for the discussion on validity. Althought the number itself is well formed, and passes all the automatic checks, dealing with this number in a data quality assessment will raise your digital eyebrow. In the data quality world we will nowadays say that this Hetu is a wrong Hetu, that it cannot be correct.

So always use a bit of human inference when dealing with finnish national personal identification numbers.