Have you encountered LUHNs algorithm? I can almost guarantee it even if you’ve never heard the name before. It is part of all of our lives every single day.
It is used to check that various numbers are correctly entered. From ID numbers for persons in Sweden, Greace, and Israel to credit card numbers and IMEI numbers and misc other things.
It is a very simple checksum function not intended to be cryptographically secure hash function. Due to the calculation used it is also referred to modulus 10.
The algorithm was invented by Hans Peter Luhn in 1954. It was meant to protect against most accidental errors, not against malicious attacks.
Each number in a sequence is alternating multiplied by 1 and 2 with the results added together. After that a modulus 10 is applied to the sum and the result is the check digit the function should return.
Let’s take an example, 374 is the input, we’re looking for the check digit that should be used. The calculation should be based on even number of digits in the input so we prepend with a zero, thus we use 0374 as our input. Now we’ll calculate 0*1, 3*2, 7*1, and 4*2 – multiply each digit in the input with 1 and 2 alternating.
0*1 = 0, 3*2 = 6, 7*1 = 7 and 4*2 = 8. Add them together and we have 0 + 6 + 7 + 8 = 21. Here is where we use modulo, 21 modulo 10 is 1. That however is not the check digit, the check digit is the reverse. That is, what do you need to add to this to make the result 10. In our case we arrived at one after the modulo operation so our check digit is 9.
There is a special case however. If a digit that is multiplied with two is between five and nine then the result will be two digits (5*2 = 10 and so on). That result is then added together resulting in all results being added together as single digits. This if we have 7 we get 7*2=14, 14 is now taken each digit and added so the result we get for the 7 is 5 (1+4). Instead of adding the digit like that, one can also subtract 9 from the result as it arrives at the same end result (14-9=5).
If you find my explanation above to rushed or hard to follow, take a look at wikipedia.
With that explanation out of the way, let’s look at three ways to implement this. From slowest to fastest. If you only need this occasionally then it does not matter, they are all fast. I however faced a situation where I had to generate valid numbers that has correct check digits för a few hundred million rows.
Let’s first look at a basic implementation in PL/SQL. I say basic because it is probably the most straightforward implementation of the pseudo code version of the algorithm. It is in my experience the most common way to implement it. When this feature is needed it tends to end up in a traditional function/procedure in PL/SQL.
This code is “borrowed” from Daniel Ekberg, just changing some variable names to make it easier to understand for the rest of us.
create or replace function get_value_w_chk (inv in varchar2) return varchar as in_val varchar2 (30); pos int; total int := 0; multiplier int := 0; result_pos int := 0; in_len int; chk_digit int; begin in_len := length (inv); in_val := inv; for i in 0 .. (in_len - 1) loop pos := in_len - i; multiplier := mod (multiplier + 1, 2); result_pos := to_number(substr(in_val, pos, 1), '0') * (multiplier + 1); if result_pos > 9 then total := total + result_pos - 9; else total := total + result_pos; end if; end loop; chk_digit := mod (1000 - total, 10); return in_val || chk_digit; end get_value_w_chk; /
The code essentially loops through a string of digits and makes the calculation for each digit, adds it to a total and finishes up with a modulo operation to end up with the check digit, it then returns the parameter concatenated with the check-digit.
PL/SQL recursive function
The cool kids are writing recursive functions all day long or so they would have you believe. This particular algorithm lent itself to it so I wanted to test doing that. Also, next time one of the cool kids claims it cannot be done in legacy languages like PL/SQL I can just point them to this blog post (hello future reader).
create or replace function get_chk(p_i_val in varchar2) return varchar2 as v_inval varchar2(50) := case when mod(length(p_i_val), 2) = 0 then p_i_val else '0' || p_i_val end; v_num number; v_result varchar2(51) := p_i_val; function recur_value( p_i_mult_with in number , p_i_val in varchar2) return number as v_num_char number(2); v_num_tot number(2); v_sum number(2); begin v_num_char := to_number(substr(p_i_val,1,1)) * p_i_mult_with; if v_num_char > 9 then v_num_tot := v_num_char - 9; else v_num_tot := v_num_char; end if; if length(p_i_val) > 1 then v_sum := v_num_tot + recur_value( case when p_i_mult_with = 1 then 2 else 1 end , substr(p_i_val, 2)); else v_sum := v_num_tot; end if; return v_sum; end recur_value; begin return v_result || (10 - mod( recur_value( p_i_mult_with => 1 , p_i_val => v_inval) , 10)); end get_chk; /
The code takes the in-parameter and calls the recursive function which works by peeling of one digit from the parameter and calling itself with the rest. This call continues until a chain of calls has been done and the innermost invocation only has a single digit as the in-parameter, then it returns it’s calculation and it starts summing each returned value with the calculation for the peel off digit eventually getting the total for all calculations returned to the outer function which performs the same kind of modulus operation as we saw in the first code block.
The previous versions are what you probably want to start with if your value is already in PL/SQL, but what if it starts in SQL or you generate data with SQL? Then calling a PL/SQL function is not optimal.
In my case I had a need to get a correct check digit for an invented value. When you are to do it with a few hundred million rows you do not want to call a function for every row.
This is what I came up with to calculate the check digit in SQL.
with dl as ( select 1 rad from dual union all select 2 rad from dual ) , a as ( select case when mod(length(:1), 2) = 0 then :1 else '0' || :1 end arg from dual ) --select * from a; , b as ( select level rn, to_number(substr(arg, level, 1)) dgt from a connect by level <= length(a.arg) ) --select * from b; , c as ( select b.rn , b.dgt , case when mod(b.rn, 2) = 0 then dgt * 2 else dgt end calc from b ) --select * from c; select 10 - mod(sum(case when dl.rad = 1 and c.calc < 10 then c.calc when dl.rad = 1 and c.calc >= 10 then trunc(c.calc / 10) when dl.rad = 2 and c.calc < 10 then null when dl.rad = 2 and c.calc >= 10 then c.calc - 10 end), 10) chk from c, dl where dl.rad = 1 or(dl.rad = 2 and c.calc > 9);
What is going on in this SQL? It looks long but it really is broken up into small chunks. We have four WITH-selects before we get down to the real one.
The first prefactoring SQL – dl – is just getting us a “table” with two rows. It is used later to handle the situation of the multiplication of a digit being > 9.
Next is “a”, all we do here is to fix the input in cases where it is of an uneven length. In such cases we prepend the input with the digit zero. An alternative would have been to change so we start multiplying the digits with 2 instead of 1, it achieves the same result.
After that comes “b” where we split the string into one row per digit in the input. The digits are also converted to numbers as we’re going to use them in multiplication in the next step.
In the next step “c” we calculate the value for each digit, The first is multiplied by 1, the second by two, the third by one again and so on.
After that the “real” SQL is executed. It takes the rows with sums by digits and creates two of them using “dl”. For rows where dl.rad is 1 we take the first digit (if calc is 15, we take 1), for situations where c.calc is over 9 we use the dl.rad rows where it is 2 to get the second digit (if calc is 15, we take 5). All those numbers are the added together and the usual modulus operation is performed returning the check digit.
To simplify following and troubleshooting I have the “select * from” rows left in the SQL. Uncomment one and run the SQL to see the result up to that point. It is by far the best way to understand what happens in the SQL, even better if you follow my explanation here and uncomment one to see the result I try to describe.
If this helps you please leave a comment or even more so if you have one variant of this that is simpler or faster.