Encoding decimal numbers
Are you aware abou any cipher, transforming e.g., a six digit decimal number to another one? I need to keep the result in the original range 0 .. 999999.
Or is there a way, how to encrypt e.g. a 3 byte number into another one... RC4 would NOT do, since I need to encrypt dozens of numbers using the same key.
[321 byte] By [
Maaartina] at [2007-11-26 17:06:56]

# 1
You can adapt a stream cipher for this task. For example, take the byte of output from RC4 and add it mod 10 to each digit -- so it would take 6 bytes of RC4 output to encrypt a 6 digit integer. There is a weakness with this method however. Since the byte of output is uniform in the range [0..255], the values [0..5] are slightly more likely when the result is taken mod 10.
You can reduce this bias by using more bits of output before taking it mod 10. For example, you can use AES in counter mode to generate a 128-bit output, then add this to the 6 digit number mod 1000000.
You can also use a rejection method based on a stream cipher. For example, you can reject the next byte from RC4 if it is > 99, otherwise use it as the output of the stream cipher and add it mod 100 to two of the six digits of your number.
Obviously these are non-standard uses of these algorithms: I cannot say anything about their security.
# 2
> You can also use a rejection method...
This would be perfect in respect to the probabilities.
> > 99
This is better than reducing the range to 0.249 and taking it modulo 10.
BUT: I need to enprypt a lot of such numbers using the same seed. And this would simply reduce to xoring with a constant.
And WHY: I have no stream of decimal number, but a database table containing customer numbers and personal data liek names, addreses etc. If I want to show it to somebody, I can simply replace the personal data by some dummies, but the customer numbers are secret, too.
I need to permute them somehow, so
1. the relations in the database stays consistent
2. the numbers still fits in their slots (both in the database and in the output form)
3. there is no visible relationship between the original and the permuted ones.
# 3
but the customer
> numbers are secret, too.
>
> I need to permute them somehow, so
> 1. the relations in the database stays consistent
> 2. the numbers still fits in their slots (both in the
> database and in the output form)
> 3. there is no visible relationship between the
> original and the permuted ones.
I think this gives you a security issue. Given a 6 digit number then permuting the customer numbers in some way is really just 'security through obscurity' with no key involved (Kerckhoff抯 principle). Even if a key were involved then with only a million customer numbers it will be very quick to try each inverse permutation.
I suspect that to be secure you will need to encrypt using one of the standard algorithms and accept the hit that you will need to widen the 'slots' . AES would seem to be a good candidate.
You would also need to add some verification to the process so that you could detect random forged numbers. I would probably include in the encryption a hash (sha 1 say) of the customer number so that after decryption you could check the hash of the recovered customer number.
A customer number in the range 1 to 1,000,000 requires 24 bits (3 bytes) so a sha-1 hash would bring this up to 23 bytes - 32 bytes after encryption with AES. You could truncate the hash (with slightly increased risk) to 13 bytes then you could encrypt the whole to 1 AES block (16 bytes) .
# 4
> Even if a key were involved then with only a million customer numbers it will be very quick to try each inverse permutation.
Sure.... you can quicky try each permutation.... and if you see that customer numbered 123456 bought a loaf of bread today - which one was it? One of the million, so there's no personal information leak at all. The point is, you can try all the permutations but you have no idea which one is right.
The real data are stored somewhere in a secure place - that's not my concern at the moment (and not my job to care about). But there are two more places where (at least a small part of) the data is needed:
1. In my computer, since I work on the program. I can encrypt a whole partition using TrueCrypt, but during the time I'm working on the data, there are visible to me, and (even with a firevall and a antivirus) I do not trust windows at all.
I actually do not need the real data - but I need something looking the same. I need all the cases occuring in reality to occur in my computer, too. And there must be a one-to-one relationship, so I can reproduce any problem case reported to me with my data set.
2. In a notebook of my customer taken to an exhibition/demonstration. He needs a real-like looking data, and he must not show anybody, that customer numbered 123456 bought a loaf of bread today - except in the case that 123456 is a fake number corresponding to anybody of his customers.
Of course, I considered generating the whole dataset at random. But it would be really too much work and the result would unprobably be good enough.
