logo

How Unique Should a Unique Code Be?

A system I'm working on needs to generate and store unique codes that will be distributed to its users. Any user can then enter one of these codes to obtain a discount at the basket.

Because the codes are there for anybody's taking they need to be random enough that users who shouldn't have one shouldn't be able to go guessing them.

The codes will be handed out ad-hoc to users who will take the code and enter it at the basket. Nothing ties the code to any given user though.

The code I've come up with to generate the codes (read "pasted and modified off Google") is like this:

private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHJKLMNPQRSTWXY3456789";

public static string Generate(int size)
{
 char[] buffer = new char[size];

 for (int i = 0; i < size; i++)
 {
  buffer[i] = _chars[_rng.Next(_chars.Length)];
 }
 return new string(buffer);
}

It generates codes like this:

TWEXB8GE, WHJ55459, AEJA6XP5, D4J3RXJK, NRMALE8H, WMJGQAKY, YAQQKKX7

Notice that I've left out confusingly-similar characters combinations such as I and 1, Z and 2. Going further I could probably miss out S and 5 as well as 8 and B?

The issue I have is one of compromise. How do I balance uniqueness and "security" with ease of entry for the end user?

A string that's 8 characters long might not look too hard to guess, but is in fact fairly unique.

If the set of characters used has 28 members and the string is 8 characters long then the chance of guessing a code is one in 28^8 or 28 to the power or 8 or  28*28*28*28*28*28*28*28 or 377,801,998,336.

Assuming my maths is right?

Here's how some other combinations stack up:

Possible Characters Code Length Permutations
28 12 232,218,265,089,212,416
28 8

377,801,998,336

28 6 481,890,304
28 4 614,656
10 10

10,000,000,000

10 3

1,000

Maybe a code 8 characters long is too much. Probably 6 would suffice. What would you opt for?

Comments

  1. A couple of thoughts - expiration (life span) and ease of use.

    Given ease of use as a factor (how hard for users to type in if they can't puzzle out copy and paste) and setting an expiration (n days on m population of codes) you could well manage with 4 characters... which is readily available in the form of a Note document ID for those writing Domino based applications. I've used this for a url clipping implementation where the short urls are not long lived and it gives me plenty of usage. I might go to 6 just to extend the life span of the urls out a bit or even 8 would do if I wanted them to live on for almost-ever.

      • avatar
      • Jake Howlett
      • Mon 15 Nov 2010 10:26 AM

      As I understand it there's a chance the codes will sometimes be printed on paper and literally handed out to users (or even non-users as an incentive to become one). So no copy/paste there and hence I wanted to keep as short and readable as possible.

      They need to last indefinitely. There's an option for them to have an expiry date but also an option not to set one.

  2. If the codes are going to be given out then generate a complex one and produce a QR code - http://code.google.com/apis/chart/docs/gallery/qr_codes.html.

    Then just implement a barcode reader (already done for Android / iPhone) for webcams ;-)

    • avatar
    • Dragon Cotterill
    • Mon 15 Nov 2010 11:15 AM

    Always suspicious of auto-generated codes. Have to make sure that they don't match real words. Otherwise odd combinations involving cfku (and other variants) could crop up.

      • avatar
      • Jake Howlett
      • Mon 15 Nov 2010 11:22 AM

      Hadn't thought of that. Although, according to the table above, the chance of any given four-letter-word cropping up is 1:614,656. That's fairly unlikely, no?

      Show the rest of this thread

  3. it depends on the value of what you are protecting.

    if its bra size for victorias secret and there isnt a name or any info to id the person keep it as short as possible

    if it hides a homeaddress kill it and opt for user/password+token

    guids are nice but not very user friendly to type back in.

    if you buy something from RIM you get a /reg.do ID=820874xxx&PD=29962xxx

    so thats 18 numbers only if thats easy to retype and represent a high value

    so it depends of the value of the data

    1. i need to learn how to read you mentioned the data allready , is there a policy for code reuse ? so if there is duplicated the next one wont get a discount

      Show the rest of this thread

    • avatar
    • Hynek Kobelka
    • Mon 15 Nov 2010 11:30 AM

    Obviously the length of the code also depends on the amount of valid codes that you will give out, because a "guesser" need to hit any one of them.

    Simply mathematically i think that if you want to keep it short then the best way is to expand your character map as much as possible. Right now you use only 28 allowed chars. But with the whole alphabeth in upper and lowercase and numbers you can get 60.

    Of course people will then have problems with similar symbols but maybe this could be resolved with the choice of a proper font.

    And one more thing is that if you decide to have a longer code then think about dividing it into smaller groups for better readibility: NRMA-LE8H , NR-1234-8H, NR MA LE 8H,...

    But these are just ideas :-)

      • avatar
      • Jake Howlett
      • Mon 15 Nov 2010 11:58 AM

      Good ideas though Hynek. Thanks!

  4. My calculator gets 377,801,998,336 for 28**8.

    How many of these are you going to give out? If you give out a million or them, then (even with my higher number) on average it will take only 377,802 guesses to crack one. That's not very many guesses if it is done with some computer assistance.

    And there's something possibly more important than that. You probably don't want to re-use these codes, but 28*8 is only on the order of 2**26 values, so if you just generate ~8000 random codes (2**13, actually) there will be a 50% chance that you have re-used at least one. (Lookup 'birthday paradox' on wikipedia for the details.)

    I would go with a longer code, and I would not make it random. I would create codes by applying a hash to a set of unique strings. This has the advantage, too, of allowing you to have customer-specific codes that can't be shared (because the hash input strings contain customer names or account numbers), or having codes that are specific to particular partner web site (by having the partner name or number in the hash input strings), or codes that are sharable and generic, all with the same format and generation mechanism.

    1. Oops! It's not 2**26. I took the natural log on my calculator instead of the log2. It's more like 2**39. That means you can generate close to 1,000,000 codes before there's a 50% probability that you generate a dupe.

      • avatar
      • Jake Howlett
      • Mon 15 Nov 2010 12:01 PM

      "My calculator gets 377,801,998,336" Mine too :-)

      I can't imagine there ever being more than about 10,000 of these codes in existence (and that's a high-end guess).

      You're losing me with all this log2 stuff. It's been a long time since I did any advanced maths. Working out it was 28^8 took me long enough...

      Hide the rest of this thread

      1. The log2 stuff is really just asking: How many bits are there in the number when written in binary? That's the key factor when you're trying to figure the probability of getting the same value in a set twice.

        The birthday problem is this: Every time you walk into a bar with at least X people in it, you bet the bartender that there are at least 2 people in the bar with the same birthday. How big does X have to be so that you will win more often than you lose? The answer is just 23, which surprises most people because it's a lot lower than you might suspect. But if you never make this bet when there are fewer than 23 people in the bar, and you always make this bet when there are more than 23, then you will make money. Unless, of course, the bartender knows everybody's birthday and doesn't take the bet when he would lose! ;-)

        The bits come into it because there's an easy way to get an approximate answer just by taking the number of bits that you need to represent the number of choices, divide that number of bits by 2, and raise 2 to that power. The larger the numbers you're dealing with, the better this is as an approximation. (It's actually not very good an approximation for a number as low as 365. You get 19 this way, which will cause you to lose money!)

        The log2 comes in because that's how you define the number of bits when your number of choices isn't a power of 2. E.g., for 365, the Log2 is 8.51, so that's how many "bits" you are dealing with.

        Anyhow, if there really won't ever be more than 10,000 of these codes, you're probably okay.

    • avatar
    • Greg
    • Mon 15 Nov 2010 04:54 PM

    One other thing you could consider if you want to prevent people guessing your codes is to add a checksum character into the code. At its simplest this can just be the character that might represent the sum of all the other characters. I'm sure you could work out the details of how that would work pretty quickly.

      • avatar
      • Jake Howlett
      • Tue 16 Nov 2010 09:52 AM

      "I'm sure you could work out the details of how that would work pretty quickly."

      Hmm, your faith in me may be misplaced. Never did get checksums.

    • avatar
    • Curtis Kuhn
    • Mon 15 Nov 2010 05:05 PM

    One thing you might want to keep in mind from a usability standpoint is to keep the letters in lowercase. That way users can more easily distinguish between letters and numbers. They won't be left wondering if something is a 0 (number zero) or an O (letter O). Of course you then might run into confusion with lowercase l and the number 1. Maybe a good idea to eliminate 0s, Os, ls and 1s altogether. It decreases your pool of available codes but would probably lead to less frustration and a higher success rate.

  5. Windows API has a CoCreateGuid(); function, which can be called from LotusScript too...

    It creates 128bit integers, but you can perform a base32 conversion on it, so you will get an alphanumeric text-string.

    Oh yeah, make sure your generated codes do not contain any profanity :-))

    • avatar
    • Liam McLaughlin
    • Tue 16 Nov 2010 09:45 AM

    The opposite of security is usually usability - and in this case if the user is typing in the code then it has to be short-ish. IMHO less than 9 and as Hynek suggested grouped for readibility.

    Case sensitivity to be avoided for good usability and likewise any similar letters/unumbers

    I'm also interested in the google search you'll have to do to try to find the list of unsuitable words to parse out...could be some interesting results. Let us know how that one goes

      • avatar
      • Jake Howlett
      • Tue 16 Nov 2010 09:50 AM

      I think just removing most of the of vowels will remove any risk of profanities popping up.

      My new list of chars is:

      ACDEFGHJKLMNPQRTWXY34679

      If you can spell a naughty word with they you're a smarter fecker than me ;-)

      Show the rest of this thread

  6. Hi Jake, if this is Domino, why not use @Unique (without a parameter). it gives strings that are like this:

    AMAG-8BAB9A

      • avatar
      • Jake Howlett
      • Wed 17 Nov 2010 02:29 AM

      It's not Domino, but, if it were, I'm not sure @unique would cut it.

      The first 4 chars are fixed and so there's "only" 308,915,776 possibilities, which I guess is enough in reality, but aren't the produced sequentially?

      My code would produce, say, 100 codes at once. Assuming they are in fact guaranteed unique I'm guessing there's a chance that a user who received code AMAG-8BAB9A could then take a stab at AMAG-8BAB9B and AMAG-8BAB9C etc.

      Show the rest of this thread

  7. How about using a selection of 1000 short words. You put two words together with a digit between them. Words are easer to type because they are recognizable. This would give you about 10 million combinations I think.

Your Comments

Name:
E-mail:
(optional)
Website:
(optional)
Comment:


About This Page

Written by Jake Howlett on Mon 15 Nov 2010

Share This Page

# ( ) '

Comments

The most recent comments added:

Skip to the comments or add your own.

You can subscribe to an individual RSS feed of comments on this entry.

Let's Get Social


About This Website

CodeStore is all about web development. Concentrating on Lotus Domino, ASP.NET, Flex, SharePoint and all things internet.

Your host is Jake Howlett who runs his own web development company called Rockall Design and is always on the lookout for new and interesting work to do.

You can find me on Twitter and on Linked In.

Read more about this site »

More Content