PasswordAuthentication

ShscWiki :: LogIn :: PageIndex :: RecentChanges

How To Design A Password-Based Authentication System

You're building a website, and you want to control access or allow users to personalize various parts of it. For this, you need to know the proper way to securely store usernames and passwords in a database, so that you don't allow people to break in.

The Solution


This problem has been solved, and the solution is fairly simple. In your database, you'll need three columns. The first I'll refer to as username, the second I'll refer to as salt, and the third I'll refer to as digest.

When a user is created, stuff their username in the username table. There, that was easy, wasn't it? The password is a bit more difficult.

First, generate a random string. I'd suggest using 8 characters of completely random data. Note: This is a fairly paranoid standard, but it's an easy standard to meet, so I see no problem in suggesting it for general use. Save this random string in the salt column for the user.

Now, take the password provided by the user. Prepend salt to it, so you have a new string, salt + password. This is the salted password.

The final step is the calculate the digest of the salted password. Do this by passing it through a hash function, such as MD5 or SHA-1. Note: I suggest SHA-1 for new applications. Store this in the digest column.

That's all it takes to create a user. Now, how do you verify a password when logging someone in?

Take the username provided, and look up the salt and digest stored in the database. Prepend salt to the password provided, calculate the digest using the same hash you used before, and compare the result to the digest stored in the database. If the two match, then the password entered was correct. If they do not match, then the password was not correct. Simple.

Why So Much Work?


Why do we go through all of this work? The idea is to prevent an attacker from being able to determine what peoples' passwords are, even if they have read access to the database. This way, even if someone breaks into your server and manages to snag a copy of your database, they still don't have access to all of those accounts.

To this end, we don't store unencrypted passwords in the database. Instead, we store the digest of the password. A digest is a special function which turns its input into gibberish, with the following promises: Note: There are other properties of a good digest function, but we don't care about them here.
1. The same input will always generate the same output.
2. Different inputs are extremely likely to generate different output. Note: "extremely" in this sense can be read as "you'll almost certainly never see two inputs that generate the same output, ever."
3. It is very difficult to work out what the input was that generated a specific output.

So, if we are storing the digest, then we can still determine if a login attempt is valid by digesting the proposed password and comparing the digests, without actually storing the cleartext password in our database.

But, what was all that business with the random salt? Well, let's say that that one of your users decided that "password" was a good password for their account. Now, it may be difficult to take an arbitrary digest and determine what input created it, but every script kiddie in the world knows that the SHA-1 digest of "password" is 0x5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8. So, if that digest is found in your database, then script kiddies would instantly be able to break into that account.

This class of attack against digests is called a dictionary attack. The idea is to, instead of directly attacking an individual digest, to compose a dictionary of likely passwords, calculate the digest for each, and check it against each of the passwords. Instead of spending forever to break a single account, the attacker spends a relatively short amount of time to mount a probabilistic attack on all of the digests.

In fact, dictionaries of passwords with digests already exist on the Internet, with thousands and thousands of entries. So, common passwords can be gleaned from normal digests with a minimum amount of effort.

But, prepending the salt to the password before digesting it changes the digest that results. Because the number of possible salts is so high, no-one has calculated dictionaries of common passwords for every salt. And, because salts are randomly generated, you can't make a dictionary of common passwords for "common" salts.

Why do we use a different salt for every password, though? Why can't we just hardcode in a single salt for our entire application? There are several reasons. First of all, if all the salts are the same, then an attacker can instantly tell if two accounts have the same password, by comparing their digests. Secondly, if the attacker has a fair amount of processing power on their hands, they could construct a new dictionary from scratch of common passwords for just that salt. It's not a good as finding a premade dictionary, but it is still a viable probabalistic attack on all accounts simultaneously. By using a different salt for each password, the difficulty of breaking the security of the system is greatly increased, because a separate attack must be mounted against each account.

Finally, is it secure to store the salt along with the password digests? Wouldn't it make more sense to keep it safely encrypted somewhere, or something? The key response to this is, if you can store the salt more securely, why can't you store the digest that way as well? In any system, if it is compromised to the extent where an attacker can read your digests, then they can almost certainly read the salts as well. It's easier to not lie to ourselves about this fact, and it doesn't really decrease the security of your system in any significant manner to store the salts alongside the digests.

This article is ©2008 by the respective authors. Reproduction is prohibited without express permission from all contributors.