Synopsis

We like writing web applications. Increasingly, software that might have once run as a desktop application now runs on the web. So how do we properly authenticate users in web contexts? Passwords, of course! Let’s look at how to handle password authentication on the web.

What not to do

It’s very tempting to use plaintext authentication for web applications. Let’s whip up a quick python web server that we’ll use to test authentication. Let’s say we want to provide access to magic numbers, for users we give access to.

 from flask import Flask, request 
 app = Flask(__name__)
 
 # app globals 
 secret_number = 42
 
 # routes 
 @app.route("/secret_number") 
 def get_secret_number(): 
     return secret_number
 
 if __name__ == "__main__":
     app.run()

And curling yields:

 $ curl 'localhost:5000/secret_number' 
 42

Right now, there’s no authentication. What if we want to protect the magic number with a username and password? It would be easy to do plaintext authentication, like so:

 # app globals 
 secret_number = 42 
 users = [ 
     { 'name': 'bob', 'password': 'QNJWzjRc' } 
 ]
 
 # helpers 
 def user_allowed(username, password): 
     return filter( 
         lambda user: user['name'] == username and user['password'] == password, 
         users 
     )
 
 # routes 
 @app.route("/secret_number") 
 def get_secret_number(): 
     if user_allowed( 
         request.args.get('username'), 
         request.args.get('password') 
         ): 
         return str(secret_number) 
 return 'not allowed'

Curling as we did before, we see the expected response:

 $ curl 'localhost:5000/secret_number' 
 not allowed

However, if we pass the correct username and password arguments as URL parameters, we can still retrieve the magic number.

 $ curl 'localhost:5000/secret_number?username=bob&password=QNJWzjRc' 
 42

Why is this a bad idea?

Despite the ease in implementing an authentication system like this, it’s really not a good idea to check passwords this way. Here’s why:

  1. If someone gets access to your users list, they can pretend to be an authenticated user.
    Since people tend to reuse passwords across sites, a user’s passwords being read from your site might lead to a catastrophic disclosure of information on other sites!
  2. Yet we still login to websites with passwords. There must be a better way.

Examining practices in non-technical areas, you wouldn’t tell someone your social security number, but you might give someone a piece of information that guarantees that you know your social security number, like your SSN’s last four digits. If someone wants to identify you by the last four digits of your SSN, they happily don’t need to know your full SSN.

Similarly, it would be nice if we had a way to verify that someone knows their password without actually storing their password in its entirety. That way, if an attacker reads your users list, they don’t get users’ actual passwords.

Handy One Way Functions

It turns out that we can solve these issues by employing (easy to use) cryptographic techniques. Most mathematical functions, i.e. f(x) = x + 2 have an inverse, in this case f^(x) = x - 2. Some functions, however, such as g(x) = x mod 3 have no inverse, since multiple values in the domain of the function can map to the same output, i.e. g(1) = g(301) = 1.

We can use similar mathematical functions to take a password and create an identifier based on it that could only be generated by processing that password. As an example, the SHA256 algorithm takes the password password123 and turns it into

ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

If we take a slightly different password, say pAssword123 and run it through SHA256, we’ll get a completely different result:

f5355765f831ee3c9fb35e3a3c701887f6ac33a39fbad7f1740759558716fccf

The crucial realization is that we can’t easily derive the supplied data from this seemingly random string of letters and numbers. SHA256 is, as we described above, a one-way function. This makes the generated string of letters and numbers a sort of ‘fingerprint’ of the data that generates it.

You’ve probably head of other functions like this; common hash functions include the SHA family, MD5 and Blowfish. Going forward, we will be using special hash functions that are specifically designed to take a long time to compute.

How do we apply this to our web server?

Taking things back to code, we can use simple APIs for hash functions built into most web platforms. Flask, for example, makes use of Werkzeug’s generate_password_hash and check_password_hash to generate and verify hashed passwords.

Note: Some hash functions are more suitable for storing passwords than others; existing general hashing functions like MD5 or even SHA256 shouldn’t be used due the ease with which they can be computed. Password specific hash functions like bcrypt and PBKDF2 (which in turn uses general hash functions) should be employed instead.

We’re ok using Flask’s built in API, however, which makes use of PBKDF2. Let’s augment our previous example.

Adding an import for our hash API functions:

 from flask import Flask, request 
 from werkzeug.security import generate_password_hash, check_password_hash 
 app = Flask(__name__)

We also need to store our passwords using the password’s hash, rather than the password itself:

 users = [ 
     { 'name': 'bob', 'hash': generate_password_hash('QNJWzjRc') } 
 ]

Note: Needless to say, we would never ever hardcode passwords in a production application.

Finally, we just need to update our user_allowed function to check the user’s password against the stored hash:

 def user_allowed(username, password): 
 return filter( 
     lambda user: user['name'] == username and \ 
         check_password_hash(user['hash'], password), 
     users 
 )

Once again, we can successfully retrieve our magic number with the right password as before, and are disallowed from viewing the secret number otherwise. Now, however, if our user list is leaked, we don’t learn anything too useful.

 (Pdb) p users 
 [{ 
     'hash': 'pbkdf2:sha1:1000$WNYvjwGm$241780fe007f981b9c9959b3416693bb689b9f91', 
     'name': 'bob' 
 }]

And now we have a pretty good layer of security around our stored passwords.

Further Learning

More Reading & Other Resources