All code examples are available here.

Password storage is now mandatory in several standards and is a well-known and solved problem. Not following best practices for password storage could transform a hack into a nightmare for business and developers.

TL;DR

Don’t store passwords in clear-text. Don’t hash passwords with MD5. Use dedicated algorithms for hashing passwords. Use a distinct salt per user.

Vulnerable code

It’s universally acknowledged that storing clear-text passwords is a bad idea. The recommendation is to hash them, using a cryptographically secure hashing algorithm. Here are some examples of WHAT TO NOT DO.

from hashlib import md5

def hash_password(password):
    return md5(password)
import bcrypt

def hash_password(password):
    return bcrypt.hashpw(password, '$2b$12$/msH5YVuJWZTfw.AdfLxS.')
import os
from passlib.hash import pbkdf2_sha256

def hash_password(password):
    return pbkdf2_sha256('sha256', password, os.urandom(16), iterations=1)

All of theses examples are vulnerable; see below why.

Vulnerability explanation

If an attacker gains access to your database, apart from personal information about your customers, he will try to recover your customers passwords:

Clear-text passwords

First, YOU SHOULD NEVER STORE PASSWORDS IN CLEAR-TEXT. If an attacker gains access to your database, they just need to copy the clear-text password and log in with it.

There are known defences to make the attacker’s life more difficult.

Hash functions

A Hash function is any function that can be used to map data of arbitrary size to data of fixed size. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. - Wikipedia

Hash functions have the property that any given input (string, bytes, …) will always give the same hash as output, and that distinct inputs have a very high probability of producing distinct hashes.

Hashes are used in every language for hash tables, caches, finding duplicate records, etc.

Simple hash functions

Thanks to the hash function’s properties, the Internet has long recommended the use of a simple hash function like MD5 or more recently SHA1 for hashing passwords. The recommendation is better than storing password in clear-text in the database but attackers quickly found a solution.

It’s quite easy and quick to compute the MD5 of most common passwords, as MD5('password') will always be 5f4dcc3b5aa765d61d8327deb882cf99. So the use of rainbow tables quickly became a valuable tool for attackers. If you see the value 5f4dcc3b5aa765d61d8327deb882cf99 in a data leak, you can look in the rainbow table and it will tell you that the clear-text password is password.

Don’t trust me? Go check yourself here.

The crack is basically immediate, all possible passwords are precomputed, the complexity of cracking your password with MD5 or sha1 is basically: O(1). Thus, in reality, MD5 is no better than storing passwords in clear-text.

Wordlist

The problem with brute-forcing passwords is that you need to have a list of “known” passwords in the first place. It’s called a wordlist and is basically a clear-text file of passwords found in previous hacks, usually the most common ones like 123456 or qwerty. The problem is that most users uses these simple password or a combination of them. Security researchers estimate that as many as 50% of passwords found in leaks are from the top 25% most common passwords in 2016. Passwords crackers can also apply some mangling rules, like replacing a letter by a number qw3rty or mixing the case QwErTy. So, even with a strong hash, attackers can determine the passwords of many accounts in your system.

Salt

One solution that quickly came is to use a salt; instead of hashing just the password, you hash the password concatenated with a random value that you store. So MD5('password' + 'SalT3D') == 'e7c3b1834297faf1ea92754f41daf14f' != MD5('password'). This solution blocks the most common rainbow tables but the hackers quickly found a new workaround.

As the salt value is the same across all the stored passwords, the attacker can generate a rainbow table with your salt value; such computations become easier as computer power increases and parallelization improves.

With this solution, the attacker needs to generate a rainbow table, augmenting the complexity to crack your passwords to: O(N), N being the number of passwords “knowns”.

Per-user salt

Instead of using a global salt, the solution is to generate a salt per user. The downside is that you have to store the salt value along with your passwords to be able to check the password.

In this way, the attacker would have to generate a rainbow table per salt value. This quickly raises the complexity of the attacker’s work. If you have 100 users and one distinct salt per user, the attacker would have to compute: 100 (users salt) * 100 (number of password to try) = 10000 hashes to crack all of your customers passwords.

With this solution, the attacker needs to generate a rainbow table per user salt, augmenting the complexity to crack your passwords to: O(M*N), M being the number of customers and N being the number of passwords of a password list.

Let’s slow the world

One solution to counter the augmentation of compute power and use of GPU to crack passwords is to use hash functions designed specifically for password hashing. They are called Cryptographic hash functions. They have the same basic property in that they are deterministic, but they have another property fundamental for hashing passwords: It is infeasible to generate a message from its hash value except by trying all possible messages. This means that if an attacker has a cryptographic hash, it will need to brute-force it, try an enormous number of passwords, hash them with your user salt, and check if they match.

Simple hashing functions are pretty quick and easy to parallelize, latest reports show up to 200 billion per second, yes 200,000,000,000 hash per second for MD5 and only 68 billion for SHA1.

A slower hashing algorithm can make brute-force attackers harder by forcing them to take longer. Algorithms like BCrypt or pkbdf2 are very well suited for password hashing because the algorithmic complexity can be tuned at runtime. You can set the number of iterations required to compute a hash, so if one of these algorithms becomes fast to crack, you need only double the iterations, and you should be safe.

Checking back on the report above, with BCrypt, we are down to 105 thousand hashes per second for BCrypt with 5 iterations and only 9680 hashes per second for pbkdf2 with 1000 iterations, and these iterations are pretty low according to present standards.

Closing thoughts

All these good practices have the objective to slow down a hacker that would have access to your database passwords and try to crack them. There are two complimentary solutions:

And please, do not write your own hashing function or compose them without understanding what you are doing. And, no, MD5(MD5('password') + 's') is not inherently safer.

Here is a graph of cracking time per algorithm:

This image has been generated from this Google Spreadsheet.

Non-vulnerable code

If you are using Django, just follow the Django documentation and use standard Django password hashers.

If you are not using Django, you can use passlib which is easily installable, maintained and with a very opinionated API: For example, you cannot hash a password without a salt. It also support multi-factor auth which is a very good thing.

Example of attack

We have provided a script to generate some data in a format accept by John The Ripper. You can use it to generate password hashing by varying some parameters and see the impact it haves on the time needed to crack them. It uses internaly the Django default password hashers, so the hashes are “real”. They have the same default settings as Django and can be cracked as fast as real ones.

It pick a random password from a wordlist named password.list so it’s the worst case scenario as the password cracker also have access to the password list. The basic list is quite small (~3000 passwords), if you want to have a more realistic scenario, replace password.list by one of password list downloaded here.

Here are some examples of run with MD5, BCrypt and PBKDF2:

Generate some data locally

The script generate_data.py can be used to generate some data, first install the requirements:

pip install -r requirements.txt
usage: generate_data.py [-h] [--salt SALT] [--pepper PEPPER] [--rounds ROUNDS]
                        [--n N]
                        algorithm

Generate username / password list for john the ripper

positional arguments:
  algorithm        which algorithm to use (md5, bcrypt or pbkdf2)

optional arguments:
  -h, --help       show this help message and exit
  --salt SALT      which kind of salt to use (none, same, user)
  --pepper PEPPER  a pepper
  --rounds ROUNDS  Number of rounds to use
  --n N            number of users/password to generate

For example to generate 5 usernames / password with bcrypt and a different salt per user, you can do:

$> python generate_data.py --salt user --n=5 bcrypt
Generating 5 usernames/password with algorithm 'bcrypt' with salt 'user' [12 rounds] and pepper ''

drakeandrea:$2b$12$K2GNkU1jZBnrHntKRTeE/OauVFlpC3JzQ/5ZEZQL9h8gJMJhPTKre
darlene71:$2b$12$p69/XRS1fd/N.EYn/saBhO0DQoh/SX8SR6XnA5BpN/eJF523m.U7e
brandonbates:$2b$12$fRTHDpoiWePbm1dU8AihrOfkaqNsgXkyxcRbDAhIEYEvfwA0TNYKK
danielle29:$2b$12$Q5dqtlRuWzh.wZB1ZADCGOb3HOgOV2mjN3.x5/n33cS8IQxLLvi9O
jennifer93:$2b$12$l7m9vmEKLqfPrvLURDVn/OEbnkdDa0cPZugA/4O56B3mchkVRkEom

Output also write in file `passwd`
You can crack it with `john --wordlist=password.list --format=bcrypt passwd`

Then to crack it with john the ripper:

$> john --wordlist=password.list --format=bcrypt passwd

It can take a long time to crack, it’s intended.

Use docker

We also provided a Dockerfile that you can use if you don’t want to install John.

First you can build with:

docker build -t safe_password_storage_python .

Then launch it with:

docker run -t -i --rm=true safe_password_storage_python

You can then use the same commands as shown above.

References:

Edit this page on Github