## Thursday, November 05, 2015

There are some things you just know.
The gold standard for password generation was set by this xkcd cartoon.

Let's look at password entropy (from the Wikipedia article).
"It is usual in the computer industry to specify password strength in terms of information entropy, measured in bits, a concept from information theory. Instead of the number of guesses needed to find the password with certainty, the base-2 logarithm of that number is given, which is the number of "entropy bits" in a password.

"A password with, say, 42 bits of strength calculated in this way would be as strong as a string of 42 bits chosen randomly, say by a fair coin toss. Put another way, a password with 42 bits of strength would require 242 attempts to exhaust all possibilities during a brute force search. Thus, adding one bit of entropy to a password doubles the number of guesses required, which makes an attacker's task twice as difficult."
...

"For passwords generated by a process that randomly selects a string of symbols of length, L, from a set of N possible symbols, the number of possible passwords can be found by raising the number of symbols to the power L, i.e. NL. Increasing either L or N will strengthen the generated password.

"The strength of a random password as measured by the information entropy is just the base-2 logarithm or log2 of the number of possible passwords, assuming each symbol in the password is produced independently. Thus a random password's information entropy, H, is given by the formula

H = log2NL = L * log2N = L * log N/log 2

where N is the number of possible symbols and L is the number of symbols in the password. H is measured in bits.In the last expression, log can be to any base."
If we take logs to base 10 and consider the word 'cat' (where the three symbols are each taken from the set of 26 lower case characters {a-z}) then the entropy is:
3 * (log 26/ log 2) = 3 * (1.415/.3010) = 3 * 4.7 = 14 (approx).
In the xkcd cartoon, the phrase used is: "correcthorsebatterystaple" which I make to be 25 characters. These are drawn from the 26 lc symbols {a-z} which deliver 4.7 bits of entropy per character. (Handy table for DIY calculations here).

This would give a pretty impressive total of 26 * 4.7 = 117.5 bits of entropy.

But xkcd claims only 44; how come?

The cartoon's author, Randall Munroe, could be using the NIST heuristic analysis of 2004. After all, the phrase is not composed from random letters but words in English, which cuts out a lot of randomness*. If we use the NIST figures, I calculate we get 41 bits of entropy. But the whole area is actually rather vague.

Personally, I do the best I can. I use relatively insecure passwords for low-grade access to websites which seem to demand them, and much more cryptic (and unmemorable) passwords for stuff which is really important. By now I must have somewhere close to a hundred different username-password combinations,

Some people recommend investing in a password manager; I use Word with the 128-bit AES encryption turned on and a pretty high-entropy password which I have memorised. NIST recommends 80 bits of entropy for (non-military) high-security.

If I ever forget it, I'm toast!

---

* Bruce Schneier has some good advice (and a bit of a critique of the xkcd cartoon).

Johannes Weber has written the excellent 'Password Strength/Entropy: Characters vs. Words'.

Randall Munroe, xkcd author, explains what he was really doing (clue - count the little boxes representing bits under the symbols/words ...). Think of the words in the passphrase as quasi-symbols drawn from a huge 'alphabet' - a set with thousands or hundreds of thousands of members. This is the value we give to N (so even log2N is reasonably large). Then with two to six words in the passphrase, L = 2 to 6 ... and the entropy L * log2N does what you need. In the cartoon, Randall assumes a publically-known dictionary of 2,048 words, from which each word in the passphrase is randomly chosen. This gives 11 bits of entropy for each word.