Saturday, July 8, 2017

Most Random Password Generators are Bad

Good for you!  You're taking the advice of experts and clicking "Generate Password", resulting in 10 characters of gibberish.  There!  Now your password will take thousands of years to crack.

Um ... not necessarily.  Try a few days.


RAND() LIMITS ENTROPY TO 32 BITS

When using the pseudo-random number generator supplied by most language libraries, the entropy of the resulting password is limited to 32 bits!

Let's take XKCD's algorithm: ~2000 word dictionary, randomly select 4 words, produces 2000**4 different possible passwords, which is 16 trillion.  Log base 2 gives 43.9 bits of entropy.

But using a pseudo-random number generator with a 32-bit initial seed means that it will only generate 2**32 different sequences, or 4 billion.  That's .027% of the total!  In other words, 99.97% of the possible XKCD-style passwords CANNOT BE GENERATED by that program!

Normally, you can add more bits of entropy by either expanding the dictionary size (the number of words to choose from), or increasing the number of words in the password.  But because of the pseudo-random number generator, you are STUCK at 32 bits of entropy.  An attacker could even pre-generate the 4 billion possible XKCD-style passwords that a standard Linux rand() produces.

My point is not that 32 bits bits of entropy aren't enough, it's that you aren't necessarily going to get what you think you're getting if you use the stock pseudo-random number generator.

So if you're running somebody's application and you click "generate random password" and you see a string of gibberish that claims to have a crack time of thousands of years, it is probably wrong.  32 bits of entropy at 1000 guesses per second has a brute force crack time of under 50 days.  (And modern crackers go MUCH faster than 1000/sec.)


TRULY RANDOM NUMBERS ELIMINATE THE LIMIT

For my program, I offer the "-r" option, which reaches out to https://random.org to get random numbers.  It doesn't need very many -- you only need 4 random numbers to generate an XKCD-style password -- but the important feature is that random.org is truly random.  There is no seed.  Each number is uniformly random and independent from the previous number.  (Or at least, so claims the owner of random.org.)  I'm pretty sure this removes any artificial limit on entropy, so you can get as much entropy as you want by increasing the dictionary size and/or the number of words in the password.

Using random.org is not the only way to solve this problem.  Have you ever generated an SSL certificate?  It can take several seconds while the software "generates" enough entropy for long key lengths.  I'm not personally familiar with how that is done, but I've heard the the OS uses external physical events, like keystrokes, network interrupts, etc.  I think I've heard that it also uses disk interrupts, which makes me wonder if SSD drives make it harder for kernels to generate entropy.

If you're going to be demanding a lot of entropy for your application, you should not abuse random.org.  Instead learn how to use locally-generated entropy.

The vast majority of on-line password generators are written in Javascript.  I'm not sure how to get truly random numbers (i.e. entropy) in Javascript, but this might be a good starting point.

(By the way, a bit of reading on my part shows me that I have a lot to learn.  But my reading to date does reinforce my primary point: simply using rand() or a similar/derived function does not produce passwords that take thousands of years to crack.  At best they rely on "security through obscurity".)


PASSWORD MANAGERS' RANDOMIZER???

I'm a little worried about the "random password generators" included in password managers.  The idea is that you should have a different password for every on-line account, and you let the password manager deal with the hundreds of passwords you end up with.  Since you don't need to remember, or even type those passwords, you might as well make them be random character gibberish.  Only the password manager's master password needs to be memorized.

However, if your password manager just uses the normal pseudo-random number generator in the system, that sequence of random characters will not have as much entropy as you think.  I can tell you that LastPass's online password generator just uses Javascript's get_random() function, which only has 32-bits of entropy.  Now maybe their laptop application uses /dev/random, but also maybe the fact that their on-line generator uses built-in random indicates they didn't give the issue much thought.

I haven't done an exhaustive search, but I would wager that 90% of "generate password" functions just use the language's default random number generator, which has a 32-bit seed (or less!).

My suggestion is to use https://www.random.org/passwords/ to generate your gibberish passwords, or my program to generate XKCD-style passwords.


SEED V.S. PERIOD

The rand() man page says that Linux rand() uses the same algorithm as random().  And srandom()'s man page says that the seed is an unsigned int, which is 32 bits.  It also says:
The period of this random number generator is very large,
approximately 16 * ((2^31) - 1).
I.e. the period is approximately 34 billion, which is about 35 bits.  But the seed is 32 bits.  This means that you cannot start the random number generator at any arbitrary point in its period.  Even if you figure out a way to fully-leverage all 35 bits of random()'s period, that still gives you a crack time of 397 days, at 1000 guesses/sec.  And by the way, modern password crackers go much faster than 1000/sec.

XKCD-style Password Generator

I got to thinking about passwords again today.

I wrote my own program to produce XKCD-style passwords from a list of 2126 common words, and calculated some stats.  I reproduced XKCD's calculation of 44 bits of entropy for 4 randomly-selected words.  And I made a few mildly-interesting discoveries, and one more-interesting realization.


PASSWORD LENGTH: MORE = BETTER?

My average XKCD-style password length is 20.8 letters (over a large sample), which is a lot of typing.  So I decided to limit word size.  By filtering my list of 2129 words to nothing longer than 4 letters, I ended up with 709 words.  That's not many, and 4 of them together only gives 37 bits of entropy.  Not so hot.  But if you string 5 of 709 words together, you get 47 bits of entropy, which is better than XKCD!  And the average password drops to 18.3 characters.

I find that interesting: shorter passwords which produce more bits of entropy than longer passwords.  Seems counter-intuitive, until you realize that opening it up to the full 2129 words increases average word length more than it increases entropy.  (See below for the math.)


THE PASSWORDS

So, what do these passwords look like?  Here's 10 of the XKCD-style: 4 words from the 2126 word set:

password: MostlyRelativeSpinAdvanced
length: 26
password: ForBasicallyThinkingExplain
length: 27
password: CookieArmyMysteryConference
length: 27
password: ExpectConvertQuarterbackPresentation
length: 36
password: EverybodyProductHotDemonstrate
length: 30
password: RockIndexWellFloat
length: 18
password: BehaviorNearlyPromotePercentage
length: 31
password: PocketSurviveFourLab
length: 20
password: MuchWeekWillAnd
length: 15
password: DivideMorePeakSeveral
length: 21


So, how easy are those to remember?  Memorizing an XKCD-style password is about creating a mental picture or story around it.  Use some imagination.  It usually helps to make it amusing.  How about the first one: "MostlyRelativeSpinAdvanced"?  Well, I'm a bit of a science geek, so this one makes perfect sense.  You have a particle stream, and most of the particles are moving at relativistic speeds.  So measuring each particle's spin is a pretty advanced thing to do.  Hmm ... what's the amusing part of that?  Oh maybe that an actual physicist would roll his eyes at my explanation and say that I don't know the first thing about particle physics.  But basically, I was able to imagine a mini-story or mental picture for each of those passwords, so while I might not be able to memorize all 10, I could easily memorize one of them.

What about the shorter passwords consisting of 5 words from the 709 words of 4 letters or less?

password: BuyWingSadRideSeed
length: 18
password: SeedPairTankJailDo
length: 18
password: PanBuryDenyDataOld
length: 18
password: GeneRiceTeaYetSin
length: 17
password: WallJailLabNextTent
length: 19
password: HallSnapCashRichRead
length: 20
password: WarmUsKeepRoseLess
length: 18
password: PortMarkSirYouLeaf
length: 18
password: HiAgoHipAnyBe
length: 13
password: EaseSkyRealTossFate
length: 19


Even though those are shorter (and more secure) passwords, I guess I find them more difficult to remember them.  It's about creating a mental picture or story around those words.  Since the words are random, they don't come out in any conceptually correlated way.  So you stretch your imagination to encompass them.  The more words in the password, the more you have to stretch.

Take the last password up there, "EaseSkyRealTossFate", and drop that last word to make it 4 words: "Ease sky real toss".  My first thought is that "toss" is the children's game "ring toss".  Sky and ease kind of fit since the game is usually pretty easy and you toss things towards the sky.  The word "real" is kind of left out, but I imagine throwing something "real", like a laptop or a dinner plate, instead of a game piece.  So I imagine the ease of tossing a real laptop into the sky.  Yeah, that's stretching the imagination a bit, but maybe not too much.  I could pretty easily remember EaseSkyRealToss.

But now throw "fate" in there and my whole mental picture falls apart.  I guess I could say that when the laptop lands, its fate will be sealed, but ... not sure why ... but I would have much more trouble remembering it.

So I'll be sticking to 4 words and more typing.


THE MATH

Passwords are basically taking a set of N things, and taking L of them out with replacement.  For example, a 4-digit PIN consists of a set of 10 digits (N=10), and you take 4 digits out (L=4) with replacement.  The "with replacement" simply means that you might take the same digit out more than once (e.g. 2338).  So the entropy of a 4-digit PIN is 10**4 *(10 to the power of 4), which is 10,000.  To get that in terms of bits, take the log base 2 of it to get 13.  So 13 bits of entropy.

Another example: 8 randomly-selected letters for a password.  Let's assume lower-case only, and no digits or special characters.  The set of N things is the letters of the alphabet, so N=26.  By taking 8 characters, L=8.  26**8 = 208 billion.  Log base 2 of that is 37 bits of entropy.  Cool.  Now let's do random upper/lower case.  N=52, and 52**8 = 53 trillion, giving 45.6 bits of entropy.  Add in 0-9: N=62, 62**8 = 218 trillion, giving 47.6 bits of entropy.

So, back to my XKCD-style passwords.  My original set of 2126 words, taking 4 at a time, gives 2126**4 = 20 trillion, which is 44.2 bits.  My reduced set of 709 short words, taking 5 at a time gives 709**5 = 179 trillion, which is 47.3 bits.

However, see my next blog post for an observation about random password generators and entropy.


MORE WORDS?

My list of 2126 words actually comes from a list of 3000 words from Education First.  I filtered it to limit word length to 7 or fewer characters, resulting in my 2126 words.  Note to the rigorous: you'll find that I'm 2 words short; it made my code easier to ignore the first and last words.

So how about if I remove that filter and pick 4 words from the entire set of 2998?

2998 ** 4 = 80 trillion, which is 46 bits.  I.e. going from 2126 words to 2998 increases the entropy by 2 bits.  My average password length jumps to 25.6, which is 5 more characters.  I tried a few other word length limits and decided that 7 is best.


THE PROGRAM

See https://github.com/fordsfords/pgen Be sure to use "-r" if generating an actual password you want to use.