Don't Scratch Your Entropy

I have a strong conviction that 99% of «security experts» do not know the definition of the entropy. This conviction does certainly seem wildly deranged for you, unless you know the definition in question. So, let's begin with the definition, by the book.

H = sum(p_i * log(p_i))

This is a function of the probability vector P = {..., p_i, ...} that represents a distribution of a random variable. Entropy is a characteristic of a distribution of a random variable. No more and no less.

Let us find the entropy of your password. Your password's distribution vector is {1}, therefore your password's entropy is:

H = 1 * log(1) = 0

Your password's entropy is ZERO. Try log(1) in different bases on different computers if you are unsure.

A sophisticated reader may ask: What if we apply entropy to the password creation procedure? It is doable in a seemingly reasonable way. We can model any password creation procedure as a random choice from a pool of candidate passwords, then characterize the password distribution over this pool with the entropy. The resulting number will tell us how much information our procedure represents. So what? Is this number of any use in the context of «password security»?

Security experts usually jump in here and claim that this number represents the strength of the produced password. For the argument sake, let's accept this claim, and construct a password creation procedure as follows: password pool is {«123», «password», «gtfr3467ujhbvcddgy6r5ddsefvvs», "###"}, we toss two coins and pick one from this four according to the coin toss outcome.

The entropy of this procedure is (given the coin toss produces uniformly distributed outcomes):

H1 = -(1/4) * log(1/4) * 4 = 2

Now (according to the mainstream computer «science» (dictated by the NIST recommendations)) we must label all our passwords with this entropy value:

«123» has the entropy based strength 2
«password» has the entropy based strength 2
«gtfr3467ujhbvcddgy6r5ddsefvvs» has the entropy based strength 2
"###" has the entropy based strength 2.

Looks somewhat counter intuitive, and not at all what you used to think about the «entropy» as being pronounced by a respectable «expert» with a straight face.

Furthermore, we can define another password creation procedure: toss one coin and pick from the pool {«123»,«gtfr3467ujhbvcddgy6r5ddsefvvs»}. The entropy of this procedure is (twice less than the previous): 1. Therefore:

the password «123» has the entropy based strength 1.

The very same password «123» that also has the strength 2. A password has two different strengths simultaneously. If we understand the «strength» as a likelihood of being guessed by the attacker, then a single password can not have two different values, because the password alone is the input argument for the hypothetical attack, not the password creation procedure.

Thus, accepting the premise: the password creation entropy characterizes a produced password, we end up with a contradiction. Entropy is demonstrated to be not a function of a password. However, in a little less mentally insane world I should have skipped this lengthy demonstration altogether. The entropy is just defined as a function of a random distribution — who would have thought that it is also NOT a function of anything else!

But I am not a champion of taking the longer route to obvious conclusions. Matt Weir have conducted a meticulous experiment with leaked passwords to make the statement: entropy based password strength measures do not provide any actionable information to the defender, and also: there is no way to convert the notion of Shannon entropy into the guessing entropy of password creation policies. In other words, he gave us an experimental evidence that the entropy is irrelevant to the password strength problem. Of course, it is irrelevant! This irrelevance is plainly written in the entropy definition. Matt, you could have just read the definition and say: corollary, dear «experts», don't scratch your entropy. Nevertheless, these experimental results are of a great value for humanity, and I am glad we have them, the more evidence the better. In this world of imbeciles, even the most obvious facts require tons of «proofs», so far as the «experts» does not go along with math logic very well.

Still there is more to the topic! Not only the entropy of an accurate password creation model is irrelevant to the problem of password strength, but also the model itself is not possible in real life usecases. What distribution are you going to apply to human created passwords? Given that (a) humans are incapable of randomization (b) the pool of passwords they choose from is not accessible to us, not even by vivisection of the brain. This fact makes the entropy even worse than irrelevant, it makes the entropy ARBITRARY — whatever distribution we assume for a human created password it is inevitably baseless arbitrary garbage.

Let's recap:

The entropy is a function of a distribution of a random value.

Corollary:

(a) your password's entropy is 0

(b) every «security expert» pronouncing «entropy», without defining the distribution or at very least the pool of candidate passwords, is a brain dead buffoon.

The entropy is a function of a distribution of a random value. Corollary: (a) your password's entropy is 0 (b) every «security expert» pronouncing «entropy», without defining the distribution or at very least the pool of candidate passwords, is a brain dead buffoon.

The entropy is a function of a distribution of a random value.

Corollary:

(a) your password's entropy is 0

(b) every «security expert» pronouncing «entropy», without defining the distribution or at very least the pool of candidate passwords, is a brain dead buffoon.