Calculation of the hash width in cache-digests

The draft describes how to select P to achieve a particular false
positive probability.  The net effect of this is that you get
approximately the probability.

I think that this would be better if there was a description of how to
decide on an upper bound for the false positive probability.

The calculation is a little less clean:

n is the number of resources
p is the lower bound of the probability of false positive

P is the number of fixed bits allocated to each entry
N+P is the width of the hash output

All of these are integers, except p, which is a number between 0 and 1.

The current draft chooses:

P = ??(0 - log2(p)) - it's not clear, but I imagine that this uses floor()
N = round(log2(n))

I propose that we instead do:

P = round(0 - log2(p))
N+P = ceil(log2(n)-log2(p))

This is, I think, a much better calculation.  We would still benefit
from encoding N and P separately.

Received on Monday, 18 July 2016 13:28:33 UTC