[integrity]: latency tradeoffs

Current examples seem to be using a single hash to authenticate a
whole resource. However, that requires that the whole resource be
buffered before any of it can be used. This extra latency might well
outweigh any performance benefits that one might wish to gain by using
integrity.

This problem isn't fundamental. If we split the resource into chunks
then we have long known how to authenticate a number of objects with a
single hash using a Merkle tree. Consider a binary tree where the
leaves are hashes of chunks and the interior nodes are hashes of their
children. (For details, see
http://tools.ietf.org/html/rfc6962#section-2.1 or many other sources.)

Given a Merkle tree, each chunk in the resource can be proceeded by a
logarithmic number of hashes which give all the missing nodes up the
tree to the root and, by second-preimage resistance, a single hash in
the source document can authenticate any chunk-aligned subset of the
resource.

The chunk size is the amount of data that must be buffered before
being processed. A smaller chunk means lower latency, but more
overhead.

For resources that aren't accessed with Range requests, we can
simplify things a little by assuming that it'll always be received in
order.

Rather than a balanced binary tree for the Merkle tree, the tree can
be made completely unbalanced. Specifically, the left child of every
node is a data chunk and the right node (if any) is an interior node.
This means that the root hash covers the first chunk and the next
interior node. If the second chunk is preceded by this interior node,
and so on, then the whole file is secured with only a single hash per
chunk.

Both of the above require that the resource data itself be altered to
add extra data. This means that a resource suitable for integrity
cannot be used without it and vice versa. If this is unacceptable in
some cases then it's very easy to put a number of hashes straight into
the HTML: all the interior nodes of a Merkle tree could be given. The
downside is that a large amount of hash data might delay loading of
the remainder of the HTML.

In short, this means that the attribute should be something like:

integrity="8192;unbalanced;progressive;sha-256;6b4j122..."

I.e. it contains the chunk size, the tree shape, whether the hashes
are inline in the HTML or loaded progressively in the resource data,
the hash function and the data itself. (I'm not trying to propose a
concrete syntax.)


Cheers

AGL

Received on Tuesday, 14 January 2014 20:09:10 UTC