- From: Adam Langley <agl@google.com>
- Date: Tue, 14 Jan 2014 15:08:22 -0500
- To: public-webappsec@w3.org
Current examples seem to be using a single hash to authenticate a whole resource. However, that requires that the whole resource be buffered before any of it can be used. This extra latency might well outweigh any performance benefits that one might wish to gain by using integrity. This problem isn't fundamental. If we split the resource into chunks then we have long known how to authenticate a number of objects with a single hash using a Merkle tree. Consider a binary tree where the leaves are hashes of chunks and the interior nodes are hashes of their children. (For details, see http://tools.ietf.org/html/rfc6962#section-2.1 or many other sources.) Given a Merkle tree, each chunk in the resource can be proceeded by a logarithmic number of hashes which give all the missing nodes up the tree to the root and, by second-preimage resistance, a single hash in the source document can authenticate any chunk-aligned subset of the resource. The chunk size is the amount of data that must be buffered before being processed. A smaller chunk means lower latency, but more overhead. For resources that aren't accessed with Range requests, we can simplify things a little by assuming that it'll always be received in order. Rather than a balanced binary tree for the Merkle tree, the tree can be made completely unbalanced. Specifically, the left child of every node is a data chunk and the right node (if any) is an interior node. This means that the root hash covers the first chunk and the next interior node. If the second chunk is preceded by this interior node, and so on, then the whole file is secured with only a single hash per chunk. Both of the above require that the resource data itself be altered to add extra data. This means that a resource suitable for integrity cannot be used without it and vice versa. If this is unacceptable in some cases then it's very easy to put a number of hashes straight into the HTML: all the interior nodes of a Merkle tree could be given. The downside is that a large amount of hash data might delay loading of the remainder of the HTML. In short, this means that the attribute should be something like: integrity="8192;unbalanced;progressive;sha-256;6b4j122..." I.e. it contains the chunk size, the tree shape, whether the hashes are inline in the HTML or loaded progressively in the resource data, the hash function and the data itself. (I'm not trying to propose a concrete syntax.) Cheers AGL
Received on Tuesday, 14 January 2014 20:09:10 UTC