Re: [integrity]: latency tradeoffs

This is a great point. Are there any cases in CSP currently where we have a
syntax for describing Merkle trees, or would we need to come up with one? I
think it definitely makes sense to support such structures.
--Joel


On Tue, Jan 14, 2014 at 12:08 PM, Adam Langley <agl@google.com> wrote:

> Current examples seem to be using a single hash to authenticate a
> whole resource. However, that requires that the whole resource be
> buffered before any of it can be used. This extra latency might well
> outweigh any performance benefits that one might wish to gain by using
> integrity.
>
> This problem isn't fundamental. If we split the resource into chunks
> then we have long known how to authenticate a number of objects with a
> single hash using a Merkle tree. Consider a binary tree where the
> leaves are hashes of chunks and the interior nodes are hashes of their
> children. (For details, see
> http://tools.ietf.org/html/rfc6962#section-2.1 or many other sources.)
>
> Given a Merkle tree, each chunk in the resource can be proceeded by a
> logarithmic number of hashes which give all the missing nodes up the
> tree to the root and, by second-preimage resistance, a single hash in
> the source document can authenticate any chunk-aligned subset of the
> resource.
>
> The chunk size is the amount of data that must be buffered before
> being processed. A smaller chunk means lower latency, but more
> overhead.
>
> For resources that aren't accessed with Range requests, we can
> simplify things a little by assuming that it'll always be received in
> order.
>
> Rather than a balanced binary tree for the Merkle tree, the tree can
> be made completely unbalanced. Specifically, the left child of every
> node is a data chunk and the right node (if any) is an interior node.
> This means that the root hash covers the first chunk and the next
> interior node. If the second chunk is preceded by this interior node,
> and so on, then the whole file is secured with only a single hash per
> chunk.
>
> Both of the above require that the resource data itself be altered to
> add extra data. This means that a resource suitable for integrity
> cannot be used without it and vice versa. If this is unacceptable in
> some cases then it's very easy to put a number of hashes straight into
> the HTML: all the interior nodes of a Merkle tree could be given. The
> downside is that a large amount of hash data might delay loading of
> the remainder of the HTML.
>
> In short, this means that the attribute should be something like:
>
> integrity="8192;unbalanced;progressive;sha-256;6b4j122..."
>
> I.e. it contains the chunk size, the tree shape, whether the hashes
> are inline in the HTML or loaded progressively in the resource data,
> the hash function and the data itself. (I'm not trying to propose a
> concrete syntax.)
>
>
> Cheers
>
> AGL
>
>

Received on Tuesday, 14 January 2014 22:52:19 UTC