Re: [integrity]: latency tradeoffs from Mike West on 2014-01-15 (public-webappsec@w3.org from January 2014)

From: Mike West <mkwst@google.com>
Date: Wed, 15 Jan 2014 10:16:28 +0100
To: Adam Langley <agl@google.com>
Cc: "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAKXHy=f0QUBzgfiQwjVBAeZQ1ty9NxhX9uF0a8yp9cCAz0_Ftg@mail.gmail.com>

Adam, you hurt my brain. I need to go read up on Merkle trees. :)

On Tue, Jan 14, 2014 at 9:08 PM, Adam Langley <agl@google.com> wrote:

> Current examples seem to be using a single hash to authenticate a
> whole resource. However, that requires that the whole resource be
> buffered before any of it can be used. This extra latency might well
> outweigh any performance benefits that one might wish to gain by using
> integrity.
>

1. Performance isn't the goal. Integrity is the goal.

2. I think the performance benefits of integrity would be focused on cache.
That is, the second load of a resource, regardless of its URL, could avoid
hitting the network entirely if we already have a matching resource
locally. For this case, we have the whole resource already, by definition.

That said, it would be wonderful to avoid some of the obvious performance
hits that result from verifying a resource only when the entire resource
has been downloaded. This approach could, for instance, allow us to kick
out of a download early if we can detect a hash mismatch in the middle of a
file rather than at the end, or to start parsing an HTML document in an
IFrame.

> Both of the above require that the resource data itself be altered to
> add extra data. This means that a resource suitable for integrity
> cannot be used without it and vice versa.

I think this is problematic in most (all?) cases, given the nature of the
threat we're attempting to address. Trusting the resource to authenticate
itself doesn't provide much benefit if we're not sure we can trust the
resource in the first place.

That said, if I've understood you correctly, we could put only the initial
hash into the HTML document, and subsequent hashes into the resource? I'm
not sure what that would look like on disk, or how we would best be able to
communicate the hashes alongside the resource stream, but it's well worth
considering as an alternative to one-hash-one-resource.

If this is unacceptable in
> some cases then it's very easy to put a number of hashes straight into
> the HTML: all the interior nodes of a Merkle tree could be given. The
> downside is that a large amount of hash data might delay loading of
> the remainder of the HTML.
>

It would be interesting to evaluate how much overhead this would produce in
the worst case. It sounds significant (a few percent, depending on block
size and digest size).

-mike

Received on Wednesday, 15 January 2014 09:17:16 UTC