Re: [SRI] Escaping mixed-content blocking for video distribution from Mark Watson on 2014-11-05 (public-webappsec@w3.org from November 2014)

From: Mark Watson <watsonm@netflix.com>
Date: Tue, 4 Nov 2014 18:18:00 -0800
To: Adam Langley <agl@google.com>
Cc: Mike West <mkwst@google.com>, Frederik Braun <fbraun@mozilla.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAEnTvdDX_zoUX8QXU5-JOq98RwzzWieG1-wCwWvQ+saEDu1R3w@mail.gmail.com>

On Tue, Nov 4, 2014 at 5:58 PM, Adam Langley <agl@google.com> wrote:

> On Tue, Nov 4, 2014 at 5:46 PM, Mark Watson <watsonm@netflix.com> wrote:
> > I assumed the script was going to provide the hashes, since the content
> > would be coming over HTTP.
>
> That's a simple solution, but it wasn't what I had in mind at the time.
>
> Consider an HD movie that's 10GiB in size. Chunks of data cannot be
> processed before they have been verified and we don't want to add too
> much verification latency. So let's posit that 16KiB chunks are used.
>

Let's say the movie is 2 hours long. Typically, adaptive streaming
downloads data in small chunks, say 2s in duration. It would be reasonable
to hash each such chunk, so there would be 3600 hashes = 112KB (I don't see
any reason to base64 them, they can be pulled from wherever they come with
XHR).

A 2s chunk of video, in this example, is 2.8MB, so this overhead is small
(obviously it's bigger for lower bitrates).

It's true that having to wait for a complete chunk to download before
playback will affect the quality of experience. There will be corner cases
where playback could have continued if a chunk could be played before
verification, but it will stall waiting for the completion of the chunk.

>
> If all the hashes for those chunks were sent upfront in the HTML then
> there are 10 * 2^^30 / 2^^14 chunks * 32 bytes per hash * 4/3 base64
> expansion = ~27MB of hashes to send the client before anything else.
>
> With the Merkle tree construction, the hash data can be interleaved in
> the 10GiB stream such that they are only downloaded as needed. The
> downside is that you either need a server capable of doing the
> interleaving dynamically, or you need two copies of the data on disk:
> one with interleaved hashes and one without. (Unless the data format
> is sufficiently forgiving that you can get away with serving the
> interleaved version to clients that aren't doing SRI processing.)
>

Ok, so I guess this could solve the above problem, by having the site
provide a hash for each 2s chunk, say, where this hash is actually the hash
of the concatenation of the hashes for smaller pieces of that chunk and
these hashes are embedded in the file *before* the chunk they pertain to.
That way data could be fed to the video decoder closer to when it arrives.

At least in the fragmented mp4 file case there are plenty of ways to embed
stuff that will be ignored by existing clients that don't understand it.

At present we are using the Media Source Extensions, with the data being
retrieved into an ArrayBuffer with XHR and this ArrayBuffer being fed into
the Media Source. The XHR does not know the data is for media playback, so
it couldn't do the above.

However, we are discussing how to integrate with Streams, so that a Stream
obtained from the XHR would be connected directly to the Media Source. I
guess in this case there could be some media-specific integrity checking on
the Media Source side that allows this otherwise "untrusted" XHR data to be
used. In this case the data would never be exposed to JS.

...Mark

>
>
> Cheers
>
> AGL
>

Received on Wednesday, 5 November 2014 02:18:29 UTC