Re: [SRI] Escaping mixed-content blocking for video distribution from Mark Watson on 2014-11-12 (public-webappsec@w3.org from November 2014)

From: Mark Watson <watsonm@netflix.com>
Date: Wed, 12 Nov 2014 09:22:39 -0800
To: Adam Langley <agl@google.com>
Cc: Mike West <mkwst@google.com>, Frederik Braun <fbraun@mozilla.com>, "public-webappsec@w3.org" <public-webappsec@w3.org>
Message-ID: <CAEnTvdA_gGfkJ2yJo8904qCnEQ2Ar+dxuinD7RrJqk94um-f6A@mail.gmail.com>
All,

Are there any further thoughts on this ? Again, a solution here offers the
prospect of making it much easier / quicker for video distribution sites to
migrate to secure origins, with the associated user benefits.

The proposal, such as it is, is to add a request integrity mechanism to the
existing SRI mechanism (see below for a video-specific version of SRI). A
new Request-Integrity HTTP header would be included in HTTP response giving
an HMAC of the entire request as received by the server. The key used for
this HMAC would be provided to the UA in the same way as the hash used for
SRI. It would be a matter for the site to arrange for this key to be shared
between client and server. The UA would verify this HMAC as well as the
resource hash. If both pass, the resource is allowed as mixed content. If
not, the resource is requested over HTTPS instead.

Request integrity is likely broken by many middleboxes which modify HTTP
headers. In these cases we use HTTPS instead. Nevertheless, the fraction of
video traffic that needed to use HTTPS would likely be small (and would
diminish, since the middleboxes in this case would be serving no useful
purpose whatsoever.)

There is likely an additional step needed in the request integrity
mechanism to make it secure. If the UA was involved in choosing the key,
the UA could ensure that random keys are used, rather than a fixed key for
a given site, for example. I'm not proposing a finished design, but I think
the details could be worked out.

Regarding a video-specific alternative to SRI as suggested by AGL: The
existing SRI, when used with XHR, means that media data it not available
for playback until the request is complete. Also, the mechanism applies to
all data, not just audio/video data. A mechanism restricted to audio/video
would be sufficient to achieve the goals here and would reduce the attack
surface somewhat (just as audio/video fetched directly by the video element
sometimes escapes mixed-content blocking).

The idea would be to allow XHR to HTTP resources, but only for the Stream
return type and then returning a special kind of Stream which works only
with the Media Source Extension. Furthermore, the Media Source Extension
would expect such a Stream to contain embedded integrity data and for the
page to provide block-by-block integrity information. Specifically, for
mp4, we would provide a new data structure in the Movie Fragment header
giving hashes of the movie data (this kind of thing has been discussed a
few times in the past). The page would be expected to provide the hash of
this new data structure for each Movie Fragment.

I think the generic mechanism is sufficient and much simpler, but the
video-specific option restricts the attack surface. However the idea of a
"special kind" of Stream object is a little clunky.

>From a complexity / standardization perspective I can see the attraction of
simply saying that people should use HTTPS, but that is likely going to
take much longer in practice. In the meantime, the absence of a
mixed-content solution for video distribution means there will continue to
be pressure to keep various APIs desired by those sites open to insecure
origins.

...Mark


On Tue, Nov 4, 2014 at 6:18 PM, Mark Watson <watsonm@netflix.com> wrote:

>
>
> On Tue, Nov 4, 2014 at 5:58 PM, Adam Langley <agl@google.com> wrote:
>
>> On Tue, Nov 4, 2014 at 5:46 PM, Mark Watson <watsonm@netflix.com> wrote:
>> > I assumed the script was going to provide the hashes, since the content
>> > would be coming over HTTP.
>>
>> That's a simple solution, but it wasn't what I had in mind at the time.
>>
>> Consider an HD movie that's 10GiB in size. Chunks of data cannot be
>> processed before they have been verified and we don't want to add too
>> much verification latency. So let's posit that 16KiB chunks are used.
>>
>
> Let's say the movie is 2 hours long. Typically, adaptive streaming
> downloads data in small chunks, say 2s in duration. It would be reasonable
> to hash each such chunk, so there would be 3600 hashes = 112KB (I don't see
> any reason to base64 them, they can be pulled from wherever they come with
> XHR).
>
> A 2s chunk of video, in this example, is 2.8MB, so this overhead is small
> (obviously it's bigger for lower bitrates).
>
> It's true that having to wait for a complete chunk to download before
> playback will affect the quality of experience. There will be corner cases
> where playback could have continued if a chunk could be played before
> verification, but it will stall waiting for the completion of the chunk.
>
>
>>
>> If all the hashes for those chunks were sent upfront in the HTML then
>> there are 10 * 2^^30 / 2^^14 chunks * 32 bytes per hash * 4/3 base64
>> expansion = ~27MB of hashes to send the client before anything else.
>>
>> With the Merkle tree construction, the hash data can be interleaved in
>> the 10GiB stream such that they are only downloaded as needed. The
>> downside is that you either need a server capable of doing the
>> interleaving dynamically, or you need two copies of the data on disk:
>> one with interleaved hashes and one without. (Unless the data format
>> is sufficiently forgiving that you can get away with serving the
>> interleaved version to clients that aren't doing SRI processing.)
>>
>
> Ok, so I guess this could solve the above problem, by having the site
> provide a hash for each 2s chunk, say, where this hash is actually the hash
> of the concatenation of the hashes for smaller pieces of that chunk and
> these hashes are embedded in the file *before* the chunk they pertain to.
> That way data could be fed to the video decoder closer to when it arrives.
>
> At least in the fragmented mp4 file case there are plenty of ways to embed
> stuff that will be ignored by existing clients that don't understand it.
>
> At present we are using the Media Source Extensions, with the data being
> retrieved into an ArrayBuffer with XHR and this ArrayBuffer being fed into
> the Media Source. The XHR does not know the data is for media playback, so
> it couldn't do the above.
>
> However, we are discussing how to integrate with Streams, so that a Stream
> obtained from the XHR would be connected directly to the Media Source. I
> guess in this case there could be some media-specific integrity checking on
> the Media Source side that allows this otherwise "untrusted" XHR data to be
> used. In this case the data would never be exposed to JS.
>
> ...Mark
>
>
>
>>
>>
>> Cheers
>>
>> AGL
>>
>
>
Received on Wednesday, 12 November 2014 17:23:13 UTC