Re: [SRI] Escaping mixed-content blocking for video distribution

Mark,

 There is work going on in the OAuth WG on authenticating HTTP requests:

http://tools.ietf.org/html/draft-ietf-oauth-signed-http-request-00


 Have you looked at this to see if it is suitable for your use case?

 I think we would definitely like to continue the discussion on SRI for insecure origins, and on methods like unbalanced Merkle Tree hashing to apply integrity to streamed data, but the consensus seems to be strong that these should be "Level >= 2" features, and the discussion should be informed by the results of experimenting with the minimum-viable set of features currently proposed for Level 1.

-Brad

From: Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>>
Date: Wednesday, November 12, 2014 at 9:22 AM
To: Adam Langley <agl@google.com<mailto:agl@google.com>>
Cc: Mike West <mkwst@google.com<mailto:mkwst@google.com>>, Frederik Braun <fbraun@mozilla.com<mailto:fbraun@mozilla.com>>, "public-webappsec@w3.org<mailto:public-webappsec@w3.org>" <public-webappsec@w3.org<mailto:public-webappsec@w3.org>>
Subject: Re: [SRI] Escaping mixed-content blocking for video distribution
Resent-From: <public-webappsec@w3.org<mailto:public-webappsec@w3.org>>
Resent-Date: Wednesday, November 12, 2014 at 9:23 AM

All,

Are there any further thoughts on this ? Again, a solution here offers the prospect of making it much easier / quicker for video distribution sites to migrate to secure origins, with the associated user benefits.

The proposal, such as it is, is to add a request integrity mechanism to the existing SRI mechanism (see below for a video-specific version of SRI). A new Request-Integrity HTTP header would be included in HTTP response giving an HMAC of the entire request as received by the server. The key used for this HMAC would be provided to the UA in the same way as the hash used for SRI. It would be a matter for the site to arrange for this key to be shared between client and server. The UA would verify this HMAC as well as the resource hash. If both pass, the resource is allowed as mixed content. If not, the resource is requested over HTTPS instead.

Request integrity is likely broken by many middleboxes which modify HTTP headers. In these cases we use HTTPS instead. Nevertheless, the fraction of video traffic that needed to use HTTPS would likely be small (and would diminish, since the middleboxes in this case would be serving no useful purpose whatsoever.)

There is likely an additional step needed in the request integrity mechanism to make it secure. If the UA was involved in choosing the key, the UA could ensure that random keys are used, rather than a fixed key for a given site, for example. I'm not proposing a finished design, but I think the details could be worked out.

Regarding a video-specific alternative to SRI as suggested by AGL: The existing SRI, when used with XHR, means that media data it not available for playback until the request is complete. Also, the mechanism applies to all data, not just audio/video data. A mechanism restricted to audio/video would be sufficient to achieve the goals here and would reduce the attack surface somewhat (just as audio/video fetched directly by the video element sometimes escapes mixed-content blocking).

The idea would be to allow XHR to HTTP resources, but only for the Stream return type and then returning a special kind of Stream which works only with the Media Source Extension. Furthermore, the Media Source Extension would expect such a Stream to contain embedded integrity data and for the page to provide block-by-block integrity information. Specifically, for mp4, we would provide a new data structure in the Movie Fragment header giving hashes of the movie data (this kind of thing has been discussed a few times in the past). The page would be expected to provide the hash of this new data structure for each Movie Fragment.

I think the generic mechanism is sufficient and much simpler, but the video-specific option restricts the attack surface. However the idea of a "special kind" of Stream object is a little clunky.

From a complexity / standardization perspective I can see the attraction of simply saying that people should use HTTPS, but that is likely going to take much longer in practice. In the meantime, the absence of a mixed-content solution for video distribution means there will continue to be pressure to keep various APIs desired by those sites open to insecure origins.

...Mark


On Tue, Nov 4, 2014 at 6:18 PM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:


On Tue, Nov 4, 2014 at 5:58 PM, Adam Langley <agl@google.com<mailto:agl@google.com>> wrote:
On Tue, Nov 4, 2014 at 5:46 PM, Mark Watson <watsonm@netflix.com<mailto:watsonm@netflix.com>> wrote:
> I assumed the script was going to provide the hashes, since the content
> would be coming over HTTP.

That's a simple solution, but it wasn't what I had in mind at the time.

Consider an HD movie that's 10GiB in size. Chunks of data cannot be
processed before they have been verified and we don't want to add too
much verification latency. So let's posit that 16KiB chunks are used.

​Let's say the movie is 2 hours long. Typically, adaptive streaming downloads data in small chunks, say 2s in duration. It would be reasonable to hash each such chunk, so there would be 3600 hashes = 112KB (I don't see any reason to base64 them, they can be pulled from wherever they come with XHR).

A 2s chunk of video, in this example, is 2.8MB, so this overhead is small (obviously it's bigger for lower bitrates).

It's true that having to wait for a complete chunk to download before playback will affect the quality of experience. There will be corner cases where playback could have continued if a chunk could be played before verification, but it will stall waiting for the completion of the chunk.


If all the hashes for those chunks were sent upfront in the HTML then
there are 10 * 2^^30 / 2^^14 chunks * 32 bytes per hash * 4/3 base64
expansion = ~27MB of hashes to send the client before anything else.

With the Merkle tree construction, the hash data can be interleaved in
the 10GiB stream such that they are only downloaded as needed. The
downside is that you either need a server capable of doing the
interleaving dynamically, or you need two copies of the data on disk:
one with interleaved hashes and one without. (Unless the data format
is sufficiently forgiving that you can get away with serving the
interleaved version to clients that aren't doing SRI processing.)

Ok, so I guess this could solve the above problem, by having the site provide a hash for each 2s chunk, say, where this hash is actually the hash of the concatenation of the hashes for smaller pieces of that chunk and these hashes are embedded in the file *before* the chunk they pertain to. That way data could be fed to the video decoder closer to when it arrives.

At least in the fragmented mp4 file case there are plenty of ways to embed stuff that will be ignored by existing clients that don't understand it.

At present we are using the Media Source Extensions, with the data being retrieved into an ArrayBuffer with XHR and this ArrayBuffer being fed into the Media Source. The XHR does not know the data is for media playback, so it couldn't do the above.

However, we are discussing how to integrate with Streams, so that a Stream obtained from the XHR would be connected directly to the Media Source. I guess in this case there could be some media-specific integrity checking on the Media Source side that allows this otherwise "untrusted" XHR data to be used. In this case the data would never be exposed to JS.

...Mark​




Cheers

AGL

Received on Wednesday, 12 November 2014 19:22:45 UTC