Re: Encrypted Media proposal (was RE: ISSUE-179: av_param - Chairs Solicit Alternate Proposals or Counter-Proposals) from Mark Watson on 2012-02-29 (public-html@w3.org from February 2012)

From: Mark Watson <watsonm@netflix.com>
Date: Wed, 29 Feb 2012 21:14:27 +0000
To: Henri Sivonen <hsivonen@iki.fi>
CC: "<public-html@w3.org>" <public-html@w3.org>
Message-ID: <B51D5015-F735-48FC-9401-E547B2159196@netflix.com>
On Feb 29, 2012, at 12:27 AM, Henri Sivonen wrote:

> On Tue, Feb 28, 2012 at 12:35 AM, Mark Watson <watsonm@netflix.com> wrote:
>> These are obviously fairly general statements - the proposal doesn't prescribe a model for where CDMs will come from and we appreciate opinions and ideas on that topic.
> 
> Without knowing the nature of the CDMs, the impact of the proposal
> can't be evaluated. It can't even be evaluated if the proposal
> proposes a sensible API without having a good idea of what kind of
> things CDMs would be.

What level of information would you like to see ? I assume you're interested in more about the functional split between CDM and browser, right ? If a detailed example API for a browser communicating with a CDM were provided, would this be sufficient ? 

> 
> But let's take a step back from CDMs and try to understand the
> motivating requirements better.
> 
> So far, the requirements from the content provider point of view that
> I've seen are:
> * Decrypted data must not be available to JavaScript
> * Speedbump to deter users from saving decrypted content

The speedbump was an analogy to illustrate that something doesn't have to be perfect to be useful. The requirement for content protection is that it really is genuinely difficult (in ways that can be measured in terms of the expertise, equipment and time required) to obtain the keys or decrypted content.

> * Hiding content from untrusted CDNs that host the content
> * Hiding content from eavesdroppers when the content is served over
> HTTP without SSL/TLS
> 
> (I also saw that there are unbounded requirements that can't be
> enumerated for all content providers in general, but I hope it's
> possible to gain a better understanding of the requirements of Netflix
> in particular later on. After all, there must be some process that
> lead to Netflix deciding to use whatever DRM Netflix now uses in
> various cases.)

Yes, the process is that we work closely with our content providers to determine what's acceptable to them in any given scenario. It's a two-way cooperative process that ultimately ends in a risk assessment and judgement call for any given platform, type of content etc. The judgement call is based on more than just the properties of the proposed content protection, but also includes engineering constraints and timescales, platform volume and other business aspects. Our content provider partners understand that their product is more valuable to us if we can deliver it to more platforms and we understand that they do not want our product to be easily usable for piracy.

So you can see it would not be simple, or even useful, to enumerate the exact outcomes of those decisions to date.

The point is that we are not proposing W3C to design a content protection system based on a specific set of content protection requirements. Others have designed such systems. Our proposal is about integrating those solutions with HTML.

I can give some examples of the kind of thing I mean, though.

Example 1 (at the extreme end of the scale): A TV contains a secure media pipeline implementation which runs on a separate processor from the general-purpose CPU in which the browser runs. This processor has hardware mechanisms which make it difficult to install new firmware. The owner of the TV cannot easily get root access to the media pipeline processor, but they may more easily get root access to the general-purpose CPU. In this example the CDM runs on the secure media processor. On the general-purpose CPU, there are drivers allowing software to interact with the secure media pipeline. This interaction follows the CDM API, which mirrors the proposed HTML API plus calls to send the actual encrypted media and control how the output is composited.

Example 2 (at the other end of the scale): A content protection system vendor provides a (closed-source, obfuscated) software component for multiple platforms that implements decryption and decoding of audio/video media. They adapt that component to support the CDM APIs offered by various desktop browsers. An OS vendor, browser maker or commercial video service arranges for that component to be installed on the users machine. Technically, the same component can be used by any service that supports the protection system in question and whose A/V codecs are supported by the CDM. The CDM returns decoded audio/video samples to the browser, which is implemented such that data returned from CDMs is not available to Javascript (e.g. through Canvas).

I am not saying that either of these is necessary or sufficient for any given scenario (see the note above about a cooperative decision process). It's hard to imagine that Example 1 might not be sufficient and it's easy to imagine that Example 2 might not meet some people's requirements if the browser could be easily modified to expose the data on a canvas (for example).

> 
> From the browser point of view (well, open source browser at least),

When you say 'open source' do you mean GPLv3 specifically, or open source in the broader sense ?

> some obvious requirements are:
> * The system is fully specified and doesn't involve any
> implementation-side secrets
> * The system can be implemented by anyone and in Open Source software
> * The system doesn't require browsers to interface with 3rd-party
> black boxes that the browser vendors don't control (i.e. it should be
> possible to have a fully-functional fully open source implementation
> of the interoperable Web platform including all the supporting
> software in the style of B2G and Chromium OS.)

Why do you think these requirements apply to our proposal when they do not apply today in respect of plugins ?

> 
> Here's a description of a straw feature that I believe meets all the
> above requirements. Would this system be adequate for Netflix to serve
> movies to a browser that implements this feature? If not, why not
> specifically? (The main purpose of this exercise is to gain better
> understanding of the requirements. This isn't an offer to implement
> this straw feature.)

I will review the proposal in detail and get back to you. As explained above, the answer to your first question is not a simple binary one, and not one that Netflix could answer alone.

Would you be willing to conduct a similar review of our proposal ?

...Mark

> 
> - -
> 
> This feature adds a decryption layer to the browser's HTTP stack and
> an API for initializing decryption keys from a different origin. Also,
> the Same Origin Policy is extended to block obvious access to
> decrypted data from JavaScript.
> 
> The browser maintains a key storage that holds tuple of key,
> sha1(key), origin of key, list of authorized origins and time to live.
> There's a JavaScript API navigator.addKey(keyUrl,
> arrayOfAuthorizedOrigins, timeToLiveSeconds, doneCallback). keyUrl is
> a URL of the same origin as the caller of the API. The payload
> retrieved from the URL is key material to be added to the key storage.
> arrayOfAuthorizedOrigins is an array of origins serialized as strings
> that are authorized to serve content to be decrypted using the key.
> (This is a privacy mechinism against other origins probing the key
> store in case an untrusted CDN has leaked key hashes. More on hashes
> later.) timeToLiveSeconds is the number of seconds after which the
> browser purges this keystore entry. doneCallback is a JavaScript
> function that the browser calls after it has retrieved and processed
> keyUrl. Upon success, a single argument true is passed. Upon failure,
> a single argument false is passed. (Note: The key material is not
> exposed to JS.) The browser generates an id for the key by hashing the
> key material with SHA-1. Origin of key is set to the origin of the
> caller of the API which has to be the same as the origin of keyUrl.
> 
> When an HTTP response includes the response header Content-Encoding:
> AES256, the processing happens as follows (if any step fails, treat as
> like other HTTP errors):
> 
> The HTTP implementation gets the value of another response header
> called Key-SHA1 and base64-decodes it. Then, the browser's key storage
> is searched for a key whose sha1(key) entry matches this value and
> whose list of authorized origins containst the origin of the HTTP
> response and decrypts the response payload using AES256 using the
> located key as the decryption key. The decrypted payload is exposed to
> the other parts of the browser as having origin E(origin of key).
> 
> Origins of the type E(Origin) have the following properties:
> 
> * A resource of origin E(Origin) can be included as embedded content
> (<img>, <video>, <audio>) in (and only in) a document whose origin is
> Origin.
> * For the purpose of JavaScript access to the data of the resource
> (be it raw bytes or pixel data), E(Origin) is considered to be
> different-origin with every origin including the origin(s)
> representing the authority of browser extensions.
> * Browsers disable the "Save As..." context menu for embedded content
> whose origin is of the form E(Origin)
> 
> When the layer above the HTTP layer request the HTTP stack to perform
> a range request on a Content-Encoding: AES256 resource, the HTTP stack
> must extend the range such that the range consists of full AES256
> blocks in both directions. (The lack of block chaining is deliberate
> so that seeking is possible.)
> 
> -- 
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>
Received on Wednesday, 29 February 2012 21:14:57 UTC