Re: Individualization from Henri Sivonen on 2014-10-27 (public-html-media@w3.org from October 2014)

From: Henri Sivonen <hsivonen@hsivonen.fi>
Date: Mon, 27 Oct 2014 16:24:08 +0200
To: David Dorwin <ddorwin@google.com>
Cc: Joe Steele <steele@adobe.com>, "<public-html-media@w3.org>" <public-html-media@w3.org>
Message-ID: <CANXqsR+n-GNBHKvUODGRcCHD7xYU-4hAEex2QtJ9pu8cTpc3mA@mail.gmail.com>
On Sat, Oct 25, 2014 at 3:22 AM, David Dorwin <ddorwin@google.com> wrote:
>
> On Fri, Oct 24, 2014 at 1:58 AM, Henri Sivonen <hsivonen@hsivonen.fi> wrote:
>>
>> On Wed, Oct 22, 2014 at 5:13 AM, David Dorwin <ddorwin@google.com> wrote:
>
>> > I assume you are referring to per-origin individualization, which Joe
>> > has
>> > previously mentioned.
>>
>> As far as spec changes are concerned, I am referring to download-based
>> individualization in general. However, the constraints Firefox places
>> on the CDM are supposed to make origin-independent download-based
>> individualization impossible, so my immediate interest is with the
>> origin-dependent case.
>
> I'm not familiar with the term "download-based individualization". Can you
> define it?

I mean a situation where multiple users are given the same CDM DLL but
in order to perform license requests in a way that causes the licenses
to be bound to a single CDM instance, the CDM needs to obtain some
blob from outside the device on which the CDM is running and to
incorporate that blob into the CDM's operation.

This is in contrast with other imaginable solutions that for the
purpose of this discussion I don't consider download-based
individualization:

1) Each hardware device instance is provisioned with a secret intended
for DRM purposes at the factory and the CDM uses this secret without
having to obtain anything from outside the device in order to make
license requests once the device has left the factory.

2) Each time the CDM DLL as a whole is downloaded, the the server that
the CDM DLL is downloaded from responds with a different DLL that has
a different secret baked into it.

3) The CDM gathers device-unique bits locally and generates all the
secrets it needs, including keys, from this data without having to
obtain anything from outside the device before it is ready to perform
license requests. (In contrast to case #1, these bits have not been
put on the device for specifically DRM purposes.)

(This list this not necessarily exhaustive.)

>> > The per-origin identifiers that presumably come with per-origin
>>
>> > individualization are good for privacy. However, it's unclear whether
>> > deferring such individualization to a centralized server maintains those
>> > qualities.
>>
>> This depends on what data the individualization request contains.
>> Firefox provides the CDM with some bits that are unique to the
>> computer, the origin using EME, the origin in the URL and a
>> randomly-generated salt. What the CDM does is up to the CDM, but even
>> if the CDM sent these bits verbatim to a centralized server, the
>> centralized server wouldn't learn anything from these bits.
>
>
> Does Firefox hash these before providing them to the CDM?

Yes. Firefox give the CDM a hash whose ingredients are
 1) Device-unique bits.
 2) The origin using EME.
 3) The origin of the top-level browsing context.
 4) A randomly-generated salt associated with the pair of #2 and #3
and persisted until the user requests the salt be forgotten.

Additionally, the CDM may ask Firefox to store data on its behalf, but
the data storage is partitioned such that whenever the hash is
different, the storage partition is different, too.

> If the CDM or central server is given or can determine the bits that are
> unique to the computer, it can track all origins that computer uses EME on
> (or potentially any origin it visits - see
> http://lists.w3.org/Archives/Public/www-tag/2014Oct/0106.html). Such an
> implementation of per-origin individualization would really just move the
> potential privacy issues from one place to another.

Well, the CDM doesn't get to see #1 directly.

Having #2 as a hash ingredient protects against tracking across
different EME license providers.

Having #3 as a hash ingredient protects against a single EME license
provider that provides video hosting services to multiple sites (or is
a MITM injecting EME-usage <iframe>s to various http sites) from
tracking the user across sites.

Having #4 as a hash ingredient allows the user to opt to have a
trackability discontinuity point even with a single EME license
provider.

> We should probably mention this in the privacy considerations section.

Good idea.

>> AFAICT, a centralized individualization server can always tally
>> information that's baked into the CDM. Potentially, if the CDM builds
>> for different platforms have different in-baked information, the
>> centralized server could count individualizations by platform, but the
>> platform is already information that browsers expose left and right in
>> the UA string (albeit not in a form that's hard for the user to
>> forge).
>
>
> As you note, cryptographic proof/identification of the CDM (i.e. a class of
> devices) is a concern since it cannot be faked like UA strings and could
> potentially be use for platform segmentation (possibly even when not using
> the DRM).
> However, unique identification of a user or client is much more concerning.

The uniqueness is quite partitioned. See above.

>> If the server of the application proxies the individualization
>> requests to the central server, the central server may not even get to
>> learn the IP address of the client the CDM. (Though, of course, the
>> browser has no proof that the application's server won't pass this
>> information along.)
>>
>> If the JS program of the application XHRs the individualization
>> request directly to the central server (and the central server
>> authorizes this via CORS), the centralized server learns that the IP
>> address of the client wanted to individualize for the origin of the
>> application. Also, if the central server requests credentials and the
>> user has for other reasons browsed the site of the CDM vendor to
>> obtain a cookie, the central server can match individualizations with
>> that cookie. That may not be cool, but at least the credentials can't
>> be requested covertly from people who care to inspect what CORS
>> headers are sent, so a CDM vendor doing this would get caught and
>> getting blogged about.
>
>
> Wouldn't this be an argument for having the license server proxy to the
> central server instead of the app?

Yes, but
1) the browser and the user get no guarantees that the license server
doesn't reveal the IP address of the user's device to the
individualization server anyway (i.e. there's no cryptographic
mechanism preventing the proxy from adding an X-Forwarded-For header).
2) for better or worse, we allow sites to leak user's browsing
behavior to a third party such as Google Analytics and even have
designed platform features to cater to Google Analytics (e.g. <script
async>). If an EME-using site XHRs to a CDM vendor for
individualization, the CDM vendor learns that a user at an IP address
has chosen to watch some content from that EME-using site. Arguably,
this is less of a privacy violation that e.g. HBO Nordic leaking to
Google, AdForm and Facebook which *titles* the user navigates to on
the site.
3) EME already makes key server as a third-party service a possibility.

Due to #1, it's not like the user can trust the proxying to have the
assumed properties and hide the user's IP address from the
individualization server. Due to #2 and #3, if the site wants to, it
has plenty of opportunity of leaking the user's behavior to third
parties.

> (And thus, not adding the message type.)

Even if the application has been designed to proxy the
individualization requests through a server belonging to the
application, it's conceivable that the application would want to XHR
license requests and individualization requests to different URLs in
its own URL space.

> The potential for cookies to be passed is quite concerning.

Yes. With Google Analytics and key server as a service, too.

>> > (I am assuming the reason for the proxying is that the
>> > "individualization server" is not run by the application provider.)
>>
>> The assumption I have is that the individualization server would be
>> run by the CDM vendor.
>
>
> If the central server is just getting some bits that are unique to the
> computer, the origin(s), and a randomly-generated salt, it would seem that
> the content provider's license server could handle this (unless the goal is
> to build a central repository).

If the CDM vendor allows the content provider to operate this piece of
infrastructure, sure.

(While I don't have a problem with explaining the privacy measures we
are putting in place in Firefox, I find this level of vetting for a
new enum item rather surprising. Have spec edit requests catering to
Microsoft's needs been vetted on this level of detail by this Task
Force? Where can I read a similar vetting of the privacy properties of
Microsoft's solution? Or Apple's in response to the initData changes
that catered, in practice, only to Apple?)

-- 
Henri Sivonen
hsivonen@hsivonen.fi
https://hsivonen.fi/
Received on Monday, 27 October 2014 14:24:36 UTC